# Title: Deep Learning Model for Cardiovascular Disease Prediction
### Author: Iuri Lima

Description:
This script builds a deep learning model using TensorFlow to predict the presence or absence of cardiovascular disease based on various medical and demographic features.

The dataset used in this example contains the following features:
- Age (in days)
- Height (in cm)
- Weight (in kg)
- Gender (categorical code: 1 = women, 2 = men)
- Systolic blood pressure (ap_hi)
- Diastolic blood pressure (ap_lo)
- Cholesterol level (1 = normal, 2 = above normal, 3 = well above normal)
- Glucose level (1 = normal, 2 = above normal, 3 = well above normal)
- Smoking status (0 = non-smoker, 1 = smoker)
- Alcohol intake (0 = no, 1 = yes)
- Physical activity level (0 = low, 1 = moderate, 2 = high)

The target variable is the presence or absence of cardiovascular disease (0 = absent, 1 = present).

The script loads and preprocesses the data using the StandardScaler class from the scikit-learn library. It then splits the data into training and testing sets using the train_test_split function from scikit-learn.

The deep learning model is built using the Keras API from TensorFlow. The model consists of a fully connected neural network with two hidden layers of 64 and 32 units, respectively, and a final output layer with a sigmoid activation function. The model is compiled with the binary cross-entropy loss function and the Adam optimizer, and trained for 10 epochs with a batch size of 32.

Finally, the script evaluates the model's performance on the training and testing sets using the evaluate method from Keras.

Dependencies:
- pandas (v1.1.5 or higher)
- scikit-learn (v0.24.2 or higher)
- tensorflow (v2.5.0 or higher)

Usage:
1. Install the required dependencies using pip or conda:
   - pip install pandas scikit-learn tensorflow
   - conda install pandas scikit-learn tensorflow

2. Download the Cardiovascular Disease dataset from Kaggle:
   - https://www.kaggle.com/sulianova/cardiovascular-disease-dataset
   - Save the dataset as "cardiovascular_disease.csv" in the same directory as this script.

3. Run the script in a Python environment:
   - python deep_learning_model.py

Output:
The script outputs the training and testing accuracy of the deep learning model on the Cardiovascular Disease dataset.



In [7]:
# Step 1: Import necessary libraries
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Step 2: Load and preprocess data
df = pd.read_csv('/content/sample_data/cardio_train.csv', sep=';')
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Step 3: Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Build and train a deep learning model using TensorFlow
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))


# Step 5: Evaluate model performance on training and testing sets
train_loss, train_acc = model.evaluate(X_train, y_train, verbose=0)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f'Training accuracy: {train_acc:.4f}')
print(f'Testing accuracy: {test_acc:.4f}')


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Training accuracy: 0.7414
Testing accuracy: 0.7379


The results of the deep learning model trained on the Cardiovascular Disease dataset show a training accuracy of 0.7414 and a testing accuracy of 0.7379. The training accuracy indicates how well the model performs on the training data, while the testing accuracy measures the performance of the model on data it has not seen before.

The similar training and testing accuracies suggest that the model has not overfit the training data, meaning it has not memorized the data but rather learned general patterns and features. A high training accuracy and low testing accuracy could indicate overfitting, which would result in poor generalization to new data.

Overall, a testing accuracy of 0.7379 suggests that the model is performing reasonably well in predicting the presence or absence of cardiovascular disease based on the features provided in the dataset. However, further analysis and fine-tuning of the model may be necessary to improve its accuracy and applicability to real-world scenarios.