<a href="https://colab.research.google.com/github/prodramp/publiccode/blob/master/machine_learning/comet_ml/UCI_Heart_Disease_Keras_Cometml_ModelReg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Original Data Source Link:
https://archive.ics.uci.edu/ml/datasets/Heart+Disease

CSV Formatted Dataset:
https://www.kaggle.com/ronitf/heart-disease-uci?select=heart.csv

The dataset has 14 key attributes out of original 76, from the dataset along with their descriptions:

- age: The person’s age in years
- sex: The person’s sex (1 = male, 0 = female)
- cp: The chest pain experienced (Value 1: typical angina, Value 2: atypical - angina, Value 3: non-anginal pain, Value 4: asymptomatic)
- trestbps: The person’s resting blood pressure
- chol: The person’s cholesterol measurement in mg/dl
- fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false)
- restecg: Resting electrocardiographic measurement (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes’ criteria)
- thalach: The person’s maximum heart rate achieved
- exang: Exercise induced angina (1 = yes; 0 = no)
- oldpeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot)
- slope: the slope of the peak exercise ST segment (Value 1: upsloping, Value 2: flat, Value 3: downsloping)
- ca: The number of major vessels (0–3)
- thal: A blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)
- target: Heart disease (0 = no, 1 = yes)

Note: Make sure you have download the dataset from Kaggle link first. 

- Now we will upload the heart.csv from local file system to Google colab 
server.
- You can also use pandas to upload the file from local file system also
  - df = pd.read_csv(io.BytesIO(uploaded['heart.csv']))


In [None]:
!pip show comet_ml

In [None]:
!pip install comet_ml

In [None]:
from comet_ml import Experiment

In [None]:
experiment = Experiment(
    api_key="TjOdGIw502EUf8Ge4IQKWJP76", ## "YOUR_COMET_ML_API_KEY_HERE",
    project_name="keras_heart_disease", 
    workspace="avkash-prodramp-com",
    auto_metric_logging=True,
    auto_param_logging=True,
    log_graph=True,
    auto_metric_step_rate=True,
    parse_args=True,
    auto_histogram_weight_logging=True,
    auto_histogram_gradient_logging=True,
    auto_histogram_activation_logging=True,
    auto_histogram_epoch_rate=True,
)

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
!ls -l

In [None]:
import pandas as pd
import seaborn as  sns
import matplotlib.pyplot as plt

In [None]:
import io
df = pd.read_csv('heart.csv')

In [None]:
df

In [None]:
df.shape

In [None]:
df.columns

In [None]:
f = sns.countplot(x='target', data=df)
f.set_title("Heart disease distribution (Target distribution)")
f.set_xticklabels(['Heart disease - Yes', 'Heart Disease - No'])
plt.xlabel("");

In [None]:
f = sns.countplot(x='target', data=df, hue='sex')
plt.legend(['Female', 'Male'])
f.set_title("Heart Disease by gender")
f.set_xticklabels(['Heart disease - Yes', 'Heart Disease - No'])
plt.xlabel("");

In [None]:
heat_map = sns.heatmap(df.corr(method='pearson'), annot=True, fmt='.2f', linewidths=2)
heat_map.set_xticklabels(heat_map.get_xticklabels(), rotation=45);
plt.rcParams["figure.figsize"] = (50,50)

Now we need to split the dataset into target and training format.
- Target will have only target column
- Training data will have all the columns besides target column

In [None]:
df_input = pd.DataFrame(df.iloc[:, 0:13])

In [None]:
df_input.shape

In [None]:
df_input

In [None]:
df_target = pd.DataFrame(df.iloc[:, 13],columns=['target'])

In [None]:
df_target.shape

In [None]:
df_target

Transforming the training data so it can be understood by the deep learning engine. 

- StandardScaler removes the mean and scales the data to unit variance.

More info: 
- https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
- https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py


In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled_input = scaler.fit_transform(df_input)

In [None]:
df_scaled_input


- Splitting the dataset into two sets: 
  - training set
  - testing set. 
- Library Used:
   - scikit-learn library has been used with function sklearn.model_selection.train_test_split() function to split the source data

In [None]:
from sklearn.model_selection import train_test_split
df_input_train, df_input_test, df_target_train, df_target_test = train_test_split(df_scaled_input, df_target, test_size = 0.30, random_state = 5)
print("Training data Split (Train & Test)")
print(df_input_train.shape)
print(df_input_test.shape)

print("Target data Split (Train & Test)")
print(df_target_train.shape)
print(df_target_test.shape)

In [None]:
## Adding Validation data with Training
X_train, X_val, y_train, y_val = train_test_split(df_input_train, df_target_train, test_size=0.2, random_state=1)
print("Training data Split with  (Train & Test)")
print(X_train.shape)
print(y_train.shape)

print("Validation data Split (Train & Test)")
print(X_val.shape)
print(y_val.shape)

**Build the Keras classifier to predict the heart disease**

In [None]:
from keras.models import Sequential
from keras.layers import Dense

**Keras Settings:**

- In the first line, we se the model as Sequential. 
- All Layers
  - Added 3 fully connected Dense layers, two hidden and one output. 
  - All layers are from Dense class.
- First Layer (Dense(30, input_dim=13, activation='tanh'))
  - The first input layer has inout dimension as 13 for all 13 columns.
  - the activation function is tanh
- Second Layer (Dense(20, activation='tanh'))
  - It has 20 neurons and the tanh activation function. 
- Output Layer (Dense(1, activation='sigmoid'))
  - Output layer has a single neuron (output) 
  - The sigmoid activation function is selected for binary classification problems 
  - Out target data is binay - Yes or No heart disease .


In [None]:
model = Sequential()
model.add(Dense(30, input_dim=13, activation='tanh'))
model.add(Dense(20, activation='tanh'))
model.add(Dense(10, activation='tanh'))
model.add(Dense(1, activation='sigmoid'))

**Compile and fit Step:**

Compile function has 3 arguments:
- The adam optimizer: An algorithm for first-order gradient-based optimization.
- The binary_crossentropy loss function: logarithmic loss, which for a binary classification problem is defined in Keras as binary_crossentropy
- The accuracy metric: to evaluate the performance of your model during training and testing

Fit Function has the following parameters:
- epochs = 100

**Plotting Keras Model**
- https://keras.io/api/utils/model_plotting_utils/


In [None]:
from keras.utils.vis_utils import plot_model

In [None]:
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
%matplotlib inline

import keras
from IPython.display import clear_output

In [None]:
# updatable live loss plot with Keras Model Training
# a minimal example (sort of)
# Code Source: https://gist.github.com/stared/dfb4dfaf6d9a8501cd1cc8b8cb806d2e

class PlotLosses(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.i = 0
        self.x = []
        self.losses = []
        self.val_losses = []
        
        self.fig = plt.figure()
        
        self.logs = []

    def on_epoch_end(self, epoch, logs={}):
        
        self.logs.append(logs)
        self.x.append(self.i)
        self.losses.append(logs.get('loss'))
        self.val_losses.append(logs.get('val_loss'))
        self.i += 1
        
        clear_output(wait=True)
        plt.plot(self.x, self.losses, label="loss")
        plt.plot(self.x, self.val_losses, label="val_loss")
        plt.legend()
        plt.show();
        
plot_losses = PlotLosses()

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
#history = model.fit(df_input_train, df_target_train, epochs=100, verbose=1)

# Note: To print the live training graph, you need to add the callback function as below:
# history = model.fit(X_train, y_train, epochs=100, callbacks=[plot_losses], verbose=1, validation_data=(X_val, y_val))
history = model.fit(X_train, y_train, epochs=100, verbose=1, validation_data=(X_val, y_val))

In [None]:
# Live plotting of model training
# Code Source: https://gist.github.com/stared/dfb4dfaf6d9a8501cd1cc8b8cb806d2e

class PlotLearning(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.i = 0
        self.x = []
        self.losses = []
        self.val_losses = []
        self.acc = []
        self.val_acc = []
        self.fig = plt.figure()
        
        self.logs = []

    def on_epoch_end(self, epoch, logs={}):
        
        self.logs.append(logs)
        self.x.append(self.i)
        self.losses.append(logs.get('loss'))
        self.val_losses.append(logs.get('val_loss'))
        self.acc.append(logs.get('acc'))
        self.val_acc.append(logs.get('val_acc'))
        self.i += 1
        f, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
        
        clear_output(wait=True)
        
        ax1.set_yscale('log')
        ax1.plot(self.x, self.losses, label="loss")
        ax1.plot(self.x, self.val_losses, label="val_loss")
        ax1.legend()
        
        ax2.plot(self.x, self.acc, label="accuracy")
        ax2.plot(self.x, self.val_acc, label="validation accuracy")
        ax2.legend()
        
        plt.show();
        
plot_learn_loss_with_acc = PlotLearning()

In [None]:
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
#history = model.fit(df_input_train, df_target_train, epochs=100, verbose=1)

# Note: To print the live training graph, you need to add the callback function as below:
history = model.fit(X_train, y_train, epochs=100, callbacks=[plot_learn_loss_with_acc], verbose=1, validation_data=(X_val, y_val))

In [None]:
model.summary()
score = model.evaluate(X_train, y_train, verbose=0)
print('Model Accuracy = ',score[1])

In [None]:
history

In [None]:
history.history.keys()

In [None]:
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.show()

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

In [None]:
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc', color="green")
plt.plot(epochs, val_acc, 'b', label='Validation acc', color="red")
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss', color="orange")
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

In [None]:
Target_Classification = model.predict(df_input_test)
Target_Classification = (Target_Classification > 0.5)



from sklearn.metrics import confusion_matrix

print(confusion_matrix(df_target_test, Target_Classification))

In [None]:
import numpy as np
best_model_accuracy = history.history['accuracy'][np.argmin(history.history['loss'])]
print(best_model_accuracy)

In [None]:
model.save('model/keras-heart-disease.bin')

In [None]:
!ls

In [None]:
!ls model/

In [None]:
from tensorflow import keras

In [None]:
model_x = keras.models.load_model('model/keras-heart-disease.bin')

In [None]:
print(model_x.metrics[0])
print(model_x.metrics[1])

In [None]:
model_x.summary()
score_x = model_x.evaluate(X_train, y_train, verbose=0)
print('Model Accuracy = ',score_x[1])

In [None]:
!pwd

In [None]:
!ls /content/model

In [None]:
experiment.display()

In [None]:
experiment.log_model("Keras Heart Disease Model", "/content/model/keras-heart-disease.bin")

In [None]:
experiment.log_dataset_hash(X_train)

In [None]:
experiment.end()