## About the project

In this project, I will use a dataset from [Kaggle](https://www.kaggle.com/datasets/andrewmvd/heart-failure-clinical-data) to predict the survival of patients with heart failure from serum creatinine and ejection fraction, and other factors such as age, anemia, diabetes, and so on.

This project is a part of Codecademy Build Deep Learning Models with TensorFlow Skill Path.

### Load the data

Import libraries

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from collections import Counter
from sklearn.compose import ColumnTransformer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer
from sklearn.metrics import classification_report
from tensorflow.keras.utils import to_categorical
import numpy as np

Load the data from `heart_failure.csv` to a pandas DataFrame object. Assign the resulting DataFrame to a variable called data

In [None]:
data = pd.read_csv("heart_failure_clinical_records_dataset.csv")

Print the distribution of the `death_event` column in the data DataFrame class using collections.Counter. This is the column you will need to predict.

In [None]:
from collections import Counter
print('Classes and number of values in the dataset', Counter(data['death_event']))

Extract the label column death_event from the data DataFrame and assign the result to a variable called `y`

In [None]:
y = data.iloc[:, -1]
#or y = data["death_event"]

Extract the features columns `['age','anaemia','creatinine_phosphokinase','diabetes','ejection_fraction','high_blood_pressure','platelets','serum_creatinine','serum_sodium','sex','smoking','time']` from the DataFrame instance data and assign the result to a variable called `x`

In [None]:
x = data[['age','anaemia','creatinine_phosphokinase','diabetes','ejection_fraction','high_blood_pressure','platelets','serum_creatinine','serum_sodium','sex','smoking','time']]

## Data preprocessing

Use the `pandas.get_dummies()` function to convert the categorical features in the DataFrame instance `x` to one-hot encoding vectors and assign the result back to variable `x`.

In [None]:
x = pd.get_dummies(x)

Use the `sklearn.model_selection.train_test_split()` method to split the data into training features, test features, training labels, and test labels, respectively. To the `test_size` parameter assign the percentage of data to put in the test data, and use value for the `random_state` parameter. Store the results of the function to `X_train`, `X_test`, `Y_train`, `Y_test` variables.

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

Initialize a ColumnTransformer object by using StandardScaler to scale the numeric features in the dataset: `['age','creatinine_phosphokinase','ejection_fraction','platelets','serum_creatinine','serum_sodium','time']`. Assign the resulting object to a variable called `ct`

In [None]:
from sklearn.preprocessing import Normalizer

ct = ColumnTransformer([("numeric", Normalizer(), ['age','creatinine_phosphokinase','ejection_fraction','platelets','serum_creatinine','serum_sodium','time'])])

Use the `ColumnTransformer.fit_transform()` function to train the scaler instance ct on the training data `X_train` and assign the result back to `X_train`.

In [None]:
X_train = ct.fit_transform(X_train)

Use the `ColumnTransformer.transform()` to scale the test data instance `X_test` using the trained scaler ct, and assign the result back to `X_test`.

In [None]:
X_test = ct.transform(X_test)

## Prepare labels for classification

Initialize an instance of `LabelEncoder` and assign it to a variable called `le`.

In [None]:
le = LabelEncoder()

Using the `LabelEncoder.fit_transform()` function, fit the encoder instance le to the training labels `Y_train`, while at the same time converting the training labels according to the trained encoder.

In [None]:
Y_train = le.fit_transform(Y_train.astype(str))

Using the `LabelEncoder.transform() function`, encode the test labels `Y_test` using the trained encoder le.

In [None]:
Y_test = le.fit_transform(Y_test.astype(str))

Using the `tensorflow.keras.utils.to_categorical()` function, transform the encoded training labels `Y_train` into a binary vector and assign the result back to `Y_train`.

In [None]:
from tensorflow.keras.utils import to_categorical
Y_train = to_categorical(Y_train)

Using the tensorflow.keras.utils.`to_categorical()` function, transform the encoded test labels `Y_test` into a binary vector and assign the result back to `Y_test`.

In [None]:
Y_test = to_categorical(Y_test)

##Design the model

Initialize a `tensorflow.keras.models.Sequential` model instance called model.

In [None]:
model = Sequential()

Create an input layer instance of `tensorflow.keras.layers.InputLayer` and add it to the model instance model using the `Model.add()` function.

In [None]:
model.add(InputLayer(input_shape=(X_train.shape[1],)))

Create a hidden layer instance of `tensorflow.keras.layers.Dense` with relu activation function and 12 hidden neurons, and add it to the model instance model.

In [None]:
model.add(Dense(12, activation='relu'))

Create an output layer instance of `tensorflow.keras.layers.Dense` with a softmax activation function (because of classification) with the number of neurons corresponding to the number of classes in the dataset.

In [None]:
model.add(Dense(2, activation='softmax'))

Using the `Model.compile()` function, compile the model instance model using the `categorical_crossentropy` loss, adam optimizer and accuracy as metrics.

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

## Train and evaluate the model

Using the `Model.fit()` function, fit the model instance model to the training data `X_train` and training labels `Y_train`. Set the number of epochs to 100 and the batch size parameter to 16.

In [None]:
model.fit(X_train, Y_train, epochs = 100, batch_size = 16, verbose=1)

Using the `Model.evaluate()` function, evaluate the trained model instance model on the test data `X_test` and test labels `Y_test`. Assign the result to a variable called loss (representing the final loss value) and a variable called acc (representing the accuracy metrics), respectively.

In [None]:
loss, acc = model.evaluate(X_test, Y_test, verbose=0)
print("Loss", loss, "Accuracy:", acc)

## Generating a classification report

Use the `Model.predict()` to get the predictions for the test data `X_test` with the trained model instance model. Assign the result to a variable called `y_estimate`.

In [None]:
y_estimate = model.predict(X_test, verbose=0)

Use the `numpy.argmax()` method to select the indices of the true classes for each label encoding in `y_estimate`. Assign the result to a variable called `y_estimate`.

In [None]:
y_estimate = np.argmax(y_estimate, axis=1)

Use the `numpy.argmax()` method to select the indices of the true classes for each label encoding in `Y_test`. Assign the result to a variable called `y_true`.

In [None]:
y_true = np.argmax(Y_test, axis=1)

Print additional metrics, such as F1-score, using the `sklearn.metrics.classification_report()` function by providing it with `y_true` and `y_estimate` vectors as input parameters.

In [None]:
print(classification_report(y_true, y_estimate))