In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import tensorflow as tf
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split


Using the [Classify structured data with feature columns](https://www.tensorflow.org/tutorials/structured_data/feature_columns) Tutorial for this example. Alot of the code was copied but as you can see, anything i dont understand I look up

**This is going to be a simple structured data classification with Tensorflow. This is my first time using tensorflow (but I already have pre-existing knowledge of neural networks) So please give me feedback in the comments so i can become more versed in this cool library/tool!**

In [None]:
HeartAttackFile = '../input/heart-attack-analysis-prediction-dataset/heart.csv'
O2SaturationFile= '../input/heart-attack-analysis-prediction-dataset/o2Saturation.csv'

In [None]:
HeartAttackData = pd.read_csv(HeartAttackFile)
O2SaturationData = pd.read_csv(O2SaturationFile)

# EDA - Exploratory Data Analysis Stage

Here we will look at the data and labels and inspect which parts would be more detremental/useful. We will also clean the data up and split it for cross-validation once clean.

In [None]:
HeartAttackData.describe()

We check all of the data is present. We have only integer values aside from *oldpeak*

In [None]:
HeartAttackData.info()

In [None]:
HeartAttackData.hist(bins = 25, figsize=(20,20))

Looking at the label descriptions, the floating point value is vague to me. The oldpeak column just states "the previous peak" as its descrition, We will leave it out for now but test later on if it affects the performance. 

In [None]:
#HeartAttackData = HeartAttackData.drop(columns=['oldpeak'])
CatColumns = ['sex','exng','caa','cp','fbs','restecg']
NumColumns = ['age','trtbps','chol','thalachh']


In [None]:
train, test = train_test_split(HeartAttackData, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')


> tf.data is an api that enables complex input pipelines from data. It is designed to handle large amounts of data from different formats and performing complex transformations

In this case we are just taking the columns of the table, and inputting it into the neural network as different features/inputs

In [None]:
# A utility method to create a tf.data dataset from a Pandas Dataframe
#A tf dataset structure is produced, we can use this with the neural network. 
def df_to_dataset(dataframe,shuffle=True,batch_size=32):
    dataframe = dataframe.copy() #Dataframes need to be copied so original isnt affected via actions due to referencing
    labels = dataframe.pop('output')
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds
    
    
    

We want to use that method above to turn out training, testing and validation **dataframes**** into **datasets****!

In [None]:
batch_size = 32 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)


In [None]:
feature_columns = []
for header in NumColumns:
  feature_columns.append(feature_column.numeric_column(header))

for col_name in CatColumns:
  categorical_column = feature_column.categorical_column_with_vocabulary_list(
      col_name, HeartAttackData[col_name].unique())
  indicator_column = feature_column.indicator_column(categorical_column)
  feature_columns.append(indicator_column)


The above code embeds all of the categorical/numerical data into their seperate feature columns. We do this as the categorical data should be represented with whole integers, we dont want a 0.5 in any of the data. I could use the Bucketized cols when doing the age column, but i will pursue that development later.

DenseFeatures is what converts the feature columns into a tensor. Like how simple perceptrons can be stated as just "matrix multiplication" more complex neural networks use tensors like such. 

In [None]:
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)


We will now Create, Compile and train the model:

The following code has 3 major operations
* It creates the model, this takes a Keras.input object. You also specify the layers present in the neural network. The feature layer we just specified is the "input layer, The Layers.Dense(128... is a regularly densely connected (all outs to all inps, essentially fot product). The activation string is the activation function. We dont want just a linear function so we specify relu which is **Rectified Linear Unit**. Finally we will Dropout to reduce the chance of overfitting (large weights overfit more, so dropout just disables a portion of MP neurons randomly). Finally the last Dense layer is the output where we take all the values through a linear function.

* Model.compile() will configure the model for training. It takes the name of an optimizier, an objective function and a list of metrics to measure model performance with. 

* Model.fit() will then train the model for a number of epochs (which just amount of time spent/iterations). We validate the dataloss here also, ensuring that the metrics we wanted to measure by are kept.

In [None]:
model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dropout(.1),
  layers.Dense(1)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_ds,
          validation_data=val_ds,
          epochs=10)


In [None]:
loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)
model.save('HeartAttackClassifier')

As you can see, a model that is only 63 - 45% accurate is not good. I would like for this to be higher. Therefore I will perform some more refining of the Model later. For now though this was a good exercise for getting to grips with Tensorflow basics and the capabilities it has.

In [None]:
reloaded_model = tf.keras.models.load_model('HeartAttackClassifier')
