<a href="https://colab.research.google.com/github/shuvayan/Agile_Data_Code/blob/master/M1_NB_MiniProject_2_Structured_Data_Classification_SD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Programme in AI and MLOps
## A programme by IISc and TalentSprint
### Mini-Project Notebook: Structured Data Classification

## Problem Statement

To predict whether a patient has a heart disease.

## Learning Objectives

At the end of the experiment, you will be able to

* Understand the Cleveland Clinic Foundation Heart Disease dataset
* Pre-process this dataset using Keras layers: IntegerLookup, StringLookup & Normalization
* Understand and use Keras concatenate layer
* Build a neural network model and architecture using Keras functional api
* Predict an unseen data

## Introduction

This example demonstrates how to do structured data classification, starting from a raw
CSV file. Our data includes both numerical and categorical features. We will use Keras
preprocessing layers to normalize the numerical features and vectorize the categorical
ones.

Note that this example should be run with TensorFlow 2.5 or higher.

## Dataset

[Our dataset](https://archive.ics.uci.edu/ml/datasets/heart+Disease) is provided by the
Cleveland Clinic Foundation for Heart Disease.
It's a CSV file with 303 rows. Each row contains information about a patient (a
**sample**), and each column describes an attribute of the patient (a **feature**). We
use the features to predict whether a patient has a heart disease (**binary
classification**).

Here's the description of each feature:
<br><br>

Column| Description| Feature Type
------------|--------------------|----------------------
Age | Age in years | Numerical
Sex | (1 = male; 0 = female) | Categorical
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical
Trestbpd | Resting blood pressure (in mm Hg on admission) | Numerical
Chol | Serum cholesterol in mg/dl | Numerical
FBS | fasting blood sugar in 120 mg/dl (1 = true; 0 = false) | Categorical
RestECG | Resting electrocardiogram results (0, 1, 2) | Categorical
Thalach | Maximum heart rate achieved | Numerical
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical
Oldpeak | ST depression induced by exercise relative to rest | Numerical
Slope | Slope of the peak exercise ST segment | Numerical
CA | Number of major vessels (0-3) colored by fluoroscopy | Both numerical & categorical
Thal | 3 = normal; 6 = fixed defect; 7 = reversible defect | Categorical
Target | Diagnosis of heart disease (1 = true; 0 = false) | Target

<br><br>

In [2]:
#@title Download the data
!wget -qq https://cdn.iisc.talentsprint.com/AIandMLOps/Datasets/heart.csv
print("Data Downloaded Successfuly!!")

Data Downloaded Successfuly!!


## Grading = 10 Points

### Import Required Packages

In [3]:
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import IntegerLookup     # Used in Feature processing
from tensorflow.keras.layers import Normalization     # Used in Feature processing
from tensorflow.keras.layers import StringLookup      # Used in Feature processing

In [47]:
print(tf.__version__)

2.15.0


# Part A

## Load the data and create batches [2 Marks]

### Load data into a Pandas dataframe

Hint:: pd.read_csv

In [4]:
file_url = "/content/heart.csv"
## YOUR CODE HERE
heart_df = pd.read_csv(file_url)

In [7]:
heart_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    object 
 13  target    303 non-null    int64  
dtypes: float64(1), int64(12), object(1)
memory usage: 33.3+ KB


Check the shape of the dataset:

In [5]:
## YOUR CODE HERE
heart_df.shape

(303, 14)

In [6]:
heart_df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,reversible,0
3,37,1,3,130,250,0,0,187,0,3.5,3,0,normal,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,normal,0


NameError: name 'heart_df' is not defined

Check the preview of a few samples:

Hint:: head()

In [9]:
## YOUR CODE HERE
heart_df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,reversible,0
3,37,1,3,130,250,0,0,187,0,3.5,3,0,normal,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,normal,0


Draw some inference from the data. What does the target column indicate?

The last column, "target", indicates whether the patient has a heart disease (1) or not
(0).

### Split the data into a training and validation set

Hint:: Use .sample() method from Pandas.

Refer to link [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sample.html)

Official reference from pandas for drop method [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html)

In [10]:
## YOUR CODE HERE
train_data = heart_df.sample(frac=0.8, random_state=42)

# Select the rows that are not in the training set to create the test set
test_data = heart_df.drop(train_data.index)

# Optionally, reset the index for both train and test datasets
train_data.reset_index(drop=True, inplace=True)
test_data.reset_index(drop=True, inplace=True)

## YOUR CODE HERE

print(
    "Using %d samples for training and %d for validation"
    % (len(train_data), len(test_data))
)

Using 242 samples for training and 61 for validation


### Converting into tensorflow dataset & creating batches

Generate `tf.data.Dataset` objects for each dataframe:

Each `Dataset` yields a tuple `(input, target)` where `input` is a dictionary of features
and `target` is the value `0` or `1`:

Refer [here](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices) to create the tuple using tensorflow.

In [11]:
def dataframe_to_dataset(dataframe):
    ## YOUR CODE HERE
    dataframe = dataframe.copy()
    # Extract features and targets
    targets = dataframe.pop('target')
    features = dict(dataframe)
    # Convert targets to one-hot encoding
    targets = tf.one_hot(targets, depth=2)
    # Convert features and targets to tf.data.Dataset
    ds = tf.data.Dataset.from_tensor_slices((features, targets))
    return ds

train_ds = dataframe_to_dataset(train_data)
test_ds = dataframe_to_dataset(test_data)

In [12]:
# Example usage to iterate over the datasets
for input, target in train_ds.take(1):
    print("Input features:", input)
    print("Target:", target)

Input features: {'age': <tf.Tensor: shape=(), dtype=int64, numpy=49>, 'sex': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'cp': <tf.Tensor: shape=(), dtype=int64, numpy=3>, 'trestbps': <tf.Tensor: shape=(), dtype=int64, numpy=118>, 'chol': <tf.Tensor: shape=(), dtype=int64, numpy=149>, 'fbs': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'restecg': <tf.Tensor: shape=(), dtype=int64, numpy=2>, 'thalach': <tf.Tensor: shape=(), dtype=int64, numpy=126>, 'exang': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'oldpeak': <tf.Tensor: shape=(), dtype=float64, numpy=0.8>, 'slope': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'ca': <tf.Tensor: shape=(), dtype=int64, numpy=3>, 'thal': <tf.Tensor: shape=(), dtype=string, numpy=b'normal'>}
Target: tf.Tensor([1. 0.], shape=(2,), dtype=float32)


In [None]:
# Visualizing one datapoint from the formatted data
## YOUR CODE HERE

#### Create the batch of the datasets:

Refer [here](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch) to create a batch using tensorflow.

In [14]:
## YOUR CODE HERE  ## For train
# Define the batch size
BATCH_SIZE = 32

# Create batches from the train dataset
train_batched_dataset = train_ds.batch(BATCH_SIZE)

## YOUR CODE HERE  ## For val
test_batched_dataset = test_ds.batch(BATCH_SIZE)

for i in train_batched_dataset.as_numpy_iterator():
    print(i)
    break


({'age': array([49, 55, 54, 57, 66, 53, 53, 39, 50, 59, 56, 67, 44, 54, 70, 65, 56,
       67, 48, 54, 45, 41, 51, 50, 59, 60, 71, 65, 58, 46, 51, 52]), 'sex': array([1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 1, 1, 1, 1]), 'cp': array([3, 4, 3, 2, 4, 4, 4, 3, 3, 4, 2, 3, 4, 4, 3, 3, 4, 4, 4, 3, 4, 4,
       4, 3, 1, 4, 2, 1, 4, 2, 4, 2]), 'trestbps': array([118, 128, 110, 130, 120, 140, 142, 138, 140, 135, 120, 115, 110,
       110, 160, 160, 132, 120, 122, 150, 142, 110, 130, 120, 160, 150,
       160, 138, 114, 101, 140, 134]), 'chol': array([149, 205, 214, 236, 302, 203, 226, 220, 233, 234, 236, 564, 197,
       206, 269, 360, 184, 237, 222, 232, 309, 172, 305, 219, 273, 258,
       302, 282, 318, 197, 298, 201]), 'fbs': array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0]), 'restecg': array([2, 1, 0, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2,
       0, 0, 2, 2, 0,

## Feature preprocessing with Keras layers [3 Marks]

### Categorical Features Encoding

The following features are categorical features encoded as integers:

- `sex`
- `cp`
- `fbs`
- `restecg`
- `exang`
- `ca`

In [None]:
Sex

We will encode these features using **one-hot encoding**. We have two options
here:

 - Use `CategoryEncoding()`, which requires knowing the range of input values
 and will throw an error on input outside the range.
 - Use `IntegerLookup()` which will build a lookup table for inputs and **reserve
 an output index for unkown input values**.

For this example, we want a simple solution that will handle out of range inputs
at inference, so we will use `IntegerLookup()`.

We also have a categorical feature encoded as a string: `thal`. We will create an
index of all possible features and encode output using the `StringLookup()` layer.

Create a function `encode_categorical_feature`, this function takes four parameters.
1. Feature to be encoded.
2. Name of the feature in the dataset.
3. Dataset containing the feature.
4. A boolean value wether the feature is string or not.

**Refer :** StringLookup [here](https://www.tensorflow.org/api_docs/python/tf/keras/layers/StringLookup) and IntegerLookup [here](https://www.tensorflow.org/api_docs/python/tf/keras/layers/IntegerLookup).

In [15]:
def encode_categorical_feature(feature, name, dataset, is_string):
    if is_string:

    ### YOUR CODE HERE
    # Create a lookup layer which will turn strings into integer indices
        lookup_layer = StringLookup(output_mode= 'one_hot', num_oov_indices= 0)

    # Prepare a Dataset that only yields our feature
    ## YOUR CODE HERE
        feature_dataset = dataset.map(lambda x,y: x[feature])
        feature_dataset = np.array(list(feature_dataset.as_numpy_iterator()))

    # Learn the set of possible string values and assign them a fixed integer index
    ## YOUR CODE HERE
        lookup_layer.adapt(feature_dataset)


    # Turn the string input into integer indices
    ## YOUR CODE HERE
        encoded_feature = lookup_layer(feature_dataset)
    else:
        lookup_layer= IntegerLookup(output_mode = 'one_hot', num_oov_indices= 0)
        feature_dataset = dataset.map(lambda x,y: x[feature])
        feature_dataset = np.array(list(feature_dataset.as_numpy_iterator()))
        lookup_layer.adapt(feature_dataset)
        encoded_feature = lookup_layer(feature_dataset)


    return encoded_feature


tf.Tensor(
[[1. 0.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]

In [24]:
# Encode the 'sex' feature
result = encode_categorical_feature('sex','sex', train_ds, False)
print(result)

tf.Tensor(
[[1. 0.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]

In [17]:
# Next Steps

### Numerical features Normalization
The following feature are continuous numerical features:

- `age`
- `trestbps`
- `chol`
- `thalach`
- `oldpeak`
- `slope`

For each of these features, we will use a `Normalization()` layer to make sure the mean
of each feature is 0 and its standard deviation is 1.


- Define a function `encode_numerical_feature` to apply featurewise normalization to numerical features.


Refer Normalization [here](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization)

In [60]:
def encode_numerical_feature(feature, name, dataset):

    # Create a Normalization layer for our feature
    normalizer = tf.keras.layers.Normalization(axis=None)

    # Prepare a Dataset that only yields our feature
    feature_dataset = dataset.map(lambda x,y: x[feature])
    feature_dataset = np.array(list(feature_dataset.as_numpy_iterator()))

    # Learn the statistics of the data
    normalizer.adapt(feature_dataset)

    # Normalize the input feature
    encoded_feature = normalizer(feature_dataset)

    return encoded_feature

In [61]:
result = encode_numerical_feature('thalach','thalach', train_ds)
print(result)

tf.Tensor(
[-0.942273   -0.766764    0.46179914  1.1638352   0.15465835  0.33016738
 -1.6004318   0.1985356   0.6811854   0.5934309   1.3393443   0.54955363
  1.295467   -1.7320637  -1.5565546   0.15465835 -1.8636954  -3.355522
  1.6903622   0.7689399  -0.02085067  0.46179914 -0.24023694  0.46179914
 -0.98615026  0.41792187  0.6373082   1.1638352  -0.32799146  0.37404463
 -1.117782    0.46179914 -1.8636954   0.6811854   0.54955363  0.8566944
 -0.942273    0.5056764  -0.6790095   0.37404463 -0.72288674  0.9883262
 -0.06472792  0.37404463  0.6811854  -1.4249228  -0.02085067  0.6811854
  0.6373082  -0.06472792 -0.98615026  0.41792187  1.2077124   1.6903622
  0.81281716 -1.3810456  -0.6790095  -1.6881863  -0.8983958   0.37404463
 -0.15248244  0.5934309   1.1199579  -0.32799146 -0.63513225 -0.6790095
  0.06690384  0.2862901   0.54955363  0.94444895  0.15465835  0.54955363
 -1.0739048   1.4270988  -1.0739048   0.9005717   1.1638352  -0.24023694
 -0.942273    1.5148532   0.1985356  -0.2841141

In [62]:
result = encode_numerical_feature('trestbps','trestbps', train_ds)
print(result)

tf.Tensor(
[-0.7914619  -0.21817915 -1.2500881  -0.10352261 -0.6768054   0.46976015
  0.5844167   0.35510358  0.46976015  0.18311878 -0.6768054  -0.96344674
 -1.2500881  -1.2500881   1.6163256   1.6163256   0.01113395 -0.6768054
 -0.5621488   1.0430429   0.5844167  -1.2500881  -0.10352261 -0.6768054
  1.6163256   1.0430429   1.6163256   0.35510358 -1.020775   -1.7660426
  0.46976015  0.12579049  0.8137298  -0.390164    0.46976015 -1.3647447
 -1.2500881   2.1896083  -0.10352261  0.35510358 -0.390164   -0.6768054
  0.46976015 -0.3328357  -0.6768054  -0.10352261  1.0430429  -0.44749224
  0.46976015  2.1896083  -1.1354315   1.0430429  -1.3647447   0.46976015
  0.01113395  0.46976015 -1.2500881  -0.10352261  0.18311878 -1.8233708
 -0.6768054   1.501669   -0.6768054  -0.21817915  3.9094567  -1.1354315
 -0.10352261 -1.5367295   0.46976015  0.24044704  0.46976015 -0.6768054
 -1.2500881   0.46976015  0.46976015  0.01113395 -0.7914619  -1.2500881
 -1.2500881  -0.6768054   0.35510358  0.69907326 

In [63]:
result = encode_numerical_feature('oldpeak','oldpeak', train_ds)
print(result)

tf.Tensor(
[-0.2585769   0.73468673  0.40359885 -0.92075264 -0.5896648   1.6451783
 -0.92075264 -0.92075264 -0.42412084 -0.5068928  -0.2585769   0.40359885
 -0.92075264 -0.92075264  1.4796345  -0.2585769   0.8174586  -0.09303298
 -0.92075264  0.40359885 -0.92075264 -0.92075264  0.072511    0.40359885
 -0.92075264  1.2313185  -0.5896648   0.23805489  2.721214   -0.92075264
  2.5556698  -0.2585769   0.73468673 -0.75520873  0.072511   -0.92075264
  1.3968624  -0.75520873  1.0657747  -0.92075264  0.56914276 -0.92075264
  0.73468673 -0.92075264 -0.42412084  0.072511    2.0590382  -0.92075264
 -0.92075264  1.3968624   0.40359885  0.40359885 -0.42412084 -0.92075264
  0.072511   -0.01026099  0.072511    1.0657747   1.3968624  -0.8379807
 -0.2585769  -0.92075264 -0.92075264 -0.92075264  2.390126   -0.8379807
 -0.5068928  -0.42412084  2.0590382  -0.8379807   0.56914276  0.56914276
 -0.42412084 -0.92075264 -0.75520873 -0.92075264 -0.92075264  0.072511
  0.32082686  2.2245822  -0.92075264  1.89349

# Part B

## Building the model [4 Marks]

#### Need to instantiate a Keras tensor for all features
 Use keras Input() [method](https://keras.io/api/layers/core_layers/input/).
* Create a list of inputs to be fed to the model.
* This list consists of the features output from keras Input() method.
* These inputs include all the features.


In [None]:
# Categorical features encoded as integers
## YOUR CODE HERE

# Categorical feature encoded as string
## YOUR CODE HERE

# Numerical features
## YOUR CODE HERE

all_inputs = [sex, cp, fbs, restecg,exang,ca,thal,age,trestbps,chol,thalach,oldpeak,slope]
## These list of inputs objects are fed inside the model as inputs

### Encoding above features
Use  "encode_categorical_feature" & "encode_numerical_feature" function on respective features that we defined above.

In [None]:
# Integer categorical features
sex_encoded = ## YOUR CODE HERE
cp_encoded = ## YOUR CODE HERE
fbs_encoded = ## YOUR CODE HERE
restecg_encoded = ## YOUR CODE HERE
exang_encoded = ## YOUR CODE HERE
ca_encoded = ## YOUR CODE HERE

# String categorical features
thal_encoded = ## YOUR CODE HERE

# Numerical features
age_encoded = ## YOUR CODE HERE
trestbps_encoded = ## YOUR CODE HERE
chol_encoded = ## YOUR CODE HERE
thalach_encoded = ## YOUR CODE HERE
oldpeak_encoded = ## YOUR CODE HERE
slope_encoded = ## YOUR CODE HERE

##### Understanding the result of  encoder functions

In [None]:
sex_encoded = ## YOUR CODE HERE
sex_encoded

In [None]:
cp_encoded = ## YOUR CODE HERE
cp_encoded

### Using Functional API for building model
Build the three model architectures with 1, 2, and 3 hidden layers having different numbers of neurons. Train for each architecture and compare the train and the validation accuracy.

* These inputs will be passed to keras.model.
* Concatenate the encoded features using layers.concatenate()
* Add the Dense layers and compile the model.

In [None]:
all_inputs = [sex, cp, fbs, restecg,exang,ca,thal,age,trestbps,chol,thalach,oldpeak,slope]

all_features = layers.concatenate## YOUR CODE HERE

## In Functional API for creating a Neural Network model.
## Different layers are connected as layers.method(previous layer object)
## For eg: x = layers.Dense(32, activation="relu")(all_features), similarly
## layers.concatenate is connected to Input or Initialization of keras tensors.
## layers.concatenate([encoded_feature_object1, encoded_feature_object2, .....]), where
## encoded_feature_objects contains the return values from encode_categorical and encode_numerical features.

## YOUR CODE HERE     # Dense Layer
## YOUR CODE HERE     # Dropout
output = ## YOUR CODE HERE
model = keras.Model## YOUR CODE HERE
model.compile## YOUR CODE HERE

### Visualize the connectivity graph using `keras.utils.plot_model`:

In [None]:
# `rankdir='LR'` is to make the graph horizontal.
## YOUR CODE HERE

### Train the model

(Change the Colab notebook's runtime to GPU for faster training)

In [None]:
## YOUR CODE HERE

## Inference on new data [1 Mark]

To get a prediction for a new sample, you can simply call `model.predict()`. There are
just two things you need to do:

1. wrap scalars into a list so as to have a batch dimension (models only process batches
of data, not single samples)
2. Call `convert_to_tensor` on each feature

Note : The predicted output should be either 0  or 1 based on a threshold value of probability 0.5.

In [None]:
## YOUR CODE HERE

input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = model.predict(input_dict)

## YOUR CODE HERE