## Heart Disease Inference

In this notebook, we will try to look at just the inference part of the heart disease classification solution

The process of applying the same transformations like data pre-processing, feature engineering etc. which was applied during the training process, and then applying the trained ML model to generate predictions is termed as the inference process.

### Import Modules

In [1]:
import pandas as pd
import numpy as np

### Get Inference Data

In [2]:
# in real-time use cases, this code should be replaced with live flowing data

data = pd.read_csv("../../2_Introduction_to_Inference_Process/inference_heart_disease.csv") # Live connection to the database
data.drop_duplicates(subset=None, inplace=True)
data.duplicated().any()
inference_df = data.copy()
inference_data, labels = inference_df[inference_df.columns.drop('target')], inference_df['target']

In [3]:
inference_data.columns

Index(['age', 'sex', 'chest_pain_type', 'resting_bp', 'cholestoral',
       'fasting_blood_sugar', 'restecg', 'max_hr', 'exang', 'oldpeak', 'slope',
       'num_major_vessels', 'thal'],
      dtype='object')

In [4]:
inference_data.head()

Unnamed: 0,age,sex,chest_pain_type,resting_bp,cholestoral,fasting_blood_sugar,restecg,max_hr,exang,oldpeak,slope,num_major_vessels,thal
0,66,1,0,120,302,0,0,151,0,0.4,1,0,2
1,52,1,0,112,230,0,1,160,0,0.0,2,1,2
2,63,0,1,140,195,0,1,179,0,0.0,2,2,2
3,46,1,2,150,231,0,1,147,0,3.6,1,0,2
4,63,1,0,130,254,0,0,147,0,1.4,1,1,3


### Apply Same Pre-processing

In [5]:
features_to_encode = ['thal', 'slope', 'chest_pain_type', 'restecg']
encoded_df = pd.DataFrame(columns= ['age', 'sex', 'resting_bp', 'cholestoral', 'fasting_blood_sugar',
   'max_hr', 'exang', 'oldpeak', 'num_major_vessels', 'thal_0', 'thal_1',
   'thal_2', 'thal_3', 'slope_0', 'slope_1', 'slope_2',
   'chest_pain_type_0', 'chest_pain_type_1', 'chest_pain_type_2',
   'chest_pain_type_3', 'restecg_0', 'restecg_1', 'restecg_2'])
placeholder_df = pd.DataFrame()

# One-Hot Encoding using get_dummies for the specified categorical features
for f in features_to_encode:
    if(f in inference_data.columns):
        encoded = pd.get_dummies(inference_data[f])
        encoded = encoded.add_prefix(f + '_')
        placeholder_df = pd.concat([placeholder_df, encoded], axis=1)
    else:
        print('Feature not found')

# Implement these steps to prevent dimension mismatch during inference
for feature in encoded_df.columns:
    if feature in inference_data.columns:
        encoded_df[feature] = inference_data[feature]
    if feature in placeholder_df.columns:
        encoded_df[feature] = placeholder_df[feature]
# fill all null values
encoded_df.fillna(0, inplace=True)

from sklearn import preprocessing
# normalization
val = encoded_df.values 
min_max_normalizer = preprocessing.MinMaxScaler()
norm_val = min_max_normalizer.fit_transform(val)
df2 = pd.DataFrame(norm_val)

processed_inference_data = df2.copy()
processed_inference_data

  encoded_df.fillna(0, inplace=True)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,13,14,15,16,17,18,19,20,21,22
0,1.0,1.0,0.464286,0.814607,0.0,0.457447,0.0,0.111111,0.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
1,0.621622,1.0,0.321429,0.410112,0.0,0.553191,0.0,0.0,0.5,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
2,0.918919,0.0,0.821429,0.213483,0.0,0.755319,0.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
3,0.459459,1.0,1.0,0.41573,0.0,0.414894,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
4,0.918919,1.0,0.642857,0.544944,0.0,0.414894,0.0,0.388889,0.5,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
5,0.324324,1.0,0.642857,0.320225,0.0,0.638298,0.0,0.555556,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0
6,0.27027,0.0,0.0,0.235955,0.0,0.755319,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
7,0.513514,1.0,0.642857,0.55618,1.0,0.446809,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
8,0.756757,1.0,0.285714,1.0,0.0,0.37234,1.0,0.833333,0.5,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
9,0.486486,1.0,0.785714,0.561798,0.0,0.510638,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0


### Load Saved Model

In [6]:
import joblib
model = joblib.load('../aditya_model1_adaboost.joblib')
model

### Prediction on inference data

In [7]:
model.predict(processed_inference_data)

array([1., 1., 1., 1., 0., 1., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 1.,
       1., 1., 0.])

### Scoring check on prediction

In [8]:
from sklearn.metrics import accuracy_score
accuracy_score(labels, model.predict(processed_inference_data))

0.85