# Health care: Data set on Heart attack possibility

<img src = "https://www.verdict.co.uk/wp-content/uploads/2019/11/gene-therapy-heart-attacks-1440x1080.jpg" width=400><br>

This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
this date.The "target" field refers to the presence of heart disease in the patient. It is integer valued 0 = no/less chance of heart attack and 1 = more chance of heart attack.

In [None]:
!pip install autoviz

# AutoViz
<img src="https://raw.githubusercontent.com/AutoViML/AutoViz/master/logo.png" width=200> <br>

Automatically Visualize any dataset, any size with a single line of code.

AutoViz performs automatic visualization of any dataset with one line. Give any input file (CSV, txt or json) and AutoViz will visualize it.

*Link* : https://github.com/AutoViML/AutoViz

In [None]:
import numpy as np
import pandas as pd
from autoviz.AutoViz_Class import AutoViz_Class

In [None]:
data = pd.read_csv('../input/health-care-data-set-on-heart-attack-possibility/heart.csv')
data.head()

In [None]:
data.shape

In [None]:
dft = AutoViz_Class().AutoViz("../input/health-care-data-set-on-heart-attack-possibility/heart.csv", sep=",", depVar="target", dfte="pandasDF",verbose=1, lowess=False, chart_format="svg", max_rows_analyzed=350, max_cols_analyzed=15)

Thus, the AutoViz package enables you to get good generic visualizations on simple datasets without having to go through the struggle of coding each plot separately.
> **If there are any other such graphing libraries please do comment about them.**

# Classification Modelling

In [None]:
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import plot_confusion_matrix, classification_report

In [None]:
X_train, X_test , y_train, y_test = train_test_split(data.drop('target',axis=1),data.target,random_state = 32,test_size = 0.2)

In [None]:
rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True, bootstrap = True,) 

param_grid = { 
    'n_estimators': [70, 100,150],
    'max_features': ['auto', 'sqrt', 'log2'],
    'random_state' : [0,32,48],
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 6)

In [None]:
%time CV_rfc.fit(X_train,y_train)

In [None]:
%time preds = CV_rfc.predict(X_test)

In [None]:
plot_confusion_matrix(CV_rfc, X_test, y_test)

In [None]:
print(classification_report(y_test,preds))

Experimenting Without Bootstrapping

In [None]:
rfc2 = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True,) 

param_grid = { 
    'n_estimators': [70, 100,150],
    'max_features': ['auto', 'sqrt', 'log2'],
    'random_state' : [0,32,48],
}

CV_rfc2 = GridSearchCV(estimator=rfc2, param_grid=param_grid, cv= 6)

In [None]:
%time CV_rfc2.fit(X_train,y_train)

In [None]:
%time preds2 = CV_rfc2.predict(X_test)

In [None]:
plot_confusion_matrix(CV_rfc2, X_test, y_test)
print(classification_report(y_test,preds2))