**Context**


This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4.

Attribute Information: 
 1. age 
 2. sex 
 3. chest pain type (4 values) 
 4. resting blood pressure 
 5. serum cholestoral in mg/dl 
 6. fasting blood sugar > 120 mg/dl
 7. resting electrocardiographic results (values 0,1,2)
 8. maximum heart rate achieved 
 9. exercise induced angina 
 10. oldpeak = ST depression induced by exercise relative to rest 
 11. the slope of the peak exercise ST segment 
 12. number of major vessels (0-3) colored by flourosopy 
 13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df=pd.read_csv("../input/heart.csv")

In [None]:
df.head(5)

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
f, ax = plt.subplots(1, 2, figsize = (15, 7))
f.suptitle("Heart disease?", fontsize = 18.)
_ = df.target.value_counts().plot.bar(ax = ax[0], rot = 0, color = (sns.color_palette()[0], sns.color_palette()[2])).set(xticklabels = ["No", "Yes"])
_ = df.target.value_counts().plot.pie(labels = ("No", "Yes"), autopct = "%.2f%%", label = "", fontsize = 13., ax = ax[1],\
colors = (sns.color_palette()[0], sns.color_palette()[2]), wedgeprops = {"linewidth": 1.5, "edgecolor": "#F7F7F7"}), ax[1].texts[1].set_color("#F7F7F7"), ax[1].texts[3].set_color("#F7F7F7")

In [None]:
fig, ax = plt.subplots(4,2, figsize=(16,16))
sns.distplot(df.age, bins = 20, ax=ax[0,0]) 
sns.distplot(df.oldpeak, bins = 20, ax=ax[0,1]) 
sns.distplot(df.trestbps, bins = 20, ax=ax[1,0]) 
sns.distplot(df.chol, bins = 20, ax=ax[1,1]) 
sns.distplot(df.ca, bins = 20, ax=ax[2,0])
sns.distplot(df.thal, bins = 20, ax=ax[2,1])
sns.distplot(df.thalach, bins = 20, ax=ax[3,0]) 
sns.distplot(df.slope, bins = 20, ax=ax[3,1]) 
plt.show()

**Correlation between features**

Variables within a dataset can be related for lots of reasons. It can be useful in data analysis and modeling to better understand the relationships between variables. The statistical relationship between two variables is referred to as their correlation.

A correlation could be positive, meaning both variables move in the same direction, or negative, meaning that when one variable’s value increases, the other variables’ values decrease. Correlation can also be neural or zero, meaning that the variables are unrelated

In [None]:
plt.figure(figsize=(16,12))
sns.heatmap(df.corr(),annot=True,cmap='YlGnBu',fmt='.2f',linewidths=2)

In [None]:
df.info()

**Violin Plots**

A violin plot is a method of plotting numeric data. It is similar to box plot with a rotated kernel density plot on each side. Violin plots are similar to box plots, except that they also show the probability density of the data at different values (in the simplest case this could be a histogram).

A violin plot is more informative than a plain box plot. In fact while a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The difference is particularly useful when the data distribution is multimodal (more than one peak). In this case a violin plot clearly shows the presence of different peaks, their position and relative amplitude. This information could not be represented with a simple box plot which only reports summary statistics. The inner part of a violin plot usually shows the mean (or median) and the interquartile range.

In [None]:
fig,ax = plt.subplots(nrows=4, ncols=2, figsize=(18,18))
plt.suptitle('Violin Plots',fontsize=24)
sns.violinplot(x="cp", data=df,ax=ax[0,0],palette='Set3')
sns.violinplot(x="trestbps", data=df,ax=ax[0,1],palette='Set3')
sns.violinplot (x ='chol', data=df, ax=ax[1,0], palette='Set3')
sns.violinplot(x='fbs', data=df, ax=ax[1,1],palette='Set3')
sns.violinplot(x='restecg', data=df, ax=ax[2,0], palette='Set3')
sns.violinplot(x='thalach', data=df, ax=ax[2,1],palette='Set3')
sns.violinplot(x='exang', data=df, ax=ax[3,0],palette='Set3')
sns.violinplot(x='age', data=df, ax=ax[3,1],palette='Set3')
plt.show()

# Predictive modelling 

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier

X = df.iloc[:, :-1]
y = df.iloc[:, -1]


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

**Logistic Regression**


Logistic regression is the appropriate regression analysis to conduct when the dependent variable is binary. Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

In [None]:
#Model
LR = LogisticRegression()

#fiting the model
LR.fit(X_train, y_train)

#prediction
y_pred = LR.predict(X_test)

#Accuracy
print("Accuracy ", LR.score(X_test, y_test)*100)

#Plot the confusion matrix
sns.set(font_scale=1.5)
cm = confusion_matrix(y_pred, y_test)
sns.heatmap(cm, annot=True, fmt='g')
plt.show()

**Decision Tree**

Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data.

In [None]:
#Model
DT = DecisionTreeClassifier()

#fiting the model
DT.fit(X_train, y_train)

#prediction
y_pred = DT.predict(X_test)

#Accuracy
print("Accuracy ", DT.score(X_test, y_test)*100)

#Plot the confusion matrix
sns.set(font_scale=1.5)
cm = confusion_matrix(y_pred, y_test)
sns.heatmap(cm, annot=True, fmt='g')
plt.show()

In [None]:
from sklearn import tree
import graphviz

In [None]:
#Plotting the graph
tree_graph = tree.export_graphviz(DT, out_file=None)
graphviz.Source(tree_graph)

# Partial dependency plot

While feature importance shows what variables most affect predictions, partial dependence plots show how a feature affects predictions.

In [None]:
feature_names = [i for i in df.columns if df[i].dtype in [np.int64]]

In [None]:
from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots

# Create the data that we will plot
pdp_goals = pdp.pdp_isolate(model=DT, dataset=df, model_features=feature_names, feature='age')

# plot it
pdp.pdp_plot(pdp_goals, 'age')
plt.show()

In [None]:
pdp_dist = pdp.pdp_isolate(model=DT, dataset=df, model_features=feature_names, feature='trestbps')

pdp.pdp_plot(pdp_dist, 'trestbps')
plt.show()

This graph seems too simple to represent reality. But that's because the model is so simple. You should be able to see from the decision tree above that this is representing exactly the model's structure.

# 2D Partial Dependence Plots

2D partial dependence plots are very useful to understand interaction between features

In [None]:
features_to_plot = ['age', 'trestbps']
inter1  =  pdp.pdp_interact(model=DT, dataset=df, model_features=feature_names, features=features_to_plot)

pdp.pdp_interact_plot(pdp_interact_out=inter1, feature_names=features_to_plot, plot_type='contour')
plt.show()

This graph shows predictions for any combination of age and resting blood pressure. For example, we see the highest predictions when age is around late 50's.

**Gradient Boosting**

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

In [None]:
#Model
model = GradientBoostingClassifier()

#fiting the model
model.fit(X_train, y_train)

#prediction
y_pred = model.predict(X_test)

#Accuracy
print("Accuracy ", model.score(X_test, y_test)*100)

#Plot the confusion matrix
sns.set(font_scale=1.5)
cm = confusion_matrix(y_pred, y_test)
sns.heatmap(cm, annot=True, fmt='g')
plt.show()

...stay tuned for updates