### Data

This database contains 14 physical attributes based on physical testing of a patient. Blood samples are taken and the patient also conducts a brief exercise test. The "goal" field refers to the presence of heart disease in the patient. It is integer (0 for no presence, 1 for presence). In general, to confirm 100% if a patient has heart disease can be quite an invasive process, so if we can create a model that accurately predicts the likelihood of heart disease, we can help avoid expensive and invasive procedures.

**Content**

Attribute Information:

* age
* sex
* chest pain type (4 values)
* resting blood pressure
* serum cholestoral in mg/dl
* fasting blood sugar > 120 mg/dl
* resting electrocardiographic results (values 0,1,2)
* maximum heart rate achieved
* exercise induced angina
* oldpeak = ST depression induced by exercise relative to rest
* the slope of the peak exercise ST segment
* number of major vessels (0-3) colored by flourosopy
* thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
* target:0 for no presence of heart disease, 1 for presence of heart disease

Original Source: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

In [None]:
# import the required libraries 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# read the data 
df = pd.read_csv("../input/heart-disease-uci/heart.csv")

In [None]:
# display top 5 rows
df.head()

In [None]:
# main info 
df.info()

In [None]:
# summary statistics
df.describe()

In [None]:
# binary classification?
df['target'].unique()

### Exploratory Data Analysis and Visualization

In [None]:
plt.figure(figsize = (7, 3), dpi = 100)
sns.countplot(data=df, x='target')
plt.title("Count of each class in the target variable")
plt.show()

In [None]:
# scatter and kde plots
sns.pairplot(df[['age','trestbps', 'chol','thalach','oldpeak','target']], hue='target')
plt.show()

In [None]:
# correlation heatmap
plt.figure(figsize=(14,10))
sns.heatmap(df.corr(),cmap='viridis',annot=True, vmin = -1, vmax = 1, fmt=".2")
plt.show()

### Building the model 

In [None]:
# Separate the features from the labels into 2 objects, X and y
X = df.drop('target',axis=1)
y = df['target']

In [None]:
# Perform a train test split on the data, with the test size of 10% and a random_state of 101
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)

In [None]:
#Creating a StandardScaler object to normalize the X train and test set feature data
scaler = StandardScaler()

scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)

In [None]:
#Logistic Regression Model
from sklearn.linear_model import LogisticRegressionCV

log_model = LogisticRegressionCV()
log_model.fit(scaled_X_train,y_train)

In [None]:
# report back the C parameter 
log_model.C_

In [None]:
# report back all the best parameters 
log_model.get_params()

In [None]:
# Creating a visualization of the coefficients by using a barplot of their values
coefs = pd.Series(index=X.columns,data=log_model.coef_[0])
coefs = coefs.sort_values()

plt.figure(figsize=(10,5), dpi = 100)
sns.barplot(x=coefs.index,y=coefs.values)
plt.title('Visualization of the model coefficients')
plt.show()

### Model Performance Evaluation

In [None]:
#import evaluation metrics 
from sklearn.metrics import confusion_matrix,classification_report,plot_confusion_matrix

In [None]:
#predictions
y_pred = log_model.predict(scaled_X_test)

#confusion matrix
confusion_matrix(y_test,y_pred)

In [None]:
#plot confusion matrix
plot_confusion_matrix(log_model,scaled_X_test,y_test);

In [None]:
#classification report
print(classification_report(y_test,y_pred))

In [None]:
#Performance Curves 
from sklearn.metrics import plot_precision_recall_curve,plot_roc_curve

plot_precision_recall_curve(log_model,scaled_X_test,y_test);

In [None]:
plot_roc_curve(log_model,scaled_X_test,y_test);