**Heart disease** describes a range of conditions that affect your heart. Diseases under the heart disease umbrella include blood vessel diseases, such as coronary artery disease; heart rhythm problems (arrhythmias); and heart defects you're born with (congenital heart defects), among others.
Facts related to Heart Disease:
* One in every four deaths in the U.S. is related to heart disease.
* Coronary heart disease, arrhythmia, and myocardial infarction are some examples of heart disease.
* Heart disease might be treated with medication or surgery.
* Quitting smoking and exercising regularly can help prevent heart disease.

There are many types of heart disease that affect different parts of the organ and occur in different ways.

1. Congenital heart disease

This is a general term for some deformities of the heart that have been present since birth. Examples include:

Septal defects: There is a hole between the two chambers of the heart.
Obstruction defects: The flow of blood through various chambers of the heart is partially or totally blocked.
Cyanotic heart disease: A defect in the heart causes a shortage of oxygen around the body.

2. Arrhythmia

Arrhythmia is an irregular heartbeat.There are several ways in which a heartbeat can lose its regular rhythm. These include:
tachycardia, when the heart beats too fast
bradycardia, when the heart beats too slowly
premature ventricular contractions, or additional, abnormal beats
fibrillation, when the heartbeat is irregular
Arrhythmias occur when the electrical impulses in the heart that coordinate the heartbeat do not work properly. These make the heart beat in a way it should not, whether that be too fast, too slowly, or too erratically.
Irregular heartbeats are common, and all people experience them. They feel like a fluttering or a racing heart. However, when they change too much or occur because of a damaged or weak heart, they need to be taken more seriously and treated.

3. Coronary artery disease

The coronary arteries supply the heart muscle with nutrients and oxygen by circulating blood.

Coronary arteries can become diseased or damaged, usually because of plaque deposits that contain cholesterol. Plaque buildup narrows the coronary arteries, and this causes the heart to receive less oxygen and nutrients.

4. Dilated cardiomyopathy

The heart chambers become dilated as a result of heart muscle weakness and cannot pump blood properly. The most common reason is that not enough oxygen reaches the heart muscle, due to coronary artery disease. This usually affects the left ventricle.

5. Myocardial infarction

This is also known as a heart attack, cardiac infarction, and coronary thrombosis. An interrupted blood flow damages or destroys part of the heart muscle. This is usually caused by a blood clot that develops in one of the coronary arteries and can also occur if an artery suddenly narrows or spasms.

6. Heart failure

Also known as congestive heart failure, heart failure occurs when the heart does not pump blood around the body efficiently.

The left or right side of the heart might be affected. Rarely, both sides are. Coronary artery disease or high blood pressure can, over time, leave the heart too stiff or weak to fill and pump properly.

7. Hypertrophic cardiomyopathy

This is a genetic disorder in which the wall of the left ventricle thickens, making it harder for blood to be pumped out of the heart. This is the leading cause of sudden death in athletes. A parent with hypertrophic cardiomyopathy has a 50 percent chance of passing the disorder on to their children.

8. Mitral regurgitation

Also known as mitral valve regurgitation, mitral insufficiency, or mitral incompetence, this occurs when the mitral valve in the heart does not close tightly enough. This allows blood to flow back into the heart when it should leave. As a result, blood cannot move through the heart or the body efficiently.

People with this type of heart condition often feel tired and out of breath.

9. Mitral valve prolapse

The valve between the left atrium and left ventricle does not fully close, it bulges upwards, or back into the atrium. In most people, the condition is not life-threatening, and no treatment is required. Some people, especially if the condition is marked by mitral regurgitation, may require treatment.

10. Pulmonary stenosis

It becomes hard for the heart to pump blood from the right ventricle into the pulmonary artery because the pulmonary valve is too tight. The right ventricle has to work harder to overcome the obstruction. An infant with severe stenosis can turn blue. Older children will generally have no symptoms.

Treatment is needed if the pressure in the right ventricle is too high, and a balloon valvuloplasty or open-heart surgery may be performed to clear an obstruction.

Let's get our hands dirty and implement a binary classfier that classfies patients with high risk of getting a heart attack. 

# Importing Necesary Libraries

In [None]:
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import pickle
import os
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix,classification_report
from sklearn.model_selection import GridSearchCV
import pickle

In [None]:
data = pd.read_csv('../input/heart-disease-uci/heart.csv')
data.head()

# Exploratory Data Analysis [EDA]

In [None]:
def data_info(data):
    print('1) Number of columns are : ',data.shape[1])
    print('2) Number of rows are : ',data.shape[0])
    print('3) Total number of data-points :',data.size)
    numerical_features = [f for f in data.columns if data[f].dtypes!='O']
    print('4) Count of Numerical Features :',len(numerical_features))
    cat_features = [c for c in data.columns if data[c].dtypes=='O']
    print('5) Count of Categorical Features :',len(cat_features))
data_info(data)

In [None]:
# Missing value identification
data.isna().sum()

In [None]:
sns.set()
sns.scatterplot(x="age",y="cp",hue="target",data=data)

In [None]:
x=data['target'].value_counts()
print(x)
sns.countplot(x="target",data=data)

In [None]:
corrmap = data.corr()
top=corrmap.index
plt.figure(figsize=(30,20))
g=sns.heatmap(data[top].corr(),annot=True,cmap="RdYlGn")

# Target feature splitting 

In [None]:
# split the dataset into dependent('target') and independent features
# Feature, 'target' is the target variable or dependent feature in the data
x = data.drop('target', axis=1)
y = data['target']
x.shape, y.shape

# Standard Scaling

In [None]:
# Standard-Scaling the independent features
sc = StandardScaler()
X = sc.fit_transform(x)
# Using 80% of data for training and 20%, which is 1/5 of the total data for testing 
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.2)
x_train.shape,x_test.shape,y_train.shape,y_test.shape

# Model Building
Building a Logistic Regression model for regression task 

In [None]:
# Logistic Regression after trying different techniques and algorithms. 
model = LogisticRegression()
model.fit(x_train,y_train)
pred = model.predict(x_test)

# Evaluating the model

In [None]:
from sklearn.model_selection import cross_val_score
CV_Result = cross_val_score(model,x_train,y_train, cv=4, n_jobs=-1)
print(CV_Result)
print(CV_Result.mean())
print(CV_Result.std())

In [None]:
# Defining a function for printing some metrics
def metrics_info(x_test,pred):
    print('acccuracy score:', accuracy_score(y_test,pred))
    print('confusion matrix: ',confusion_matrix(y_test,pred))
    print('AUC-ROC Curve value :',roc_auc_score(y_test,pred))

metrics_info(x_test,pred)

# Hyper-parameter tuning Logistic Regression model

In [None]:
parameters = {'penalty':('l1', 'l2'), 'C':[1.0,10],'solver':('newton-cg','lbfgs','liblinear','sag','saga')}
clf = GridSearchCV(model,parameters)
clf.fit(x_train,y_train)

In [None]:
print('Best Penalty:', clf.best_estimator_.get_params()['penalty'])
print('Best C:', clf.best_estimator_.get_params()['C'])
print('Best Number Of Components:', clf.best_estimator_.get_params()['solver'])
print(clf.best_estimator_.get_params())

In [None]:
from sklearn.model_selection import cross_val_score
CV_Result = cross_val_score(clf,X,y, cv=4, n_jobs=-1)
print(CV_Result)
print(CV_Result.mean())
print(CV_Result.std())

# Building a sklearn data pipeline

In [None]:
inp = [('scale',StandardScaler()),('model',LogisticRegression(C= 1.0,penalty= 'l1',solver= 'saga',tol=0.0001))]
pipe = Pipeline(inp)
pipe.fit(x,y)

# Model Testing

In [None]:
input = [59,1,0,135,234,0,1,161,0,0.5,1,0,3]    
pred = pipe.predict_proba([input])[0][1]
pred = round(pred*100,2)
print('The person has a rough ' + str(pred) + ' %'+ ' chance of getting a heart disease.Health is wealth. Take Care!')

Converting trained files to saved models for future use, using pickle python library.

In [None]:
file=open('hd_model.pkl','wb')
pickle.dump(pipe,file)