# Heart Disease Diagnosis and Prediction

<img src="https://www.nosm.ca/wp-content/uploads/2018/06/HeartDoctor-Background.jpg" style="width:100%;height:300px;" />

# Introduction 

#### Dataset information 

This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4.


#### Attribute Information: 
> 1. age 
> 2. sex 
> 3. chest pain type (4 values) 
> 4. resting blood pressure 
> 5. serum cholestoral in mg/dl 
> 6. fasting blood sugar > 120 mg/dl
> 7. resting electrocardiographic results (values 0,1,2)
> 8. maximum heart rate achieved 
> 9. exercise induced angina 
> 10. oldpeak = ST depression induced by exercise relative to rest 
> 11. the slope of the peak exercise ST segment 
> 12. number of major vessels (0-3) colored by fluorosopy 
> 13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

# Contents

1. [Data Cleaning](#dc)
2. [Models](#md) <br>
    2.1 [Random Forest](#md)<br>
    2.2 [LightGBM](#lgm)<br>
    2.3 [XGBoost](#xgb)
3. [Top feature causing heart disease](#tf)
4. [Investigating strongly correlated features](#ic)

In [None]:
!pip install git+https://github.com/fastai/fastai@2e1ccb58121dc648751e2109fc0fbf6925aa8887 2>/dev/null 1>/dev/null

In [None]:
from fastai.imports import *
from fastai.structured import *
from pandas_summary import DataFrameSummary
from sklearn.ensemble import RandomForestClassifier
from IPython.display import display
from sklearn import metrics
from sklearn.model_selection import train_test_split
import numpy as np 
import pandas as pd

%load_ext autoreload
%autoreload 2
%matplotlib inline 
pd.options.mode.chained_assignment = None

# Data Cleaning <a class="anchor" id="dc"></a>

In [None]:
df = pd.read_csv("../input/heart.csv")

In [None]:
## from this kernel: https://www.kaggle.com/tentotheminus9/what-causes-heart-disease-explaining-the-model

df.columns = ['age', 'sex', 'chest_pain_type', 'resting_blood_pressure', 'cholesterol', 'fasting_blood_sugar', 'rest_ecg', 'max_heart_rate_achieved',
       'exercise_induced_angina', 'st_depression', 'st_slope', 'num_major_vessels', 'thalassemia', 'target']

df['sex'][df['sex'] == 0] = 'female'
df['sex'][df['sex'] == 1] = 'male'

df['chest_pain_type'][df['chest_pain_type'] == 1] = 'typical angina'
df['chest_pain_type'][df['chest_pain_type'] == 2] = 'atypical angina'
df['chest_pain_type'][df['chest_pain_type'] == 3] = 'non-anginal pain'
df['chest_pain_type'][df['chest_pain_type'] == 4] = 'asymptomatic'

df['fasting_blood_sugar'][df['fasting_blood_sugar'] == 0] = 'lower than 120mg/ml'
df['fasting_blood_sugar'][df['fasting_blood_sugar'] == 1] = 'greater than 120mg/ml'

df['rest_ecg'][df['rest_ecg'] == 0] = 'normal'
df['rest_ecg'][df['rest_ecg'] == 1] = 'ST-T wave abnormality'
df['rest_ecg'][df['rest_ecg'] == 2] = 'left ventricular hypertrophy'

df['exercise_induced_angina'][df['exercise_induced_angina'] == 0] = 'no'
df['exercise_induced_angina'][df['exercise_induced_angina'] == 1] = 'yes'

df['st_slope'][df['st_slope'] == 1] = 'upsloping'
df['st_slope'][df['st_slope'] == 2] = 'flat'
df['st_slope'][df['st_slope'] == 3] = 'downsloping'

df['thalassemia'][df['thalassemia'] == 1] = 'normal'
df['thalassemia'][df['thalassemia'] == 2] = 'fixed defect'
df['thalassemia'][df['thalassemia'] == 3] = 'reversable defect'

In [None]:
df.head()

In [None]:
def missing_data_ratio(df):
    all_data_na = (df.isnull().sum() / len(df)) * 100
    all_data_na = all_data_na.drop(all_data_na[all_data_na == 0].index).sort_values(ascending=False)[:30]
    missing_data = pd.DataFrame({'Missing Ratio' :all_data_na})
    return missing_data

In [None]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [None]:
import pandas_profiling

In [None]:
profile = pandas_profiling.ProfileReport(df)

#### quick exploration with pandas profiling 

In [None]:
missing_data_ratio(df)

In [None]:
profile

In [None]:
df.columns

In [None]:
df.chest_pain_type = df.chest_pain_type.astype("category")
df.exercise_induced_angina = df.exercise_induced_angina.astype("category")
df.fasting_blood_sugar = df.fasting_blood_sugar.astype("category")
df.rest_ecg = df.rest_ecg.astype("category")
df.sex = df.sex.astype("category")
df.st_slope = df.st_slope.astype("category")
df.thalassemia = df.thalassemia.astype("category")

In [None]:
df = pd.get_dummies(df, drop_first=True)

In [None]:
df_p,y,_=proc_df(df,"target")

In [None]:
df_p.head()

In [None]:
import seaborn as sns
corr = df_p.corr()
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
f, ax = plt.subplots(figsize=(11, 9))

sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values,mask=mask,cmap='gist_rainbow',vmax=.8,center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
corr_matrix = df_p.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
to_drop = [column for column in upper.columns if any(upper[column] > 0.85)]

print("vatiables to drop {}".format(to_drop))

In [None]:
plt.subplot(121)
ax = sns.countplot(df_p["st_slope_upsloping"])


plt.subplot(122)
ax = sns.countplot(df_p["st_slope_flat"])




In [None]:
plt.subplot(121)
ax = sns.countplot(df_p["thalassemia_reversable defect"])


plt.subplot(122)
ax = sns.countplot(df_p["thalassemia_fixed defect"])


In [None]:
df_p.columns

In [None]:
df_p.drop(to_drop,axis=1,inplace=True)

In [None]:
df_p.columns

In [None]:
corr = df_p.corr()
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
f, ax = plt.subplots(figsize=(11, 9))

sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values,mask=mask,cmap='gist_rainbow',vmax=1,center=0,vmin=-1,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
from scipy.cluster import hierarchy as hc

In [None]:
def hierarchy_tree(df):
    corr = np.round(scipy.stats.spearmanr(df).correlation, 4)
    corr_condensed = hc.distance.squareform(1-df.corr())
    z = hc.linkage(corr_condensed, method='average')
    fig = plt.figure(figsize=(16,10))
    dendrogram = hc.dendrogram(z, labels=df.columns, orientation='left', leaf_font_size=16)
    plt.show()

In [None]:
hierarchy_tree(df_p)

##### after removing the correlated values the data is ready to try some mchine learning model on it 

# Models <a class="anchor" id="md"></a>

#### Random Forest  <a class="anchor" id="rd"></a>

In [None]:
from sklearn.model_selection import RandomizedSearchCV

In [None]:
rf_param_grid = {
                 'max_depth' : [4, 6, 8,10],
                 'n_estimators': range(1,30),
                 'max_features': ['sqrt', 'auto', 'log2'],
                 'min_samples_split': [2, 3, 10,20],
                 'min_samples_leaf': [1, 3, 10,18],
                 'bootstrap': [True, False],
                 
                 }

In [None]:
m = RandomForestClassifier()

In [None]:
m_r = RandomizedSearchCV(param_distributions=rf_param_grid, 
                                    estimator = m, scoring = "accuracy", 
                                    verbose = 0, n_iter = 200, cv = 5)

In [None]:
%time m_r.fit(df_p, y)

In [None]:
m_r.best_score_

In [None]:
m_r.best_params_

In [None]:
rf_bp = m_r.best_params_

In [None]:
rf_classifier=RandomForestClassifier(n_estimators=rf_bp["n_estimators"],
                                     min_samples_split=rf_bp['min_samples_split'],
                                     min_samples_leaf=rf_bp['min_samples_leaf'],
                                     max_features=rf_bp['max_features'],
                                     max_depth=rf_bp['max_depth'],
                                     bootstrap=rf_bp['bootstrap'])

In [None]:
rf_classifier.fit(df_p,y)

#### LightGBM  <a class="anchor" id="lgm"></a>

In [None]:
import lightgbm as lgbm

In [None]:
lgbm_model = lgbm.LGBMClassifier()

In [None]:
lgbm_params = {
    "n_estimators":[10,100,1000,2000],
    'boosting_type': ['dart','gbdt'],          
    'learning_rate': [0.05,0.1,0.2],       
    'min_split_gain': [0.0,0.1,0.5,0.7],     
    'min_child_weight': [0.001,0.003,0.01],     
    'num_leaves': [10,21,41,61],            
    'min_child_samples': [10,20,30,60,100]
              }

In [None]:
lgbm_model = lgbm.LGBMClassifier()

In [None]:
lgbm_c = RandomizedSearchCV(param_distributions=lgbm_params, 
                                    estimator = lgbm_model, scoring = "accuracy", 
                                    verbose = 0, n_iter = 200, cv = 4)

In [None]:
%time lgbm_c.fit(df_p,y)

In [None]:
lgbm_bp =lgbm_c.best_params_

In [None]:
lgbm_model = lgbm.LGBMClassifier(num_leaves=lgbm_bp["num_leaves"],
                                 n_estimators=lgbm_bp["n_estimators"],
                                 min_split_gain=lgbm_bp["min_split_gain"],
                                 min_child_weight=lgbm_bp["min_child_weight"],
                                 min_child_samples=lgbm_bp["min_child_samples"],
                                 learning_rate=lgbm_bp["learning_rate"],
                                 boosting_type=lgbm_bp["boosting_type"])

In [None]:
lgbm_model.fit(df_p,y)

#### Xgboost  <a class="anchor" id="xgb"></a>

In [None]:
import xgboost as xgb

In [None]:
xgb_classifier = xgb.XGBClassifier()

In [None]:
gbm_param_grid = {
    'n_estimators': range(1,20),
    'max_depth': range(1, 10),
    'learning_rate': [.1,.4, .45, .5, .55, .6],
    'colsample_bytree': [.6, .7, .8, .9, 1],
    'booster':["gbtree"],
     'min_child_weight': [0.001,0.003,0.01],
}

In [None]:
xgb_random = RandomizedSearchCV(param_distributions=gbm_param_grid, 
                                    estimator = xgb_classifier, scoring = "accuracy", 
                                    verbose = 0, n_iter = 200, cv = 5)

In [None]:
%time xgb_random.fit(df_p,y)

In [None]:
xgb_bp = xgb_random.best_params_

In [None]:
xgb_model=xgb.XGBClassifier(n_estimators=xgb_bp["n_estimators"],
                            min_child_weight=xgb_bp["min_child_weight"],
                            max_depth=xgb_bp["max_depth"],
                            learning_rate=xgb_bp["learning_rate"],
                            colsample_bytree=xgb_bp["colsample_bytree"],
                            booster=xgb_bp["booster"])

In [None]:
xgb_model.fit(df_p,y)

#### choosing the best model 

In [None]:
def feature_imp(df,model):
    fi = pd.DataFrame()
    fi["feature"] = df.columns
    fi["importance"] = model.feature_importances_
    return fi

In [None]:
from IPython.core.display import HTML

def multi_table(table_list):
    ''' Acceps a list of IpyTable objects and returns a table which contains each IpyTable in a cell
    '''
    return HTML(
        '<table><tr style="background-color:white;">' + 
        ''.join(['<td>' + table._repr_html_() + '</td>' for table in table_list]) +
        '</tr></table>'
    )

In [None]:
rf_fm = rf_feat_importance(rf_classifier,df_p)
lgbm_fm = feature_imp(df_p,lgbm_model)
xgb_fm = feature_imp(df_p,xgb_model)

In [None]:
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score

In [None]:
oof_rf = np.zeros(len(df_p))
oof_lgbm = np.zeros(len(df_p))
oof_xgb = np.zeros(len(df_p))

rf_fm  = []
lgbm_fm  = []
xgboost_fm  = []

rf_acc  = []
lgbm_acc  = []
xgboost_acc  = []

skf = StratifiedKFold(n_splits=5, random_state=42)
for train_index, test_index in skf.split(df_p, y):
        
        rf_classifier=RandomForestClassifier(n_estimators=rf_bp["n_estimators"],
                                     min_samples_split=rf_bp['min_samples_split'],
                                     min_samples_leaf=rf_bp['min_samples_leaf'],
                                     max_features=rf_bp['max_features'],
                                     max_depth=rf_bp['max_depth'],
                                     bootstrap=rf_bp['bootstrap'])
        
        
        
        rf_classifier.fit(df_p.loc[train_index],y[train_index])
        oof_rf[test_index] = rf_classifier.predict_proba(df_p.loc[test_index])[:,1]
        
        rf_acc.append(accuracy_score(y[test_index],rf_classifier.predict(df_p.loc[test_index])))
        
        rf_fm.append(feature_imp(df_p,rf_classifier))
        
        
        xgb_model=xgb.XGBClassifier(n_estimators=xgb_bp["n_estimators"],
                            min_child_weight=xgb_bp["min_child_weight"],
                            max_depth=xgb_bp["max_depth"],
                            learning_rate=xgb_bp["learning_rate"],
                            colsample_bytree=xgb_bp["colsample_bytree"],
                            booster=xgb_bp["booster"])
        
        xgb_model.fit(df_p.loc[train_index],y[train_index])
        oof_xgb[test_index] = xgb_model.predict_proba(df_p.loc[test_index])[:,1]

        
        xgboost_acc.append(accuracy_score(y[test_index],xgb_model.predict(df_p.loc[test_index])))

        xgboost_fm.append(feature_imp(df_p,xgb_model))
    
        
        
        lgbm_model = lgbm.LGBMClassifier(num_leaves=lgbm_bp["num_leaves"],
                                 n_estimators=lgbm_bp["n_estimators"],
                                 min_split_gain=lgbm_bp["min_split_gain"],
                                 min_child_weight=lgbm_bp["min_child_weight"],
                                 min_child_samples=lgbm_bp["min_child_samples"],
                                 learning_rate=lgbm_bp["learning_rate"],
                                 boosting_type=lgbm_bp["boosting_type"])
        
        lgbm_model.fit(df_p.loc[train_index],y[train_index])
        oof_lgbm[test_index] = lgbm_model.predict_proba(df_p.loc[test_index])[:,1]
        
        lgbm_acc.append(accuracy_score(y[test_index],lgbm_model.predict(df_p.loc[test_index])))

        
        lgbm_fm.append(feature_imp(df_p,lgbm_model))
        

In [None]:
print("random forest accuracy :{} lgbm accuracy : {} xgboost accuracy : {}".format(np.mean(rf_acc),np.mean(lgbm_acc),np.mean(xgboost_acc)))

In [None]:
def roc_ac(oof):
    return roc_auc_score(y,oof)

In [None]:
print("random forest ROC :{} lgbm ROC : {} xgboost ROC : {}".format(roc_ac(oof_rf),roc_ac(oof_lgbm),roc_ac(oof_xgb)))

In [None]:
from sklearn.metrics import classification_report

In [None]:
print("#"*10+" random Forest "+"#"*10)

In [None]:
print(classification_report(y, (oof_rf>=0.5)*1))

In [None]:
print("#"*10+" xgboost Forest "+"#"*10)

In [None]:
print(classification_report(y, (oof_xgb>=0.5)*1))

In [None]:
print("#"*10+" lgbm Forest "+"#"*10)

In [None]:
print(classification_report(y, (oof_lgbm>=0.5)*1))

In [None]:
from sklearn.metrics import roc_curve

In [None]:
fpr, tpr, thresholds = roc_curve(y, oof_xgb)

fig, ax = plt.subplots()
ax.plot(fpr, tpr)
ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls="--", c=".3")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.rcParams['font.size'] = 12
plt.title('ROC curve XGBOOST')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)

In [None]:
fpr, tpr, thresholds = roc_curve(y, oof_lgbm)

fig, ax = plt.subplots()
ax.plot(fpr, tpr)
ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls="--", c=".3")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.rcParams['font.size'] = 12
plt.title('ROC curve LGBM')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)

In [None]:
fpr, tpr, thresholds = roc_curve(y, oof_rf)

fig, ax = plt.subplots()
ax.plot(fpr, tpr)
ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls="--", c=".3")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.rcParams['font.size'] = 12
plt.title('ROC curve RANDOM FOREST')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)

In [None]:
rf_feature_imp=rf_fm[0]
for i in range(1,5):
    rf_feature_imp.importance += rf_fm[i].importance
rf_feature_imp.importance = rf_feature_imp.importance / 5
rf_feature_imp = rf_feature_imp.sort_values(by="importance",ascending=False)

In [None]:
xgb_feature_imp=xgboost_fm[0]
for i in range(1,5):
    xgb_feature_imp.importance += xgboost_fm[i].importance
xgb_feature_imp.importance = xgb_feature_imp.importance / 5
xgb_feature_imp = xgb_feature_imp.sort_values(by="importance",ascending=False)

In [None]:
lgbm_feature_imp=lgbm_fm[0]
for i in range(1,5):
    lgbm_feature_imp.importance += lgbm_fm[i].importance
lgbm_feature_imp.importance = lgbm_feature_imp.importance / 5
lgbm_feature_imp = lgbm_feature_imp.sort_values(by="importance",ascending=False)

In [None]:
def scale(df,col):
    return (df[col] - df[col].min(axis=0)) / (df[col].max(axis=0) - df[col].min(axis=0))

In [None]:
xgb_feature_imp["imp_sclaed"]=scale(xgb_feature_imp,"importance")
lgbm_feature_imp["imp_sclaed"]=scale(lgbm_feature_imp,"importance")
rf_feature_imp["imp_sclaed"]=scale(rf_feature_imp,"importance")

In [None]:
xgb_feature_imp.columns

In [None]:
ax = xgb_feature_imp[xgb_feature_imp.imp_sclaed>0].plot('feature', 'imp_sclaed', 'barh', figsize=(12,7), legend=False)


for bar in ax.patches:
    bar.set_facecolor('#888888')


ax.patches[0].set_facecolor('#aa3333')
ax.patches[1].set_facecolor('#aa3333')
ax.patches[2].set_facecolor('#aa3333')

In [None]:
ax = lgbm_feature_imp[lgbm_feature_imp.imp_sclaed>0].plot('feature', 'imp_sclaed', 'barh', figsize=(12,7), legend=False)


for bar in ax.patches:
    bar.set_facecolor('#888888')


ax.patches[0].set_facecolor('#aa3333')
ax.patches[1].set_facecolor('#aa3333')
ax.patches[2].set_facecolor('#aa3333')

# Top features causing Heart disease  <a class="anchor" id="tf"></a>

In [None]:
ax = rf_feature_imp[rf_feature_imp.imp_sclaed>0].plot('feature', 'imp_sclaed', 'barh', figsize=(12,7), legend=False)


for bar in ax.patches:
    bar.set_facecolor('#888888')


ax.patches[0].set_facecolor('#aa3333')
ax.patches[1].set_facecolor('#aa3333')
ax.patches[2].set_facecolor('#aa3333')

#### The model are close to each other in accuracy and ROC auc so i've stacked the 3 models to get the best of the 3 in feature importance

In [None]:
combined_feature_imp = pd.DataFrame(columns=rf_feature_imp.columns)

In [None]:
combined_feature_imp=(rf_feature_imp.sort_index().drop("feature",axis=1) + xgb_feature_imp.sort_index().drop("feature",axis=1)+ lgbm_feature_imp.sort_index().drop("feature",axis=1))/3

In [None]:
combined_feature_imp["feature"] = rf_feature_imp.sort_index().feature

In [None]:
combined_feature_imp=combined_feature_imp.sort_values(by="imp_sclaed",ascending=False)

In [None]:
combined_feature_imp["imp_sclaed"]=scale(combined_feature_imp,"imp_sclaed")

In [None]:
ax = combined_feature_imp[combined_feature_imp.imp_sclaed>0].plot('feature', 'imp_sclaed', 'barh', figsize=(12,7), legend=False)


for bar in ax.patches:
    bar.set_facecolor('#888888')


ax.patches[0].set_facecolor('#aa3333')
ax.patches[1].set_facecolor('#aa3333')
ax.patches[2].set_facecolor('#aa3333')
ax.patches[3].set_facecolor('#aa3333')
ax.patches[4].set_facecolor('#aa3333')
plt.title("TOP 5 Most importante feature according to 3 ML models(XGB,LGBM,RF)")

### Combining the 3 models we can acheave a AUC score of 90%

In [None]:
print("Combined AUC :{}".format(roc_ac((oof_rf*0.4+oof_lgbm*0.2+oof_xgb*0.4)/3)))

# Investigating strongly correlated features <a class="anchor" id="ic"></a>

In [None]:
df_p["target"] = y

In [None]:
df_p.columns

In [None]:
#[df_p.st_slope_flat[df_p.st_slope_flat==1],df_p["thalassemia_fixed defect"][df_p["thalassemia_fixed defect"]==1],df_p["chest_pain_type_typical angina"][df_p["chest_pain_type_typical angina"]==1],df_p.target],max_heart_rate_achieved).style.background_gradient(cmap='summer_r')


In [None]:
top_hd_f= combined_feature_imp.feature[combined_feature_imp.imp_sclaed>0.5].values

In [None]:

for val in top_hd_f:
    plt.figure(figsize=(20,6))
    ax=sns.countplot(x=val,hue="target", data=df_p)
    plt.legend(labels=["dosen't Have HD",'Have HD'],bbox_to_anchor=(0.4,1))
    plt.show()
    

### What are the top four factors of heart diseas means  Major vessels / Thalassemia / st_depression / max heart rate achived  

#### num_major_vessels

After some reseach i found that num_major_vessels is the a result of a test called Coronary Artery to find how many major vesels are bloked in the heart,what happens when your heart's blood supply is blocked or interrupted by a build-up of fatty substances in the coronary arteries, <b>it seems logical the more major vessels is a good thing, and therefore will reduce the probability of heart disease.</b>

to learn more about the test click on the image bellow

<a href="https://www.youtube.com/watch?v=GhNT2G1fkJg"><img src="https://myheart.net/wp-content/uploads/2014/06/heart-blockage-featured_result.jpg" width="80%" hight ="20%" ></a>

#### ST_depression

<b>S-T Segment</b><br><br>
The ST segment is the flat, isoelectric section of the ECG between the end of the S wave (the J point) and the beginning of the T wave.
<ul>
<li>The ST Segment represents the interval between ventricular depolarization and repolarization.</li>
<li>The most important cause of ST segment abnormality (elevation or depression) is myocardial ischaemia or infarction.</li></ul>

![](https://litfl.com/wp-content/uploads/2018/10/ECG-waves-segments-and-intervals-LITFL-ECG-library-3.jpg)

#### Thalassemia 

β-Thalassemia is an inherited hemoglobin disorder resulting in chronic hemolytic anemia that typically requires life-long transfusion therapy. ... Heart disease is mainly expressed by a particular cardiomyopathy that progressively leads to heart failure and death

<img src="https://www.aktuelbilgiler.com/wp-content/uploads/2019/01/1-11.jpg">

#### max heart rate achived 

This factor is self explanatory the maximum heart rate achived 

#### Angina 

Angina is chest pain or discomfort caused when your heart muscle doesn't get enough oxygen-rich blood. It may feel like pressure or squeezing in your chest. The discomfort also can occur in your shoulders, arms, neck, jaw, or back. Angina pain may even feel like indigestion. But, angina is not a disease.

In [None]:
?plt.xticks()

In [None]:
df_temp=df_p[ (df_p["st_depression"] <1.0) & (df_p.num_major_vessels == 0)]
plt.figure(figsize=(10,8))
ax=sns.countplot(x="target", data=df_temp)
total = float(len(df_temp))
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height/2,
            '{:1.2f}%'.format((height/total)*100),
            ha="center",color="white",size=20) 
    
    
plt.xticks([0,1],["dosen't Have HD",'Have HD'])

plt.show()



In [None]:
df_temp=df_p[ (df_p["st_depression"] <1.0) & (df_p.num_major_vessels == 0) & (df_p.max_heart_rate_achieved>150) & (df_p["thalassemia_fixed defect"] == 1)]
plt.figure(figsize=(10,8))
ax=sns.countplot(x="target", data=df_temp)
total = float(len(df_temp))
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height/2,
            '{:1.2f}%'.format((height/total)*100),
            ha="center",color="white",size=20) 
    
    
plt.xticks([0,1],["dosen't Have HD",'Have HD'])

plt.show()



In [None]:
df_p.columns

In [None]:
df_temp=df_p[ (df_p.st_slope_flat==1) & (df_p["thalassemia_fixed defect"] == 1) & (df_p["chest_pain_type_typical angina"] ==1)  ]
plt.figure(figsize=(10,8))
ax=sns.countplot(x="target", data=df_temp)
total = float(len(df_temp))
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height/2,
            '{:1.2f}%'.format((height/total)*100),
            ha="center",color="white",size=20) 
    
    
plt.xticks([0,1],["dosen't Have HD",'Have HD'])

plt.show()



####  number of major factors after the Coronary Artery and aspects of ECG and Coronary Artery results dominated the result, those 2 can predict up to 85% of heart disease acording to the dataset we have 

## Thank you for reading ( ͡ᵔ ͜ʖ ͡ᵔ )