# Kalp Hastalığı üzerine Makine Öğrenmesi

# İçindekiler

* Veri Seti: Heart Disease UCI
    * Kütüphaneler
    * Veri Okuma
    * Veri'ye İlk Bakış
    * Verinin İşlemlere Hazır Düzeye Getirilmesi
* Veriyi Görselleştirmek ve anlam çıkarmak
* Classification Modelleri
    * Logistic Regression
    * K-Nearest Neighbour (KNN)
    * Support Vector Machine (SVM) 
    * Naive Bayes
    * Decision Tree
    * Random Forest
    * Evualuation Classificution Models
* Classification Modellerinin Başarı Değerlerinin Karşılaştırılması ve Görselleştirilmesi


# Veri Seti : Heart Disease UCI

### Context
This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to
this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4.

Attribute Information:

age
sex
chest pain type (4 values)
resting blood pressure
serum cholestoral in mg/dl
fasting blood sugar > 120 mg/dl
resting electrocardiographic results (values 0,1,2)
maximum heart rate achieved
exercise induced angina
oldpeak = ST depression induced by exercise relative to rest
the slope of the peak exercise ST segment
number of major vessels (0-3) colored by flourosopy
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
The names and social security numbers of the patients were recently removed from the database, replaced with dummy values. One file has been "processed", that one containing the Cleveland database. All four unprocessed files also exist in this directory.

To see Test Costs (donated by Peter Turney), please see the folder "Costs"

### Acknowledgements
#### Creators:

Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.
Donor:
David W. Aha (aha '@' ics.uci.edu) (714) 856-8779

### Inspiration
Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).

See if you can find any other trends in heart data to predict certain cardiovascular events or find any clear indications of heart health.


https://archive.ics.uci.edu/ml/datasets/Heart+Disease

## Kütüphaneler

In [None]:
# Matematik

import numpy as np 
import pandas as pd 

# Görselleştirme

import matplotlib.pyplot as plt
import seaborn as sns

## Veri Okuma

In [None]:
data = pd.read_csv("../input/heart-disease-uci/heart.csv")

## Veri'ye ilk bakış

In [None]:
print("Veri'nin ilk 5 satırı")
data.head()

In [None]:
print("Veri'nin son 5 satırı")
data.tail()

In [None]:
print("Satır/Sütun") 
data.shape

In [None]:
print("Verimiz sütunlarında kaç farklı değer barındırıyor.")
data.nunique()

In [None]:
print("Veri için korelasyon bilgileri") 

# Görselleştirilmiş hai bizim için daha iyi ve anlaşılır olucaktır.
data.corr()

In [None]:
print("Veri hakkında temel bilgiler")
data.describe()

In [None]:
print("Veri hakkında bilgiler")
data.info()

## Verinin işlemlere hazır düzeye getirilmesi

In [None]:
print("Boş değerlerin tablo şeklinde gösterimi")
data.isnull()

In [None]:
print("Boş değerlerin sütun  bazlı toplanması")

data.isnull().sum(axis=0)

In [None]:
print("Sütunların isimlerinin incelenmesi")
data.columns

## Veriyi Görselleştirmek ve anlam çıkarmak

In [None]:
f, ax = plt.subplots(figsize=(10,10))

# annot = True : renklerin içinde sayılarda yazsın
# linewidths = 5 :aralardaki kırmızı çizginin boyutu
# linecolor = red : aralardaki kırmızı çizginin rengi
# ax = ax : belirlediğim değerleri koyucam

sns.heatmap(data.corr(), annot=True, linewidths=5, linecolor="red", fmt =".1f",ax=ax)
plt.show()

In [None]:
sns.barplot(x="cp",
           y="target",
           data=data,)

In [None]:
sns.catplot(x="cp",
           y="target",
           data=data,
           kind="violin")

In [None]:
sns.catplot(y="age",x="target",data=data,kind="violin")

## x ve y datalarını belirlemek

In [None]:
y= data.target.values

x_data = data.drop(["target"],axis=1)

## Normalizasyon

In [None]:
x = (x_data-np.min(x_data))/(np.max(x_data)-np.min(x_data)).values

In [None]:
x.head()

## Tren Test Split

In [None]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2, random_state=42)

# Classification Modelleri

## Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression

lr= LogisticRegression()

lr.fit(x_train,y_train)

print("Logistic Regression test başarısı {}".format(lr.score(x_test,y_test)))

## K-Nearest Neighbour (KNN)

In [None]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 3) # k

knn.fit(x_train,y_train)

In [None]:
predict = knn.predict(x_test)
predict

In [None]:
print("KNN test başarısı ({}) test başarısı: {} ".format(3,knn.score(x_test,y_test)))

In [None]:
# find k value
score_list = []
for each in range(1,15):
    knn2 = KNeighborsClassifier(n_neighbors = each)
    knn2.fit(x_train,y_train)
    score_list.append(knn2.score(x_test,y_test))
    
plt.figure(figsize=(20,10))
plt.plot(range(1,15),score_list)

plt.show()

In [None]:
# Belirlediğimiz k değerinden daha başarılı k değerleri varsa modelimizi o k değeri ile eğitebilirdik. Ama biz en başarılı olanı seçmişiz zaten.

## Support Vector Machine (SVM) Classification

In [None]:
from sklearn.svm import SVC

svm = SVC(random_state = 1)
svm.fit(x_train,y_train)

In [None]:
print("SVM nin test başarısı {}".format(svm.score(x_test,y_test)))

## Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()
nb.fit(x_train,y_train)

In [None]:
print("Naive Bayes test değerleri {}".format(nb.score(x_test,y_test)))

## Decision Tree Classification

In [None]:
from sklearn.tree import DecisionTreeClassifier

dt = DecisionTreeClassifier()
dt.fit(x_train,y_train)

print("Decision Tree test değerleri {}".format(dt.score(x_test,y_test)))

## Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier( n_estimators =100, random_state=1)
rf.fit(x_train,y_train)

print("Random Forest test değerleri {}".format(rf.score(x_test,y_test)))

## Evualuation Classificution Models

In [None]:
y_pred = rf.predict(x_test)
y_true = y_test

In [None]:
#%% confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true,y_pred)


In [None]:
# %% cm visualization
f, ax = plt.subplots(figsize =(5,5))
sns.heatmap(cm,annot = True,linewidths=0.5,linecolor="red",fmt = ".0f",ax=ax)
plt.xlabel("y_pred")
plt.ylabel("y_true")
plt.show()


# Classification Modellerinin Başarı Değerlerinin Karşılaştırılması ve Görselleştirilmesi


In [None]:
lr_score = lr.score(x_test,y_test)
knn_score = knn.score(x_test,y_test)
svm_score = svm.score(x_test,y_test)
nb_score = nb.score(x_test,y_test)
dt_score = dt.score(x_test,y_test)

models_data =["Lr","Knn","Svm","Nb","Dt"]
score_data = [lr_score,knn_score,svm_score,nb_score,dt_score]


In [None]:
score_data

In [None]:
df = pd.DataFrame({'Models': models_data,'Score':score_data})
df

In [None]:
sns.barplot(x= models_data, y= score_data,palette = sns.cubehelix_palette(len(x)))

In [None]:
# asceding : azalan sıralama
new_index = (df["Score"].sort_values(ascending=False)).index.values

# indexleri değiştir
sorted_data = df.reindex(new_index)

In [None]:
sorted_data

In [None]:
plt.figure(figsize=(10,5)) 

sns.barplot(x="Models", y="Score",data=sorted_data)