# INTRODUCTION
 
 In this kernel, I will explain both supervised and unsupervised machine leaarning algorithms.
 
 1. [Load and Understand Data](#1)
 1. [Supervised Learning](#2) 
     1. [k-NN (K Nearest Neighbour Algorithm)](#3)
     1. [Regression](#4)  
        1. [Linear Regression](#5)
        1. [Regularized Regression](#6)
        1. [Logistic Regression](#7)     
     1. [SVM - Support Vector Machine  ](#4)   
       

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

<a id = "1" ></a>
## Load and Understand Data

In [None]:
data = pd.read_csv('/kaggle/input/heart-disease-uci/heart.csv')
data.head()

In [None]:
data.info()

It has 303 entries and 14 attributes. The data type of the 13 attribute is integer and data type of 1 attribute is float.

In [None]:
data.target.unique()

'target' attribute is class of our data. It defines the a person is target of heart disease or not.

In [None]:
data[['target','age']].groupby('age', as_index = True).mean().sort_values('age', ascending = False)

In [None]:
data[['cp','target']].groupby('target', as_index = True).mean().sort_values('cp', ascending = False)

In [None]:
plt.scatter(data.age,  data.target, color = 'red')
plt.plot()

In [None]:
data.info()
data.head()

In [None]:
plt.scatter(data.trestbps, data.chol)
plt.show()

In [None]:
colors = ["green" if each == 1 else "red" for each in data.target]
pd.plotting.scatter_matrix(data.loc[:, data.columns != "target"],
                          c = colors,
                          figsize = (15,15),
                          diagonal = 'hist',
                          s = 200,
                          alpha = 0.5,
                          edgecolor = "black")
plt.show()

In [None]:
sns.countplot(data.target)
plt.show()

Data is balanced.

<a id = "2" ></a>
## Supervised Learning

<a id = "3" ></a>
### k-NN (K Nearest Neighbour Algorithm)

In [None]:
x_data = data.loc[:, data.columns != 'target']
y = data.iloc[:,13].values

In [None]:
#normalization of x 
x = (x_data - np.mean(x_data)) / (np.max(x_data) - np.min(x_data))

In [None]:
#train - test split
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state = 42)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(x_train, y_train)

In [None]:
knn.score(x_test, y_test)

With k = 3 we reach score 0.83 but can we reach a better score with a different k value ?

In [None]:
scores = []
train_accuracy = []
test_accuracy = []

for each in range(1, 20):
    knn2 = KNeighborsClassifier(n_neighbors = each)
    knn2.fit(x_train, y_train)
    
    train_accuracy.append(knn2.score(x_train, y_train))
    test_accuracy.append(knn2.score(x_test, y_test))
    
plt.figure(figsize = (5, 5))
plt.plot(range(1, 20), train_accuracy, color = 'blue', label = 'Training accuracy')
plt.plot(range(1, 20), test_accuracy, color = 'red', label = 'Testing accuracy')
plt.legend()
plt.xlabel('k values')
plt.ylabel('Scores')
plt.show()

print("Best accuracy is {} with K = {}".format(np.max(test_accuracy),1+test_accuracy.index(np.max(test_accuracy))))
    
    

<a id = "4" ></a>
### Regression

<a id = "5" ></a>
#### Linear Regression


    y = ax + b where y = target, x = feature and a = parameter of model
    We choose parameter of model(a) according to minimum error function that is lost function
    In linear regression we use Ordinary Least Square (OLS) as lost function.
    OLS: sum all residuals but some positive and negative residuals can cancel each other so we sum of square of residuals. It is called OLS
    Score: Score uses R^2 method that is ((y_pred - y_mean)^2 )/(y_actual - y_mean)^2


In [None]:
x = data.trestbps.values.reshape(-1, 1)
y = data.chol.values.reshape(-1, 1)
plt.scatter(data.trestbps, data.chol)
plt.show()

In [None]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x, y)

In [None]:
x_ = np.arange(min(x), max(x), 0.1).reshape(-1, 1)
predicted = reg.predict(x_)

plt.scatter(data.trestbps, data.chol)
plt.plot(x_, predicted, color = 'black')
plt.xlabel("trestbps")
plt.ylabel("chol")
plt.show()

print("R^2 score : {}".format(reg.score(x, y)))

R^2 score is very low, actually this dataset is not proper for linear regression.

Cross validation score

* K folds = K fold CV.
* When K is increase, computationally cost is increase
    cross_val_score(reg,x,y,cv=5): use reg(linear regression) with x and y that we define at above and K is 5. * It means 5 times(split, train,predict)


In [None]:
from sklearn.model_selection import cross_val_score
reg = LinearRegression()
k = 5
cv_res = cross_val_score(reg, x, y, cv = k)
print('CV Scores: ',cv_res)
print('CV scores average: ',np.sum(cv_res)/k)

<a id = "6" ></a>
#### Regularized Regression

In order to avoid overfitting, we use regularization that penalize large coefficients.

Ridge regression: First regularization technique. Also it is called L2 regularization.
       *  Ridge regression lost fuction = OLS + alpha * sum(parameter^2)
        alpha is parameter we need to choose to fit and predict. Picking alpha is similar to picking K in KNN. As you understand alpha is hyperparameter that we need to choose for best accuracy and model complexity. This process is called hyperparameter tuning.
        What if alpha is zero? lost function = OLS so that is linear rigression :)
        * If alpha is small that can cause overfitting
        * If alpha is big that can cause underfitting. But do not ask what is small and big. These can be change from problem to problem.
    
 Lasso regression: Second regularization technique. Also it is called L1 regularization.
        * Lasso regression lost fuction = OLS + alpha * sum(absolute_value(parameter))
        It can be used to select important features od the data. Because features whose values are not shrinked to zero, is chosen by lasso regression
        In order to choose feature, I add new features in our regression data




In [None]:
from sklearn.linear_model import Ridge

x_train,x_test,y_train,y_test = train_test_split(x, y, random_state = 2, test_size = 0.3)
ridge = Ridge(alpha = 0.1, normalize = True)
ridge.fit(x_train,y_train)
ridge_predict = ridge.predict(x_test)
print('Ridge score: ',ridge.score(x_test,y_test))

In [None]:
from sklearn.linear_model import Lasso

ls = Lasso(alpha = 0.1, normalize = True)
ls.fit(x_train,y_train)
ls.predict(x_test)
print('Lasso score: ',ls.score(x_test,y_test))
print('Lasso coefficients: ',ls.coef_)

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

x_data = data.loc[:, data.columns != 'target']
y = data.iloc[:,13].values
#normalization of x 
x = (x_data - np.mean(x_data)) / (np.max(x_data) - np.min(x_data))

x_train,x_test,y_train,y_test = train_test_split(x, y, test_size = 0.2, random_state = 1)

rf = RandomForestClassifier(random_state = 42, n_estimators = 100)
rf.fit(x_train,y_train)
y_pred = rf.predict(x_test)
y_true = y_test
cm = confusion_matrix(y_true, y_pred)
print('Confusion matrix: \n',cm)
print('Classification report: \n',classification_report(y_test,y_pred))

In [None]:
sns.heatmap(cm, annot = True, linewidth = 0.5, linecolor = 'red')
plt.show()

<a id = "7" ></a>
#### Logistic Regression


In [None]:
from sklearn.metrics import roc_curve
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
x_data = data.loc[:, data.columns != 'target']
y = data.iloc[:,13].values
#normalization of x 
x = (x_data - np.mean(x_data)) / (np.max(x_data) - np.min(x_data))

x_train,x_test,y_train,y_test = train_test_split(x, y, test_size = 0.2, random_state = 1)

In [None]:
logistic = LogisticRegression()
logistic.fit(x_train, y_train)
y_pred_prob = logistic.predict_proba(x_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr, tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.show()

<a id = "8" ></a>
#### SVM - Support Vector Machine


In [None]:
x_data = data.loc[:, data.columns != 'target']
y = data.iloc[:,13].values
#normalization of x 
x = (x_data - np.mean(x_data)) / (np.max(x_data) - np.min(x_data))

x_train,x_test,y_train,y_test = train_test_split(x, y, test_size = 0.2, random_state = 1)

In [None]:
from sklearn.svm import SVC

svm = SVC(random_state = 1)
svm.fit(x_train, y_train)

In [None]:
print("SVM accuracy: {}".format(svm.score(x_test, y_test)))