### K-En yakin Komsu (K-Nearest Neighnors)
Gozlemlerin birbirine olan benzerlikleri uzerinden tahmin yapilir. Parametrik olmayan bir ogrenme turudur. Gozlemlerin birbirine olan benzerlileri uzerinden tahmin yapilir. Bagimsiz degisken degerleri verilen gozlem biriminin bagimli degisken degeri olan 'Y'sini tahmin etmek icin ilgili gozlem birimlerinin tablodaki diger gozlem birimleriyle olan benzerlikleri hesaplanacaktir. Bu benzerlikler uzerinden ilgili gozlem birimimize tablodaki gozlem birimlerindan hangisi en yakin ise ilgili gozlem biriminin bagimli degiskeni olan 'Y' tahmin edilmis olacaktir.
##### Hesaplama basamaklari:
* Komsu sayisi belirlenir (k)
* Bilinmeyen nokta ile diger tum noktalar ile arasindaki uzaklik hesaplanir.
* Uzakliklar siralanir ve belirlenen k sayisina gore en yakin olan k gozlem secilir.
* Siniflandirma ise en sik sinif, regresyon ise ortalama degeri tahmin degeri olarak verilir.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as pltt
from sklearn.preprocessing import scale, StandardScaler
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn import neighbors
from sklearn.svm import SVR
from warnings import filterwarnings
filterwarnings('ignore')

In [None]:
df = pd.read_csv('../input/hitters-baseball-data/Hitters.csv')
df = df.dropna()
dms = pd.get_dummies(df[['League','Division','NewLeague']])
y = df['Salary']
X_ = df.drop(['Salary','League','Division','NewLeague'],axis = 1).astype('float64')
X = pd.concat([X_, dms[['League_N','Division_W','NewLeague_N']]],axis = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

#### Model ve Tahmin

In [None]:
knn_model = KNeighborsRegressor().fit(X_train,y_train)

In [None]:
knn_model.n_neighbors

In [None]:
knn_model.metric

In [None]:
dir(knn_model)

In [None]:
knn_model.predict(X_test)[0:5]

In [None]:
y_pred = knn_model.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

#### Model Tuning

In [None]:
RMSE = []
for k in range(1,11):
    knn_model = KNeighborsRegressor(n_neighbors = k).fit(X_train,y_train)
    y_pred = knn_model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test,y_pred))
    RMSE.append(rmse)
    print('k: ',k, 'icin RMSE degeri: ',rmse)

In [None]:
#GridSearcCV optimum parametre bulmak icin.

In [None]:
knn_params = {'n_neighbors': np.arange(1,30,1)}

In [None]:
knn = KNeighborsRegressor()
knn_cv_model = GridSearchCV(knn, knn_params, cv = 10).fit(X_train,y_train)

In [None]:
knn_cv_model.best_params_

In [None]:
#final modeli
knn_tuned = KNeighborsRegressor(n_neighbors = knn_cv_model.best_params_['n_neighbors']).fit(X_train,y_train)
y_pred = knn_tuned.predict(X_test)
RMSE = np.sqrt(mean_squared_error(y_test,y_pred))
RMSE