### Ridge Regression
Amac hata kareler toplamini minimiza eden katsayilari, bu katsayilara bir ceza uygulayarak bulmaktir.
* Asiri ogrenmeye karsi direnclidir. 
* Yanlidir fakat varyansi dusuktur.
* Cok parametre oldugunda EKK' ya gore daha iyidir.
* Cok boyutluluk lanetine karsi cozum sunar.
* Coklu dogrusal baglanti problemi oldugunda etkilidir.
* Tum degiskenler ile model kurar. ilgisiz degiskenleri modelden cikarmaz, katsayilarini sifira yaklastirir.
* Delta kritik roldedir. iki terimin (formulundeki) goreceli etkilerini kontrol etmeyi saglar.
* Delta icin iyi bir deger bulunmasi onemlidir. Bunu icin CV yontemi kullanilir.

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn import model_selection
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import cross_val_score

In [None]:
df =pd.read_csv('../input/hitters-baseball-data/Hitters.csv')
df = df.dropna()
dms = pd.get_dummies(df[['League','Division','NewLeague']])
y = df['Salary']
X_ = df.drop(['Salary','League','Division','NewLeague'],axis = 1).astype('float64')
X = pd.concat([X_, dms[['League_N','Division_W','NewLeague_N']]],axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

In [None]:
X.head()

In [None]:
y.head()

In [None]:
df.shape

In [None]:
ridge_model = Ridge(alpha = 0.1).fit(X_train, y_train)
ridge_model

In [None]:
ridge_model.coef_

In [None]:
ridge_model.intercept_

In [None]:
np.linspace(10,-2,100) # sayi uretmeyi saglar

In [None]:
lambdalar = 10 ** np.linspace(10,-2,100) * 0.5
lambdalar

In [None]:
ridge_model = Ridge()
katsayilar = []

for i in lambdalar:
    ridge_model.set_params(alpha = i)
    ridge_model.fit(X_train,y_train)
    katsayilar.append(ridge_model.coef_)
katsayilar

In [None]:
ax = plt.gca()
ax.plot(lambdalar,katsayilar)
ax.set_xscale('log')

#### Tahmin

In [None]:
ridge_model = Ridge().fit(X_train,y_train)
y_pred = ridge_model.predict(X_train)

In [None]:
y_pred[0:10]

In [None]:
y_train[0:10]

In [None]:
#train hatasi
RMSE = np.sqrt(mean_squared_error(y_train,y_pred))
RMSE

In [None]:
# valide edilmis RMSE (Daha dogruhata sonucu) (train hatasi)
np.sqrt(np.mean(-cross_val_score(ridge_model, X_train, y_train, cv = 10, scoring = 'neg_mean_squared_error')))

In [None]:
#test hatasi
y_pred = ridge_model.predict(X_test)
RMSE = np.sqrt(mean_squared_error(y_test,y_pred))
RMSE

#### Model Tuning (Parametre Ayarlama)

In [None]:
ridge_model = Ridge().fit(X_train,y_train)
y_pred = ridge_model.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

In [None]:
lambdalar1 = np.random.randint(0,1000,100)

In [None]:
lambdalar2 = 10**np.linspace(10,-2,100)*0.5

In [None]:
ridgecv = RidgeCV(alphas = lambdalar2, scoring = 'neg_mean_squared_error', cv = 10, normalize = True)
ridgecv.fit(X_train, y_train)

In [None]:
ridgecv.alpha_ #optimum parametre

In [None]:
#final modeli
ridge_tuned = Ridge(alpha = ridgecv.alpha_).fit(X_train,y_train)

In [None]:
y_pred = ridge_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))

In [None]:
# lamdalar 1 icin
ridgecv = RidgeCV(alphas = lambdalar1, scoring = 'neg_mean_squared_error', cv = 10, normalize = True)
ridgecv.fit(X_train, y_train)
print('alpha : ',ridgecv.alpha_)
ridge_tuned = Ridge(alpha = ridgecv.alpha_).fit(X_train,y_train)
y_pred = ridge_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test,y_pred))