### **Ridge regression**

Ridge regression is a linear regression technique that is used to address the problem of overfitting in regression models when multicollinearity occurs between the independent variables. Multicollinearity occurs when two or more independent variables in the model are strongly correlated with each other.

In ridge regression, a penalty term (also known as a regularization term) is introduced into the linear regression model. This penalty term is proportional to the square of the regression coefficients, with a regularization parameter (known as the ridge parameter) controlling the magnitude of the penalty. The addition of this penalty term helps to stabilize the regression coefficients and reduce their variation, thus mitigating the multicollinearity problem and reducing the risk of overfitting.

Ridge regression is particularly useful when there are many independent variables in the model and when these variables are highly correlated with each other. However, it is important to note that ridge regression does not completely eliminate the multicollinearity problem, but rather handles it more effectively than standard linear regression.

The effectiveness of ridge regression depends on the appropriate choice of ridge parameter, which can be determined through techniques such as cross-validation.

the equation is the same of the linear regeression:

**y_hat = b0 + b1x**

b0 -> y-intercept

b1 -> x-coefficient

y_hat -> predicted value

lamda -> ridge penalty (helps handle collinearity, by penalising b1, and other coefficient)

collinear: columns are correlatede eachselfs

In [22]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

In [23]:
teams = pd.read_csv('teams.csv')
teams.head(5)

Unnamed: 0,team,year,athletes,events,age,height,weight,prev_medals,medals
0,AFG,1964,8,8,22.0,161.0,64.2,0.0,0
1,AFG,1968,5,5,23.2,170.2,70.0,0.0,0
2,AFG,1972,8,8,29.0,168.3,63.8,0.0,0
3,AFG,1980,11,11,23.6,168.4,63.2,0.0,0
4,AFG,2004,5,5,18.6,170.8,64.8,0.0,0


In [24]:
teams.info()
print(teams.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2014 entries, 0 to 2013
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   team         2014 non-null   object 
 1   year         2014 non-null   int64  
 2   athletes     2014 non-null   int64  
 3   events       2014 non-null   int64  
 4   age          2014 non-null   float64
 5   height       2014 non-null   float64
 6   weight       2014 non-null   float64
 7   prev_medals  2014 non-null   float64
 8   medals       2014 non-null   int64  
dtypes: float64(4), int64(4), object(1)
memory usage: 141.7+ KB
(2014, 9)


In [25]:
train, test = train_test_split(teams, test_size=.2, random_state=1)

In [26]:
predictors = ['athletes', 'events']
target = 'medals'

In [27]:
X = train[predictors].copy()
y = train[[target]].copy()

In [28]:
X

Unnamed: 0,athletes,events
1322,6,6
1872,119,80
953,4,4
1117,2,2
1993,43,25
...,...,...
1791,40,25
1096,36,23
1932,719,245
235,13,11


In [29]:
y

Unnamed: 0,medals
1322,0
1872,5
953,0
1117,0
1993,0
...,...
1791,1
1096,1
1932,264
235,0


scale x values

In [30]:
x_mean = X.mean()
x_std = X.std()

In [31]:
print(x_mean, x_std)

athletes    74.409063
events      35.990068
dtype: float64 athletes    127.250043
events       48.978737
dtype: float64


In [32]:
X = (X - x_mean) / x_std

In [34]:
X.describe()

# mean = 0, std = 1

Unnamed: 0,athletes,events
count,1611.0,1611.0
mean,-2.3706810000000002e-17,-9.923781e-18
std,1.0,1.0
min,-0.5768883,-0.714393
25%,-0.5297371,-0.6123079
50%,-0.4197174,-0.4489717
75%,-0.02679027,0.183956
max,6.008571,4.634867


In [35]:
X['intercept'] = 1
X = X[['intercept'] + predictors]

In [36]:
X

Unnamed: 0,intercept,athletes,events
1322,1,-0.537596,-0.612308
1872,1,0.350420,0.898552
953,1,-0.553313,-0.653142
1117,1,-0.569030,-0.693976
1993,1,-0.246829,-0.224384
...,...,...,...
1791,1,-0.270405,-0.224384
1096,1,-0.301839,-0.265219
1932,1,5.065546,4.267361
235,1,-0.482586,-0.510223


In [37]:
X.T

Unnamed: 0,1322,1872,953,1117,1993,385,1287,1831,0,1159,...,960,847,1669,715,905,1791,1096,1932,235,1061
intercept,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
athletes,-0.537596,0.35042,-0.553313,-0.56903,-0.246829,-0.482586,-0.537596,0.138239,-0.521879,-0.152527,...,-0.199678,-0.160386,-0.529737,-0.529737,-0.341132,-0.270405,-0.301839,5.065546,-0.482586,-0.19182
events,-0.612308,0.898552,-0.653142,-0.693976,-0.224384,-0.571474,-0.612308,0.102288,-0.571474,-0.163133,...,-0.285636,-0.101882,-0.612308,-0.591891,-0.367304,-0.224384,-0.265219,4.267361,-0.510223,0.041037


In [39]:
# this is lambda
alpha = 2
I = np.identity(X.shape[1])
I

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [40]:
I[0][0] = 0

In [42]:
I
# we dont wont penalize the y intercept

array([[0., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [43]:
penalty = alpha * I
penalty

array([[0., 0., 0.],
       [0., 2., 0.],
       [0., 0., 2.]])

In [46]:
B = np.linalg.inv(X.T @ X + penalty) @ X.T @ y

In [47]:
B

Unnamed: 0,medals
0,10.691496
1,61.857734
2,-34.63292


In [48]:
B.index = ['intercept', 'athletes', 'events']
B

Unnamed: 0,medals
intercept,10.691496
athletes,61.857734
events,-34.63292


In [50]:
test_X = test[predictors]
test_X = (test_X - x_mean) / x_std
test_X['intercept'] = 1
test_X = test_X[['intercept'] + predictors]
test_X

Unnamed: 0,intercept,athletes,events
309,1,-0.553313,-0.653142
285,1,0.594035,1.000637
919,1,-0.144668,0.102288
120,1,0.146098,0.531045
585,1,-0.301839,-0.122299
...,...,...,...
541,1,-0.380425,-0.408138
1863,1,-0.191820,0.143122
622,1,-0.058224,0.388126
1070,1,-0.569030,-0.693976


In [51]:
prediction = test_X @ B
prediction

Unnamed: 0,medals
309,-0.914959
285,12.782156
919,-1.799893
120,1.337116
585,-3.744014
...,...
541,1.294285
1863,-6.130765
622,-6.352080
1070,-0.472980


### skleran comparison

In [52]:
from sklearn.linear_model import Ridge

In [53]:
ridge = Ridge(alpha=alpha)

In [54]:
ridge.fit(X[predictors], y)

In [55]:
ridge.coef_

array([[ 61.85773366, -34.63292036]])

In [56]:
ridge.intercept_

array([10.69149597])

In [58]:
sklearn_predictions = ridge.predict(test_X[predictors])
sklearn_predictions

array([[-9.14958971e-01],
       [ 1.27821560e+01],
       [-1.79989300e+00],
       [ 1.33711574e+00],
       [-3.74401434e+00],
       [ 2.14000934e+01],
       [ 5.49275861e+00],
       [-2.24089563e+00],
       [-1.57792730e+00],
       [-6.93969528e-01],
       [ 2.13215706e+00],
       [ 6.93022088e+01],
       [ 7.31015350e+01],
       [ 1.51695579e+01],
       [ 8.13660846e+01],
       [ 8.43710203e+01],
       [-5.61896535e-01],
       [ 2.33795627e-01],
       [-6.05703919e-01],
       [-4.72980085e-01],
       [ 1.19439581e+01],
       [-1.75478394e+00],
       [-9.14958971e-01],
       [ 3.22386656e-01],
       [-3.61129051e+00],
       [ 1.41084181e+01],
       [-5.46747166e+00],
       [ 4.99243294e-01],
       [ 4.65195738e+00],
       [-4.73305506e-01],
       [ 2.22237519e+00],
       [-1.00355000e+00],
       [ 3.10600754e+00],
       [-1.31313047e+00],
       [ 1.07526223e+02],
       [-2.68287451e+00],
       [ 1.23126255e+02],
       [-3.08104601e+00],
       [-1.1

In [59]:
prediction - sklearn_predictions

Unnamed: 0,medals
309,6.217249e-14
285,-4.032330e-13
919,-2.811085e-13
120,-4.103384e-13
585,-2.167155e-13
...,...
541,0.000000e+00
1863,-3.774758e-13
622,-4.902745e-13
1070,9.059420e-14


### find the optimal alpha

In [63]:
def ridge_fit(train, predictors, target, alpha):
    X = train[predictors].copy()
    y = train[[target]].copy()
    
    x_mean = X.mean()
    x_std = X.std()
    
    X = (X - x_mean) / x_std
    X["intercept"] = 1
    X = X[["intercept"] + predictors]
    
    penalty = alpha * np.identity(X.shape[1])
    penalty[0][0] = 0
    
    B = np.linalg.inv(X.T @ X + penalty) @ X.T @ y
    B.index = ["intercept", "athletes", "events"]
    return B, x_mean, x_std

# def of what we wrote before

In [61]:
B, x_mean, x_std = ridge_fit(train, predictors, target, alpha)

In [64]:
def ridge_predict(test, predictors, x_mean, x_std, B):
    test_X = test[predictors]
    test_X = (test_X - x_mean) / x_std
    test_X["intercept"] = 1
    test_X = test_X[["intercept"] + predictors]

    predictions = test_X @ B
    return predictions

# def of what we wrote before

In [65]:
from sklearn.metrics import mean_absolute_error

errors = []
alphas = [10**i for i in range(-2,4)]

In [66]:
alphas

[0.01, 0.1, 1, 10, 100, 1000]

In [67]:
for alpha in alphas:
    B, x_mean, x_std = ridge_fit(train, predictors, target, alpha)
    predictions = ridge_predict(test, predictors, x_mean, x_std, B)
    
    errors.append(mean_absolute_error(test[target], predictions))

In [68]:
errors

[6.309640830161126,
 6.3060443319528865,
 6.272283376431585,
 6.114051204717739,
 7.156811236590467,
 6.9780545895757315]

In [70]:
alphas
# choose the alpha with minus error

[0.01, 0.1, 1, 10, 100, 1000]