# 1. Answer to the questions

1. Derive an analytical solution for the regression task. Use a vector form of equation.  
    - Y = X*w                 - линрег в матричном виде
    - SSE = Σ(Y - Y_pred)^2   - минимизируем функию сумма квадратов ошибок
    - d/dw(Σ(Y - X*w)^2) = 0  - диффур = 0
    - -2X^T(Y - X*w) = 0      - раскрытие скобок
    - -2X^TY + 2X^TX\*w = 0
    - $$ w = (X^TX)^{-1}X^TY $$   
2. What changes in solution when adding L1 and L2 regularization to the loss function.
    - тем самым мы решаем проблему переобучения
    L1 - убираем малозначимые(линейно зависимые) признаки
    L2 - не даем весам разрастись(штраф в виде квадрата весов)
3. Explain why L1 regularization is commonly used to select features. Why are there a lot of weights that are equal to 0 after the model is fitted?
    - штрафует за признаки в первой степени, тем самым модель стремиться оставить поменьше весов
4. Explain how you can use the same models (Linear Regression, Ridge, etc) but make it possible to fit nonlinear dependencies?
    - нелинейные зависимости становятся новыми осями и вписываются как часть линейного уравнения

# 2. Introduction  - make all preprocess staff from the previous lesson

## 2.1 Import libraries.

In [220]:
# ! pip3.12 install scikit-learn statsmodels matplotlib seaborn -q

In [None]:
! pip install kaggle
! mkdir .kaggle data
! chmod 600 .kaggle/kaggle.json
! kaggle competitions download two-sigma-connect-rental-listing-inquiries
! unzip two-sigma-connect-rental-listing-inquiries.zip
! unzip data/train.json.zip
! unzip data/test.json.zip

In [221]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.preprocessing import PolynomialFeatures
import re

## 2.2 Read train and test parts.

In [222]:
df_train = pd.read_json('data/train.json')
df_train.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,manager_id,photos,price,street_address,interest_level
4,1.0,1,8579a0b0d54db803821a35a4a615e97a,2016-06-16 05:55:27,Spacious 1 Bedroom 1 Bathroom in Williamsburg!...,145 Borinquen Place,"[Dining Room, Pre-War, Laundry in Building, Di...",40.7108,7170325,-73.9539,a10db4590843d78c784171a107bdacb4,[https://photos.renthop.com/2/7170325_3bb5ac84...,2400,145 Borinquen Place,medium
6,1.0,2,b8e75fc949a6cd8225b455648a951712,2016-06-01 05:44:33,BRAND NEW GUT RENOVATED TRUE 2 BEDROOMFind you...,East 44th,"[Doorman, Elevator, Laundry in Building, Dishw...",40.7513,7092344,-73.9722,955db33477af4f40004820b4aed804a0,[https://photos.renthop.com/2/7092344_7663c19a...,3800,230 East 44th,low
9,1.0,2,cd759a988b8f23924b5a2058d5ab2b49,2016-06-14 15:19:59,**FLEX 2 BEDROOM WITH FULL PRESSURIZED WALL**L...,East 56th Street,"[Doorman, Elevator, Laundry in Building, Laund...",40.7575,7158677,-73.9625,c8b10a317b766204f08e613cef4ce7a0,[https://photos.renthop.com/2/7158677_c897a134...,3495,405 East 56th Street,medium


In [223]:
df_test = pd.read_json('data/test.json')
df_test.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,manager_id,photos,price,street_address
0,1.0,1,79780be1514f645d7e6be99a3de696c5,2016-06-11 05:29:41,Large with awesome terrace--accessible via bed...,Suffolk Street,"[Elevator, Laundry in Building, Laundry in Uni...",40.7185,7142618,-73.9865,b1b1852c416d78d7765d746cb1b8921f,[https://photos.renthop.com/2/7142618_1c45a2c8...,2950,99 Suffolk Street
1,1.0,2,0,2016-06-24 06:36:34,Prime Soho - between Bleecker and Houston - Ne...,Thompson Street,"[Pre-War, Dogs Allowed, Cats Allowed]",40.7278,7210040,-74.0,d0b5648017832b2427eeb9956d966a14,[https://photos.renthop.com/2/7210040_d824cc71...,2850,176 Thompson Street
2,1.0,0,0,2016-06-17 01:23:39,Spacious studio in Prime Location. Cleanbuildi...,Sullivan Street,"[Pre-War, Dogs Allowed, Cats Allowed]",40.726,7174566,-74.0026,e6472c7237327dd3903b3d6f6a94515a,[https://photos.renthop.com/2/7174566_ba3a35c5...,2295,115 Sullivan Street


## 2.3 Preprocess “interest level” feature.

In [224]:
df_train['interest_level'] = df_train['interest_level'].astype('category').cat.codes
df_train['interest_level'].value_counts()

interest_level
1    34284
2    11229
0     3839
Name: count, dtype: int64

# 3. Intro data analysis part 2

## 3.1 Lets generate additional features for better model quality. Consider a column named “features”. It consists of a list of highlights of the current flat.

In [225]:
df_train.features.iloc[0]

['Dining Room',
 'Pre-War',
 'Laundry in Building',
 'Dishwasher',
 'Hardwood Floors',
 'Dogs Allowed',
 'Cats Allowed']

## 3.2 Remove unused symbols ([,],’,” and space) from the column.

In [226]:
df_train['features'] = df_train['features'].apply(lambda x: list([re.sub(r'[\[\]\'\"\s]', '', item) for item in x]))


In [227]:
df_train['features_n'] = df_train['features'].apply(lambda x: ','.join(map(str, x)))

In [228]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 49352 entries, 4 to 124009
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   bathrooms        49352 non-null  float64
 1   bedrooms         49352 non-null  int64  
 2   building_id      49352 non-null  object 
 3   created          49352 non-null  object 
 4   description      49352 non-null  object 
 5   display_address  49352 non-null  object 
 6   features         49352 non-null  object 
 7   latitude         49352 non-null  float64
 8   listing_id       49352 non-null  int64  
 9   longitude        49352 non-null  float64
 10  manager_id       49352 non-null  object 
 11  photos           49352 non-null  object 
 12  price            49352 non-null  int64  
 13  street_address   49352 non-null  object 
 14  interest_level   49352 non-null  int8   
 15  features_n       49352 non-null  object 
dtypes: float64(3), int64(3), int8(1), object(9)
memory usage: 6.1+

## 3.3 Split values in every row with separator “,” and collect the result in one huge list for the whole dataset. You may use DataFrame.iterrows().

In [229]:
list_df_futures = df_train.explode('features')['features'].dropna().to_list()
list_df_futures[:10]

['DiningRoom',
 'Pre-War',
 'LaundryinBuilding',
 'Dishwasher',
 'HardwoodFloors',
 'DogsAllowed',
 'CatsAllowed',
 'Doorman',
 'Elevator',
 'LaundryinBuilding']

## 3.4 How many unique values consist of a result list?

In [230]:
len(set(list_df_futures))

1545

## 3.5 Let's get to know the new library - Collections. With this package you could effectively get quantity statistics about your data.

## 3.6 Count the most popular features of our huge list and take the top 20 for this moment.

In [231]:
from collections import Counter

counter = Counter(list_df_futures)

counter.most_common(20)

[('Elevator', 25915),
 ('CatsAllowed', 23540),
 ('HardwoodFloors', 23527),
 ('DogsAllowed', 22035),
 ('Doorman', 20898),
 ('Dishwasher', 20426),
 ('NoFee', 18062),
 ('LaundryinBuilding', 16344),
 ('FitnessCenter', 13252),
 ('Pre-War', 9148),
 ('LaundryinUnit', 8738),
 ('RoofDeck', 6542),
 ('OutdoorSpace', 5268),
 ('DiningRoom', 5136),
 ('HighSpeedInternet', 4299),
 ('Balcony', 2992),
 ('SwimmingPool', 2730),
 ('LaundryInBuilding', 2593),
 ('NewConstruction', 2559),
 ('Terrace', 2283)]

## 3.7 If everything correct you should get next values:  'Elevator', 'HardwoodFloors', 'CatsAllowed', 'DogsAllowed', 'Doorman', 'Dishwasher', 'NoFee', 'LaundryinBuilding', 'FitnessCenter', 'Pre-War', 'LaundryinUnit', 'RoofDeck', 'OutdoorSpace', 'DiningRoom', 'HighSpeedInternet', 'Balcony', 'SwimmingPool', 'LaundryInBuilding', 'NewConstruction', 'Terrace'.

## 3.8 Now generate 20 new features based on top 20 values: 1 if value in “feature” column else 0.

In [232]:
top_20 = counter.most_common(20)

top_20_features = [feature for feature, count in top_20]
top_20_features

['Elevator',
 'CatsAllowed',
 'HardwoodFloors',
 'DogsAllowed',
 'Doorman',
 'Dishwasher',
 'NoFee',
 'LaundryinBuilding',
 'FitnessCenter',
 'Pre-War',
 'LaundryinUnit',
 'RoofDeck',
 'OutdoorSpace',
 'DiningRoom',
 'HighSpeedInternet',
 'Balcony',
 'SwimmingPool',
 'LaundryInBuilding',
 'NewConstruction',
 'Terrace']

In [233]:
def onehotencoder(df, features_list):
  result = np.zeros((len(df), len(features_list)))
  for r, col in enumerate(df):
    for fi in range(len(features_list)):
      for c in col:
        if features_list[fi] == c:
          result[r, fi] = 1
  return result

In [234]:
df_encoded = pd.DataFrame(onehotencoder(df_train['features'], top_20_features))


In [235]:
rename_dict = dict(zip(range(20), top_20_features))
rename_dict

{0: 'Elevator',
 1: 'CatsAllowed',
 2: 'HardwoodFloors',
 3: 'DogsAllowed',
 4: 'Doorman',
 5: 'Dishwasher',
 6: 'NoFee',
 7: 'LaundryinBuilding',
 8: 'FitnessCenter',
 9: 'Pre-War',
 10: 'LaundryinUnit',
 11: 'RoofDeck',
 12: 'OutdoorSpace',
 13: 'DiningRoom',
 14: 'HighSpeedInternet',
 15: 'Balcony',
 16: 'SwimmingPool',
 17: 'LaundryInBuilding',
 18: 'NewConstruction',
 19: 'Terrace'}

In [236]:
df_encoded = df_encoded.rename(columns=rename_dict)

In [237]:
df_encoded

Unnamed: 0,Elevator,CatsAllowed,HardwoodFloors,DogsAllowed,Doorman,Dishwasher,NoFee,LaundryinBuilding,FitnessCenter,Pre-War,LaundryinUnit,RoofDeck,OutdoorSpace,DiningRoom,HighSpeedInternet,Balcony,SwimmingPool,LaundryInBuilding,NewConstruction,Terrace
0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49347,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
49348,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
49349,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
49350,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [238]:
df_train_with_features = pd.concat([df_train.reset_index(), df_encoded.reset_index()], axis=1)

## 3.9 Extend our features set with  'bathrooms',  'bedrooms', 'interest_level' and lets create a special variable feature_list with all feature names. Now there are 23 values. All models should be trained on these 23 features.

In [239]:
df = df_train_with_features.drop(['building_id',
                             'created',
                             'description',
                             'display_address',
                             'features',
                             'latitude',
                             'listing_id',
                             'longitude',
                             'manager_id',
                             'photos',
                             'street_address',
                             'features_n',
                             'index'], axis=1)

In [240]:
df.shape

(49352, 24)

In [241]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2 , random_state=0)

In [242]:
train.shape

(39481, 24)

In [243]:
test.shape

(9871, 24)

In [244]:
# .to_numpy()
X_train = train.drop(['price'], axis=1)
Y_train = train[['price']]

In [245]:
X_test = test.drop(['price'], axis=1)
Y_test = test[['price']]

# 4. Models implementation - linear regression

## 4.1 Implement python class for a linear regression algorithm with two basic methods - fit and predict. Use stochastic gradient descent to find optimal weights of the model. For better understanding we recommend additionally implementing separate versions of the algorithm with the analytical solution and non-stochastic gradient descent under the hood.

In [246]:
# def mse_mat(X, w, y):
#     y_p = X.dot(w)
#     return np.sum((y - y_p)**2) / len(y_p)

# def gr_mse_mat(X, w, y):
#     y_p = X.dot(w)
#     return 2/len(X)*(y-y_p).dot(-X)

# w = np.zeros(X.shape[1])
# eps = 0.0001
# learn_rate = 0.01
# next_w = w.copy()

# for i in range(99):
#     cur_w = next_w
#     next_w = cur_w - learn_rate*gr_mse_mat(X, cur_w, y)
#     print(f'Iter: {i}')
#     print(f'Cur w: {cur_w}')
#     print(f'Next w: {next_w}')
#     # print(f'MSE: {mse_mat(X, cur_w, y)}')
#     print('------------------------------------------------------------------------------------------------')
#     if np.linalg.norm(cur_w - next_w, ord=2) <= eps:
#         break
    

In [247]:
class MyLinearRegression:
  def __init__(self, alpha = 0.01, eps = 0.001, n_cycles=500, print_eps=False) -> None:
    self.w = 0
    self.alpha = alpha
    self.eps = eps
    self.n_cycles = n_cycles
    self.print_eps = print_eps
    
  def fit(self, X:np.array, Y:np.array) -> None:
    alpha = self.alpha
    amount_samples, amount_features = X.shape
    w = np.ones(amount_features).reshape(-1, 1)
    for _ in range(self.n_cycles):
      w_prev = w.copy()
      f = (X @ w_prev)
      err = f - Y
      grad = 2 * X.T.dot(err)/amount_samples
      w = w_prev - alpha * grad
      if (w.max()>1e25) or (np.absolute(w - w_prev).sum() < self.eps):
        break
      elif self.print_eps:
        print(np.absolute(w - w_prev).sum())
    self.w = w
  def predict(self, X:np.array) -> np.array:
    return X.dot(self.w)

## 4.2 Give definition for R squared (R2) coefficient and implement function for calculation.

Значение R2 описывает качество подгонки линии регрессии к данным. Оптимально подогнанная линия — это линия, которая минимизирует значение R2. По мере удаления либо приближения от своих оптимальных значений R2 всегда будет расти.
https://habr.com/ru/articles/558084/

In [248]:
def r_squared(y_true, y_pred):
    rss = (y_true - y_pred).var()
    tss = (y_true - np.mean(y_true)).var()
    return (1 - (rss / tss)[0])

## 4.3 Make predictions with your algorithm and estimate model with MAE, RMSE and R2 metrics.

In [249]:
my_lin_reg = MyLinearRegression()
my_lin_reg.fit(X_train.to_numpy(), Y_train.to_numpy())
yp_my_train = my_lin_reg.predict(X_train.to_numpy())
yp_my_test = my_lin_reg.predict(X_test.to_numpy())
yp_my_train

array([[2200.34005989],
       [2639.23147669],
       [3603.65590906],
       ...,
       [1806.94272377],
       [2149.34727534],
       [7248.39437791]])

In [250]:
mae_my_lin_reg_train = mean_absolute_error(Y_train, yp_my_train)
mae_my_lin_reg_test = mean_absolute_error(Y_test, yp_my_test)

In [251]:
rmse_my_lin_reg_train = np.sqrt(mean_squared_error(Y_train, yp_my_train))
rmse_my_lin_reg_test = np.sqrt(mean_squared_error(Y_test, yp_my_test))

In [252]:
r2_my_lin_reg_train = r_squared(Y_train, yp_my_train)
r2_my_lin_reg_test = r_squared(Y_test, yp_my_test)

  return (1 - (rss / tss)[0])


In [253]:
r2_my_lin_reg_train

0.005444148539332394

## 4.4 Initialize LinearRegression() from sklearn.linear_model, fit model and predict train and test parts, same as previous lesson.

In [254]:
lin_reg = LinearRegression()
lin_reg.fit(X_train, Y_train)
yp_train = lin_reg.predict(X_train)
yp_test = lin_reg.predict(X_test)
yp_train

array([[2210.43817982],
       [2458.99829103],
       [4064.1247018 ],
       ...,
       [1884.66146579],
       [2096.88857321],
       [7480.36729635]])

## 4.5 Compare quality metrics and make sure that difference is small. (Between your implementations and sklearn).

In [255]:
print(f"MAE my : TRAIN:{mae_my_lin_reg_train:.2f} TEST:{mae_my_lin_reg_test:.2f}")
print(f"MAE lib: TRAIN:{mean_absolute_error(Y_train, yp_train):.2f} TEST:{mean_absolute_error(Y_test, yp_test):.2f}")


MAE my : TRAIN:1142.32 TEST:1055.72
MAE lib: TRAIN:1192.99 TEST:1105.73


In [256]:
print(f"RMSE my : TRAIN:{rmse_my_lin_reg_train:.2f} TEST:{rmse_my_lin_reg_test:.2f}")
print(f"RMSE lib: TRAIN:{np.sqrt(mean_squared_error(Y_train, yp_train)):.2f} TEST:{np.sqrt(mean_squared_error(Y_test, yp_test)):.2f}")

RMSE my : TRAIN:23985.77 TEST:10860.49
RMSE lib: TRAIN:23984.30 TEST:10860.25


In [257]:
print(f"R2 my : TRAIN:{r2_my_lin_reg_train:.5f} TEST:{r2_my_lin_reg_test:.5f}")
print(f"R2 lib: TRAIN:{r_squared(Y_train, yp_train):.5f} TEST:{r_squared(Y_test, yp_test):.5f}")

R2 my : TRAIN:0.00544 TEST:0.02396
R2 lib: TRAIN:0.00557 TEST:0.02403


  return (1 - (rss / tss)[0])


Как видим разница показателей между собственной реализацией и библиотечной минимальна

## 4.6 Save metrics as in the previous lesson in a table with columns model, train, test for MAE table, RMSE table and R2 coefficient.

In [258]:
result_MAE = pd.DataFrame({'model': [], 'train': [], 'test': []})
result_RMSE = pd.DataFrame({'model': [], 'train': [], 'test': []})
result_R2 = pd.DataFrame({'model': [], 'train': [], 'test': []})

In [259]:
result_MAE.loc[len(result_MAE)] = ['linreg_default', mae_my_lin_reg_train, mae_my_lin_reg_test]
result_RMSE.loc[len(result_RMSE)] = ['linreg_default', rmse_my_lin_reg_train, rmse_my_lin_reg_test]
result_R2.loc[len(result_R2)] = ['linreg_default', r2_my_lin_reg_train, r2_my_lin_reg_test]

# 5. Regularized models implementation - Ridge, Lasso, ElasticNet

## 5.1 Implement Ridge, Lasso, ElasticNet algorithms: extend loss function with L2, L1 and both regularization correspondingly.

Аналитическое решение  
функция

$$ L(w) = ||Xw - Y||^2_2 + ||\Gamma w||^2_2 $$

минимизируется при:

$$ w = (X^TX + \Gamma^T \Gamma)^{-1}X^TY $$

В нашем случае $\Gamma^T\Gamma = \lambda I$, если нет свободного члена, иначе:
$\Gamma^T\Gamma =
\left(\begin{matrix}
\lambda I & 0 \\
0 & 0
\end{matrix}\right)$

In [260]:
class MyL2:
  def __init__(self) -> None:
    self.w = 0
  def fit(self, X, Y) -> None:
    amount_samples, amount_features = X.shape
    L = 20
    IL = np.array([[L if i == j else 0 for j in range(amount_features)] for i in range(amount_features)])
    IL[0][0] = 0
    A = np.linalg.inv(X.T @ X + IL)
    w = Y.iloc[:,0] @ X @ A
    self.w = w
  def get_w(self):
      return self.w
  def predict(self, X) -> pd.DataFrame:
    return pd.DataFrame(X.dot(self.w)).to_numpy().reshape(-1,1)

**Градиент:**
$$
\frac{\partial{L}}{\partial{w}}
= \frac{1}{n}(X^TXw - X^TY) + \lambda sign(w)
= \frac{1}{n}X^T(y_{pred} - Y) + \lambda sign(w)
$$

где для приближения будем считать что $|\cdot|$ - дифференцируемая функция, ее производной является $sign(\cdot)$

In [261]:

class MyL1:
  def __init__(self, n_cycles=500, print_eps=False, eps = 0.1, alpha = 0.03 ) -> None:
    self.w = 0
    self.n_cycles = n_cycles
    self.print_eps = print_eps
    self.eps = eps
    self.alpha= alpha

  def fit(self, X, Y) -> None:
    alpha = self.alpha # шаг обучения
    lm = 0.3
    amount_samples, amount_features = X.shape
    w = np.random.normal(size=amount_features)
    np_soft_sign = np.vectorize(lambda x, eps=1e-7: np.sign(x) if abs(x) > eps else x / eps)
    for _ in range(self.n_cycles):
      signw = np_soft_sign(self.w)
      f = X.dot(w)
      err = f - np.array(Y.iloc[:,0])
      grad = 2 * X.T.dot(err)/amount_samples + lm*signw
      if (w.max()>1e25) or (np.absolute(alpha * grad).sum() < self.eps):
        break
      elif self.print_eps:
        print(np.absolute(alpha * grad).sum())
      w -= alpha * grad
    self.w = w
  def predict(self, X) -> pd.DataFrame:
    return pd.DataFrame(X.dot(self.w)).to_numpy().reshape(-1,1)

In [262]:
class MyElasticNet:
  def __init__(self, l1_ratio = 0.5, n_cycles=500, print_eps=False, eps=0.1, alpha = 0.03) -> None:
    self.w = 0
    self.l1_ratio = l1_ratio
    self.n_cycles = n_cycles
    self.print_eps = print_eps
    self.eps = eps
    self.alpha= alpha

  def fit(self, X, Y) -> None:
    alpha = self.alpha # шаг обучения
    lm = 0.3 # лямбда, коэф забывания, для л1
    amount_samples, amount_features = X.shape
    w = np.random.normal(size=amount_features)
    np_soft_sign = np.vectorize(lambda x, eps=1e-7: np.sign(x) if abs(x) > eps else x / eps)
    for _ in range(self.n_cycles):
      signw = np_soft_sign(self.w)
      f = X.dot(w)
      err = f - np.array(Y.iloc[:,0])
      grad = 2 * X.T.dot(err)/amount_samples + (1-self.l1_ratio)*w + (self.l1_ratio)*lm*signw
      if (w.max()>1e25) or (np.absolute(alpha * grad).sum() < self.eps):
        break
      elif self.print_eps:
        print(np.absolute(alpha * grad).sum())
      w -= alpha * grad
    self.w = w
  def predict(self, X) -> pd.DataFrame:
    return pd.DataFrame(X.dot(self.w)).to_numpy().reshape(-1,1)

## 5.2 Make predictions with your algorithm and estimate the model with MAE, RMSE and R2 metrics.

### Ridge

In [263]:
my_L2 = MyL2()
my_L2.fit(X_train, Y_train)
yp_my_L2_train = my_L2.predict(X_train)
yp_my_L2_test = my_L2.predict(X_test)

In [264]:
mae_l2_train = mean_absolute_error(Y_train, yp_my_L2_train)
mae_l2_test = mean_absolute_error(Y_test, yp_my_L2_test)

In [265]:
rmse_l2_train = np.sqrt(mean_squared_error(Y_train, yp_my_L2_train))
rmse_l2_test = np.sqrt(mean_squared_error(Y_test, yp_my_L2_test))

In [266]:
r2_l2_train = r_squared(Y_train, yp_my_L2_train)
r2_l2_test = r_squared(Y_test, yp_my_L2_test)

  return (1 - (rss / tss)[0])


### Lasso

In [267]:
my_L1 = MyL1()
my_L1.fit(X_train, Y_train)
yp_my_l1_train = my_L1.predict(X_train)
yp_my_l1_test = my_L1.predict(X_test)

In [268]:
mae_l1_train = mean_absolute_error(Y_train, yp_my_l1_train)
mae_l1_test = mean_absolute_error(Y_test, yp_my_l1_test)

In [269]:
rmse_l1_train = np.sqrt(mean_squared_error(Y_train, yp_my_l1_train))
rmse_l1_test = np.sqrt(mean_squared_error(Y_test, yp_my_l1_test))

In [270]:
r2_l1_train = r_squared(Y_train, yp_my_l1_train)
r2_l1_test = r_squared(Y_test, yp_my_l1_test)

  return (1 - (rss / tss)[0])


### Elatic Net

In [271]:
my_EN = MyElasticNet()
my_EN.fit(X_train, Y_train)
yp_my_en_train = my_EN.predict(X_train)
yp_my_en_test = my_EN.predict(X_test)

In [272]:
mae_en_my_train = mean_absolute_error(Y_train, yp_my_en_train)
mae_en_my_test = mean_absolute_error(Y_test, yp_my_en_test)

In [273]:
rmse_en_my_train = np.sqrt(mean_squared_error(Y_train, yp_my_en_train))
rmse_en_my_test = np.sqrt(mean_squared_error(Y_test, yp_my_en_test))

In [274]:
r2_en_my_train = r_squared(Y_train, yp_my_en_train)
r2_en_my_test = r_squared(Y_test, yp_my_en_test)

  return (1 - (rss / tss)[0])


## 5.3 Initialize Ridge(), Lasso() and ElasticNet() from sklearn.linear_model, fit model and make prediction for train and test samples, same as previous lesson.

In [275]:
ridge_model = Ridge()
ridge_model.fit(X_train, Y_train)
ridge_train_predictions = ridge_model.predict(X_train)
ridge_test_predictions = ridge_model.predict(X_test)

In [276]:
mae_ridge_train = mean_absolute_error(Y_train, ridge_train_predictions)
mae_ridge_test = mean_absolute_error(Y_test, ridge_test_predictions)

In [277]:
rmse_ridge_train = np.sqrt(mean_squared_error(Y_train, ridge_train_predictions))
rmse_ridge_test = np.sqrt(mean_squared_error(Y_test, ridge_test_predictions))

In [278]:
r2_ridge_train = r_squared(Y_train, ridge_train_predictions)
r2_ridge_test = r_squared(Y_test, ridge_test_predictions)

  return (1 - (rss / tss)[0])


In [279]:
lasso_model = Lasso()
lasso_model.fit(X_train, Y_train)
lasso_train_predictions = lasso_model.predict(X_train)
lasso_test_predictions = lasso_model.predict(X_test)

In [280]:
mae_lasso_train = mean_absolute_error(Y_train, lasso_train_predictions)
mae_lasso_test = mean_absolute_error(Y_test, lasso_test_predictions)

In [281]:
rmse_lasso_train = np.sqrt(mean_squared_error(Y_train, lasso_train_predictions))
rmse_lasso_test = np.sqrt(mean_squared_error(Y_test, lasso_test_predictions))

In [282]:
r2_lasso_train = r_squared(Y_train, lasso_train_predictions.reshape(-1,1))
r2_lasso_test = r_squared(Y_test, lasso_test_predictions.reshape(-1,1))

  return (1 - (rss / tss)[0])


In [283]:
en_model = ElasticNet( l1_ratio=0.5)
en_model.fit(X_train, Y_train)
en_train_predictions = en_model.predict(X_train)
en_test_predictions = en_model.predict(X_test)

In [284]:
mae_en_train = mean_absolute_error(Y_train, en_train_predictions)
mae_en_test = mean_absolute_error(Y_test, en_test_predictions)

In [285]:
rmse_en_train = np.sqrt(mean_squared_error(Y_train, en_train_predictions))
rmse_en_test = np.sqrt(mean_squared_error(Y_test, en_test_predictions))

In [286]:
r2_en_train = r_squared(Y_train, en_train_predictions.reshape(-1,1))
r2_en_test = r_squared(Y_test, en_test_predictions.reshape(-1,1))

  return (1 - (rss / tss)[0])


## 5.4 Compare quality metrics and make sure that difference is small. (Between your implementations and sklearn).

### Ridge

In [287]:
print(f"L2 MAE my : TRAIN:{mae_l2_train:.2f} TEST:{mae_l2_test:.2f}")
print(f"L2 MAE lib: TRAIN:{mae_ridge_train:.2f} TEST:{mae_ridge_test:.2f}")


L2 MAE my : TRAIN:1191.36 TEST:1103.74
L2 MAE lib: TRAIN:1192.93 TEST:1105.66


In [288]:
print(f"L2 RMSE my : TRAIN:{rmse_l2_train:.2f} TEST:{rmse_l2_test:.2f}")
print(f"L2 RMSE lib: TRAIN:{rmse_ridge_train:.2f} TEST:{rmse_ridge_test:.2f}")

L2 RMSE my : TRAIN:23984.32 TEST:10859.92
L2 RMSE lib: TRAIN:23984.30 TEST:10860.24


In [289]:
print(f"L2 R2 my : TRAIN:{r2_l2_train:.5f} TEST:{r2_l2_test:.5f}")
print(f"L2 R2 lib: TRAIN:{r2_ridge_train:.5f} TEST:{r2_ridge_test:.5f}")

L2 R2 my : TRAIN:0.00556 TEST:0.02408
L2 R2 lib: TRAIN:0.00557 TEST:0.02403


### Lasso

In [290]:
print(f"L1 MAE my : TRAIN:{mae_l1_train:.2f} TEST:{mae_l1_test:.2f}")
print(f"L1 MAE lib: TRAIN:{mae_lasso_train:.2f} TEST:{mae_lasso_test:.2f}")


L1 MAE my : TRAIN:1179.05 TEST:1092.32
L1 MAE lib: TRAIN:1189.08 TEST:1101.78


In [291]:
print(f"L1 RMSE my : TRAIN:{rmse_l1_train:.2f} TEST:{rmse_l1_test:.2f}")
print(f"L1 RMSE lib: TRAIN:{rmse_lasso_train:.2f} TEST:{rmse_lasso_test:.2f}")

L1 RMSE my : TRAIN:23984.46 TEST:10859.54
L1 RMSE lib: TRAIN:23984.30 TEST:10860.04


In [292]:
print(f"L1 R2 my : TRAIN:{r2_l1_train:.5f} TEST:{r2_l1_test:.5f}")
print(f"L1 R2 lib: TRAIN:{r2_lasso_train:.5f} TEST:{r2_lasso_test:.5f}")

L1 R2 my : TRAIN:0.00555 TEST:0.02415
L1 R2 lib: TRAIN:0.00557 TEST:0.02407


### Elastic Net

In [293]:
print(f"ElasticNet MAE my : TRAIN:{mae_en_my_train:.2f} TEST:{mae_en_my_test:.2f}")
print(f"ElasticNet MAE lib: TRAIN:{mae_en_train:.2f} TEST:{mae_en_test:.2f}")


ElasticNet MAE my : TRAIN:1094.58 TEST:1006.41
ElasticNet MAE lib: TRAIN:1142.44 TEST:1056.60


In [294]:
print(f"ElasticNet RMSE my : TRAIN:{rmse_en_my_train:.2f} TEST:{rmse_en_my_test:.2f}")
print(f"ElasticNet RMSE lib: TRAIN:{rmse_en_train:.2f} TEST:{rmse_en_test:.2f}")

ElasticNet RMSE my : TRAIN:23994.16 TEST:10875.32
ElasticNet RMSE lib: TRAIN:24003.12 TEST:10895.25


In [295]:
print(f"ElasticNet R2 my : TRAIN:{r2_en_my_train:.5f} TEST:{r2_en_my_test:.5f}")
print(f"ElasticNet R2 lib: TRAIN:{r2_en_train:.5f} TEST:{r2_en_test:.5f}")

ElasticNet R2 my : TRAIN:0.00479 TEST:0.02130
ElasticNet R2 lib: TRAIN:0.00400 TEST:0.01771


разница незначительна

## 5.5 Save metrics as in the previous lesson in a table with columns model, train, test for MAE table, RMSE table and R2 coefficient.

In [296]:
result_MAE.loc[len(result_MAE)] = ['ridge_default', mae_l2_train, mae_l2_test]
result_RMSE.loc[len(result_RMSE)] = ['ridge_default', rmse_l2_train, rmse_l2_test]
result_R2.loc[len(result_R2)] = ['ridge_default', r2_l2_train, r2_l2_test]

In [297]:
result_MAE.loc[len(result_MAE)] = ['lasso_default', mae_l1_train, mae_l1_test]
result_RMSE.loc[len(result_RMSE)] = ['lasso_default', rmse_l1_train, rmse_l1_test]
result_R2.loc[len(result_R2)] = ['lasso_default', r2_l1_train, r2_l1_test]

In [298]:
result_MAE.loc[len(result_MAE)] = ['elasticnet_default', mae_en_my_train, mae_en_my_test]
result_RMSE.loc[len(result_RMSE)] = ['elasticnet_default', rmse_en_my_train, rmse_en_my_test]
result_R2.loc[len(result_R2)] = ['elasticnet_default', r2_en_my_train, r2_en_my_test]

# 6. Feature normalization

## 6.1 First of all, please write several examples why and where features normalization is mandatory and vice versa.

```Когда нормализация признаков обязательна:```  
1. Когда признаки имеют разные масштабы: Например, если у вас есть признак, измеряемый в миллиметрах, и другой признак, измеряемый в килметрахх, то без нормализации признаков модель может неправильно интерпретировать важность этих признаков из-за их разных масштабов.
2. В алгоритмах, чувствительных к масштабу: Некоторые алгоритмы, такие как k-ближайших соседей (k-NN) и линейная регрессия, могут значительно снизить свою точность, если признаки имеют разные масштабы. Нормализация признаков помогает этим алгоритмам работать более эффективно  
3. При использовании градиентного спуска: В контексте обучения с учителем, таких как логистическая регрессия или нейронные сети, нормализация признаков может ускорить сходимость алгоритма обучения, поскольку градиентный спуск может быстрее сходиться к оптимальному решению.  

```Когда нормализация признаков может быть не нужна:```  
1. Когда признаки уже находятся в одинаковом масштабе: Если все признаки уже находятся в одном и том же масштабе, то дополнительная нормализация может быть не нужна и даже может привести к потере важной инфорции.  
2. В задачах, где масштаб признаков не влияет на результат: В некоторых типах задач, таких как классификация с использованием деревьев решений, масштаб признаков может не иметь значительного влияния на результат. Деревья решений могут автоматически учитывать важность признаков без необходимости их нормалации  

## 6.2 Here let's consider the first of classical normalization methods - MinMaxScaler. Please write a mathematical formula for the method.

$$
X_{scaled} = a + \frac{{(X - X_{min})(b-a)}}{X_{max} - X_{min}}
$$
$$ X_{max} - макс.признака $$ 
$$ X_{min} - мин.признака $$
$$ [a,b] - новый \ диапазон \ значений $$

## 6.3 Implement your own function for MinMaxScaler feature normalization

In [299]:
def MyMinMaxScaler(X, a=0, b=1):
    X_scaled = X.copy()
    for column in X.columns:
        X_scaled[column] = a + (X[column] - X[column].min())*(b - a) / (X[column].max() - X[column].min())
    return X_scaled.to_numpy()

## 6.4 Initialize MinMaxScaler() from sklearn.preprocessing.

In [300]:
from sklearn.preprocessing import MinMaxScaler

minmax = MinMaxScaler()


## 6.4 Compare feature normalization with your own method and from sklearn.

In [301]:
MyMinMaxScaler(X_train)

array([[0.15384615, 0.125     , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       [0.15384615, 0.        , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       [0.15384615, 0.        , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.15384615, 0.        , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       [0.15384615, 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.30769231, 0.375     , 0.5       , ..., 0.        , 1.        ,
        0.        ]])

In [302]:
minmax.fit_transform(X_train)

array([[0.15384615, 0.125     , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       [0.15384615, 0.        , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       [0.15384615, 0.        , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.15384615, 0.        , 0.5       , ..., 0.        , 0.        ,
        0.        ],
       [0.15384615, 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.30769231, 0.375     , 0.5       , ..., 0.        , 1.        ,
        0.        ]])

In [303]:
mean_absolute_error(MyMinMaxScaler(X_train), minmax.fit_transform(X_train))

6.724452637579508e-21

## 6.5 Repeat steps from b to e for one more normalization method StandardScaler.

$$
x_{scaled} = \frac{x - \mu}{\sigma}
$$
Где:
$$ \mu - среднее \ арифметическое \ признака $$
$$ \sigma - среднеквадратическое(стандартное) \ отклонение $$

In [304]:
def MyStandartScaler(X):
    X_scaled = X.copy()
    for column in X.columns:
        X_scaled[column] =(X[column] - X[column].mean())/X[column].std()
    return X_scaled.to_numpy()

In [305]:
from sklearn.preprocessing import StandardScaler

stdscl = StandardScaler()

In [306]:
MyStandartScaler(X_train)

array([[-0.42386827, -0.48840193, -0.28367682, ..., -0.23486314,
        -0.23396127, -0.2193851 ],
       [-0.42386827, -1.38301994, -0.28367682, ..., -0.23486314,
        -0.23396127, -0.2193851 ],
       [-0.42386827, -1.38301994, -0.28367682, ..., -0.23486314,
        -0.23396127, -0.2193851 ],
       ...,
       [-0.42386827, -1.38301994, -0.28367682, ..., -0.23486314,
        -0.23396127, -0.2193851 ],
       [-0.42386827, -1.38301994,  1.60086711, ..., -0.23486314,
        -0.23396127, -0.2193851 ],
       [ 1.57395222,  1.30083409, -0.28367682, ..., -0.23486314,
         4.27410352, -0.2193851 ]])

In [307]:
stdscl.fit_transform(X_train)

array([[-0.42387364, -0.48840812, -0.28368041, ..., -0.23486612,
        -0.23396423, -0.21938787],
       [-0.42387364, -1.38303746, -0.28368041, ..., -0.23486612,
        -0.23396423, -0.21938787],
       [-0.42387364, -1.38303746, -0.28368041, ..., -0.23486612,
        -0.23396423, -0.21938787],
       ...,
       [-0.42387364, -1.38303746, -0.28368041, ..., -0.23486612,
        -0.23396423, -0.21938787],
       [-0.42387364, -1.38303746,  1.60088738, ..., -0.23486612,
        -0.23396423, -0.21938787],
       [ 1.57397215,  1.30085057, -0.28368041, ..., -0.23486612,
         4.27415765, -0.21938787]])

In [308]:
mean_absolute_error(stdscl.fit_transform(X_train), MyStandartScaler(X_train))

9.516511820300382e-06

# 7. Fit models with normalization

## 7.1 Fit all models - Linear regression, Ridge, Lasso and ElasticNet with MinMaxScaler.

In [309]:
mm_tr = MyMinMaxScaler(X_train)
mm_ts = MyMinMaxScaler(X_test)

In [310]:
lr_mm = MyLinearRegression()
lr_mm.fit(mm_tr, Y_train.to_numpy())

In [311]:
l2_mm = MyL2()
l2_mm.fit(mm_tr, Y_train)

In [312]:
l1_mm = MyL1()
l1_mm.fit(mm_tr, Y_train)

In [313]:
en_mm = MyElasticNet()
en_mm.fit(mm_tr, Y_train)

## 7.2 Fit all models - Linear regression, Ridge, Lasso and ElasticNet with StandardScaler.

In [314]:
ss_tr = MyStandartScaler(X_train)
ss_ts = MyStandartScaler(X_test)

In [315]:
stds = StandardScaler()
s_tr = stds.fit_transform(X_train)
s_ts = stds.fit_transform(X_test)

In [316]:
lr_ss = MyLinearRegression()
lr_ss.fit(ss_tr, Y_train.to_numpy())

In [317]:
lr_ss.predict(ss_tr)

array([[-1646.94196648],
       [-1374.51847521],
       [  198.05932313],
       ...,
       [-1963.33037731],
       [-1752.86954559],
       [ 3635.51775686]])

In [318]:
l2_ss = MyL2()
l2_ss.fit(ss_tr, Y_train)

In [319]:
l1_ss = MyL1()
l1_ss.fit(ss_tr, Y_train)

In [320]:
en_ss = MyElasticNet()
en_ss.fit(ss_tr, Y_train)

## 7.3 Add all results to our dataframe with metrics on samples.

MinMax

In [321]:
result_MAE.loc[len(result_MAE)] = ['linreg_minmax', 
    mean_absolute_error(Y_train, lr_mm.predict(mm_tr)), 
    mean_absolute_error(Y_test, lr_mm.predict(mm_ts))]
result_RMSE.loc[len(result_RMSE)] = ['linreg_minmax', 
    np.sqrt(mean_squared_error(Y_train, lr_mm.predict(mm_tr))), 
    np.sqrt(mean_squared_error(Y_test, lr_mm.predict(mm_ts)))]
result_R2.loc[len(result_R2)] = ['linreg_minmax', 
    r_squared(Y_train, lr_mm.predict(mm_tr)), 
    r_squared(Y_test, lr_mm.predict(mm_ts))]

  return (1 - (rss / tss)[0])


In [322]:
result_MAE.loc[len(result_MAE)] = ['ridge_minmax', 
    mean_absolute_error(Y_train, l2_mm.predict(mm_tr)), 
    mean_absolute_error(Y_test, l2_mm.predict(mm_ts))]
result_RMSE.loc[len(result_RMSE)] = ['ridge_minmax', 
    np.sqrt(mean_squared_error(Y_train, l2_mm.predict(mm_tr))), 
    np.sqrt(mean_squared_error(Y_test, l2_mm.predict(mm_ts)))]
result_R2.loc[len(result_R2)] = ['ridge_minmax', 
    r_squared(Y_train, l2_mm.predict(mm_tr)), 
    r_squared(Y_test, l2_mm.predict(mm_ts))]

  return (1 - (rss / tss)[0])


In [323]:
result_MAE.loc[len(result_MAE)] = ['lasso_minmax', 
    mean_absolute_error(Y_train, l1_mm.predict(mm_tr)), 
    mean_absolute_error(Y_test, l1_mm.predict(mm_ts))]
result_RMSE.loc[len(result_RMSE)] = ['lasso_minmax', 
    np.sqrt(mean_squared_error(Y_train, l1_mm.predict(mm_tr))), 
    np.sqrt(mean_squared_error(Y_test, l1_mm.predict(mm_ts)))]
result_R2.loc[len(result_R2)] = ['lasso_minmax', 
    r_squared(Y_train, l1_mm.predict(mm_tr)), 
    r_squared(Y_test, l1_mm.predict(mm_ts))]

  return (1 - (rss / tss)[0])


In [324]:
result_MAE.loc[len(result_MAE)] = ['elasticnet_minmax', 
    mean_absolute_error(Y_train, en_mm.predict(mm_tr)), 
    mean_absolute_error(Y_test, en_mm.predict(mm_ts))]
result_RMSE.loc[len(result_RMSE)] = ['elasticnet_minmax', 
    np.sqrt(mean_squared_error(Y_train, en_mm.predict(mm_tr))), 
    np.sqrt(mean_squared_error(Y_test, en_mm.predict(mm_ts)))]
result_R2.loc[len(result_R2)] = ['elasticnet_minmax', 
    r_squared(Y_train, en_mm.predict(mm_tr)), 
    r_squared(Y_test, en_mm.predict(mm_ts))]

  return (1 - (rss / tss)[0])


StandartScaler

In [325]:
result_MAE.loc[len(result_MAE)] = ['linreg_std_scaler', 
    mean_absolute_error(Y_train, lr_ss.predict(ss_tr)), 
    mean_absolute_error(Y_test, lr_ss.predict(ss_ts))]
result_RMSE.loc[len(result_RMSE)] = ['linreg_std_scaler', 
    np.sqrt(mean_squared_error(Y_train, lr_ss.predict(ss_tr))), 
    np.sqrt(mean_squared_error(Y_test, lr_ss.predict(ss_ts)))]
result_R2.loc[len(result_R2)] = ['linreg_std_scaler', 
    r_squared(Y_train, lr_ss.predict(ss_tr)), 
    r_squared(Y_test, lr_ss.predict(ss_ts))]

  return (1 - (rss / tss)[0])


In [326]:
result_MAE.loc[len(result_MAE)] = ['ridge_std_scaler', 
    mean_absolute_error(Y_train, l2_ss.predict(ss_tr)), 
    mean_absolute_error(Y_test, l2_ss.predict(ss_ts))]
result_RMSE.loc[len(result_RMSE)] = ['ridge_std_scaler', 
    np.sqrt(mean_squared_error(Y_train, l2_ss.predict(ss_tr))), 
    np.sqrt(mean_squared_error(Y_test, l2_ss.predict(ss_ts)))]
result_R2.loc[len(result_R2)] = ['ridge_std_scaler', 
    r_squared(Y_train, l2_ss.predict(ss_tr)), 
    r_squared(Y_test, l2_ss.predict(ss_ts))]

  return (1 - (rss / tss)[0])


In [327]:
result_MAE.loc[len(result_MAE)] = ['lasso_std_scaler', 
    mean_absolute_error(Y_train, l1_ss.predict(ss_tr)), 
    mean_absolute_error(Y_test, l1_ss.predict(ss_ts))]
result_RMSE.loc[len(result_RMSE)] = ['lasso_std_scaler', 
    np.sqrt(mean_squared_error(Y_train, l1_ss.predict(ss_tr))), 
    np.sqrt(mean_squared_error(Y_test, l1_ss.predict(ss_ts)))]
result_R2.loc[len(result_R2)] = ['lasso_std_scaler', 
    r_squared(Y_train, l1_ss.predict(ss_tr)), 
    r_squared(Y_test, l1_ss.predict(ss_ts))]

  return (1 - (rss / tss)[0])


In [328]:
result_MAE.loc[len(result_MAE)] = ['elasticnet_std_scaler', 
    mean_absolute_error(Y_train, en_ss.predict(ss_tr)), 
    mean_absolute_error(Y_test, en_ss.predict(ss_ts))]
result_RMSE.loc[len(result_RMSE)] = ['elasticnet_std_scaler', 
    np.sqrt(mean_squared_error(Y_train, en_ss.predict(ss_tr))), 
    np.sqrt(mean_squared_error(Y_test, en_ss.predict(ss_ts)))]
result_R2.loc[len(result_R2)] = ['elasticnet_std_scaler', 
    r_squared(Y_train, en_ss.predict(ss_tr)), 
    r_squared(Y_test, en_ss.predict(ss_ts))]

  return (1 - (rss / tss)[0])


# 8. Overfit models

## 8.1 Let's consider an overfitted model in practice. After theory, you know that polynomial regression is easy to overfit. So let's create toy example and see how regularization works in real life.

## 8.2 In the last lesson we created polynomial features with degree equals 10. Here repeat these steps from the previous lesson, remember, with only 3 basic features - 'bathrooms’, 'bedrooms’, ‘'interest_level'.

## 8.3 And train and fit all our implemented algorithms - Linear regression, Ridge, Lasso and ElasticNet on a set of polynomial features.

In [329]:
result_MAE

Unnamed: 0,model,train,test
0,linreg_default,1142.316182,1055.719475
1,ridge_default,1191.359802,1103.737095
2,lasso_default,1179.047184,1092.315885
3,elasticnet_default,1094.575476,1006.413052
4,linreg_minmax,1478.33095,1383.945672
5,ridge_minmax,1189.028121,1110.485224
6,lasso_minmax,1322.840459,1252.565611
7,elasticnet_minmax,1582.268167,1483.717172
8,linreg_std_scaler,3853.382678,3769.009081
9,ridge_std_scaler,3853.398033,3769.044396


In [330]:
poly = PolynomialFeatures(degree=10)

In [331]:
X_train_poly = poly.fit_transform(X_train[['bathrooms', 'bedrooms', 'interest_level']])
X_test_poly = poly.fit_transform(X_test[['bathrooms', 'bedrooms', 'interest_level']])

In [332]:
lr_poly = MyLinearRegression(alpha=0.0000001 ,print_eps=True)
lr_poly.fit(X_train_poly, Y_train.to_numpy())

126319498.54287638
1882597844050823.5
3.2685369153616364e+22


In [333]:
X_train_poly.shape

(39481, 286)

In [334]:
lr_poly.predict(X_train_poly)

array([[5.76382204e+29],
       [7.46762655e+27],
       [7.46762655e+27],
       ...,
       [7.46762655e+27],
       [9.59357111e+27],
       [1.95514996e+34]])

In [335]:
l2_poly = MyL2()
l2_poly.fit(X_train_poly, Y_train)

In [336]:
l2_poly.predict(X_train_poly)

array([[3314.42015612],
       [2613.53355164],
       [2613.53355164],
       ...,
       [2613.53355164],
       [2248.95533853],
       [5733.38548122]])

In [337]:
l1_poly = MyL1(print_eps=True)
l1_poly.fit(X_train_poly, Y_train)

8402844063899.294
4.16301554410074e+25
2.1922912766496836e+38


In [338]:
l1_poly.predict(X_train_poly)

array([[2.19229128e+38],
       [2.85036382e+36],
       [2.85036382e+36],
       ...,
       [2.85036382e+36],
       [3.66129347e+36],
       [7.43342803e+42]])

In [339]:
en_poly = MyElasticNet(print_eps=True)
en_poly.fit(X_train_poly, Y_train)

4184883396546.573
1.4871781985122735e+25
7.512819854136654e+37


In [340]:
en_poly.predict(X_train_poly)

array([[7.51281985e+37],
       [1.00877762e+36],
       [1.00877762e+36],
       ...,
       [1.00877762e+36],
       [1.29407192e+36],
       [2.53760605e+42]])

## 8.4 Store results of quality metrics in the result dataframe.

In [341]:
result_MAE.loc[len(result_MAE)] = ['linreg_poly', 
    mean_absolute_error(Y_train, lr_poly.predict(X_train_poly)), 
    mean_absolute_error(Y_test, lr_poly.predict(X_test_poly))]
result_RMSE.loc[len(result_RMSE)] = ['linreg_poly', 
    np.sqrt(mean_squared_error(Y_train, lr_poly.predict(X_train_poly))), 
    np.sqrt(mean_squared_error(Y_test, lr_poly.predict(X_test_poly)))]
result_R2.loc[len(result_R2)] = ['linreg_poly', 
    r_squared(Y_train, lr_poly.predict(X_train_poly)), 
    r_squared(Y_test, lr_poly.predict(X_test_poly))]

  return (1 - (rss / tss)[0])


In [342]:
result_MAE.loc[len(result_MAE)] = ['ridge_poly', 
    mean_absolute_error(Y_train, l2_poly.predict(X_train_poly)), 
    mean_absolute_error(Y_test, l2_poly.predict(X_test_poly))]
result_RMSE.loc[len(result_RMSE)] = ['ridge_poly', 
    np.sqrt(mean_squared_error(Y_train, l2_poly.predict(X_train_poly))), 
    np.sqrt(mean_squared_error(Y_test, l2_poly.predict(X_test_poly)))]
result_R2.loc[len(result_R2)] = ['ridge_poly', 
    r_squared(Y_train, l2_poly.predict(X_train_poly)), 
    r_squared(Y_test, l2_poly.predict(X_test_poly))]

  return (1 - (rss / tss)[0])


In [343]:
result_MAE.loc[len(result_MAE)] = ['lasso_poly', 
    mean_absolute_error(Y_train, l1_poly.predict(X_train_poly)), 
    mean_absolute_error(Y_test, l1_poly.predict(X_test_poly))]
result_RMSE.loc[len(result_RMSE)] = ['lasso_poly', 
    np.sqrt(mean_squared_error(Y_train, l1_poly.predict(X_train_poly))), 
    np.sqrt(mean_squared_error(Y_test, l1_poly.predict(X_test_poly)))]
result_R2.loc[len(result_R2)] = ['lasso_poly', 
    r_squared(Y_train, l1_poly.predict(X_train_poly)), 
    r_squared(Y_test, l1_poly.predict(X_test_poly))]

  return (1 - (rss / tss)[0])


In [344]:
result_MAE.loc[len(result_MAE)] = ['elasticnet_poly', 
    mean_absolute_error(Y_train, en_poly.predict(X_train_poly)), 
    mean_absolute_error(Y_test, en_poly.predict(X_test_poly))]
result_RMSE.loc[len(result_RMSE)] = ['elasticnet_poly', 
    np.sqrt(mean_squared_error(Y_train, en_poly.predict(X_train_poly))), 
    np.sqrt(mean_squared_error(Y_test, en_poly.predict(X_test_poly)))]
result_R2.loc[len(result_R2)] = ['elasticnet_poly', 
    r_squared(Y_train, en_poly.predict(X_train_poly)), 
    r_squared(Y_test, en_poly.predict(X_test_poly))]

  return (1 - (rss / tss)[0])


## 8.5 Analyze results, and choose the best model for your opinion.

In [345]:
result_MAE

Unnamed: 0,model,train,test
0,linreg_default,1142.316,1055.719
1,ridge_default,1191.36,1103.737
2,lasso_default,1179.047,1092.316
3,elasticnet_default,1094.575,1006.413
4,linreg_minmax,1478.331,1383.946
5,ridge_minmax,1189.028,1110.485
6,lasso_minmax,1322.84,1252.566
7,elasticnet_minmax,1582.268,1483.717
8,linreg_std_scaler,3853.383,3769.009
9,ridge_std_scaler,3853.398,3769.044


In [346]:
result_RMSE

Unnamed: 0,model,train,test
0,linreg_default,23985.77,10860.49
1,ridge_default,23984.32,10859.92
2,lasso_default,23984.46,10859.54
3,elasticnet_default,23994.16,10875.32
4,linreg_minmax,24027.45,10945.46
5,ridge_minmax,23984.33,10880.42
6,lasso_minmax,24003.72,10904.76
7,elasticnet_minmax,24053.21,10993.01
8,linreg_std_scaler,24290.99,11492.33
9,ridge_std_scaler,24290.97,11492.49


In [347]:
result_R2

Unnamed: 0,model,train,test
0,linreg_default,0.005444149,0.02396308
1,ridge_default,0.005563895,0.02407545
2,lasso_default,0.005552452,0.02414552
3,elasticnet_default,0.004787732,0.02129814
4,linreg_minmax,0.002153741,0.008903205
5,ridge_minmax,0.005563117,0.02281667
6,lasso_minmax,0.004015498,0.01597976
7,elasticnet_minmax,0.0008452241,0.003493213
8,linreg_std_scaler,0.005563642,0.02404681
9,ridge_std_scaler,0.00556582,0.02401476


Лучшие модели по моему:
1. Default: Elastic Net(L1+L2)
2. MinMax : Ridge(L2)
3. StdScal: Ridge(L2)
4. PolyFut: Ridge(L2)

## 8.6 Addition try different alpha parameters of regularization in algorithms, choose best and analyze results.

In [348]:
my_lin_reg = MyLinearRegression()
my_lin_reg.fit(X_train.to_numpy(), Y_train.to_numpy())
yp_my_train = my_lin_reg.predict(X_train.to_numpy())
yp_my_test = my_lin_reg.predict(X_test.to_numpy())
yp_my_train

array([[2200.34005989],
       [2639.23147669],
       [3603.65590906],
       ...,
       [1806.94272377],
       [2149.34727534],
       [7248.39437791]])

In [349]:
my_L2 = MyL2()
my_L2.fit(X_train, Y_train)
yp_my_L2_train = my_L2.predict(X_train)
yp_my_L2_test = my_L2.predict(X_test)

In [350]:
my_L1 = MyL1()
my_L1.fit(X_train, Y_train)
yp_my_L1_train = my_L1.predict(X_train)
yp_my_L1_test = my_L1.predict(X_test)

In [351]:
lr_best_a = 0.0001
first_lr = MyLinearRegression(alpha=lr_best_a)
first_lr.fit(X_train.to_numpy(), Y_train.to_numpy())
lr_min_mse = mean_squared_error(Y_test.to_numpy(), first_lr.predict(X_test.to_numpy()))
for alpha in [num / 10000 for num in range(1, 1000, 10)]:
    my_lin_reg = MyLinearRegression(alpha=alpha)
    my_lin_reg.fit(X_train.to_numpy(), Y_train.to_numpy())
    yp = my_lin_reg.predict(X_test.to_numpy())
    if mean_squared_error(Y_test.to_numpy(), yp) < lr_min_mse:
        lr_best_a = alpha
        lr_min_mse = mean_squared_error(Y_test.to_numpy(), yp)

In [352]:
lr_best_a

0.0191

In [353]:
lr_min_mse

117925521.08554548

In [354]:
l1_best_a = 0.0001
first_l1 = MyL1(alpha=l1_best_a)
first_l1.fit(X_train, Y_train)
l1_min_mse = mean_squared_error(Y_test, first_l1.predict(X_test))
for alpha in [num / 10000 for num in range(1, 1000, 10)]:
    my_l1 = MyL1(alpha=alpha)
    my_l1.fit(X_train, Y_train)
    yp = my_l1.predict(X_test)
    if mean_squared_error(Y_test, yp) < l1_min_mse:
        l1_best_a = alpha
        l1_min_mse = mean_squared_error(Y_test, yp)

In [355]:
l1_best_a

0.0191

In [356]:
l1_min_mse

117925523.02742486

In [357]:
en_best_a = 0.001
first_en = MyElasticNet(alpha=en_best_a)
first_en.fit(X_train, Y_train)
en_min_mse = mean_squared_error(Y_test, first_en.predict(X_test))
for alpha in [num / 1000 for num in range(1, 100, 5)]:
    my_en = MyElasticNet(alpha=alpha)
    my_en.fit(X_train, Y_train)
    yp = my_en.predict(X_test)
    if mean_squared_error(Y_test, yp) < en_min_mse:
        en_best_a = alpha
        en_min_mse = mean_squared_error(Y_test, yp)

In [358]:
en_best_a

0.096

In [359]:
en_min_mse

118272327.66754018

# 9. Native models

## 9.1 Calculate metrics for mean and median from previous lesson and add results to final dataframe.

In [360]:
result_MAE.loc[len(result_MAE)] = ['native_mean', 
                                   mean_absolute_error(Y_train, Y_train.assign(price=Y_train['price'].mean())), 
                                   mean_absolute_error(Y_test, Y_test.assign(price=Y_test['price'].mean()))]
result_RMSE.loc[len(result_MAE)] = ['native_mean', 
                                   np.sqrt(mean_squared_error(Y_train, Y_train.assign(price=Y_train['price'].mean()))), 
                                   np.sqrt(mean_squared_error(Y_test, Y_test.assign(price=Y_test['price'].mean())))]
result_R2.loc[len(result_R2)] = ['native_mean', 
                                 r_squared(Y_train, Y_train.assign(price=Y_train['price'].mean())), 
                                 r_squared(Y_test, Y_test.assign(price=Y_test['price'].mean()))]

  return (1 - (rss / tss)[0])


In [361]:
result_MAE.loc[len(result_MAE)] = ['native_median', 
                                   mean_absolute_error(Y_train, Y_train.assign(price=Y_train['price'].median())), 
                                   mean_absolute_error(Y_test, Y_test.assign(price=Y_test['price'].median()))]
result_RMSE.loc[len(result_MAE)] = ['native_median', 
                                   np.sqrt(mean_squared_error(Y_train, Y_train.assign(price=Y_train['price'].median()))), 
                                   np.sqrt(mean_squared_error(Y_test, Y_test.assign(price=Y_test['price'].median())))]
result_R2.loc[len(result_R2)] = ['native_median', 
                                 r_squared(Y_train, Y_train.assign(price=Y_train['price'].median())), 
                                 r_squared(Y_test, Y_test.assign(price=Y_test['price'].median()))]

  return (1 - (rss / tss)[0])
  return (1 - (rss / tss)[0])


# 10. Compare results

## 10.1 Print your final tables

In [362]:
result_MAE

Unnamed: 0,model,train,test
0,linreg_default,1142.316,1055.719
1,ridge_default,1191.36,1103.737
2,lasso_default,1179.047,1092.316
3,elasticnet_default,1094.575,1006.413
4,linreg_minmax,1478.331,1383.946
5,ridge_minmax,1189.028,1110.485
6,lasso_minmax,1322.84,1252.566
7,elasticnet_minmax,1582.268,1483.717
8,linreg_std_scaler,3853.383,3769.009
9,ridge_std_scaler,3853.398,3769.044


In [363]:
result_RMSE

Unnamed: 0,model,train,test
0,linreg_default,23985.77,10860.49
1,ridge_default,23984.32,10859.92
2,lasso_default,23984.46,10859.54
3,elasticnet_default,23994.16,10875.32
4,linreg_minmax,24027.45,10945.46
5,ridge_minmax,23984.33,10880.42
6,lasso_minmax,24003.72,10904.76
7,elasticnet_minmax,24053.21,10993.01
8,linreg_std_scaler,24290.99,11492.33
9,ridge_std_scaler,24290.97,11492.49


In [364]:
result_R2

Unnamed: 0,model,train,test
0,linreg_default,0.005444149,0.02396308
1,ridge_default,0.005563895,0.02407545
2,lasso_default,0.005552452,0.02414552
3,elasticnet_default,0.004787732,0.02129814
4,linreg_minmax,0.002153741,0.008903205
5,ridge_minmax,0.005563117,0.02281667
6,lasso_minmax,0.004015498,0.01597976
7,elasticnet_minmax,0.0008452241,0.003493213
8,linreg_std_scaler,0.005563642,0.02404681
9,ridge_std_scaler,0.00556582,0.02401476


## 10.2 What is the best model?

Лучшие модели по моему:
1. Default: Elastic Net(L1+L2)
2. MinMax: Ridge(L2)
3. StdScal: Ridge(L2)
4. PolyFut: Ridge(L2)

## 10.3 What is the most stable model?

lasso_minmax

# 11. Addition task

## 11.1 There are some tricks with target variable for better model quality. If we have a distribution with a heavy tail, you can use some monotonic function to “improve” distribution. In practice you can use logarithmic functions. We recommend you do this exercise and compare results. But don’t forget to perform inverse transformation when you will compare metrics.

In [365]:
y_log = np.log(Y_train)
model_log = MyLinearRegression(alpha=0.0191)
model_log.fit(X_train.to_numpy(), y_log)
y_pred_log = model_log.predict(X_test.to_numpy())
y_pred = np.exp(y_pred_log)

In [366]:
mean_absolute_error(Y_test, y_pred)

13945907906.565697

In [367]:
my_lin_reg = MyLinearRegression()
my_lin_reg.fit(X_train.to_numpy(), Y_train.to_numpy())
yp_my_train = my_lin_reg.predict(X_train.to_numpy())
yp_my_test = my_lin_reg.predict(X_test.to_numpy())
yp_my_test

array([[6814.73351337],
       [5634.67497719],
       [4084.96340062],
       ...,
       [3429.37838622],
       [3664.48603369],
       [3752.89784742]])

In [368]:
mean_absolute_error(Y_test, yp_my_test)

1055.7194751495945

## 11.2 Next trick is outliers. The angle of the linear regression line strongly depends on outlier points. And often you should remove these points from !allert! only train data. You should explain why it was removed only from the train sample.  We recommend you do this exercise and compare results.

In [369]:
def cut_by_quantile(df, col_name, q_small, q_big):
    q_low, q_high = df[col_name].quantile([q_small, q_big])
    return df[(df[col_name] > q_low) & (df[col_name] < q_high)]

In [370]:
train = cut_by_quantile(train, 'price', 0.01, 0.99)

In [371]:
# .to_numpy()
X_train = train.drop(['price'], axis=1)
Y_train = train[['price']]

In [372]:
X_test = test.drop(['price'], axis=1)
Y_test = test[['price']]

In [373]:
my_lin_reg = MyLinearRegression()
my_lin_reg.fit(X_train.to_numpy(), Y_train.to_numpy())
yp_my_train = my_lin_reg.predict(X_train.to_numpy())
yp_my_test = my_lin_reg.predict(X_test.to_numpy())

In [374]:
mean_absolute_error(Y_test, yp_my_test)

922.5107124961879

да, ошибка уменьшилась

## 10.3 Also it will be a useful exercise to realize linear regression algorithm with batch training.

In [388]:
class MyLinearRegressionBatch:
    def __init__(self, alpha=0.01, n_cycles=100):
        self.alpha = alpha
        self.n_cycles = n_cycles
        self.w = None

    def fit(self, X, Y):
        n_samples, n_features = X.shape
        self.w = np.random.randn(n_features)

        for _ in range(self.n_cycles):
            for i in range(n_samples):
                random_index = np.random.randint(n_samples)
                xi = X[random_index:random_index+1]
                yi = Y[random_index:random_index+1]
                gradients = 2 * xi.T.dot(xi.dot(self.w) - yi)
                self.w = self.w - self.alpha * gradients

    def predict(self, X):
        return X.dot(self.w)


In [395]:

X_b = np.c_[np.ones((X_train.shape[0], 1)), X_train]
model = MyLinearRegressionBatch()

model.fit(X_b, Y_train.to_numpy())

predictions = model.predict(X_b)

In [416]:
mean_absolute_error(Y_train, predictions[:,0])

758.6376191612227

да, ошибка уменьшилась