На этом занятии мы попробуем задачу регрессии. Данные в этой же папке, будем тренироваться на датасете фильмов с IMDB

Перед обучением обучением модели, нужно подготовить данные:

- найти\собрать данные
- почистить и предобработать
- преобразовать в матрицы 


In [None]:
!pip install gensim -U
!pip install --upgrade numpy

In [1]:
import numpy

In [2]:
# импорты необходимых библиотек
import pandas as pd

# import seaborn as sns
import matplotlib.pyplot as plt
# %matplotlib inline

import gensim
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error




In [89]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [166]:
data = pd.read_csv('/content/drive/MyDrive/IMDB-Movie-Data.csv')
print(data.shape)

data.head(3)

(1000, 12)


Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0


## Что делать с NaN?
Есть 3 варианта

In [None]:
# 1. Убрать строки с NaN
print(data.isna().any())
data.shape

In [None]:
print(data.shape)
tmp = data.dropna()
tmp.shape

In [None]:
# 2. Превратить NaN в 0
print(data.shape)
tmp = data.fillna(0)
print(tmp.shape)

In [167]:
# 3. Превратить NaN в средние значения по колонке

# вычисляем средние для колонок с пустыми значениями
meta_mean = data.Metascore.mean()
rev_mean = data['Revenue (Millions)'].mean()

#заменяем пустоты на средние значения
data.Metascore.fillna(meta_mean, inplace=True)
data['Revenue (Millions)'].fillna(rev_mean, inplace=True)

# проверяем присутствие NaN
data.isna().any()

Rank                  False
Title                 False
Genre                 False
Description           False
Director              False
Actors                False
Year                  False
Runtime (Minutes)     False
Rating                False
Votes                 False
Revenue (Millions)    False
Metascore             False
dtype: bool

## Подготовка данных

Попробуем предсказывать рейтинг фильма по данным его описания, года, длины в минутах и кассовых сборов

Колонка "Rating" станет **целевой переменной, или таргетом** (y)<br>
Остальных данные будут **обучающей выборкой** (X)

In [6]:
data.Description

0      A group of intergalactic criminals are forced ...
1      Following clues to the origin of mankind, a te...
2      Three girls are kidnapped by a man with a diag...
3      In a city of humanoid animals, a hustling thea...
4      A secret government agency recruits some of th...
                             ...                        
995    A tight-knit team of rising investigators, alo...
996    Three American college students studying abroa...
997    Romantic sparks occur between two dance studen...
998    A pair of friends embark on a mission to reuni...
999    A stuffy businessman finds himself trapped ins...
Name: Description, Length: 1000, dtype: object

In [168]:
# подготовим описания фильмов
data["text"] = data.Description.apply(lambda x: x.lower().split()) 

data["text"]

0      [a, group, of, intergalactic, criminals, are, ...
1      [following, clues, to, the, origin, of, mankin...
2      [three, girls, are, kidnapped, by, a, man, wit...
3      [in, a, city, of, humanoid, animals,, a, hustl...
4      [a, secret, government, agency, recruits, some...
                             ...                        
995    [a, tight-knit, team, of, rising, investigator...
996    [three, american, college, students, studying,...
997    [romantic, sparks, occur, between, two, dance,...
998    [a, pair, of, friends, embark, on, a, mission,...
999    [a, stuffy, businessman, finds, himself, trapp...
Name: text, Length: 1000, dtype: object

In [None]:
data.text.values

In [169]:
input_text = list(data.text.values)

In [170]:
documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(input_text)]
documents[10:12]

[TaggedDocument(words=['the', 'adventures', 'of', 'writer', 'newt', 'scamander', 'in', 'new', "york's", 'secret', 'community', 'of', 'witches', 'and', 'wizards', 'seventy', 'years', 'before', 'harry', 'potter', 'reads', 'his', 'book', 'in', 'school.'], tags=[10]),
 TaggedDocument(words=['the', 'story', 'of', 'a', 'team', 'of', 'female', 'african-american', 'mathematicians', 'who', 'served', 'a', 'vital', 'role', 'in', 'nasa', 'during', 'the', 'early', 'years', 'of', 'the', 'u.s.', 'space', 'program.'], tags=[11])]

обучаем модель на текстах описаний фильмов (можно поизменять параметры)

In [95]:
model = Doc2Vec(documents, vector_size=5, window=2, min_count=1, workers=4)

In [96]:
model.save("D2V.model") # сохранение модели

In [97]:
# так можно посмотреть на векторы текстов, на которых училась модель
# индекс [] около documents -- это индекс текста из датасета

model.dv[documents[0].tags[0]]


array([-0.02176344,  0.10645068, -0.12121619, -0.07537015,  0.01603152],
      dtype=float32)

Теперь нужно добавить векторы в датасет с остальными параметрами

In [98]:
# создадим список с векторами для каждого текста
vectors = []
for x in documents:
    vec = list(model.dv[x.tags][0])
    vectors.append(vec)

In [99]:
# так получим датафрейм, где все компоненты векторов в отдельных столбцах
split_df = pd.DataFrame(vectors,
                        columns=['v1', 'v2', 'v3','v4',"v5"])

split_df


Unnamed: 0,v1,v2,v3,v4,v5
0,-0.021763,0.106451,-0.121216,-0.075370,0.016032
1,0.081098,0.017959,-0.019524,-0.427588,0.001407
2,0.215759,0.485485,0.027760,-0.473435,-0.153536
3,-0.027700,0.442416,0.305442,-0.608460,-0.164788
4,0.082145,0.483151,0.022781,-0.404735,-0.002980
...,...,...,...,...,...
995,0.293254,0.515370,0.084858,-0.418623,-0.215410
996,0.033281,0.526380,0.168328,-0.562434,0.007402
997,-0.100525,0.358681,-0.002413,-0.007090,-0.205610
998,-0.008109,0.250834,0.148509,-0.186417,-0.079836


In [100]:
# теперь добавим его к основному датафрейму
result = data.join(split_df, how='left')
result.shape

(1000, 18)

In [17]:
result

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,text,v1,v2,v3,v4,v5
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.130000,76.0,"[a, group, of, intergalactic, criminals, are, ...",-0.050145,-0.011905,-0.108754,-0.055358,0.079037
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.460000,65.0,"[following, clues, to, the, origin, of, mankin...",0.083701,-0.031407,0.017762,-0.531490,0.075088
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.120000,62.0,"[three, girls, are, kidnapped, by, a, man, wit...",0.157337,0.295756,0.053068,-0.470184,-0.028435
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.320000,59.0,"[in, a, city, of, humanoid, animals,, a, hustl...",-0.049044,0.332151,0.387964,-0.781172,-0.029383
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.020000,40.0,"[a, secret, government, agency, recruits, some...",0.036583,0.311901,0.075132,-0.485156,0.149358
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,Secret in Their Eyes,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585,82.956376,45.0,"[a, tight-knit, team, of, rising, investigator...",0.253390,0.384887,0.134213,-0.494408,-0.092510
996,997,Hostel: Part II,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.540000,46.0,"[three, american, college, students, studying,...",-0.033330,0.325146,0.195590,-0.547036,0.149406
997,998,Step Up 2: The Streets,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.010000,50.0,"[romantic, sparks, occur, between, two, dance,...",-0.108421,0.320174,0.052072,-0.140763,-0.129928
998,999,Search Party,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,82.956376,22.0,"[a, pair, of, friends, embark, on, a, mission,...",-0.030535,0.166419,0.165587,-0.180988,-0.021532


In [101]:
# переопределим датасет, оставив только важное

data_sm = result[['Runtime (Minutes)',"Year",
                'Rating', 'Votes',
                'Revenue (Millions)','Metascore','v1', 'v2', 'v3','v4',"v5"]
              ]


data_sm.head(3)

Unnamed: 0,Runtime (Minutes),Year,Rating,Votes,Revenue (Millions),Metascore,v1,v2,v3,v4,v5
0,121,2014,8.1,757074,333.13,76.0,-0.021763,0.106451,-0.121216,-0.07537,0.016032
1,124,2012,7.0,485820,126.46,65.0,0.081098,0.017959,-0.019524,-0.427588,0.001407
2,117,2016,7.3,157606,138.12,62.0,0.215759,0.485485,0.02776,-0.473435,-0.153536


## Подготавливаем матрицы

In [102]:
# определяем X и y

X = data_sm.drop(["Rating"],axis=1).values 

display(X, X.shape)

array([[ 1.21000000e+02,  2.01400000e+03,  7.57074000e+05, ...,
        -1.21216185e-01, -7.53701478e-02,  1.60315223e-02],
       [ 1.24000000e+02,  2.01200000e+03,  4.85820000e+05, ...,
        -1.95235666e-02, -4.27588463e-01,  1.40741654e-03],
       [ 1.17000000e+02,  2.01600000e+03,  1.57606000e+05, ...,
         2.77595092e-02, -4.73434716e-01, -1.53536499e-01],
       ...,
       [ 9.80000000e+01,  2.00800000e+03,  7.06990000e+04, ...,
        -2.41294969e-03, -7.09009357e-03, -2.05610499e-01],
       [ 9.30000000e+01,  2.01400000e+03,  4.88100000e+03, ...,
         1.48509011e-01, -1.86417475e-01, -7.98363611e-02],
       [ 8.70000000e+01,  2.01600000e+03,  1.24350000e+04, ...,
         1.55649811e-01, -4.27935869e-01,  4.07249816e-02]])

(1000, 10)

In [20]:
data_sm.isna().any()

Runtime (Minutes)     False
Year                  False
Rating                False
Votes                 False
Revenue (Millions)    False
Metascore             False
v1                    False
v2                    False
v3                    False
v4                    False
v5                    False
dtype: bool

In [103]:
y = data_sm['Rating'].values # отдельно вынесли массив со значениями скорости ветра
y.shape

(1000,)

Иногда бывает полезно [нормализовать](https://en.wikipedia.org/wiki/Normalization_(statistics)) данные: это позволяет исправить ситуацию, когда признаки представлены в разных единацах измерения. 
Для этого используется StandardScaler. 

До нормализации:

In [22]:
list(X[0])

[121.0,
 2014.0,
 757074.0,
 333.13,
 76.0,
 -0.050145380198955536,
 -0.011905302293598652,
 -0.10875413566827774,
 -0.05535794794559479,
 0.07903745770454407]

In [104]:
# использзуем стандартизатор
sc = StandardScaler()

X_train, X_test, y_train, y_test = train_test_split(sc.fit_transform(X), y, random_state=42)

После:

In [24]:
list(sc.fit_transform(X)[0])

[0.4163497512303056,
 0.37979525138136244,
 3.1126899627963738,
 2.5961363010556906,
 1.0233613578368184,
 -1.3801969745484304,
 -1.8414515823153261,
 -2.2349698896946917,
 2.182290534870191,
 0.33313671367619974]

теперь с данными удобнее работать и обучать

In [105]:
# задаем модель регрессора
# силу регуляризации можно варьировать параметром alpha
regressor = Ridge() 


# обучаем
regressor.fit(X_train, y_train)

Ridge()

In [106]:
# давайте предскажем результат для тестовой выборки

y_preds = regressor.predict(X_test)

### оценка результатов алгоритма

В качестве метрики будем использовать [среднюю абсолютную ошибку](https://www.youtube.com/watch?v=ZejnwbcU8nw). Она показывает отклонение от правильного ответа в тех же единах измерения

*(а вообще есть [разные способы](https://towardsdatascience.com/what-are-the-best-metrics-to-evaluate-your-regression-model-418ca481755b))*

In [27]:
mean_absolute_error(y_test, y_preds) 

0.4996954350084848

Попробуйте разные значения для параметра регуляризации alpha при обучении модели. Как они влияют на величину ошибки?

## Домашнее задание

In [28]:
import math

In [179]:
def metrics(y_test, y_preds):
  mae = mean_absolute_error(y_test, y_preds) 
  mse = mean_squared_error(y_test, y_preds) 
  rmse = math.sqrt(mean_squared_error(y_test, y_preds))
  return(f'MAE = {mae:.4f}, MSE = {mse:.4f}, RMSE = {rmse:.4f}')

In [107]:
# Результаты семинара
metrics(y_test, y_preds)

MAE = 0.4993, MSE = 0.4732, RMSE = 0.6879


### Пунктуация

In [108]:
from string import punctuation
def preprocess(text):
    tokens = text.lower().split()
    tokens = [token.strip(punctuation) for token in tokens]
    return tokens

In [159]:
data["text"] = data.Description.apply(preprocess) 
input_text = list(data.text.values)

In [160]:
def return_result(input_text, vector_size=5, window=2, min_count=1, workers=4):
  documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(input_text)]
  model = Doc2Vec(documents, vector_size=vector_size, window=window, min_count=min_count, workers=workers)
  vectors = []
  for x in documents:
      vec = list(model.dv[x.tags][0])
      vectors.append(vec)
  split_df = pd.DataFrame(vectors,
                          columns=['v1', 'v2', 'v3','v4',"v5"])
  result = data.join(split_df, how='left')
  data_sm = result[['Runtime (Minutes)',"Year",
                  'Rating', 'Votes',
                  'Revenue (Millions)','Metascore','v1', 'v2', 'v3','v4',"v5"]
                ]
  X = data_sm.drop(["Rating"],axis=1).values 
  y = data_sm['Rating'].values 

  sc = StandardScaler()
  X_train, X_test, y_train, y_test = train_test_split(sc.fit_transform(X), y, random_state=42)
  regressor = Ridge() 
  regressor.fit(X_train, y_train)
  y_preds = regressor.predict(X_test)
  return metrics(y_test, y_preds)

In [162]:
return_result(input_text)

MAE = 0.4997, MSE = 0.4753, RMSE = 0.6894


Удаление пунктуации почти не повлияло - MAE увеличилась на 0.0004

### Без нормализации


In [163]:
X = data_sm.drop(["Rating"],axis=1).values 
y = data_sm['Rating'].values 

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

regressor = Ridge() 
regressor.fit(X_train, y_train)
y_preds = regressor.predict(X_test)

In [164]:
metrics(y_test, y_preds)

MAE = 0.4980, MSE = 0.4720, RMSE = 0.6870


На удивление, без нормализации результат (совсем немного) лучше: MAE на 0,001 ниже.

### Разные параметры модели

In [180]:
# Поизменяем размер окна
windows = [1, 3, 5, 8, 10, 20]
for w in windows:
  res = return_result(input_text, vector_size=5, window=w, min_count=1, workers=4)
  print(f'Window={w}', res)

Window=1 MAE = 0.5002, MSE = 0.4756, RMSE = 0.6896
Window=3 MAE = 0.5003, MSE = 0.4753, RMSE = 0.6894
Window=5 MAE = 0.4986, MSE = 0.4739, RMSE = 0.6884
Window=8 MAE = 0.4981, MSE = 0.4723, RMSE = 0.6873
Window=10 MAE = 0.4995, MSE = 0.4753, RMSE = 0.6894
Window=20 MAE = 0.4985, MSE = 0.4741, RMSE = 0.6885


Самая низкая MAE при окне 8.


In [182]:
# Поизменяем минимальную частоту встречаемости
m_count = [1, 3, 5, 8, 10, 20, 50]
for m in m_count:
  res = return_result(input_text, vector_size=5, window=2, min_count=m, workers=4)
  print(f'Min count={m}', res)

Min count=1 MAE = 0.4983, MSE = 0.4749, RMSE = 0.6892
Min count=3 MAE = 0.4994, MSE = 0.4760, RMSE = 0.6899
Min count=5 MAE = 0.4997, MSE = 0.4763, RMSE = 0.6901
Min count=8 MAE = 0.4987, MSE = 0.4751, RMSE = 0.6893
Min count=10 MAE = 0.4983, MSE = 0.4751, RMSE = 0.6892
Min count=20 MAE = 0.4978, MSE = 0.4764, RMSE = 0.6902
Min count=50 MAE = 0.5000, MSE = 0.4765, RMSE = 0.6903


Самый лучший результат при минимальной частоте встречаемости = 20 - позволяет фильтровать самые редкие слова.

In [183]:
# А теперь число workers
workers = [1, 3, 5, 8, 10, 20, 50]
for w in workers:
  res = return_result(input_text, vector_size=5, window=2, min_count=1, workers=w)
  print(f'Workers={w}', res)

Workers=1 MAE = 0.5002, MSE = 0.4743, RMSE = 0.6887
Workers=3 MAE = 0.5001, MSE = 0.4754, RMSE = 0.6895
Workers=5 MAE = 0.4996, MSE = 0.4742, RMSE = 0.6886
Workers=8 MAE = 0.4999, MSE = 0.4745, RMSE = 0.6889
Workers=10 MAE = 0.4985, MSE = 0.4736, RMSE = 0.6882
Workers=20 MAE = 0.4986, MSE = 0.4749, RMSE = 0.6891
Workers=50 MAE = 0.4991, MSE = 0.4738, RMSE = 0.6883


Увеличение числа workers идет на пользу: самая низкая MAE при числе workers=10.

### Разные параметры alpha для Ridge

In [191]:
X_train, X_test, y_train, y_test = train_test_split(sc.fit_transform(X), y, random_state=42)

In [192]:
def do_ridge_regression(n):
  regressor = Ridge(n) 
  regressor.fit(X_train, y_train)
  y_preds1 = regressor.predict(X_test)
  return metrics(y_test, y_preds1)


In [197]:
alphas = [0.001, 0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 1000, 2000, 5000]
for a in alphas:
  print(f'Alpha={a}', do_ridge_regression(a))

Alpha=0.001 MAE = 0.4994, MSE = 0.4731, RMSE = 0.6878
Alpha=0.01 MAE = 0.4994, MSE = 0.4731, RMSE = 0.6878
Alpha=0.1 MAE = 0.4994, MSE = 0.4731, RMSE = 0.6878
Alpha=0.5 MAE = 0.4994, MSE = 0.4731, RMSE = 0.6878
Alpha=1 MAE = 0.4993, MSE = 0.4732, RMSE = 0.6879
Alpha=2 MAE = 0.4993, MSE = 0.4732, RMSE = 0.6879
Alpha=5 MAE = 0.4991, MSE = 0.4734, RMSE = 0.6880
Alpha=10 MAE = 0.4989, MSE = 0.4737, RMSE = 0.6883
Alpha=20 MAE = 0.4986, MSE = 0.4745, RMSE = 0.6888
Alpha=50 MAE = 0.4983, MSE = 0.4771, RMSE = 0.6907
Alpha=100 MAE = 0.4994, MSE = 0.4823, RMSE = 0.6945
Alpha=1000 MAE = 0.5661, MSE = 0.5966, RMSE = 0.7724
Alpha=2000 MAE = 0.6218, MSE = 0.6825, RMSE = 0.8262
Alpha=5000 MAE = 0.6906, MSE = 0.8027, RMSE = 0.8960


При alpha от 0 до 100 нет значительной разницы в результатах, но самая низкая MAE при alpha=50.
Однако при alpha>=1000 величина средней ошибки значительно увеличивается.

### Linear Regression

In [198]:
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_preds = regressor.predict(X_test)

In [199]:
metrics(y_test, y_preds)

'MAE = 0.4994, MSE = 0.4731, RMSE = 0.6878'

Величина ошибки сопоставима с ошибкой у Ridge.

### Lasso

In [201]:
def do_lasso_regression(n):
  regressor = Lasso(n)
  regressor.fit(X_train, y_train)
  y_preds = regressor.predict(X_test)
  return metrics(y_test, y_preds)

In [202]:
alphas = [0.001, 0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 1000, 2000, 5000]
for a in alphas:
  print(f'Alpha={a}', do_lasso_regression(a))

Alpha=0.001 MAE = 0.4986, MSE = 0.4726, RMSE = 0.6875
Alpha=0.01 MAE = 0.4941, MSE = 0.4713, RMSE = 0.6865
Alpha=0.1 MAE = 0.4971, MSE = 0.4956, RMSE = 0.7040
Alpha=0.5 MAE = 0.7475, MSE = 0.9174, RMSE = 0.9578
Alpha=1 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=2 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=5 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=10 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=20 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=50 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=100 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=1000 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=2000 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866
Alpha=5000 MAE = 0.7736, MSE = 0.9734, RMSE = 0.9866


Для Lasso полходят только совсем маленькие alpha (<=0.1). При больших alpha величина средней ошибки резко растет и становится значительно выше, чем была у ridge regression.