<a href="https://colab.research.google.com/github/jproctor-rebecca/DS-Unit-2-Linear-Models/blob/master/module3-ridge-regression/LS_DS_213_assignment_RJProctor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 3*

---

# Ridge Regression

## Assignment

We're going back to our other **New York City** real estate dataset. Instead of predicting apartment rents, you'll predict property sales prices.

But not just for condos in Tribeca...

- [ ] Use a subset of the data where `BUILDING_CLASS_CATEGORY` == `'01 ONE FAMILY DWELLINGS'` and the sale price was more than 100 thousand and less than 2 million.
- [ ] Do train/test split. Use data from January — March 2019 to train. Use data from April 2019 to test.
- [ ] Do one-hot encoding of categorical features.
- [ ] Do feature selection with `SelectKBest`.
- [ ] Fit a ridge regression model with multiple features. Use the `normalize=True` parameter (or do [feature scaling](https://scikit-learn.org/stable/modules/preprocessing.html) beforehand — use the scaler's `fit_transform` method with the train set, and the scaler's `transform` method with the test set)
- [ ] Get mean absolute error for the test set.
- [ ] As always, commit your notebook to your fork of the GitHub repo.

The [NYC Department of Finance](https://www1.nyc.gov/site/finance/taxes/property-rolling-sales-data.page) has a glossary of property sales terms and NYC Building Class Code Descriptions. The data comes from the [NYC OpenData](https://data.cityofnewyork.us/browse?q=NYC%20calendar%20sales) portal.


## Stretch Goals

Don't worry, you aren't expected to do all these stretch goals! These are just ideas to consider and choose from.

- [ ] Add your own stretch goal(s) !
- [ ] Instead of `Ridge`, try `LinearRegression`. Depending on how many features you select, your errors will probably blow up! 💥
- [ ] Instead of `Ridge`, try [`RidgeCV`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html).
- [ ] Learn more about feature selection:
    - ["Permutation importance"](https://www.kaggle.com/dansbecker/permutation-importance)
    - [scikit-learn's User Guide for Feature Selection](https://scikit-learn.org/stable/modules/feature_selection.html)
    - [mlxtend](http://rasbt.github.io/mlxtend/) library
    - scikit-learn-contrib libraries: [boruta_py](https://github.com/scikit-learn-contrib/boruta_py) & [stability-selection](https://github.com/scikit-learn-contrib/stability-selection)
    - [_Feature Engineering and Selection_](http://www.feat.engineering/) by Kuhn & Johnson.
- [ ] Try [statsmodels](https://www.statsmodels.org/stable/index.html) if you’re interested in more inferential statistical approach to linear regression and feature selection, looking at p values and 95% confidence intervals for the coefficients.
- [ ] Read [_An Introduction to Statistical Learning_](http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf), Chapters 1-3, for more math & theory, but in an accessible, readable way.
- [ ] Try [scikit-learn pipelines](https://scikit-learn.org/stable/modules/compose.html).

## Import Data & Data Wrangling

In [1]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Applied-Modeling/master/data/'
    !pip install category_encoders==2.*

# If you're working locally:
else:
    DATA_PATH = '../data/'
    
# Ignore this Numpy warning when using Plotly Express:
# FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.
import warnings
warnings.filterwarnings(action='ignore', category=FutureWarning, module='numpy')


In [2]:
import pandas as pd
import pandas_profiling

# Read New York City property sales data
df = pd.read_csv(DATA_PATH+'condos/NYC_Citywide_Rolling_Calendar_Sales.csv', 
                 parse_dates=['SALE DATE'],
                 index_col='SALE DATE')

# Change column names: replace spaces with underscores
df.columns = [col.replace(' ', '_') for col in df]

# SALE_PRICE was read as strings.
# Remove symbols, convert to integer
df['SALE_PRICE'] = (
    df['SALE_PRICE']
    .str.replace('$','')
    .str.replace('-','')
    .str.replace(',','')
    .astype(int)
)


In [3]:
# BOROUGH is a numeric column, but arguably should be a categorical feature,
# so convert it from a number to a string
df['BOROUGH'] = df['BOROUGH'].astype(str)


In [4]:
# Reduce cardinality for NEIGHBORHOOD feature

# Get a list of the top 10 neighborhoods
top10 = df['NEIGHBORHOOD'].value_counts()[:10].index

# At locations where the neighborhood is NOT in the top 10, 
# replace the neighborhood with 'OTHER'
df.loc[~df['NEIGHBORHOOD'].isin(top10), 'NEIGHBORHOOD'] = 'OTHER'


In [5]:
df.head()

Unnamed: 0_level_0,BOROUGH,NEIGHBORHOOD,BUILDING_CLASS_CATEGORY,TAX_CLASS_AT_PRESENT,BLOCK,LOT,EASE-MENT,BUILDING_CLASS_AT_PRESENT,ADDRESS,APARTMENT_NUMBER,ZIP_CODE,RESIDENTIAL_UNITS,COMMERCIAL_UNITS,TOTAL_UNITS,LAND_SQUARE_FEET,GROSS_SQUARE_FEET,YEAR_BUILT,TAX_CLASS_AT_TIME_OF_SALE,BUILDING_CLASS_AT_TIME_OF_SALE,SALE_PRICE
SALE DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2019-01-01,1,OTHER,13 CONDOS - ELEVATOR APARTMENTS,2,716,1246,,R4,"447 WEST 18TH STREET, PH12A",PH12A,10011.0,1.0,0.0,1.0,10733,1979.0,2007.0,2,R4,0
2019-01-01,1,OTHER,21 OFFICE BUILDINGS,4,812,68,,O5,144 WEST 37TH STREET,,10018.0,0.0,6.0,6.0,2962,15435.0,1920.0,4,O5,0
2019-01-01,1,OTHER,21 OFFICE BUILDINGS,4,839,69,,O5,40 WEST 38TH STREET,,10018.0,0.0,7.0,7.0,2074,11332.0,1930.0,4,O5,0
2019-01-01,1,OTHER,13 CONDOS - ELEVATOR APARTMENTS,2,592,1041,,R4,"1 SHERIDAN SQUARE, 8C",8C,10014.0,1.0,0.0,1.0,0,500.0,0.0,2,R4,0
2019-01-01,1,UPPER EAST SIDE (59-79),15 CONDOS - 2-10 UNIT RESIDENTIAL,2C,1379,1402,,R1,"20 EAST 65TH STREET, B",B,10065.0,1.0,0.0,1.0,0,6406.0,0.0,2,R1,0


In [6]:
df['SALE_PRICE'].head(200)


SALE DATE
2019-01-01          0
2019-01-01          0
2019-01-01          0
2019-01-01          0
2019-01-01          0
               ...   
2019-01-03    1600000
2019-01-03    5243947
2019-01-03     150000
2019-01-03     849000
2019-01-03    1200000
Name: SALE_PRICE, Length: 200, dtype: int64

## EDA

In [7]:
df.shape

(23040, 20)

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 23040 entries, 2019-01-01 to 2019-04-30
Data columns (total 20 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   BOROUGH                         23040 non-null  object 
 1   NEIGHBORHOOD                    23040 non-null  object 
 2   BUILDING_CLASS_CATEGORY         23040 non-null  object 
 3   TAX_CLASS_AT_PRESENT            23039 non-null  object 
 4   BLOCK                           23040 non-null  int64  
 5   LOT                             23040 non-null  int64  
 6   EASE-MENT                       0 non-null      float64
 7   BUILDING_CLASS_AT_PRESENT       23039 non-null  object 
 8   ADDRESS                         23040 non-null  object 
 9   APARTMENT_NUMBER                5201 non-null   object 
 10  ZIP_CODE                        23039 non-null  float64
 11  RESIDENTIAL_UNITS               23039 non-null  float64
 12  COMMERCIAL_UNIT

In [9]:
# map categorical values
import matplotlib
import matplotlib.axes
import matplotlib.pyplot as plt

import pandas as pd
import numpy as np

df['BUILDING_CLASS_AT_TIME_OF_SALE'].value_counts(normalize=True).plot(kind='barh')
#plt.show()


<matplotlib.axes._subplots.AxesSubplot at 0x7fcc0a1539b0>

In [10]:
# map dependent variable/target
df['SALE_PRICE'].hist()


<matplotlib.axes._subplots.AxesSubplot at 0x7fcc0a1539b0>

#### Subset Data (initial subset of data)

1. sale price was more than 100 thousand and less than 2 million; create traditional mask
2. BUILDING_CLASS_CATEGORY == 'A5' == single family; use OneHotEncoder to identify and create mask/subset



In [11]:
mask = (df['SALE_PRICE']>100000) & (df['SALE_PRICE']<2000000) & (df['BUILDING_CLASS_CATEGORY']=='01 ONE FAMILY DWELLINGS')
df = df[mask]
df.head()

Unnamed: 0_level_0,BOROUGH,NEIGHBORHOOD,BUILDING_CLASS_CATEGORY,TAX_CLASS_AT_PRESENT,BLOCK,LOT,EASE-MENT,BUILDING_CLASS_AT_PRESENT,ADDRESS,APARTMENT_NUMBER,ZIP_CODE,RESIDENTIAL_UNITS,COMMERCIAL_UNITS,TOTAL_UNITS,LAND_SQUARE_FEET,GROSS_SQUARE_FEET,YEAR_BUILT,TAX_CLASS_AT_TIME_OF_SALE,BUILDING_CLASS_AT_TIME_OF_SALE,SALE_PRICE
SALE DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2019-01-01,3,OTHER,01 ONE FAMILY DWELLINGS,1,5495,801,,A9,4832 BAY PARKWAY,,11230.0,1.0,0.0,1.0,6800,1325.0,1930.0,1,A9,550000
2019-01-01,4,OTHER,01 ONE FAMILY DWELLINGS,1,7918,72,,A1,80-23 232ND STREET,,11427.0,1.0,0.0,1.0,4000,2001.0,1940.0,1,A1,200000
2019-01-02,2,OTHER,01 ONE FAMILY DWELLINGS,1,4210,19,,A1,1260 RHINELANDER AVE,,10461.0,1.0,0.0,1.0,3500,2043.0,1925.0,1,A1,810000
2019-01-02,3,OTHER,01 ONE FAMILY DWELLINGS,1,5212,69,,A1,469 E 25TH ST,,11226.0,1.0,0.0,1.0,4000,2680.0,1899.0,1,A1,125000
2019-01-02,3,OTHER,01 ONE FAMILY DWELLINGS,1,7930,121,,A5,5521 WHITTY LANE,,11203.0,1.0,0.0,1.0,1710,1872.0,1940.0,1,A5,620000


In [12]:
# Use data with sale prices from 100K to 2M (for train and test)
#min_cutoff = 100000
#max_cutoff =2000000
#mask = min_cutoff < X.SALE_PRICE < max_cutoff

# split data into training and validation sets using mask (cuttoff sale price)
#X, y = X.loc[mask], y.loc[mask]
#X, y = X.loc[~mask], y.loc[~mask]


#### Train/validation split
1. Use data from January — March 2019 to train. 
2. Use data from April 2019 to test.

In [13]:
# subset data into target array
target = 'SALE_PRICE'
y = df[target]

# subset data into feature matrix and drop df noise and missing values
X = df.drop([target]+['NEIGHBORHOOD',	'EASE-MENT', 
                      'ADDRESS', 'APARTMENT_NUMBER'], axis=1)


In [14]:
# Use data from January & March 2019 to train
# Use data from April 2019 to test
cutoff = '2019-03-31'
mask = X.index < cutoff

# split data into training and validation sets using mask (cuttoff dates)
X_train, y_train = X.loc[mask], y.loc[mask]
X_val, y_val = X.loc[~mask], y.loc[~mask]


In [15]:
# coerce shape of training data
assert X_train.shape[0] + X_val.shape[0] == X.shape[0]


In [16]:
y_train.head()

SALE DATE
2019-01-01    550000
2019-01-01    200000
2019-01-02    810000
2019-01-02    125000
2019-01-02    620000
Name: SALE_PRICE, dtype: int64

#### Establish Training Data Baseline


In [17]:
# map raw values of training dataset
y_train.hist()


<matplotlib.axes._subplots.AxesSubplot at 0x7fcc0a1539b0>

In [18]:
# mean of raw values of training dataset
y_train.mean()


621573.7423214999

In [19]:
# MAE of raw values of training dataset
from sklearn.metrics import mean_absolute_error

print('Baseline MAE: ', mean_absolute_error(y_train, [y_train.mean()]*len(y_train)))


Baseline MAE:  214721.52773001452


In [20]:
X.head()

Unnamed: 0_level_0,BOROUGH,BUILDING_CLASS_CATEGORY,TAX_CLASS_AT_PRESENT,BLOCK,LOT,BUILDING_CLASS_AT_PRESENT,ZIP_CODE,RESIDENTIAL_UNITS,COMMERCIAL_UNITS,TOTAL_UNITS,LAND_SQUARE_FEET,GROSS_SQUARE_FEET,YEAR_BUILT,TAX_CLASS_AT_TIME_OF_SALE,BUILDING_CLASS_AT_TIME_OF_SALE
SALE DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2019-01-01,3,01 ONE FAMILY DWELLINGS,1,5495,801,A9,11230.0,1.0,0.0,1.0,6800,1325.0,1930.0,1,A9
2019-01-01,4,01 ONE FAMILY DWELLINGS,1,7918,72,A1,11427.0,1.0,0.0,1.0,4000,2001.0,1940.0,1,A1
2019-01-02,2,01 ONE FAMILY DWELLINGS,1,4210,19,A1,10461.0,1.0,0.0,1.0,3500,2043.0,1925.0,1,A1
2019-01-02,3,01 ONE FAMILY DWELLINGS,1,5212,69,A1,11226.0,1.0,0.0,1.0,4000,2680.0,1899.0,1,A1
2019-01-02,3,01 ONE FAMILY DWELLINGS,1,7930,121,A5,11203.0,1.0,0.0,1.0,1710,1872.0,1940.0,1,A5


In [21]:
# identified any categorical variables and unidentified strings prior to transform
categoricals = X_train.select_dtypes(exclude='number').columns.tolist()
categoricals

['BOROUGH',
 'BUILDING_CLASS_CATEGORY',
 'TAX_CLASS_AT_PRESENT',
 'BUILDING_CLASS_AT_PRESENT',
 'LAND_SQUARE_FEET',
 'BUILDING_CLASS_AT_TIME_OF_SALE']

#### One-hot encoding of categorical features

In [22]:
from sklearn.linear_model import LinearRegression
from category_encoders import OneHotEncoder

  import pandas.util.testing as tm


In [23]:
# Instantiate Transformer
# creates a new column for each item in columns 'BUILDING_CLASS_AT_PRESENT' & 
# 'BUILDING_CLASS_AT_TIME_OF_SALE' & 'BUILDING_CLASS_CATEGORY'
transformer_1 = OneHotEncoder(use_cat_names=True, 
                              cols=categoricals)

# Fit transformer to the data
transformer_1.fit(X_train)

# Transform our training data
XT_train = transformer_1.transform(X_train)


#### Feature selection with SelectKBest

In [24]:
from sklearn.feature_selection import SelectKBest, f_regression

In [25]:
#XT_train['']

In [26]:
# Instantiate the Transformer
transformer_2 = SelectKBest(k=19)     # k is a hyperparameter

# Fit transformer to the training data
transformer_2.fit(XT_train, y_train)

# Transform the training data
XTT_train = transformer_2.transform(XT_train)


 145 146 153 175 176 184 190 207 210 212 218 225 227 232 236 237 238 266
 273 279 284 297 303 304 315 316 317 320 325 333 337 342 343 345 348 350
 354 358 368 374 379 380 381 386 387 396 402 409 415 421 423 424 428 432
 433 439 445 446 458 465 468 472 473 479 483 486 490 492 499 528 532 535
 542 543 547 550 551 552 559 560 571 574 575 577 582 583 584 588 590 597
 599 606 616 617 621 626 627 629 630 636 639 640 643 645 647 648 650 653
 656 660 666 667 671 672 685 687 693 699 700 719 722 723 727 729 734 744
 751 760 762 768 770 783 796 798 807 811 815 817 823 824 830 831 838 840
 843 852 854 855 864 867 882 885 896 898 899 900 901 902 908 909 913 916
 927] are constant.
  f = msb / msw
  f = msb / msw


#### Stretch Goal 2 - LinearRegression
Depending on how many features you select, your errors will probably blow up! 💥

#####Training Metric

In [27]:
# Instantiate Model
predictor = LinearRegression()

# Fit model to transformed training data
predictor.fit(XTT_train, y_train)


LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

#####Validation Metric

In [28]:
# Make predictions (TRANSFOMED training data)
# create data for best fit line
y_pred_LR = predictor.predict(XTT_train)


# Calculate MAE
print('Training MAE:', mean_absolute_error(y_train, y_pred_LR))

Training MAE: 210817.62292398006


#### Stretch Goal 1 - Define your own Stretch Goal
LinearRegression with Polynomials


######Training Metric

In [None]:
# fit a linear regression with polynomial features
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# instantiate model
poly_model = make_pipeline(PolynomialFeatures(15),
                           LinearRegression())

# Fit model to transformed training data
poly_model.fit(XTT_train, y_train)

### runtimes break###

######Validation Metric

In [30]:
# Make predictions (TRANSFOMED training data)
# create data for best fit line
y_pred_LRpoly = predictor.predict(XTT_train)


# Calculate MAE
print('Training MAE:', mean_absolute_error(y_train, y_pred_LRpoly))


Training MAE: 210817.62292398006


#### Fit a ridge regression model with multiple features. 

1. Use the normalize=True parameter 

or

1.  Use feature scaling beforehand — scaler's fit_transform method with the train set, and the scaler's transform method with the test set

######Training Metric

In [31]:
# fit a ridge regression with polynomial features
from sklearn.linear_model import Ridge
# typically you would run multiple ridge regressions

# Instantiate
ridge_model = Ridge(alpha=0.2) # Alpha is a HYPERPARAMETER
      # alpha is the compensator for small amounts of data/reduces
      # bias and increases variance

# Fit to training data
ridge_model.fit(XTT_train, y_train)


Ridge(alpha=0.2, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

######Validation Metrics

In [32]:
# Make predictions (TRANSFOMED training data)
# create data for best fit line
y_pred_ridge = ridge_model.predict(XTT_train)


# Calculate MAE
print('Training MAE:', mean_absolute_error(y_train, y_pred_ridge))

Training MAE: 211468.48421794592


#### Stretch Goal 3 - RidgeCV 


######Training Metrics

In [None]:
# fit a ridgeCV regression with polynomial features
from sklearn.linear_model import RidgeCV

# instantiate model
ridgeCV_model = RidgeCV(XTT_train, y_train)

# fit model to transformed training data
ridgeCV_model.fit(XTT_train, y_train)

###breaks on polynomials features###

######Validation Metrics

In [None]:
# Make predictions (TRANSFOMED training data)
# create data for best fit line
y_pred_ridgeCV = ridgeCV_model.predict(XTT_train)


# Calculate MAE and cross validation score
print('Training MAE:', mean_absolute_error(y_train, y_pred_ridgeCV))
print('Training Cross-Validation Score:', score(XTT_train, y_pred_ridgeCV))


In [None]:
# map 

import matplotlib.pyplot as plt
import numpy as np


# create figure
fig, ax = plt.subplots(figsize=(10,5))

# map raw data
ax.scatter(X,y)

# plot regression models
ax.plot(XTT_train, y_train_LR, linestyle='--', label='linear regression')
ax.plot(XTT_train, y_train_LRpoly, linestyle = '-.', label='linear regression (polynomial)')
ax.plot(XTT_train, y_train_ridge, label='ridge regression')
ax.plot(XTT_train, y_train_ridgeCV, label='ridge regression (CV)')

# define figure properties
ax.set_ylim([0, 10])

ax.set_xlabel('Condos Recently Sold')
ax.set_ylabel('Sales Price')
ax.legend()

# save and show figure
#plt.savefig('mod3_obj4_ridge.png', transparent=False, dpi=150)
plt.show()

#### Stretch Goal 4 Learn more
    - ["Permutation importance"](https://www.kaggle.com/dansbecker/permutation-importance)
    - [scikit-learn's User Guide for Feature Selection](https://scikit-learn.org/stable/modules/feature_selection.html)
    - [mlxtend](http://rasbt.github.io/mlxtend/) library
    - scikit-learn-contrib libraries: [boruta_py](https://github.com/scikit-learn-contrib/boruta_py) & 
    [stability-selection](https://github.com/scikit-learn-contrib/stability-selection)
    - [_Feature Engineering and Selection_](http://www.feat.engineering/) by Kuhn & Johnson.


The idea behind stability selection is to inject more noise into the original problem by generating bootstrap samples of the data, and to use a base feature selection algorithm (like the LASSO) to find out which features are important in every sampled version of the data. The results on each bootstrap sample are then aggregated to compute a stability score for each feature in the data. Features can then be selected by choosing an appropriate threshold for the stability scores.

Kuhn & Johnson - highly technical and dry; cannot wade through the dense technical jargon



#### Stretch Goal 5 Inferential Statistics
https://www.statsmodels.org/stable/index.html




#### Stretch Goal 6 Intro to Statistical Learning

3.5 Multiple Linear Regression

the F-statistic can be used to determine whether or
not we should reject this null hypothesis. If the p-value
corresponding to the F-statistic is very low, there is very clear evidence of relationship between two features.


The RSE estimates the standard deviation of the response from the
population regression line. The R2 statistic records
the percentage of variability in the response that is explained by
the predictors. The predictors explain the variance from the best-fit.

The accuracy associated
with this estimate depends on whether we wish to predict an
individual response, Y = f(X) +  (prediction intervals), or the average response, f(X)(confidence intervals). Prediction intervals will always
be wider than confidence intervals because they account for the uncertainty
associated with , the irreducible error.

residual plots can be used in order to
identify non-linearity. If the relationships are linear, then the residual
plots should display no pattern. The
inclusion of transformations of the predictors in the linear regression
model in order to accommodate non-linear relationships

The standard linear regression model assumes an additive relationship
between the predictors and the response.

In practice, the true relationship between X and Y is rarely exactly linear.
