# Assignment2+3-R

# Assignments 2+3 - Simple and Multiple Linear Regression (I+II)

# Overview of the steps

- Assignment 2:
1. Load the data and get an overview of the data
2. Perform simple linear regressions
3. Use the simple linear regression models
4. Perform multiple linear regressions
5. Use the multiple linear regression model
- Assignment 3:
6. Add interaction terms
7. Apply non-linear transformations to some predictors
8. Use qualitative predictors

# 1.2 Steps of Assignment 2 in detail

## 1.2.1 Load the data and get an overview of the data

Load the data file `Boston.csv`.

In [30]:

import pandas as pd
from IPython.display import display, Markdown
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
import statsmodels.api as sm
from statsmodels.formula.api import ols
import scipy
import seaborn as sns
default_figsize=(8, 6)


boston_df = pd.read_csv('../ISLR/data/Boston.csv', index_col=[0])

Display the number of predictors (including the response medv) and their names:

In [31]:
print(len(boston_df.columns))
print(boston_df.columns)

14
Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'black', 'lstat', 'medv'],
      dtype='object')


Print a statistic summary of the predictors and the response `medv`:

In [32]:
boston_df.describe(include='all')

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0


Display the number of data points:

In [33]:
len(boston_df)

506

Display the data in a table:

> Top 20 rows are shown.

In [34]:
n = 20
display(boston_df.info(verbose=True))
display(boston_df.head(n))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 506 entries, 1 to 506
Data columns (total 14 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   crim     506 non-null    float64
 1   zn       506 non-null    float64
 2   indus    506 non-null    float64
 3   chas     506 non-null    int64  
 4   nox      506 non-null    float64
 5   rm       506 non-null    float64
 6   age      506 non-null    float64
 7   dis      506 non-null    float64
 8   rad      506 non-null    int64  
 9   tax      506 non-null    int64  
 10  ptratio  506 non-null    float64
 11  black    506 non-null    float64
 12  lstat    506 non-null    float64
 13  medv     506 non-null    float64
dtypes: float64(11), int64(3)
memory usage: 59.3 KB


None

Plot some predictors (at least two) against the response values. We choose `lstat`, `rm`, and `age`.

In [35]:
default_alpha = .05
display(Markdown(f'### Significance level: {default_alpha}'))

def fit_lr(x, y):
    X = sm.add_constant(x)
    return sm.OLS(y, X).fit()

def plot(x, y, xlab, ylab, mod_fit=None, alpha=default_alpha):
    fig, ax = plt.subplots(figsize=default_figsize)
    ax.plot(x, y, 'yo')
    if mod_fit:
        X = sm.add_constant(x)
        regr = mod_fit.predict(X)
        ax.plot(x, regr, 'k')
        prediction = mod_fit.get_prediction(X)
        frame = prediction.summary_frame(alpha=alpha)
        zipped = pd.concat([x, frame.mean_ci_lower, frame.mean_ci_upper], axis=1)
        zipped.sort_values(x.name, inplace=True)
        ax.fill_between(zipped[x.name], zipped[frame.mean_ci_lower.name], zipped[frame.mean_ci_upper.name], color='k', alpha=.3)
    ax.set_xlabel(xlab)
    ax.set_ylabel(ylab)
    fig.show()

def format_pearsonr(values):
    return f'R = {values[0]}, p < {values[1]}'

def fit_lr_plot_full(x, y, xlab, ylab):
    mod_fit = fit_lr(x, y)
    print(format_pearsonr(scipy.stats.pearsonr(x, y)))
    plot(x, y, xlab, ylab, mod_fit)
    return mod_fit

### Significance level: 0.05

In [36]:
lstat_mod_fit = fit_lr_plot_full(boston_df['lstat'], boston_df['medv'], 'percent of households with low socioeconomic status', 'median house value')

R = -0.737662726174015, p < 5.081103394386392e-88


In [37]:
rm_mod_fit = fit_lr_plot_full(boston_df['rm'], boston_df['medv'], 'average number of rooms per house', 'median house value')

R = 0.6953599470715394, p < 2.4872288710071593e-74


In [38]:
age_mod_fit = fit_lr_plot_full(boston_df['age'], boston_df['medv'], 'average age of houses', 'median house value')

R = -0.3769545650045963, p < 1.5699822091877261e-18


## 1.2.2 Perform simple linear regressions
Fit a simple linear regression model, with `medv` as the response and some (at least two) predictors
individually. We choose `lstat`, `rm`, and `age`.

In [39]:
def print_lr(mod_fit):
    print(mod_fit.summary())
    print('Residuals:', mod_fit.resid.describe())

In [40]:
print_lr(lstat_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.544
Model:                            OLS   Adj. R-squared:                  0.543
Method:                 Least Squares   F-statistic:                     601.6
Date:                Mon, 21 Feb 2022   Prob (F-statistic):           5.08e-88
Time:                        12:28:56   Log-Likelihood:                -1641.5
No. Observations:                 506   AIC:                             3287.
Df Residuals:                     504   BIC:                             3295.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         34.5538      0.563     61.415      0.0

In [41]:
print_lr(rm_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.484
Model:                            OLS   Adj. R-squared:                  0.483
Method:                 Least Squares   F-statistic:                     471.8
Date:                Mon, 21 Feb 2022   Prob (F-statistic):           2.49e-74
Time:                        12:28:56   Log-Likelihood:                -1673.1
No. Observations:                 506   AIC:                             3350.
Df Residuals:                     504   BIC:                             3359.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -34.6706      2.650    -13.084      0.0

In [42]:
print_lr(age_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.142
Model:                            OLS   Adj. R-squared:                  0.140
Method:                 Least Squares   F-statistic:                     83.48
Date:                Mon, 21 Feb 2022   Prob (F-statistic):           1.57e-18
Time:                        12:28:56   Log-Likelihood:                -1801.5
No. Observations:                 506   AIC:                             3607.
Df Residuals:                     504   BIC:                             3615.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         30.9787      0.999     31.006      0.0

## Interprete the results
### Regression results interpretation of `medv` - median house value
Since the $R^{2}$ is calculated as $1 - \frac{RSS}{TSS}$, it, in simple words, shows how well the residual of each point is different against the mean residual value.
#### Against `lstat` - percent of households with low socioeconomic status
The slope is negative meaning that the function is decreasing and that there is a negative correlation. The value of the slope is close to -1 which means the slope angle is close to -45 degrees. All statements are confirmed by the chart.

The intercept equals approximately to 34.5 meaning the regression line is moved along the $y$ axis for 34 units, which is confirmed by the chart.

The Pearson correlation coefficient is approximately -0.74. This means that there is a correlation between `lstat` and `medv`. According to the value of the coefficient, most of the sample set is going to be densely inside an elliptic shape, which is prolonged along the `x` axis and turned down (negative slope). This information also mostly (except for $x$ values between 0 and 10) conforms to the chart.

The `p-value` is very small, which means there is a non-zero correlation between `lstat` and `medv`. So, the null-hypothesis that there is no correlation between `lstat` and `medv` has to be rejected.

The $R^{2}$ is 0.544, which means that the value is not too large for the model to fit too well and $R^{2}$ is not close to 0 which would mean the lack of explanation of `medv` by `lstat`.

So, according to $R^{2}$, there is a considerable degree of explanation between `medv` and `lstat`, but it is hard to tell if such value is good enough for this particular domain.

### Against `rm` - average number of rooms per house
The slope is positive meaning that the function is increasing and that there is a positive correlation. The value of the slope is 9.1 which means a steep positive slope. All statements are confirmed by the chart.

The intercept equals approximately to -34 meaning the regression line is moved along the negative direction of $y$ axis for 34 units, which is hard to confirm by the chart, but it is plausible.

The Pearson correlation coefficient is approximately 0.7. This means that there is a correlation between `rm` and `medv`. According to the value of the coefficient, most of the sample set is going to be densely inside an elliptic shape, which is prolonged along the `x` axis and turned up (positive slope). This information also mostly (except for $x$ values between 7.5 and 9) conforms to the chart.

The `p-value` is very small, which means there is a non-zero correlation between `rm` and `medv`. So, the null-hypothesis that there is no correlation between `rm` and `medv` has to be rejected.

The $R^{2}$ is 0.484, which means that the value is not too large for the model to fit too well and $R^{2}$ is not close to 0 which would mean the lack of explanation of `medv` by `rm`.

So, according to $R^{2}$, there is a considerable degree of explanation between `medv` and `rm`, but it is hard to tell if such value is good enough for this particular domain.

### Against `age` - average age of houses
The slope is negative meaning that the function is increasing and that there is a negative correlation. All statements are confirmed by the chart.

The intercept equals approximately to 31 meaning the regression line is moved along the $y$ axis for 31 units, which is confirmed by the chart.

The Pearson correlation coefficient is approximately -0.377. This means that there is a correlation between `age` and `medv`. According to the value of the coefficient, most of the sample set is going to be not very densely inside an almost round elliptic shape, which is still slightly prolonged along the `x` axis and turned down (negative slope). It is hard to make similar conclusions from the chart.

The `p-value` is very small, which means there is a non-zero correlation between `age` and `medv`. So, the null-hypothesis that there is no correlation between `age` and `medv` has to be rejected.

The $R^{2}$ is 0.142, which means that the value is no strong explanation of `medv` by `age`.

Overall, it is unclear if there is a relation between `age` and `medv`. Perhaps, if linear regression is the best way to describe this relation.

Obtain a confidence interval for the coefficient estimates for the indivisual models

In [43]:
def confint(mod_fit, alpha=default_alpha):
    return mod_fit.conf_int(alpha).rename(columns={0: f'{alpha * 50}%', 1: f'{100 - alpha * 50}%'})

def describe_axes(x, y):
    df = pd.concat([x, y], axis=1)
    df = df.describe()
    df.loc['max - min'] = df.loc['max'] - df.loc['min']
    return df

In [44]:
display(Markdown('#### Linear Regression params:'), lstat_mod_fit.params)
display(Markdown('#### Confidence Intervals:'), confint(lstat_mod_fit))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df['lstat'], boston_df['medv']))

#### Linear Regression params:

const    34.553841
lstat    -0.950049
dtype: float64

#### Confidence Intervals:

#### Axes Data Described:

In [45]:
display(Markdown('#### Linear Regression params:'), rm_mod_fit.params)
display(Markdown('#### Confidence Intervals:'), confint(rm_mod_fit))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df['rm'], boston_df['medv']))

#### Linear Regression params:

const   -34.670621
rm        9.102109
dtype: float64

#### Confidence Intervals:

#### Axes Data Described:

In [46]:
display(Markdown('#### Linear Regression params:'), age_mod_fit.params)
display(Markdown('#### Confidence Intervals:'), confint(age_mod_fit))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df['age'], boston_df['medv']))

#### Linear Regression params:

const    30.978678
age      -0.123163
dtype: float64

#### Confidence Intervals:

#### Axes Data Described:

## Interprete the results
### Confidence interval interpretation of `medv` - median house value
A confidence interval of 95% was used. This means 2 lines vertically symmetrical around a regression line (top and bottom 2.5% intervals).

In order to make a conclusion about the confidence interval, it must be analysed against the Linear Regression parameters, taking into account the dataset characteristics.

The difference between min (5) and max (50) for the dependent variable `medv` is 45.
#### Against `lstat` - percent of households with low socioeconomic status
The intervals for `lstat` are rather small: a bit larger than 2 for intercept and around 0.3 for slope.

They can be considered small if compared against the difference between min (1.730000) and max (37.970000) values of `lstat` is 36.240000. Such behaviour can be explained by the fact that the points are tightly packed together, which can be seen from the chart.

Surprisingly, the standard deviation is rather high and both mean and quantiles tell that the data is denser at the beginning of the `lstat` data interval.
#### Against `rm` - average number of rooms per house
The intervals for `rm` are:
- relatively large for intercept - around 10;
- relatively average for slope - around 1.7.

They can be considered large if compared against the difference between min (3.561000) and max (8.780000) values of `rm` is 5.219000. Such behaviour can be explained by a large number of outliers. A not small standard deviation confirms this.
#### Against `age` - average age of houses
The intervals for `age` are rather small: a bit larger than 3 for intercept and around 0.05 for slope.

They can be considered small if compared against the difference between min (2.900000) and max (100.000000) values of `age` is 97.100000. Such behaviour can be explained by the fact that the points are tightly packed together near the end of the age interval, which can be seen from the chart.

Surprisingly, the standard deviation is rather high which can be partially explained by uneven dispensation of points around the $x$ axis.

## 1.2.3 Use the simple linear regression models

Predict the `medv` response values for some selected predictor values. Calculate the prediction intervals for these values.

In [47]:
def predict_with_pi(mod_fit, x, alpha=default_alpha, xlab='x', ylab='y'):
    X = sm.add_constant(x)# if isinstance(x, pd.Series) else x
    # if len(X.columns) > 1:
    #     X = X.transpose()
    regr = mod_fit.predict(X)
    regr_info = mod_fit.get_prediction(X).summary_frame(alpha=alpha)

    if isinstance(x, pd.DataFrame) and len(x.columns) > 1:
        xlab = '+'.join(x.columns)
        x = x.apply(lambda r: '; '.join(str(v) for v in r), axis=1, result_type='reduce').squeeze()
    df = pd.concat([pd.Series(x), pd.Series(regr), regr_info.obs_ci_lower, regr_info.obs_ci_upper], axis=1, keys=[xlab, ylab, ylab + '_lwr', ylab + '_upr'])
    df.set_index(xlab, inplace=True)
    return df

In [48]:
display(Markdown('#### Prediction:'), predict_with_pi(lstat_mod_fit, (5, 10, 15), xlab='lstat', ylab='medv'))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df['lstat'], boston_df['medv']))

#### Prediction:

#### Axes Data Described:

In [49]:
display(Markdown('#### Prediction:'), predict_with_pi(rm_mod_fit, (5, 6.5, 8), xlab='rm', ylab='medv'))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df['rm'], boston_df['medv']))

#### Prediction:

#### Axes Data Described:

In [50]:
display(Markdown('#### Prediction:'), predict_with_pi(age_mod_fit, (25, 50, 75), xlab='age', ylab='medv'))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df['age'], boston_df['medv']))

#### Prediction:

#### Axes Data Described:

## Interprete the results
### Prediction interval interpretation of `medv` - median house value
The significance level (alpha) is 95%. By definition, the prediction interval shows the range in which the values for the given independent variable will be with alpha (95%) level of certainty.

To draw conclusions from the prediction intervals, the prediction intervals have to be compared against the training data, specifically minimum and maximum values.
#### Against `lstat` - percent of households with low socioeconomic status
The prediction interval for `lstat` is large, around 24.5 for all 3 values. It is more than half a range of `medv` values, according to the analysis of the axes. It means there is a high dispersion of the dependent variable values for close independent variable values.
#### Against `rm` - average number of rooms per house
The prediction interval for `rm` is large, around 26.3 for all 3 values. It is more than half a range of `medv` values, according to the analysis of the axes. It means there is a high dispersion of the dependent variable values for close independent variable values.
#### Against `age` - average age of houses
The prediction interval for `age` is large, around 33.4 for all 3 values. It is more than half a range of `medv` values, according to the analysis of the axes. It means there is a high dispersion of the dependent variable values for close independent variable values.

## 1.2.4 Perform multiple linear regressions
Fit `medv` as response with the predictors selected before altogether.

In [51]:
multi_mod_fit = ols('medv~lstat+rm+age', boston_df).fit()
print_lr(multi_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.639
Model:                            OLS   Adj. R-squared:                  0.637
Method:                 Least Squares   F-statistic:                     296.2
Date:                Mon, 21 Feb 2022   Prob (F-statistic):          1.20e-110
Time:                        12:28:56   Log-Likelihood:                -1582.4
No. Observations:                 506   AIC:                             3173.
Df Residuals:                     502   BIC:                             3190.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.1753      3.182     -0.369      0.7

#### Interprete the results.
Firstly, the $p\:values$ for all independent variables show that `lstat` and `rm` variables are significant (`P>|t|` is 0 meaning there is a 0 probability that the null hypothesis about lack of correlation can be rejected).

The `age` variable, on the other hand, has a high $p\:value$. This means that this variable is insignificant, and it can be safely removed.

Also, intercept also has a high $p\:value$, which means the current intercept might not be a right value, the regression line might have to move along the $y$ axis.

Taking everything into consideration, there might be a correlation between the variables, which can have an adverse effect on the precision of the linear regression model.

## Fit `medv` as response with all available predictors altogether.

In [52]:
all_mod_fit = ols('medv~' + '+'.join(c for c in boston_df.columns if c != 'medv'), boston_df).fit()
print_lr(all_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.741
Model:                            OLS   Adj. R-squared:                  0.734
Method:                 Least Squares   F-statistic:                     108.1
Date:                Mon, 21 Feb 2022   Prob (F-statistic):          6.72e-135
Time:                        12:28:57   Log-Likelihood:                -1498.8
No. Observations:                 506   AIC:                             3026.
Df Residuals:                     492   BIC:                             3085.
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     36.4595      5.103      7.144      0.0

## Interprete the results.
Only variables which have $p\:value<0.05$ can be considered significant (the null hypothesis about lack of correlation between the independent and dependent variables can be rejected).

According to the linear regression model summary and $p\:values$ of the variables, all variables, except `indus` and `age` are significant. There might also be a correlation between the independent variables, which might adversely affect the precision of the model.

## Check the correlation between the predictors.

In [53]:
def corrmat(df, ylab, render=display):
    """Does not do symbol-coded chart."""
    def pearsonr_pval(x,y):
        return scipy.stats.pearsonr(x,y)[1]
    df = df.drop(ylab, axis=1)
    render(Markdown('Pearson:'))
    corr = df.corr(method='pearson')
    render(corr)
    render(Markdown('P values:'))
    render(df.corr(method=pearsonr_pval))
    render(Markdown('Pearson (chart):'))
    fig, ax = plt.subplots(figsize=default_figsize)
    sns.heatmap(corr.round(2), ax=ax, annot=True, vmax=1, vmin=-1, center=0, cmap='vlag')
    plt.show()


In [54]:
corrmat(boston_df, 'medv')

Pearson:

P values:

Pearson (chart):

## Interprete the results.

It is obvious that the linear regression model is impaired due to strong correlation between the variables.

The weakest correlation is between `chas` and other variables. The strongest correlation is between `tax` and `rad` (positive). Other strong correlations include:
- positive: `indus` & `nox`, `indus` & `tax`, `nox` & `age`;
- negative: `indus` & `dis`, `nox` & `dis`, `age` & `dis`;

It has to be noted that the `age` variable has strong correlation with other variables, which confirms the large $p\:value$ for age during the Linear Regression `medv~lstat+rm+age` (see the interpretation above).

## 1.2.5 Use the multiple linear regression model
Predict the `medv` response values for some selected predictor values. Calculate the prediction intervals for these values.

In [55]:
lstat_x=pd.Series((5,10,15), name='lstat')
rm_x=pd.Series((5,6.5,8), name='rm')
X = pd.merge(lstat_x, rm_x, how='cross')

lstat_rm_mod_fit = ols('medv~lstat+rm', boston_df).fit()
display(Markdown('#### Prediction:'), predict_with_pi(lstat_rm_mod_fit, X, ylab='medv'))
display(Markdown('#### Axes Data Described:'), describe_axes(boston_df[['lstat', 'rm']], boston_df['medv']))

#### Prediction:

#### Axes Data Described:

## Interprete the results.

The regression results with prediction interval have to be analyzed together with the data about the `medv` axis.

The prediction interval is around 21 - 22 which is less than it was for single-variable linear regression. This means that multiple-variable linear regression makes more sense in this context since the prediction by the single linear regression had slightly larger intervals.

# 1.3 Steps of Assignment 3 in detail

Check again the accuracy of the linear regression.

In [56]:
multi2_mod_fit = ols('medv ~ lstat+rm+nox+dis+ptratio', boston_df).fit()
print_lr(multi2_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.708
Model:                            OLS   Adj. R-squared:                  0.705
Method:                 Least Squares   F-statistic:                     242.6
Date:                Mon, 21 Feb 2022   Prob (F-statistic):          3.67e-131
Time:                        12:28:57   Log-Likelihood:                -1528.7
No. Observations:                 506   AIC:                             3069.
Df Residuals:                     500   BIC:                             3095.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     37.4992      4.613      8.129      0.0

## 1.3.1 Add interaction terms
Fit a model with interaction terms. Don’t forget to also include the plain predictors.
> Assuming `lstat*rm` has the same meaning in Python, since the used API is called ["Using R-style formulas"](https://www.statsmodels.org/dev/example_formulas.html).
> The regression coefficients match with the assignment.

In [57]:
inter_mod_fit = ols('medv~lstat*rm+nox+dis+ptratio', boston_df).fit()
print_lr(inter_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.778
Model:                            OLS   Adj. R-squared:                  0.775
Method:                 Least Squares   F-statistic:                     290.8
Date:                Mon, 21 Feb 2022   Prob (F-statistic):          2.48e-159
Time:                        12:28:57   Log-Likelihood:                -1459.9
No. Observations:                 506   AIC:                             2934.
Df Residuals:                     499   BIC:                             2963.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.1518      4.880      0.646      0.5

## Interpret the results.
### Interpreting regression of `medv ~ lstat+rm+nox+dis+ptratio`.
The Ordinary Least Squares (OLS) regression without interaction terms has very good metrics. The residual plots are not analysed, but event residuals' mean and quantiles look not very large meaning that the model is relatively good.

Both $R^2$ and $R^2\:adjusted$ are around 0.7 which is a good enough result for an OLS regression model. The probability of the F-statistic is very close to 0 meaning that the slope coefficients are meaningful and are different from 0.

$p\:values$ for all the coefficients, including intercept, are 0, which means there is a correlation between the independent variables and dependent variable. The Standard Errors for coefficients do not seem too large.
### Interpreting regression of `medv~lstat*rm+nox+dis+ptratio`.
After introducing interaction terms, the OLS regression model results changed slightly.

Most of the metrics are the same. However, 3 things are different:
- both $R^2$ and $R^2\:adjusted$ are now closer to 0.8, which means a better explanation of `medv` by new independent variables and term.
- $p\:value$ of the intercept is now larger than the significance level which means uncertainty of location of the regression line along the `medv` axis.
- the absolute values of residuals' mean and quantiles seem to slightly decrease. It might be a good sign also.

Overall, the model with interaction terms seems to be slightly better than the previous one because $R^2$ and residuals can be considered more important than $p\:value$ of the intercept. However, it is still arguable that the new model is definitely better.

## 1.4 Apply non-linear transformations to some predictors
Fit a model with non-linear transformations of the predictor terms. Don’t forget to also include the plain predictors.
> The R formula is `medv~lstat*rm+I((lstat*rm)^2)+nox+dis+ptratio`. The operator `^` is replaced with `**` to accommodate Python syntax. So, the equivalent Python formula is `medv~lstat*rm+I((lstat*rm)**2)+nox+dis+ptratio`.

In [63]:
nonlin_mod_fit = ols('medv~lstat*rm+I((lstat*rm)**2)+nox+dis+ptratio', boston_df).fit()
print_lr(nonlin_mod_fit)

                            OLS Regression Results                            
Dep. Variable:                   medv   R-squared:                       0.781
Model:                            OLS   Adj. R-squared:                  0.778
Method:                 Least Squares   F-statistic:                     253.9
Date:                Mon, 21 Feb 2022   Prob (F-statistic):          8.05e-160
Time:                        12:55:26   Log-Likelihood:                -1455.8
No. Observations:                 506   AIC:                             2928.
Df Residuals:                     498   BIC:                             2961.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept               10.5522 