# **💻 Regression Complete**

## What is Regression Analysis

There are two types of supervised machine learning algorithms: **Regression** and **classification**. The former predicts continuous value outputs while the latter predicts discrete outputs. For instance, predicting the price of a house in dollars is a regression problem whereas predicting whether a tumor is malignant or benign is a classification problem.

The dependent features are called the **dependent variables**, outputs, responses or target variable.

The independent features are called the **independent variables**, inputs, or predictors.

Regression problems usually have one continuous and unbounded dependent variable. The inputs, however, can be *continuous, discrete, or even categorical* data such as gender, nationality, brand, and so on.

Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. 

**Explanatory Vs Predictive Modeling**

- **Explanatory model**: explaning or predicting the average effect of inputs on an output
- **Predicting model** : predicting the outcome values for new records given their input values

1. A good explanatory model is one that fits the data closely whereas a predictive model is one that predicts new cases accurately.

2. In explanatory models, when the goal is to predict hypothesized relationship in the population, the entire dataset is used for estimating the best fit model to maximise the amount of information. 

3. In predictive models, the goal is to predict outcomes of new individual cases, the datset are typically split into a training set and validation set. The training set is used to estimate the model and the validation or hoildout set to assess the model predictive performance.

4. Performance of explanatory models measures how close the data fit the model whereas in predictive models performance is measured by predictive accuracy (how well the model predocts the new data points).

5. In explanatory models the focus is on coefficients (Beta) and in predictive models the focus is on the predictions (y_pred)

**Yi = Β0 + β1 X1i + β2 X2i  + β3 X3i  + εi**

The assumptions of regression equation are 
1. The noise or error ε (or equivalently Y) follows a normal distribution.
2. The choice of variable and their form is correct (linear).
3. The cases are independent of each other.
4. The variability in Y values for a given set of predictors is the same regardless of the values of the predictors (homoskedasticity)


An Important fact for the ***predictive goal*** is that even we drop the first assumptions and allow the noise to follow an arbitrary distribution, even then the estimates (beta) are very good for predictions. 

The assumption of a normal distribution is required in explanatory modelling where it is used forconstructing confidence intervals and statistical tests for model parameters.

Even if the other assumtions are violated, it is still possible that the resulting predictions are sufficiently predictive performance of the model, which is the main priority. Satisfying assumptions is of secondary interest and residual analysis can give clues to potential improved models to examine.


## What are the Variants of Regression

- Linear Regression
- Polynomial Regression
- Stepwise Regression
- Ridge Regression
- Lasso Regression
- ElasticNet Regression

**Example of interpretation of a regression equation:**

Say we are interested in the relationship between family food consumption and family income. We calculate a regression equation, in which consumption is denoted C and income I, both measured in dollars, of:

C = 1375 + .064 I

What is the intercept? 1375

What does it mean? That for a family with no income their food consumption is $1,375.

What is the regression coefficient? How is it interpreted? 

For every dollar increase in family income there is a .064 dollar increase in food consumption. 

Note that we generally would have hypothesized a relationship and dep/indep variables. The relationship of I to C could have been reversed. The direction (sign) could have been opposite. This would likely reflect on a prior theory we may have had. 


**Overall Goodness of Fit Test**

Total sum of squares = sum of squares due to regression + sum of squares about regression:

 
TSS	=	SSD	+	SSE (aka error, ε)

R2, or the coefficient of determination, is defined as the percent of variation in Y about it’s mean that is explained by the linear influence of the variation of X. 

Mathematically it is described by: R2 = SSD/TSS and will range between 0 and 1. Closer to one is a poorer model, closer to one is a better model.


There are five basic steps when you’re implementing linear regression:

    Import the packages and classes you need.

    Provide data to work with and eventually do appropriate transformations.

    Create a regression model and fit it with existing data.

    Check the results of model fitting to know whether the model is satisfactory.

    Apply the model for predictions.

# Simple Regression (Sklearn)

Step 1: Import packages and classes

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Provide data

In [2]:
X = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))

In [3]:
y = np.array([5, 20, 14, 32, 22, 38])

- Now, you have two arrays: the input x and output y. You should call .reshape() on x because this array is required to be two-dimensional, or to be more precise, to have one column and as many rows as necessary. That’s exactly what the argument (-1, 1) of .reshape() specifies.

In [4]:
X

array([[ 5],
       [15],
       [25],
       [35],
       [45],
       [55]])

In [5]:
X.shape

(6, 1)

In [6]:
y

array([ 5, 20, 14, 32, 22, 38])

In [7]:
y.shape

(6,)

Step 3: Create a model and fit it

In [8]:
from sklearn.linear_model import LinearRegression

In [9]:
model = LinearRegression()

In [10]:
model

LinearRegression()

In [11]:
model.fit(X, y)

LinearRegression()

- With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using the existing input and output (x and y) as the arguments. In other words, .fit() fits the model. It returns self, which is the variable model itself. That’s why you can replace the last two statements with this one:

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works satisfactorily and interpret it.

You can obtain the coefficient of determination (𝑅²) with .score() called on model:

In [12]:
R_sq = model.score(X, y)
print('coefficient of determination:', R_sq)

coefficient of determination: 0.7158756137479542


When you’re applying .score(), the arguments are also the predictor X and regressor y, and the return value is 𝑅².

The attributes of model are .intercept_, which represents the coefficient, 𝑏₀ and .coef_, which represents 𝑏₁:

In [13]:
print('intercept:', model.intercept_)

print('slope:', model.coef_)

intercept: 5.633333333333329
slope: [0.54]


- The code above illustrates how to get 𝑏₀ and 𝑏₁. You can notice that .intercept_ is a scalar, while .coef_ is an array.

- The value 𝑏₀ = 5.63 (approximately) illustrates that your model predicts the response 5.63 when 𝑥 is zero. The value 𝑏₁ = 0.54 means that the predicted response rises by 0.54 when 𝑥 is increased by one.

You should notice that you can provide y as a two-dimensional array as well. In this case, you’ll get a similar result. This is how it might look:

As you can see, this example is very similar to the previous one, but in this case, .intercept_ is a one-dimensional array with the single element 𝑏₀, and .coef_ is a two-dimensional array with the single element 𝑏₁.

In [14]:
new_model = LinearRegression().fit(X, y.reshape((-1, 1)))
print('intercept:', new_model.intercept_)

print('slope:', new_model.coef_)

intercept: [5.63333333]
slope: [[0.54]]


Step 5: Predict response

In [15]:
y_pred = model.predict(X)
print('predicted response:', y_pred, sep='\n')

predicted response:
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


In [16]:
y_pred = model.intercept_ + model.coef_ * X
print('predicted response:', y_pred, sep='\n')

predicted response:
[[ 8.33333333]
 [13.73333333]
 [19.13333333]
 [24.53333333]
 [29.93333333]
 [35.33333333]]


The output here differs from the previous example only in dimensions. The predicted response is now a two-dimensional array, while in the previous case, it had one dimension.

If you reduce the number of dimensions of x to one, these two approaches will yield the same result. You can do this by replacing x with x.reshape(-1), x.flatten(), or x.ravel() when multiplying it with model.coef_.

In [17]:
X_new = [[5], [30]]

In [18]:
y_new = model.predict(X_new)
print(y_new)

[ 8.33333333 21.83333333]


In [19]:
X_new = np.arange(5).reshape((-1, 1))
print(X_new)

[[0]
 [1]
 [2]
 [3]
 [4]]


In [20]:
y_new = model.predict(X_new)
print(y_new)

[5.63333333 6.17333333 6.71333333 7.25333333 7.79333333]


# Multiple Regression (Sklearn)

Steps 1 and 2: Import packages and classes, and provide data

In [76]:
import numpy as np
from sklearn.linear_model import LinearRegression
X = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
X, y = np.array(X), np.array(y)

In [77]:
X

array([[ 0,  1],
       [ 5,  1],
       [15,  2],
       [25,  5],
       [35, 11],
       [45, 15],
       [55, 34],
       [60, 35]])

In [78]:
X.shape

(8, 2)

In [79]:
y

array([ 4,  5, 20, 14, 32, 22, 38, 43])

In [80]:
y.shape

(8,)

Step 3: Create a model and fit it

In [81]:
model = LinearRegression().fit(X, y)

Step 4: Get results

In [82]:
R_sq = model.score(X, y)
print('coefficient of determination:', R_sq)

print('intercept:', model.intercept_)

print('slope:', model.coef_)

coefficient of determination: 0.8615939258756775
intercept: 5.52257927519819
slope: [0.44706965 0.25502548]


- You obtain the value of 𝑅² using .score() and the values of the estimators of regression coefficients with .intercept_ and .coef_. Again, .intercept_ holds the bias 𝑏₀, while now .coef_ is an array containing 𝑏₁ and 𝑏₂ respectively.

- In this example, the intercept is approximately 5.52, and this is the value of the predicted response when 𝑥₁ = 𝑥₂ = 0. The increase of 𝑥₁ by 1 yields the rise of the predicted response by 0.45. Similarly, when 𝑥₂ grows by 1, the response rises by 0.26.

Step 5: Predict response

In [83]:
y_pred = model.predict(X)
print('predicted response:', y_pred, sep='\n')

predicted response:
[ 5.77760476  8.012953   12.73867497 17.9744479  23.97529728 29.4660957
 38.78227633 41.27265006]


In [84]:
y_pred = model.intercept_ + np.sum(model.coef_ * X, axis=1)
print('predicted response:', y_pred, sep='\n')

predicted response:
[ 5.77760476  8.012953   12.73867497 17.9744479  23.97529728 29.4660957
 38.78227633 41.27265006]


- You can predict the output values by multiplying each column of the input with the appropriate weight, summing the results and adding the intercept to the sum.

In [85]:
x_new = np.arange(10).reshape((-1, 2))
print(x_new)

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


In [86]:
y_new = model.predict(x_new)
print(y_new)

[ 5.77760476  7.18179502  8.58598528  9.99017554 11.3943658 ]


# Multiple Regression (Sklearn)

In [63]:
from sklearn import datasets
X, y = datasets.make_regression(n_samples = 10, n_features = 2, n_targets=1, bias = 22, random_state=12)

In [64]:
from sklearn.linear_model import LinearRegression

In [65]:
regmodel = LinearRegression()

In [66]:
regmodel.fit(X,y)

LinearRegression()

In [67]:
regmodel.score(X,y)

1.0

In [68]:
regmodel.intercept_

21.999999999999993

In [69]:
regmodel.coef_

array([97.8058079 , 33.46475291])

In [70]:
y_pred = regmodel.predict(X)
y_pred

array([-136.22864486, -102.90092174,  -28.81905468,   54.97466562,
        -60.17466564,  -33.49812995,  105.87214515,  275.8751669 ,
         10.41261199,   48.20304167])

In [71]:
y_pred = regmodel.intercept_ + regmodel.coef_[0]*X[:,0]+regmodel.coef_[1]*X[:,1]
y_pred

array([-136.22864486, -102.90092174,  -28.81905468,   54.97466562,
        -60.17466564,  -33.49812995,  105.87214515,  275.8751669 ,
         10.41261199,   48.20304167])

In [74]:
X_new = [[5, 20]]

In [75]:
y_new = regmodel.predict(X_new)
y_new

array([1180.32409772])

# Simple Regression (StatsModel)

In [None]:
import numpy as np
import statsmodels.api as sm

In [None]:
#Dummy Data
X = [1,2,3,4,5,6,7,8,9,10]
X = np.array(X).reshape(-1,1)
y = 5 + X*2.5*np.random.randint(20)

In [None]:
X = sm.add_constant(X)

In [None]:
model = sm.OLS(y, X)

In [None]:
results = model.fit()

In [None]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.929e+31
Date:                Tue, 08 Sep 2020   Prob (F-statistic):          1.90e-124
Time:                        15:19:48   Log-Likelihood:                 293.49
No. Observations:                  10   AIC:                            -583.0
Df Residuals:                       8   BIC:                            -582.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.0000   3.31e-14   1.51e+14      0.0

  "anyway, n=%i" % int(n))


# Multiple Regression (StatsModel)

There exists no R type regression summary report in sklearn. The main reason is that sklearn is used for predictive modelling / machine learning and the evaluation criteria are based on performance on previously unseen data (such as predictive r^2 for regression)


Statsmodels also helps us determine which of our variables are statistically significant through the p-values. If our p-value is <.05, then that variable is statistically significant. 

[link text](https://www.statsmodels.org/devel/user-guide.html#regression-and-linear-models)

Step 1: Import packages

In [None]:
import numpy as np
import statsmodels.api as sm

Step 2: Provide data and transform inputs

In [None]:
X = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
X, y = np.array(X), np.array(y)

In [None]:
X

array([[ 0,  1],
       [ 5,  1],
       [15,  2],
       [25,  5],
       [35, 11],
       [45, 15],
       [55, 34],
       [60, 35]])

The input and output arrays are created, but the job is not done yet.

You need to add the column of ones to the inputs if you want statsmodels to calculate the intercept 𝑏₀. It doesn’t takes 𝑏₀ into account by default. This is just one function call:

In [None]:
X = sm.add_constant(X)

That’s how you add the column of ones to x with add_constant(). It takes the input array x as an argument and returns a new array with the column of ones inserted at the beginning. This is how x and y look now:

In [None]:
X

array([[ 1.,  0.,  1.],
       [ 1.,  5.,  1.],
       [ 1., 15.,  2.],
       [ 1., 25.,  5.],
       [ 1., 35., 11.],
       [ 1., 45., 15.],
       [ 1., 55., 34.],
       [ 1., 60., 35.]])

You can see that the modified x has three columns: the first column of ones (corresponding to 𝑏₀ and replacing the intercept) as well as two columns of the original features.

Step 3: Create a model and fit it

The regression model based on ordinary least squares is an instance of the class statsmodels.regression.linear_model.OLS. This is how you can obtain one:

In [None]:
model = sm.OLS(y, X)

You should be careful here! Please, notice that the first argument is the output, followed with the input. There are several more optional parameters.

To find more information about this class, please visit the official documentation page.

Once your model is created, you can apply .fit() on it:

In [None]:
results = model.fit()

results = model.fit()

By calling .fit(), you obtain the variable results, which is an instance of the class statsmodels.regression.linear_model.RegressionResultsWrapper. This object holds a lot of information about the regression model.

Step 4: Get results

The variable results refers to the object that contains detailed information about the results of linear regression.

You can call .summary() to get the table with the results of linear regression:

In [None]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.862
Model:                            OLS   Adj. R-squared:                  0.806
Method:                 Least Squares   F-statistic:                     15.56
Date:                Tue, 08 Sep 2020   Prob (F-statistic):            0.00713
Time:                        15:19:48   Log-Likelihood:                -24.316
No. Observations:                   8   AIC:                             54.63
Df Residuals:                       5   BIC:                             54.87
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.5226      4.431      1.246      0.2

  "anyway, n=%i" % int(n))


This table is very comprehensive. You can find many statistical values associated with linear regression including 𝑅², 𝑏₀, 𝑏₁, and 𝑏₂.

In this particular case, you might obtain the warning related to kurtosistest. This is due to the small number of observations provided.

You can extract any of the values from the table above. Here’s an example:

In [None]:
print('coefficient of determination:', results.rsquared)

print('adjusted coefficient of determination:', results.rsquared_adj)

print('regression coefficients:', results.params)

coefficient of determination: 0.8615939258756776
adjusted coefficient of determination: 0.8062314962259487
regression coefficients: [5.52257928 0.44706965 0.25502548]


That’s how you obtain some of the results of linear regression:

- .rsquared holds 𝑅².

- .rsquared_adj represents adjusted 𝑅² (𝑅² corrected according to the number of input features).
- .params refers the array with 𝑏₀, 𝑏₁, and 𝑏₂ respectively.

You can also notice that these results are identical to those obtained with scikit-learn for the same problem.

To find more information about the results of linear regression, please visit the official documentation page.

Step 5: Predict response

You can obtain the predicted response on the input values used for creating the model using .fittedvalues or .predict() with the input array as the argument:

In [None]:
print('predicted response:', results.fittedvalues, sep='\n')



print('predicted response:', results.predict(X), sep='\n')

predicted response:
[ 5.77760476  8.012953   12.73867497 17.9744479  23.97529728 29.4660957
 38.78227633 41.27265006]
predicted response:
[ 5.77760476  8.012953   12.73867497 17.9744479  23.97529728 29.4660957
 38.78227633 41.27265006]


This is the predicted response for known inputs. If you want predictions with new regressors, you can also apply .predict() with new data as the argument:

In [None]:
X

array([[ 1.,  0.,  1.],
       [ 1.,  5.,  1.],
       [ 1., 15.,  2.],
       [ 1., 25.,  5.],
       [ 1., 35., 11.],
       [ 1., 45., 15.],
       [ 1., 55., 34.],
       [ 1., 60., 35.]])

In [None]:
X_new = sm.add_constant(np.arange(10).reshape((-1, 2)))
print(X_new)

[[1. 0. 1.]
 [1. 2. 3.]
 [1. 4. 5.]
 [1. 6. 7.]
 [1. 8. 9.]]


In [None]:
X_new.shape

(5, 3)

In [None]:
y_new = results.predict(X_new)
print(y_new)

[ 5.77760476  7.18179502  8.58598528  9.99017554 11.3943658 ]


In [None]:
y_new.shape

(5,)

# Multiple Regression (StatsModel)

In [None]:
import numpy as np
import statsmodels.api as sm

In [None]:
#Dumy Data
X = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
X, y = np.array(X), np.array(y)

In [None]:
X = sm.add_constant(X)

In [None]:
model = sm.OLS(y, X)

In [None]:
results = model.fit()

In [None]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.862
Model:                            OLS   Adj. R-squared:                  0.806
Method:                 Least Squares   F-statistic:                     15.56
Date:                Tue, 08 Sep 2020   Prob (F-statistic):            0.00713
Time:                        15:19:48   Log-Likelihood:                -24.316
No. Observations:                   8   AIC:                             54.63
Df Residuals:                       5   BIC:                             54.87
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.5226      4.431      1.246      0.2

  "anyway, n=%i" % int(n))


In [None]:
results.rsquared

0.8615939258756776

In [None]:
results.rsquared_adj

0.8062314962259487

In [None]:
results.params

array([5.52257928, 0.44706965, 0.25502548])

In [None]:
results.fittedvalues

array([ 5.77760476,  8.012953  , 12.73867497, 17.9744479 , 23.97529728,
       29.4660957 , 38.78227633, 41.27265006])

In [None]:
results.predict(X)

array([ 5.77760476,  8.012953  , 12.73867497, 17.9744479 , 23.97529728,
       29.4660957 , 38.78227633, 41.27265006])

In [None]:
X_new = sm.add_constant(np.arange(10).reshape((-1, 2)))

In [None]:
y_new = results.predict(X_new)
y_new

array([ 5.77760476,  7.18179502,  8.58598528,  9.99017554, 11.3943658 ])

### Polynomial Regression (Sklearn)

Step 1: Import packages and classes

- In addition to numpy and sklearn.linear_model.LinearRegression, you should also import the class PolynomialFeatures from sklearn.preprocessing

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

Step 2a: Provide data

In [None]:
X = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([15, 11, 2, 8, 25, 32])

- Now you have the input and output in a suitable format. Keep in mind that you need the input to be a two-dimensional array. That’s why .reshape() is used.

Step 2b: Transform input data

This is the new step you need to implement for polynomial regression!

- As you’ve seen earlier, you need to include 𝑥² (and perhaps other terms) as additional features when implementing polynomial regression. For that reason, you should transform the input array x to contain the additional column(s) with the values of 𝑥² (and eventually more features).

- It’s possible to transform the input array in several ways (like using insert() from numpy), but the class PolynomialFeatures is very convenient for this purpose. Let’s create an instance of this class:

[link text](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)

In [None]:
transformer = PolynomialFeatures(degree=2, include_bias=False)

The variable transformer refers to an instance of PolynomialFeatures which you can use to transform the input X.

You can provide several optional parameters to PolynomialFeatures:

- degree is an integer (2 by default) that represents the degree of the polynomial regression function.
- interaction_only is a Boolean (False by default) that decides whether to include only interaction features (True) or all features (False).
- include_bias is a Boolean (True by default) that decides whether to include the bias (intercept) column of ones (True) or not (False).

Before applying transformer, you need to fit it with .fit()

In [None]:
transformer.fit(X)

PolynomialFeatures(degree=2, include_bias=False, interaction_only=False,
                   order='C')

Once transformer is fitted, it’s ready to create a new, modified input. You apply .transform() to do that:

In [None]:
X_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)

That’s fitting and transforming the input array in one statement with .fit_transform(). It also takes the input array and effectively does the same thing as .fit() and .transform() called in that order. It also returns the modified array. This is how the new input array looks:

In [None]:
X

array([[ 5],
       [15],
       [25],
       [35],
       [45],
       [55]])

In [None]:
X_

array([[   5.,   25.],
       [  15.,  225.],
       [  25.,  625.],
       [  35., 1225.],
       [  45., 2025.],
       [  55., 3025.]])

The modified input array contains two columns: one with the original inputs and the other with their squares.

Step 3: Create a model and fit it

This step is also the same as in the case of linear regression. You create and fit the model:

In [None]:
model = LinearRegression().fit(X_, y)

The regression model is now created and fitted. It’s ready for application.

You should keep in mind that the first argument of .fit() is the modified input array x_ and not the original x.

Step 4: Get results

In [None]:
R_sq = model.score(X_, y)
print('coefficient of determination:', R_sq)

print('intercept:', model.intercept_)

print('coefficients:', model.coef_)

coefficient of determination: 0.8908516262498564
intercept: 21.37232142857144
coefficients: [-1.32357143  0.02839286]


Again, .score() returns 𝑅². Its first argument is also the modified input X_, not x. The values of the weights are associated to .intercept_ and .coef_: .intercept_ represents 𝑏₀, while .coef_ references the array that contains 𝑏₁ and 𝑏₂ respectively.

You can obtain a very similar result with different transformation and regression arguments:

In [None]:
X_ = PolynomialFeatures(degree=2, include_bias=True).fit_transform(X)

If you call PolynomialFeatures with the default parameter include_bias=True (or if you just omit it), you’ll obtain the new input array X_ with the additional leftmost column containing only ones. This column corresponds to the intercept. This is how the modified input array looks in this case:

In [None]:
X_

array([[1.000e+00, 5.000e+00, 2.500e+01],
       [1.000e+00, 1.500e+01, 2.250e+02],
       [1.000e+00, 2.500e+01, 6.250e+02],
       [1.000e+00, 3.500e+01, 1.225e+03],
       [1.000e+00, 4.500e+01, 2.025e+03],
       [1.000e+00, 5.500e+01, 3.025e+03]])

The first column of X_ contains ones, the second has the values of x, while the third holds the squares of x.

The intercept is already included with the leftmost column of ones, and you don’t need to include it again when creating the instance of LinearRegression. Thus, you can provide fit_intercept=False. This is how the next statement looks:

In [None]:
model = LinearRegression(fit_intercept=False).fit(X_, y)

In [None]:
R_sq = model.score(X_, y)
print('coefficient of determination:', R_sq)

print('intercept:', model.intercept_)

print('coefficients:', model.coef_)

coefficient of determination: 0.8908516262498564
intercept: 0.0
coefficients: [21.37232143 -1.32357143  0.02839286]


You see that now .intercept_ is zero, but .coef_ actually contains 𝑏₀ as its first element. Everything else is the same.

Step 5: Predict response

 you want to get the predicted response, just use .predict(), but remember that the argument should be the modified input X_ instead of the old X:

In [None]:
y_pred = model.predict(X_)
print('predicted response:', y_pred, sep='\n')

predicted response:
[15.46428571  7.90714286  6.02857143  9.82857143 19.30714286 34.46428571]


Summary

In [None]:
# Step 1: Import packages
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

In [None]:
# Step 2a: Provide data
X = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
X, y = np.array(X), np.array(y)

In [None]:
# Step 2b: Transform input data
X_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)

In [None]:
# Step 3: Create a model and fit it
model = LinearRegression().fit(X_, y)

In [None]:
# Step 4: Get results
R_sq = model.score(X_, y)
intercept, coefficients = model.intercept_, model.coef_

In [None]:
# Step 5: Predict
y_pred = model.predict(X_)

print('coefficient of determination:', R_sq)

print('intercept:', intercept)

print('coefficients:', coefficients, sep='\n')


print('predicted response:', y_pred, sep='\n')

coefficient of determination: 0.9453701449127822
intercept: 0.8430556452395876
coefficients:
[ 2.44828275  0.16160353 -0.15259677  0.47928683 -0.4641851 ]
predicted response:
[ 0.54047408 11.36340283 16.07809622 15.79139    29.73858619 23.50834636
 39.05631386 41.92339046]


In this case, there are six regression coefficients (including the intercept), as shown in the estimated regression function **𝑓(𝑥₁, 𝑥₂) = 𝑏₀ + 𝑏₁𝑥₁ + 𝑏₂𝑥₂ + 𝑏₃𝑥₁² + 𝑏₄𝑥₁𝑥₂ + 𝑏₅𝑥₂².**

You can also notice that polynomial regression yielded a higher coefficient of determination than multiple linear regression for the same problem. At first, you could think that obtaining such a large 𝑅² is an excellent result. It might be.

However, in real-world situations, having a complex model and 𝑅² very close to 1 might also be a sign of overfitting. To check the performance of a model, you should test it with new data, that is with observations not used to fit (train) the model.

### Polynomial Regression

- Implementing polynomial regression with scikit-learn is very similar to linear regression. There is only one extra step: you need to transform the array of inputs to include non-linear terms such as 𝑥².

[Wiki](https://en.wikipedia.org/wiki/Polynomial_regression)

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

In [None]:
X = np.arange(20).reshape(-1, 2)

In [None]:
y = np.arange(10)

In [None]:
poly_interaction = PolynomialFeatures(2)
poly_interaction

PolynomialFeatures(degree=2, include_bias=True, interaction_only=False,
                   order='C')

In [None]:
X_int = poly_interaction.fit_transform(X)
X_int

array([[  1.,   0.,   1.,   0.,   0.,   1.],
       [  1.,   2.,   3.,   4.,   6.,   9.],
       [  1.,   4.,   5.,  16.,  20.,  25.],
       [  1.,   6.,   7.,  36.,  42.,  49.],
       [  1.,   8.,   9.,  64.,  72.,  81.],
       [  1.,  10.,  11., 100., 110., 121.],
       [  1.,  12.,  13., 144., 156., 169.],
       [  1.,  14.,  15., 196., 210., 225.],
       [  1.,  16.,  17., 256., 272., 289.],
       [  1.,  18.,  19., 324., 342., 361.]])

In [None]:
poly_nointeraction = PolynomialFeatures(interaction_only=True)
poly_nointeraction

PolynomialFeatures(degree=2, include_bias=True, interaction_only=True,
                   order='C')

In [None]:
X_noint = poly_nointeraction.fit_transform(X)

In [None]:
polymodel = LinearRegression()

In [None]:
modelint = polymodel.fit(X_int,y)
modelint

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
modelnoint = polymodel.fit(X_noint, y)
modelnoint

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

# Ridge Regression

Ridge regression addresses some of the problems of Ordinary Least Squares by imposing a penalty on the size of the coefficients. The ridge coefficients minimize a penalized residual sum of squares:

min_{w} || X w - y||^2 + alpha ||w||^2
 
The complexity parameter  controls the amount of shrinkage: the larger the value of , the greater the amount of shrinkage and thus the coefficients become more robust to collinearity.

[Tikhonov regularization](https://en.wikipedia.org/wiki/Tikhonov_regularization)

In [None]:
from sklearn import linear_model

In [None]:
ridge = linear_model.Ridge()

In [None]:
ridge = linear_model.Ridge(alpha=.5)
ridge

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [None]:
ridge.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [None]:
ridge.coef_

array([0.34545455, 0.34545455])

In [None]:
ridge.intercept_

0.13636363636363638

In [None]:
ridge1 = linear_model.RidgeCV(alphas=np.logspace(-6, 6, 13))
ridge1

RidgeCV(alphas=array([1.e-06, 1.e-05, 1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01,
       1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06]),
        cv=None, fit_intercept=True, gcv_mode=None, normalize=False,
        scoring=None, store_cv_values=False)

In [None]:
ridge1.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])

RidgeCV(alphas=array([1.e-06, 1.e-05, 1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01,
       1.e+02, 1.e+03, 1.e+04, 1.e+05, 1.e+06]),
        cv=None, fit_intercept=True, gcv_mode=None, normalize=False,
        scoring=None, store_cv_values=False)

In [None]:
ridge1.alpha_

0.01

In [None]:
ridge1.intercept_

0.052357320099234994

In [None]:
ridge1.coef_

array([0.47146402, 0.47146402])

# LASSO Regression

The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer non-zero coefficients, effectively reducing the number of features upon which the given solution is dependent. For this reason Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero coefficients 

In [None]:
from sklearn import linear_model

In [None]:
reg = linear_model.Lasso(alpha=0.1)

In [None]:
reg.fit([[0, 0], [1, 1]], [0, 1])

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

In [None]:
from sklearn.linear_model import LassoCV
from sklearn.datasets import make_regression
X, y = make_regression(noise=4, random_state=0) # Generate a random regression problem.
regl = LassoCV(cv=5, random_state=0).fit(X, y)

In [None]:
regl.alpha_

0.39641795520113104

In [None]:
regl.score(X, y)

0.9993566905623871

In [None]:
regl.predict(X[:1,])

array([-78.49519808])

# ElasticNet Regression

ElasticNet is a linear regression model trained with both 
 and norm regularization of the coefficients. This combination allows for learning a sparse model where few of the weights are non-zero like Lasso, while still maintaining the regularization properties of Ridge. We control the convex combination of and using the l1_ratio parameter.

 Elastic-net is useful when there are multiple features which are correlated with one another. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.

A practical advantage of trading-off between Lasso and Ridge is that it allows Elastic-Net to inherit some of Ridge’s stability under rotation.

In [None]:
from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression

In [None]:
X, y = make_regression(n_features=2, noise = 4, random_state=0)
regr = ElasticNet(random_state=0)
regr.fit(X, y)

print(regr.coef_)

print(regr.intercept_)

print(regr.predict([[0, 0]]))

[18.69025284 64.8595966 ]
1.2357369852358757
[1.23573699]


In [None]:
from sklearn.linear_model import ElasticNetCV
from sklearn.datasets import make_regression

In [None]:
X, y = make_regression(n_features=2, random_state=0)
regr = ElasticNetCV(cv=5, random_state=0)
regr.fit(X, y)

print(regr.alpha_)

print(regr.intercept_)

print(regr.predict([[0, 0]]))

0.1994727942696716
0.3988829654276791
[0.39888297]


**sklearn.linear_model: Linear Models**

The sklearn.linear_model module implements a variety of linear models.


**Classical linear regressors**
    linear_model.LinearRegression(*[, …])

Ordinary least squares Linear Regression.

    linear_model.Ridge([alpha, fit_intercept, …])

Linear least squares with l2 regularization.

    linear_model.RidgeCV([alphas, …])

Ridge regression with built-in cross-validation.

    linear_model.SGDRegressor([loss, penalty, …])

Linear model fitted by minimizing a regularized empirical loss with SGD

**Regressors with variable selection**

The following estimators have built-in variable selection fitting procedures, but any estimator using a L1 or elastic-net penalty also performs variable selection: typically SGDRegressor or SGDClassifier with an appropriate penalty.

    linear_model.ElasticNet([alpha, l1_ratio, …])

Linear regression with combined L1 and L2 priors as regularizer.

    linear_model.ElasticNetCV(*[, l1_ratio, …])

Elastic Net model with iterative fitting along a regularization path.

    linear_model.Lars(*[, fit_intercept, …])

Least Angle Regression model a.k.a.

    linear_model.LarsCV(*[, fit_intercept, …])

Cross-validated Least Angle Regression model.

    linear_model.Lasso([alpha, fit_intercept, …])

Linear Model trained with L1 prior as regularizer (aka the Lasso)

    linear_model.LassoCV(*[, eps, n_alphas, …])

Lasso linear model with iterative fitting along a regularization path.

    linear_model.LassoLars([alpha, …])

Lasso model fit with Least Angle Regression a.k.a.

    linear_model.LassoLarsCV(*[, fit_intercept, …])

Cross-validated Lasso, using the LARS algorithm.

    linear_model.LassoLarsIC([criterion, …])

Lasso model fit with Lars using BIC or AIC for model selection

    linear_model.OrthogonalMatchingPursuit(*[, …])

Orthogonal Matching Pursuit model (OMP)

    linear_model.OrthogonalMatchingPursuitCV(*)

Cross-validated Orthogonal Matching Pursuit model (OMP).


## R

### Import Libraries

In [None]:
#R Magic Command
%load_ext rpy2.ipython

  from pandas.core.index import Index as PandasIndex


In [None]:
%%R
library(tidyverse)

R[write to console]: ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

R[write to console]: ✔ ggplot2 3.3.2     ✔ purrr   0.3.4
✔ tibble  3.0.3     ✔ dplyr   1.0.2
✔ tidyr   1.1.1     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.5.0

R[write to console]: ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()



# **Day 3 (PRO) - Modeling with Real Data (Preprocessing / Feature Engineering)**

## Python

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Datset

In [None]:
from sklearn import datasets
X, y = datasets.make_regression(n_samples = 1000, n_features = 20, n_targets=1, bias = 22, random_state=12)

### Exploratory Data Analysis

### Preprocessing

#### Data Conversion

#### Categorical Variable

#### Missing Value

#### Standardization

### Feature Engineering

#### Factor Extraction

#### Train Test Split

In [None]:
X_train

### Model Building

### Model Evaluation

### Model Hyperparameter Tuning

### Model Selection

### Model Prediction

### Interpretation

## R

### Import Libraries

In [None]:
#R Magic Command
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [None]:
%%R
library(tidyverse)

R[write to console]: ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

R[write to console]: ✔ ggplot2 3.3.2     ✔ purrr   0.3.4
✔ tibble  3.0.3     ✔ dplyr   1.0.0
✔ tidyr   1.1.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.5.0

R[write to console]: ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()



# Libraries and Codes Used

## Python

In [None]:
from sklearn import datasets
X, y = datasets.make_regression(n_samples = 1000, n_features = 20, n_targets=1, bias = 22, random_state=12)

In [None]:
from sklearn.linear_model import LinearRegression
linear = LinearRegression()
linear

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [None]:
from sklearn.linear_model import Ridge
ridge = Ridge()
ridge

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

In [None]:
from sklearn.linear_model import Lasso
lasso = Lasso()
lasso

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

In [None]:
from sklearn.linear_model import ElasticNet
elastic = ElasticNet()
elastic

ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, l1_ratio=0.5,
           max_iter=1000, normalize=False, positive=False, precompute=False,
           random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [None]:
linear.fit(X,y)
ridge.fit(X,y)
lasso.fit(X,y)
elastic.fit(X,y)

ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, l1_ratio=0.5,
           max_iter=1000, normalize=False, positive=False, precompute=False,
           random_state=None, selection='cyclic', tol=0.0001, warm_start=False)

In [None]:
linear.coef_
linear.intercept_
linear.predict(X)

In [None]:
from sklearn.linear_model import RidgeCV
ridgecv = RidgeCV()
ridgecv

RidgeCV(alphas=array([ 0.1,  1. , 10. ]), cv=None, fit_intercept=True,
        gcv_mode=None, normalize=False, scoring=None, store_cv_values=False)

In [None]:
from sklearn.linear_model import LassoCV
lassocv = LassoCV()
lassocv

LassoCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
        max_iter=1000, n_alphas=100, n_jobs=None, normalize=False,
        positive=False, precompute='auto', random_state=None,
        selection='cyclic', tol=0.0001, verbose=False)

In [None]:
from sklearn.linear_model import ElasticNetCV
elasticcv = ElasticNetCV()
elasticcv

ElasticNetCV(alphas=None, copy_X=True, cv=None, eps=0.001, fit_intercept=True,
             l1_ratio=0.5, max_iter=1000, n_alphas=100, n_jobs=None,
             normalize=False, positive=False, precompute='auto',
             random_state=None, selection='cyclic', tol=0.0001, verbose=0)

# Reference

[Statistical forecasting: notes on regression and time series analysis](http://people.duke.edu/~rnau/411home.htm)

# Interiew Question

In [None]:
# Q What does the y intercept tells in regression
# The predicted value of Y when X = 0.(c)
#	The estimated change in average Y per unit change in X
#	The predicted value of Y
#	The variation around the line of regression

In [None]:
# Q What does the standard error of the estimate measure?
#	The total variation of the Y variable
#	The variation around the regression line (c)
#	The explained variation
#	The variation of the X variable

In [None]:
Q What information is contained in the coefficient of determination?
# The coefficient of correlation is larger than one
#	Whether r has any significance
#	We should not partition the total variation
#	The proportion of total variation in Y that is explained by X (c)


In [None]:
# Q What do residuals represent?
# The difference between the actual Y values and the mean of Y.
#	The difference between the actual Y values and the predicted Y values. (c)
#	The square root of the slope.
#	The predicted value of Y for the average X value.

In [None]:
# Q Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity
# True (C)
# False

In [None]:
# Q Correct statement about Adj-R sq
# adjusted for the number of predictors in the model
# incorporates model’s degree of freedom
# increases if the new term improves the model accuracy
# all of the above (C)

In [None]:
# Q What is the range of alpha in ridge regression
# 0 - 1
# 0 - 10
# 0 - 100
# 0 - inf (C)
# Exp : setting λ to 0 is the same as using the OLS, while the larger its value, the stronger is the coefficients' size penalized.

In [None]:
# Q as alpha increase in ridge regression then
# the variance decreases, and the bias increases (C)
# the variance increase, and the bias decrease 
# the variance decreases, and the bias decrease
# the variance increase, and the bias increases

In [None]:
# Q In ridge regression assumes the predictors to be scaled to z-scores because
# scaling ensures that the penalty term penalizes each coefficient equally
# If the predictors are not standardized, their standard deviations are not all equal to one
# none of the above
# all of the above (C)

In [None]:
# Q Which is correct option
# Least Absolute Shrinkage and Selection Operator penalizes the sum of coefficients absolute values (L1 penalty)
# ridge regression penalizes sum of squared coefficients (the so-called L2 penalty) 
# for high values of alpha many coefficients are exactly zeroed under lasso 
# for high values of alpha coefficients are never zero in ridge regression

In [None]:
# Q A candy bar manufacturer is interested in estimating how sales are influenced by the price of their product. They randomly choose 7 cities of similar characteristics and offer a candy bar at different prices in each of them, then collect the following data
# City = [Delhi, Mumbai, Hyderabad, Chennai, Pune, Banglore, Chandigarh]
# Price = [2, 2.4, 3.2, 1.8, 2.2, 2.6, 1.9]
# Sales = [100, 80, 60, 120, 95, 65, 115]
#  If candy bar sales are the dependent variable (Y) and the company conducts a simple linear regression, what is the estimated slope parameter for the candy bar price and sales data?
from sklearn import linear_model
import numpy as np
reg = linear_model.LinearRegression()
X = np.array([2, 2.4, 3.2, 1.8, 2.2, 2.6, 1.9]).reshape(-1,1)
reg.fit(X, [100, 80, 60, 120, 95, 65, 115])
print(reg.coef_)
print(reg.intercept_)

# Q What is the sales of candy a price Rs 3
y =  reg.intercept_ + 3*(reg.coef_)
y


[-45.07042254]
194.37625754527164


array([59.16498994])

In [None]:
# Q What is the coefficient of simple linear regression of X,y = ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

reg.coef_

array([0.5, 0.5])

In [None]:
# Q What is the coefficient of ridge regression of X,y = ([[0, 0], [0, 0], [1, 1]], [0, .1, 2]) at alpha = 1
from sklearn import linear_model
reg = linear_model.Ridge(alpha= 1)
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 2])

reg.coef_

array([0.55714286, 0.55714286])

In [None]:
# Q What is the coefficient of lasso regression of X,y = ([[0, 0], [1, 1]], [0, 1]) at alpha = 0.1
from sklearn import linear_model
reg = linear_model.Lasso(alpha=0.1)
reg.fit([[0, 0], [1, 1]], [0, 1])

reg.predict([[1, 1]])

array([0.8])

In [None]:
# Q3. What is coefficients of Linear Regression
from sklearn import linear_model
from sklearn import datasets
X, y = datasets.make_regression(n_samples = 1000, n_features = 2, n_targets=1, bias = 22, random_state=231)
reg = linear_model.LinearRegression()
reg.fit(X,y)
reg.coef_

array([19.96527514, 74.94024527])

In [None]:
# Q3. What is coefficients of ridge regression
from sklearn import linear_model
from sklearn import datasets
X, y = datasets.make_regression(n_samples = 1000, n_features = 2, n_targets=1, bias = 22, random_state=111)
reg = linear_model.Ridge()
reg.fit(X,y)
reg.coef_

array([59.00652908,  3.52060773])