# Linear Regression

Linear regression assumes a linear or straight line relationship between the input variables (X) and the single output variable (y).

More specifically, that output(y) can be calculated from a linear combination of the input variables(X). When there is a single input variable, the method is referred to as a simple linear regression.

In simple linear regression we can use statistics on the training data to estimate the coefficients required by the model to make predictions on new data.

The line for a simple linear regression model can be written as:

y = b0 + b1 * x

### Simple Linear Regression With Scikit-learn

There are five basic steps when you're imlementing linear regression:

1. Import the packages and classes you need.
2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.

These steps are more or less general for most of the regression approaches and implementations.

#### Step 1: Import packages and classes

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial regression and make predictions accordingly.

#### Step 2: Provide data

In [2]:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])

You have two arrys: the input x and output y. You should call .reshape() on x because this array is required to be two-dimensional, or to be more precise, to have one column and as many rows as necessary.

In [3]:
print(x)
print('x shape : ', x.shape)
print(y)
print('y shape : ', y.shape)

[[ 5]
 [15]
 [25]
 [35]
 [45]
 [55]]
x shape :  (6, 1)
[ 5 20 14 32 22 38]
y shape :  (6,)


#### Step 3: Create a model and fit it

Let's create an instance of the class LinearRegression, which will represent the regression model:

In [4]:
model = LinearRegression()

Parameters: fit_intercept, normalize, copy_X, n_jobs, positive

In [5]:
model.fit(x, y)

LinearRegression()

With .fit(), you calculate the optimal values of the weights b0 and b1, using the existing input and output(x and y) as the arguments.

#### Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works satisfactorily and interpret it.

In [6]:
r_sq = model.score(x, y)
print('coefficient of determination: ', r_sq)

coefficient of determination:  0.7158756137479542


The attributes of model are .intercept_, which represents the coefficient, b0 and .coef_, which represents b1:

In [7]:
print('intercept: ', model.intercept_)
print('slope: ', model.coef_)

intercept:  5.633333333333329
slope:  [0.54]


You can notice that .intercept_ is a scalar, while .coef_ is an array.

The value b0 = 5.63 illustrates that your model predicts the responese 5.63 when x is zero. The value b1 = 0.54 means that the predicted response rises by 0.54 when x is increased by one.

#### Step5: Predict response

In [8]:
y_pred = model.predict(x)
print('predicted response: ', y_pred)

predicted response:  [ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


This is a nearly identical way to predict the response:

In [9]:
y_pred = model.intercept_ + model.coef_ * x
print('predicted response: ', y_pred)

predicted response:  [[ 8.33333333]
 [13.73333333]
 [19.13333333]
 [24.53333333]
 [29.93333333]
 [35.33333333]]


### Multiple Linear Regression With Scikit-learn

#### Step 1 and 2: Import packages and classes, and provide data

In [10]:
x = [[0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]]
y = [4, 5, 20, 14, 32, 22, 38, 43]
x, y = np.array(x), np.array(y)

In [11]:
print(x)

[[ 0  1]
 [ 5  1]
 [15  2]
 [25  5]
 [35 11]
 [45 15]
 [55 34]
 [60 35]]


In [12]:
print(y)

[ 4  5 20 14 32 22 38 43]


In multiple linear regression, x is a two-dimensional array with at least two columns, while y is usually a one-dimensional array.

#### Step 3: Create a model and fit it

In [13]:
model = LinearRegression().fit(x, y)

#### Step 4: Get results

In [14]:
r_sq = model.score(x, y)
print('coefficient of determination: ', r_sq)
print('intercept: ', model.intercept_)
print('slope: ', model.coef_)

coefficient of determination:  0.8615939258756775
intercept:  5.52257927519819
slope:  [0.44706965 0.25502548]


In this example, the intercept is approximately 5.52, and this is the value of the predicted response when x1 = x2 = 0. The increase of x1 by 1 yields the rise of the predicted response by 0.45. Similarly, when x2 grows by 1, the response rises by 0.26

#### Step 5: Predict response

In [15]:
y_pred = model.predict(x)
print('predicted response: ', y_pred)

predicted response:  [ 5.77760476  8.012953   12.73867497 17.9744479  23.97529728 29.4660957
 38.78227633 41.27265006]


### Polynomial Regression With Scikit-learn

#### Step 1: Import packages and classes

In addition to numpy and sklearn.linear_model.LinearRegression, you should also import the class PolynomialFeatures from sklearn.preprocessing:

In [16]:
from sklearn.preprocessing import PolynomialFeatures

#### Step 2: Provide data

In [17]:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([15, 11, 2, 8, 25, 32])

In [18]:
transformer = PolynomialFeatures(degree = 2, include_bias = False)

* degree is an integer that represents the degree of the polynomial regression function
* include_bias is a Boolean that decides whether to include the bias column of ones or not

Before applying transformer, you need to fit it wiht .fit():

In [19]:
transformer.fit(x)

PolynomialFeatures(include_bias=False)

Once transformer is fitted, it's ready to create a new, modified input.

In [20]:
x_pln = transformer.transform(x)

That's the transformation of the input array with .transform(). It takes the input array as the argument and returns the modified array.

You can also use .fit_transform()

In [21]:
print(x_pln)

[[   5.   25.]
 [  15.  225.]
 [  25.  625.]
 [  35. 1225.]
 [  45. 2025.]
 [  55. 3025.]]


#### Step 3: Create a model and fit it

In [22]:
model = LinearRegression().fit(x_pln, y)

#### Step 4: Get results

In [23]:
r_sq = model.score(x_pln, y)
print('coefficient of determination: ', r_sq)
print('intercept: ', model.intercept_)
print('coefficietns: ', model.coef_)

coefficient of determination:  0.8908516262498563
intercept:  21.37232142857144
coefficietns:  [-1.32357143  0.02839286]


#### Step5: Predict response

In [24]:
y_pred = model.predict(x_pln)
print('predicted response: ', y_pred)

predicted response:  [15.46428571  7.90714286  6.02857143  9.82857143 19.30714286 34.46428571]
