In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

In [2]:
#.reshape() on x because this array is required to be two-dimensional, 
#or to be more precise, to have one column and as many rows as necessary.
#That‚Äôs exactly what the argument (-1, 1) of .reshape() specifies.
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])

In [3]:
#Let‚Äôs create an instance of the class LinearRegression, 
#which will represent the regression model and we will fit using the existing data
model = LinearRegression()

You can provide several optional parameters to LinearRegression:
    1. fit_intercept is a Boolean (True by default) that decides whether to calculate the intercept ùëè‚ÇÄ (True) or consider it equal to zero (False).
    2. normalize is a Boolean (False by default) that decides whether to normalize the input variables (True) or not (False).
    3. copy_X is a Boolean (True by default) that decides whether to copy (True) or overwrite the input variables (False).
    4. n_jobs is an integer or None (default) and represents the number of jobs used in parallel computation. None usually means one job and -1 to use all processors.

In [4]:
#We start using the model. First, we need to call .fit() on model:

model.fit(x, y)

LinearRegression()

With .fit(), you calculate the optimal values of the weights ùëè‚ÇÄ and ùëè‚ÇÅ, using the existing input and output (x and y) as the arguments. In other words, .fit() fits the model. It returns self, which is the variable model itself. That‚Äôs why you can replace the last two statements with this one:

In [5]:
model = LinearRegression().fit(x, y)

Once you have your model fitted, you can get the results to check whether the model works satisfactorily and interpret it.

In [6]:
# We can obtain the coefficient of determination (ùëÖ¬≤) with .score() called on model:
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)

coefficient of determination: 0.7158756137479542


When you‚Äôre applying .score(), the arguments are also the predictor x and regressor y, and the return value is ùëÖ¬≤.

The attributes of model are .intercept_, which represents the coefficient, ùëè‚ÇÄ and .coef_, which represents ùëè‚ÇÅ:

In [7]:
print('intercept:', model.intercept_)
print('slope:', model.coef_)

intercept: 5.633333333333329
slope: [0.54]


The code above illustrates how to get ùëè‚ÇÄ and ùëè‚ÇÅ. You can notice that .intercept_ is a scalar, while .coef_ is an array.

In [8]:
# Once there is a satisfactory model, you can use it for predictions with either existing or new data.
# To obtain the predicted response, use .predict():

In [9]:
y_pred = model.predict(x)
print('predicted response:', y_pred, sep='\n')

predicted response:
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]


In [10]:
# When applying .predict(), you pass the regressor as the argument and get the corresponding predicted response.
y_pred = model.intercept_ + model.coef_ * x
print('predicted response:', y_pred, sep='\n')

predicted response:
[[ 8.33333333]
 [13.73333333]
 [19.13333333]
 [24.53333333]
 [29.93333333]
 [35.33333333]]


In practice, regression models are often applied for forecasts. This means that you can use fitted models to calculate the outputs based on some other, new inputs as shown below:


In [14]:
x_new = np.arange(5).reshape((-1, 1))
y_new = model.predict(x_new)
print(y_new)

[5.63333333 6.17333333 6.71333333 7.25333333 7.79333333]
