In [8]:
# Linear regression : Linear regression is a basic and commonly used type of predictive analysis.  The overall idea of 
# regression is to examine two things: (1) does a set of predictor variables do a good job in predicting an outcome 
# (dependent) variable?  (2) Which variables in particular are significant predictors of the outcome variable, and in 
# what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?  These regression estimates 
# are used to explain the relationship between one dependent variable and one or more independent variables.  The simplest form of the
#  regression equation with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent
#  variable score, c = constant, b = regression coefficient, and x = score on the independent variable.

# Linear regression in Python 
# Step 1: Import packages and classes

# The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:
import numpy as np
from sklearn.linear_model import LinearRegression

In [9]:
# Step 2: Provide data

# The second step is defining data to work with. The inputs (regressors, 𝑥) 
# and output (predictor, 𝑦) should be arrays (the instances of the class numpy.ndarray) or
#  similar objects. This is the simplest way of providing data for regression:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([5, 20, 14, 32, 22, 38])

# Now, you have two arrays: the input x and output y. You should call .reshape() on x because this array 
# is required to be two-dimensional, or to be more precise, to have one column and as many rows as necessary. 


In [10]:
# Step 3: Create a model and fit it

# The next step is to create a linear regression model and fit it using the existing data.

# Let’s create an instance of the class LinearRegression, which will represent the regression model:


model = LinearRegression()


# This statement creates the variable model as the instance of LinearRegression. You can provide several optional parameters to LinearRegression:

# fit_intercept is a Boolean (True by default) that decides whether to calculate the intercept 𝑏₀ (True) or consider it equal to zero (False).
# normalize is a Boolean (False by default) that decides whether to normalize the input variables (True) or not (False).
# copy_X is a Boolean (True by default) that decides whether to copy (True) or overwrite the input variables (False).
# n_jobs is an integer or None (default) and represents the number of jobs used in parallel computation. None usually means one job and -1 to use all processors.

In [11]:
#  Step 3: With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using the existing input 
# and output (x and y) as the arguments. In other words, .fit() fits the model. It returns self, which is the variable model itself. 
model.fit(x, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [14]:
#  Step 4: Get results

# Once you have your model fitted, you can get the results to check whether the model works satisfactorily and interpret it.

# You can obtain the coefficient of determination (𝑅²) with .score() called on model:
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)

coefficient of determination: 0.7158756137479542
