# Basic Linear regression

In this example we manufacture our own data using a linear function.
Then we use `LinerRegression` from `sklearn` to create a model that would predic the result.
For more details see: [sklearn.linear_model.LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from joblib import dump

### Generate the data

You can set any size here.

In [None]:
size = 10
original_data = np.array([[n, 2*n] for n in range(size)], dtype="float32")

In [None]:
original_data

We need to transpose the data to make it easier to use it.

In [None]:
data = np.transpose(original_data.copy())
data

Fix the seed so we can repeate the same example with the same random data

In [None]:
np.random.seed(42)

First we'll use the data as was created by the linear function, later we'll add a bit noise to see how that impacts the results.

In [None]:
noise_level = 0
noise = np.random.random(size)
print(noise)
data[1] = data[1] + noise * noise_level
data

### Show the data

Let's plot the data so we can have a feeling how it looks like.

In [None]:
plt.scatter(data[0], data[1]);

In [None]:
data[0]

The "features" or the "independent variables" are usually stored in a variable called X (capital letter).
In most cases there are many features and thus X is a matrix, but in our simplified case we only have one value.
Therefore we need to massaged it a bit to become a matrix.

In [None]:
X = data[0].reshape((-1, 1))
X

There are usually a lot less "results" or "dependent variables", in many cases there is only one. The values are usually store in the variable y.

In [None]:
y = data[1]
y

### Train the model

We are looking for a linear function like `y = ax+b` for which our cost is the smallest.

We start by creating an object to hold our model.

In [None]:
model = LinearRegression()

In [None]:
print(model)

Then we "train" the model by calling the `fit` method.

In [None]:
model.fit(X, y)

Evaluate the model

`intercept_` is where the line crosses the `y` axis. (It is `b` in the above expression.)
This is aproximately 0 in our case.

In [None]:
model.intercept_

`coef_` is how steep the line is. (It is `a` in the above expression.)

In [None]:
model.coef_

[Coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination)

In [None]:
print('coefficient of determination:', model.score(X, y))

In [None]:
model.predict([[10], [7]])

In [None]:
x1, x2 = min(data[0]), max(data[0]) # 0, size-1
y1, y2 = model.predict([[x1], [x2]])
plt.plot([x1, x2], [y1, y2], color="red");
plt.scatter(data[0], data[1]);

### Save the model

Save the model in a file using some kind of serialization. `joblib` is used frequently

In [None]:
dump(model, 'linear.joblib')

Run the script `basic_linear_regression_predict.py` on the command line to see how to use the model.