Main aim of Linear Regression is to find a real number or a continous value Y (Continous in terms of electronic signals).
So here we will generate a random dataset using equation **Y = MX + C**. Then we will move on to other sample datasets.

### Steps we will follow
1. Split the data into Training data and testing data
2. Use the training data to fit the model
3. Predict the Output for Test data
4. Get the score
5. Get Intercept and coeficient

In [1]:
# Import Sklearn
from sklearn import linear_model

In [2]:
# Create the data 
# we will be using for loop to create the dataset
# X will hold input data
X = []
# Y will hold output
Y = []

# Here we are setting M and C some random value, Our model will predict this value after we fit the model
M = 1.4356
C = 3.5678
for i in range(0, 200):
    X.append(i)
    y = M * i + C
    Y.append(y)
    

In [3]:
# Here we can check X and Y
# only show top 5 results
X[0:5]

[0, 1, 2, 3, 4]

In [4]:
Y[0:5]

[3.5678, 5.0034, 6.439, 7.8746, 9.3102]

Now the data has been generated we will divide the data between training data and test data. We will use 30 percent ration between 
train and test data. 30% of 200 is 60. SO 140 values will in training dataset and 60 in test dataset.

In [5]:
X_train, X_test = X[0:140], X[140:]
Y_train, Y_test = Y[0:140], Y[140:]

In [6]:
# Now we can check length of each data to verify our data is splited or not
print("Train data length " + str(len(X_train)) + " " + str(len(Y_train)))
print("Test data length " + str(len(X_test)) + " " + str(len(Y_test)))

Train data length 140 140
Test data length 60 60


In [7]:
# Create object of model
model = linear_model.LinearRegression()

The Linear regression model has a method call **fit** which takes input X and Y and will train the model
Since it is linear model Y should be a list containing single value in each row. X should be an array of samples.

#### For example
suppose equation is Y = w0 + w1* X1 + w2* X2 + w3*X3
So input data for this equation should contain Three values **X1 X2 X3**
and to feed the model this data it should be in the following format

`
[
[X1a, X2a, X3a]
[X1b, X2b, X3b]
[X1c, X2c, X3c]
[X1d, X2d, X3d]
]
`

Here a, b, c ,d represents row of the data. One more representation

|X1|X2|X3|Y|
|--|--|--|--|
|1|2|3|7|
|3|4|6|10|

So For X we have data in this form 

[0,1,2,3,4 ...]

and we need data in this form

[[0], [1], [2], [4], ....]

To convert data we will use numpy

In [8]:
import numpy as np

In [9]:
X_train, X_test = np.array(X_train).reshape(-1,1),np.array(X_test).reshape(-1,1) 

In [10]:
# Now we will fit the data
model.fit(X_train, Y_train)

In [11]:
# Now we can check the coeficient and intercept if the model is successfully determined it
model.coef_

array([1.4356])

In [12]:
model.intercept_

3.5678000000000196

If you can verify with above data we can check that model is successfull in determining M and C

In [13]:
# Now predit
y_pred = model.predict(X_test)

In [14]:
# IF you check compare predicted output and test output we can check how well the model has performed
# Here we are choosing some random index
print("Predcited output: " + str(y_pred[31]))
print("Predcited output: " + str(Y_test[31]))

Predcited output: 249.05540000000002
Predcited output: 249.0554


In [15]:
# check the model score
# Score takes Test data as input and actual test output values and instead of calculating predictions
# it will calculate the percentage accuracy of model
# 1.0 means model is 100 % accurate based on test data
model.score(X_test, Y_test)

1.0