## Linear regression based Algorithm

The most basic machine learning algorithm that can be implemented on the data is linear regression. 
The linear regression model returns an equation that determines the relationship between the independent variables and the dependent variable.

The equation for linear regression can be written as:  
                **Y = W1X1 + W2X2 + .....+ WnXn**

Here, X1, X2,….Xn represent the independent variables while the coefficients W1, W2, …. Wn  represent the weights. 

We are using the date column to extract features like – day, month, year,  mon/fri etc. and then fit a linear regression model.


In [None]:
# import packages
import pandas as pd
import numpy as np

# to plot within notebook
import matplotlib.pyplot as plt
%matplotlib inline

# setting figure size
from matplotlib.pylab import rcParams

rcParams['figure.figsize'] = 20, 10

# for normalizing data
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))

# read the file
df = pd.read_csv('NSE-TATAGLOBAL.csv')

# print the head
df.head()


In [None]:
# setting index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')
df.index = df['Date']


In [None]:
# sorting
data = df.sort_index(ascending=True, axis=0)

In [None]:
# creating a separate dataset
new_data = pd.DataFrame(index=range(0, len(df)), columns=['Date', 'Close'])

for i in range(0, len(data)):
    new_data['Date'][i] = data['Date'][i]
    new_data['Close'][i] = data['Close'][i]


In [None]:
total = len(new_data)
cut_length = 1687
remain = total - cut_length
total, cut_length, remain


In [None]:
# attr = ['Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
#             'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start']

year, month, week, day, day_of_week, day_of_year = [], [], [], [], [], []
Is_month_end, Is_month_start, Is_quarter_end, Is_quarter_start, Is_year_end, Is_year_start = [], [], [], [], [], []
date = new_data['Date'].values
for ts in date:
    dt_year, dt_week, dt_weekday = ts.isocalendar()
    year.append(dt_year)
    month.append(ts.month)
    week.append(dt_week)
    day.append(ts.day)
    day_of_week.append(dt_weekday)
    day_of_year.append(ts.timetuple().tm_yday)
    Is_month_end.append(ts.is_month_end)
    Is_month_start.append(ts.is_month_start)
    Is_quarter_end.append(ts.is_quarter_end)
    Is_quarter_start.append(ts.is_quarter_start)
    Is_year_end.append(ts.is_year_end)
    Is_year_start.append(ts.is_year_start)

print(len(year), len(month), len(week), len(day), len(day_of_week), len(day_of_year), len(Is_month_end),
      len(Is_month_start), len(Is_quarter_end), len(Is_quarter_start), len(Is_year_start), len(Is_year_end))


In [None]:
new_data['Year'] = year
new_data['Month'] = month
new_data['Week'] = week
new_data['Day'] = day
new_data['Dayofweek'] = day_of_week
new_data['Dayofyear'] = day_of_year
new_data['Is_month_end'] = Is_month_end
new_data['Is_month_start'] = Is_month_start
new_data['Is_quarter_end'] = Is_quarter_end
new_data['Is_quarter_start'] = Is_quarter_start
new_data['Is_year_end'] = Is_year_end
new_data['Is_year_start'] = Is_year_start

new_data.drop('Date', axis=1, inplace=True)  # elapsed will be the time stamp


In [None]:
# split into train and validation
train = new_data[:cut_length]
valid = new_data[cut_length:]

new_data.shape, train.shape, valid.shape


In [None]:
x_train = train.drop('Close', axis=1)
y_train = train['Close']
x_valid = valid.drop('Close', axis=1)
y_valid = valid['Close']


In [None]:
# y_train


In [None]:
# implement linear regression
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(x_train, y_train)


In [None]:
# make predictions and find the rmse
preds = model.predict(x_valid)
rms = np.sqrt(np.mean(np.power((np.array(y_valid) - np.array(preds)), 2)))
rms


In [None]:
# plot
valid['Predictions'] = 0
valid['Predictions'] = preds

valid.index = new_data[cut_length:].index
train.index = new_data[:cut_length].index

plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
