<h1 style="color:orange;font-family:courier" align="center">Time Series Analysis And Prediction</h1>
<h2 style="color:orange;font-family:verdana;" align="center">NIFTY-50 Stock Market</h2>
<h3 style="color:orange;" align="center">BAJAJ-AUTO</h3>

<h1 style="color:royalblue;">Introduction</h1>
Time series analysis is a statistical technique that deals with time series data, or trend analysis.  Time series data means that data is in a series of  particular time periods or intervals.  The data is considered in three types:

* **Time series data:** A set of observations on the values that a variable takes at different times.

* **Cross-sectional data:** Data of one or more variables, collected at the same point in time.

* **Pooled data:** A combination of time series data and cross-sectional data.

<i style="color:royalblue;">Time Series Analysis is used for many applications such as:</i>
 * Economic Forecasting
 * Sales Forecasting
 * Budgetary Analysis
 * Stock Market Analysis
 * Yield Projections
 * Process and Quality Control
 * Inventory Studies
 * Workload Projections
 * Utility Studies
 * Census Analysis
     
<h2 style="color:royalblue;">Components of Time Series</h2>
Time series data consist of four components:

 * Trend Component: This is a variation that moves up or down in a reasonably predictable pattern over a long period.

 * Seasonality Component: is the variation that is regular and periodic and repeats itself over a specific period such as a day, week, month, season, etc.,

 * Cyclical Component: is the variation that corresponds with business or economic 'boom-bust' cycles or follows their own peculiar cycles, and

 * Random Component: is the variation that is erratic or residual and does not fall under any of the above three classifications.

To make this concept more clear here is a visual interpretation of the various components of the Time Series. You can view the original diagram with its context, [here](https://www.atap.gov.au/tools-techniques/travel-demand-modelling/6-forecasting-evaluation).

![](https://kite.com/wp-content/uploads/2019/08/variations-of-time-series.jpg )
<h3 style="color:royalblue;">Traditional Techniques:</h3> The fitting of time series models can be an ambitious undertaking. There are many methods of model fitting including the following:
 * Box-Jenkins ARIMA models
 * Box-Jenkins Multivariate Models
 * Holt-Winters Exponential Smoothing (single, double, triple)
The user's application and preference will decide the selection of the appropriate technique. It is beyond the realm and intention of the authors of this handbook to cover all these methods. The overview presented here will start by looking at some basic smoothing techniques:
 * Averaging Methods
 * Exponential Smoothing Techniques.
 
<h3 style="color:royalblue;">Modern Technique:</h3>
All these techniques tutorial are mention in this [notebook](https://www.kaggle.com/rohanrao/a-modern-time-series-tutorial) given by @rohanrao
* Auto ARIMAX
* Facebook Prophet
* LightGBM
* LSTM 

<h1 style="color:royalblue;">NIFTY 50</h1>
The NIFTY 50 index National Stock Exchange of India's benchmark broad based stock market index for the Indian equity market. Full form of NIFTY is National Index Fifty. It represents the weighted average of 50 Indian company stocks in 13 sectors and is one of the two main stock indices used in India, the other being the BSE Sensex.

<p>Nifty is owned and managed by India Index Services and Products (IISL), which is a wholly owned subsidiary of the NSE Strategic Investment Corporation Limited. IISL had a marketing and licensing agreement with Standard and Poor's for co-branding equity indices until 2013. The Nifty 50 was launched 1 April 1996, and is one of the many stock indices of Nifty.</p>
**Source:**https://en.wikipedia.org/wiki/NIFTY_50

<h1 style="color:royalblue;">BAJAJ-AUTO</h1>
<h4 style="color:darkblue;">The Company</h4>
The Bajaj Group is amongst the top 10 business houses in India. Its footprint stretches over a wide range of industries, spanning automobiles (two wheelers manufacturer and three wheelers manufacturer), home appliances, lighting, iron and steel, insurance, travel and finance. The group's flagship company, Bajaj Auto, is ranked as the world's fourth largest three and two wheeler manufacturer and the Bajaj brand is well-known across several countries in Latin America, Africa, Middle East, South and South East Asia. Founded in 1926, at the height of India's movement for independence from the British, the group has an illustrious history. The integrity, dedication, resourcefulness and determination to succeed which are characteristic of the group today, are often traced back to its birth during those days of relentless devotion to a common cause. Jamnalal Bajaj, founder of the group, was a close confidant and disciple of Mahatma Gandhi. In fact, Gandhiji had adopted him as his son.
<p>In 2007, Bajaj Auto acquired a 14% stake in KTM that has since grown to 48%. This partnership catalysed Bajaj Auto’s endeavour to democratise motorcycle racing in India. Bajaj Auto today exclusively manufactures Duke range of KTM bikes and exports them worldwide. In FY2018, KTM was the fastest growing motorcycle brand in the country</p>

**Source: https://www.bajajauto.com/**
<h4 style="color:orange;">Before starting, I would like to thanks @parulpandey and @rohanrao for some very amazing inspiration.</h4>
<h4> Kernel Inspiration:</h4>

* https://www.kaggle.com/parulpandey/getting-started-with-time-series-using-pandas

* https://www.kaggle.com/rohanrao/a-modern-time-series-tutorial

# 1. Importing Packages and Collecting Data

In [None]:
'''Import basic modules'''
import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime    
from pandas import Series 
import statsmodels.api as sm

'''import visualization'''
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
%matplotlib inline
import altair as alt # visualization

'''Display markdown formatted output like bold, italic bold etc.'''
from IPython.display import Markdown
def bold(string):
    display(Markdown(string))
    
'''Ignore deprecation and future, and user warnings.'''   
import warnings as wrn
wrn.filterwarnings('ignore', category = DeprecationWarning) 
wrn.filterwarnings('ignore', category = FutureWarning) 
wrn.filterwarnings('ignore', category = UserWarning)

In [None]:
"""Let's look on the Bajaj-Auto stok price dataset"""
stock = pd.read_csv("../input/nifty50-stock-market-data/BAJAJ-AUTO.csv")
stock.head()

In [None]:
"""Let's look on the data info"""
stock.info()

Now that our data has been converted into the desired format, let’s take a look at its various columns for further analysis.

* **The Open and Close columns** indicate the opening and closing price of the stocks on a particular day.
* **The High and Low columns** provide the highest and the lowest price for the stock on a particular day, respectively.
* **The Volume column** tells us the total volume of stocks traded on a particular day.
The **volume weighted average price (VWAP)** is a trading benchmark used by traders that gives the average price a security has traded at throughout the day, based on both volume and price. It is important because it provides traders with insight into both the trend and value of a security.[source](https://www.investopedia.com/terms/v/vwap.asp).

# Data Prearation

In [None]:
stock.Date = pd.to_datetime(stock.Date, format="%Y-%m-%d")
stock["month"] = stock.Date.dt.month
stock["week"] = stock.Date.dt.week
stock["day"] = stock.Date.dt.day
stock["day_of_week"] = stock.Date.dt.dayofweek
stock.fillna(stock.mean(), inplace=True)

stock.set_index("Date", drop=False, inplace=True)
stock.head()

# Exploratory Data Analysis
Let's explore the data and look at details at year, month and day level

# <font color="brown">Volume Weighted Average Price</font>

In [None]:
bars = alt.Chart(stock).mark_trail(color='orange').encode(
    x = 'Date:T',
    y = 'VWAP:Q',
).properties(
    title={
    "text":['Volume Weighted Average Price (VWAP)'],
    "subtitle":['There is a continuos increase in the VWAP price till 2018 and a certain dip in 2019'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=300, width=600)

In [None]:
vwap_df = stock[['VWAP']]
start_date = datetime(2017,1,1)
end_date = datetime(2018,12,1)
temp = vwap_df[(start_date <=vwap_df.index) & (end_date <=vwap_df.index)].reset_index()
bars = alt.Chart(temp).mark_trail(color='orange').encode(
    x = 'Date:T',
    y = 'VWAP:Q',
).properties(
    title={
    "text":['Trend of VWAP in 2019  '],
    "subtitle":['There is a continuos increase in the VWAP price in 2019'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=300, width=600)

## <font color="brown">Open and Close Stock Price</font>

In [None]:
# set base for creat custom legend and plots
base = alt.Chart(stock).transform_calculate(
legend1="'Close prices of stocks'",
legend2="'Open price of stock'",

)
scale = alt.Scale(domain=["Close prices of stocks", "Open price of stock"], range=['blue', 'violet', ])

# timeseries plot of close prices of stocks in blue colour
line1 = base.mark_line(color='blue').encode(
x = 'Date:T',
y = 'Close:Q',
color=alt.Color('legend1:N', scale=scale, title=''),
)

# timeseries plot of open prices of stocks in blue colour
line2 = base.mark_line(color='violet').encode(
x = 'Date:T',
y = 'Open:Q',
color=alt.Color('legend2:N', scale=scale, title='')
)

text = line1.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   


(line1 + line2 + text).properties(
    title={"text":['Timeseries Plot of Close and Open Price of Stock Over Year'],
           "fontSize":15,
           "fontWeight": 'bold',
           "font":'Courier New',},
    height=500, width=600
).interactive()

In [None]:
# set base for creat custom legend and plots
base = alt.Chart(stock).transform_calculate(
legend1="'High prices of stocks'",
legend2="'Low price of stock'",

)
scale = alt.Scale(domain=["High prices of stocks", "Low price of stock"], range=['red', 'green', ])

# timeseries plot of High prices of stocks red colour
line1 = base.mark_line(color='red').encode(
x = 'Date:T',
y = 'High:Q',
color=alt.Color('legend1:N', scale=scale, title=''),
)

# timeseries between low prices of stocks green colour

line2 = base.mark_line(color='green').encode(
x = 'Date:T',
y = 'Low:Q',
color=alt.Color('legend2:N', scale=scale, title='')
)

text = line1.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   


(line1 + line2 + text).properties(
    title= {"text":['Timeseries Plot of High and Price of Stock Over Year'],
                   "fontSize":15,
                   "fontWeight": 'bold',
                   "font":'Courier New',},
    height=500, width=600
).interactive()

# <font color="brown">Moving Average (MA)</font>
The moving average (MA) is a simple technical analysis tool that smooths out price data by creating a constantly updated average price. The average is taken over a specific period of time, like 10 days, 20 minutes, 30 weeks or any time period the trader chooses. [source](https://www.investopedia.com/articles/active-trading/052014/how-use-moving-average-buy-stocks.asp#:~:text=The%20moving%20average%20(MA)%20is,time%20period%20the%20trader%20chooses.)

#### MA of stock over the Weeks

In [None]:
# dataframe Moving Average for weeks 4, 16, 28, 40, 52 

df_ma = pd.DataFrame()
# Grouping the data week by week by taking its average.So there will be total 52 rows in the final list
df_ma['Close'] = stock['Close'].resample('W').mean()

# calculating moving averge
df_ma['weeks_4'] = df_ma['Close'].rolling(window = 4, min_periods = 1).mean()
df_ma['weeks_16'] = df_ma['Close'].rolling(window = 16, min_periods = 1).mean()
df_ma['weeks_28'] = df_ma['Close'].rolling(window = 28, min_periods = 1).mean()
df_ma['weeks_40'] = df_ma['Close'].rolling(window = 40, min_periods = 1).mean()
df_ma['weeks_52'] = df_ma['Close'].rolling(window = 52, min_periods = 1).mean()

df_ma

In [None]:
# set base for creat custom legend and plots
base = alt.Chart(df_ma.reset_index()).transform_calculate(
legend1="'Close price'",
legend2="'MA of weeks 4'",
legend3="'MA of weeks 16'",
legend4="'MA of weeks 28'",
legend5="'MA of weeks 40'",
legend6="'MA of weeks 52'"
)
scale = alt.Scale(domain=["Close price", 
                          "MA of weeks 4",
                          "MA of weeks 16",
                          "MA of weeks 28",
                          "MA of weeks 40",
                          "MA of weeks 52"], 
                  range=['blue', 
                         'gold', 
                         'darkgreen', 
                         'slategray', 
                         'deeppink',
                         'firebrick'])

line1 = base.mark_line(color='blue').encode(
x = 'Date:T',
y = 'Close:Q',
color=alt.Color('legend1:N', scale=scale, title=''),
)

line2 = base.mark_line(color='gold').encode(
x = 'Date:T',
y = 'weeks_4:Q',
color=alt.Color('legend2:N', scale=scale, title='')
)

line3 = base.mark_line(color='darkgreen').encode(
x = 'Date:T',
y = 'weeks_16:Q',
color=alt.Color('legend3:N', scale=scale, title='')
)

line4 = base.mark_line(color='slategray').encode(
x = 'Date:T',
y = 'weeks_28:Q',
color=alt.Color('legend4:N', scale=scale, title='')
)

line5 = base.mark_line(color='deeppink').encode(
x = 'Date:T',
y = 'weeks_40:Q',
color=alt.Color('legend5:N', scale=scale, title='')
)

line6 = base.mark_line(color='firebrick').encode(
x = 'Date:T',
y = 'weeks_52:Q',
color=alt.Color('legend6:N', scale=scale, title='')
)

text = line1.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   


(line1 + line2 + line3 + line4 + line5 + line6 +text).properties(
    title={"text":['Moving Average Plot of Stock Over the Weeks'],
           "fontSize":15,
           "fontWeight": 'bold',
           "font":'Courier New',},
    height=500, width=600
).interactive()

#### MA of stock over the Days

In [None]:
# we will here use resample.Resampler.asfreq() function. Because it also provide us option of padding (backwardfill/forwardfill missing values "not NANs" ). 
# source: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.asfreq.html We are using this, because on saturdays and sundays, market remains closed, so friday's close price could be forwarded in closing days.
stock_day = stock.asfreq('D', method ='pad')

df_rw = pd.DataFrame()
df_rw['Close'] = stock['Close']
df_rw['day_10'] = df_rw['Close'].rolling(window = 10, min_periods = 1).mean()
df_rw['day_50'] = df_rw['Close'].rolling(window = 50, min_periods = 1).mean()
df_rw

In [None]:
# set base for creat custom legend and plots
base = alt.Chart(df_ma.reset_index()).transform_calculate(
legend1="'Close price'",
legend2="'MA of day 10'",
legend3="'MA of day 50'",

)
scale = alt.Scale(domain=["Close price", 
                          "MA of day 10",
                          "MA of day 50",], 
                  range=['blue',   
                         'pink', 
                         'firebrick'])

line1 = base.mark_line(color='blue').encode(
x = 'Date:T',
y = 'Close:Q',
color=alt.Color('legend1:N', scale=scale, title=''),
)

line2 = base.mark_line(color='pink').encode(
x = 'Date:T',
y = 'weeks_4:Q',
color=alt.Color('legend2:N', scale=scale, title='')
)

line3 = base.mark_line(color='firebrick').encode(
x = 'Date:T',
y = 'weeks_16:Q',
color=alt.Color('legend3:N', scale=scale, title='')
)


text = line1.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   


(line1 + line2 + line3  +text).properties(
    title={"text":['Moving Average Plot of Stock Over the Days'],
           "fontSize":15,
           "fontWeight": 'bold',
           "font":'Courier New',},
    height=500, width=600
).interactive()

# <font color="brown">Autocorrelation</font>
Autocorrelation is a mathematical representation of the degree of similarity between a given time series and a lagged version of itself over successive time intervals. It is the same as calculating the correlation between two different time series, except autocorrelation uses the same time series twice: once in its original form and once lagged one or more time periods. [source](https://www.investopedia.com/terms/a/autocorrelation.asp) 

In [None]:
from statsmodels.tsa.stattools import acf, pacf
# data for partial autocorrelation plot
lags = 50
source = pd.DataFrame({
    'lags': list(range(lags+1)),
    'PACF': pacf(stock["Close"], nlags=lags)
})

# ploting partial autocorrelation plot https://www.statsmodels.org/dev/generated/statsmodels.graphics.tsaplots.plot_pacf.html )
bar = alt.Chart(source).mark_bar().encode(
    x='lags:Q',
    y='PACF:Q',

)
circle = alt.Chart(source).mark_circle(size = 50, color='red').encode(
    x='lags:Q',
    y='PACF:Q',

)
text = bar.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)  

(bar + circle + text).properties(
    title={"text":['Partial AutoCorrelation Plot With 50 Lags of Stocks'],
                   "fontSize":15,
                   "fontWeight": 'bold',
                   "font":'Courier New',},
    height=500, width=600
)

# Facebook Prophet

Prophet follows the sklearn model API. We create an instance of the Prophet class and then call its fit and predict methods.

The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.

Splitting the data into train and validation along with features.

* **train:** Data from 26th May, 2008 to 31st December, 2019.
* **valid:** Data from 1st January, 2019 to 31st December, 2020.

```Note that the default parameters are used for Prophet. They can be tuned to improve the results.```

In [None]:
df_train = stock[stock.Date < "2020"]
df_valid = stock[stock.Date >= "2020"]

from fbprophet import Prophet
model = Prophet()
model.fit(df_train[["Date", "VWAP"]].rename(columns={"Date": "ds", "VWAP": "y"}))

forecast = model.predict(df_valid[["Date", "VWAP"]].rename(columns={"Date": "ds", "VWAP": "y"}))

In [None]:
model.plot_components(forecast)

In [None]:
model.plot(forecast)
plt.title('Volume Weighted Average Price (VWAP) With Predicted Values', fontsize=15)
plt.show()

# <font color="brown">XGBoost Modeling and Forecasting</font>
### <font color="dodgerblue">Predecting next day stock price</font>

Preparing Data for the Regression.The model takes first sixty values as input and predict the next value. For that We have to prepare the data with sixty previous values as 'X' and current value as 'Y'.

We are taking last 6 month for Testing from 01-01-2020 to 30-06-2020 and remaining for the training 2870 days.

Then, we are try to predict next day that is 01-07-2020. 

In [None]:
# train/test split
def train_data(data):
    x = data['Close']
    X_train = []
    y_train = []
    for i in range(60, 2870):
        X_train.append(x[i-60:i])
        y_train.append(x[i])
    X_train, y_train = np.array(X_train), np.array(y_train)
    return X_train, y_train

def test_data(data):
    x = data['Close'][len(data['Close'])-182:]
    X_test = []
    y_test = []
    for i in range(60, 182):
        X_test.append(x[i-60:i])
        y_test.append(x[i])
    X_test, y_test = np.array(X_test), np.array(y_test)
    return X_test, y_test

In [None]:
# performing 
X_train, y_train = train_data(stock_day)

X_test, y_test = test_data(stock_day)

# shape of the input and output
X_train.shape, y_train.shape, X_test.shape, y_test.shape

In [None]:
# train test split plot
train_line = alt.Chart(stock[stock.Date < "2020"]['Close'].reset_index()).mark_line(color='blue').encode(
x = 'Date:T',
y = 'Close:Q',
)

test_line = alt.Chart(stock[stock.Date >= "2020"]['Close'].reset_index()).mark_line(color='orange').encode(
x = 'Date:T',
y = 'Close:Q',
)

(train_line + test_line ).properties(
        title='TRAIN/TEST SPLIT',
        height=200, width=600
    ) 

# <font color="brown">Creating Model</font>

In [None]:
import xgboost
from hyperopt import hp
from scipy.stats import uniform, randint
from sklearn.model_selection import cross_val_score, KFold
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from xgboost import plot_importance, plot_tree
from sklearn.metrics import mean_squared_error, mean_absolute_error

xgb = xgboost.XGBRegressor(random_state = 101, tree_method='gpu_hist')

In [None]:
# Traing and  Evaluate model on train/test set
xgb.fit(X_train, y_train)
prediction = xgb.predict(X_test)
mae = np.round(mean_absolute_error(prediction, y_test), 5)
mse = np.round(mean_squared_error(prediction, y_test), 5)
print(' ')
print('Mean Absolute Error :',mae)
print('Mean Absolute Error :',mse)

Being mean absolute squared error and mean squared error , smaller is better. Look like model not perform well. However, train_test split has its drawbacks. Because this approach introduces bias as we are not using all of our observations for testing and also we're reducing the train data size. To overcome this we can use a technique called cross validation where all the data is used for training and testing periodically. Thus we may reduce the bias introduced by train_test_split. From different cross validation methods, we would use k-fold cross validation. In sklearn we have a method cross_val_score for calculating k-fold cross validation score.

# Cross Validation

In [None]:
# kfold split
kfold = KFold(n_splits=10, shuffle=True)
val_score = cross_val_score(xgb, X_train, y_train, cv=kfold, n_jobs= -1, scoring = 'neg_mean_absolute_error')
val_score = (-1 * val_score)
val_score = np.round(val_score.mean(), 5)
print('CV Mean Absolute Error :', val_score)

mean absolute squared error of model for both the data set has decreaded from previous train/test split

# Optimization of Hyperparameters

Let's start optimizing Hyperparameters of the models.

For optimization we used Random Search to all the models with the hopes of optimizing their hyperparameters and thus improving their accuracy. Are the default model parameters the best bet? Let's find out.

In [None]:
## Create a function to tune hyperparameters of the selected models.
def tune_hyperparameters(model, params, X_train, y_train):
    global best_params, best_score #if you want to know best parametes and best score
## Construct grid search object with 3 fold cross validation
    kfold = KFold(n_splits=5, shuffle=True)
    regcv = RandomizedSearchCV(estimator=xgb_opt, 
                               param_distributions=params, 
                               cv = kfold, 
                               verbose = 1, 
                               scoring = 'neg_mean_absolute_error', 
                               n_jobs = -1)
    regcv.fit(X_train, y_train)
    best_params = regcv.best_params_ 
    best_score = np.round((-1 * regcv.best_score_), 5)
    return best_params, best_score

In [None]:
# Difine hyperparameters of Xgboost
params = {
        'n_estimators': randint(1000, 2000),
        "learning_rate": uniform(0.01, 0.06),
        'max_depth': [5, 10, 15, 20],
        'min_child_weight': [1, 5, 10, 15],
        'subsample': [0.7, 0.05, 0.1],
        'gamma': [0.1, 0.5, 0.05],
        'colsample_bytree': [0.1, 0.7, 0.05],
        'alpha' : [0.5, 1, 5],
        'lambda': [0.1, 1, 3],
}

# Define the model 
xgb_opt = xgboost.XGBRegressor(random_state = 101, tree_method='gpu_hist')

# Turning for jio data
tune_hyperparameters(xgb_opt, params, X_train, y_train)
best_params, best_score = best_params, best_score
print('best params:{} & best_mae_score:{:0.5f}' .format(best_params, best_score))

# Retrain and Predict Using Best Hyperparameters

In [None]:
# Prepare the data for predicting the Close Value on 01-07-2020.

jul_x = stock_day['Close'][len(stock_day['Close'])-60:]
jul_X_test = []
jul_X_test.append(jul_x[0:])
jul_X_test = np.array(jul_X_test)
jul_X_test

In [None]:
# fitting and preding on optimize param
xgb_opt = xgboost.XGBRegressor(**best_params)

xgb_opt.fit(X_train, y_train)
# predicting the Closing Values for the Last 6 month of 2020
pred = xgb_opt.predict(X_test)
# predicting the closing price of 1 july 2020
jul_pred = xgb_opt.predict(jul_X_test)
print('')
print('Prediction of close value of JIO for 1st july 2020:', prediction)

In [None]:
pred.shape

In [None]:
stock_day['Close'][len(stock_day['Close'])-122:].shape

In [None]:
# Creating dataframe of y_test, prediction values Last 30 Days of 2015, and prediction values 1st jan 2016 ,

test_pred_df = stock_day['Close'][len(stock_day['Close'])-122:].reset_index()

# prediction value
test_pred_df['prediction'] = pred

# prediction value of 1 jan 2016
new_row = {'Date': pd.to_datetime('2020-07-01'), 'jul_pred': jul_pred}
test_pred_df = test_pred_df.append(new_row, ignore_index=True)

In [None]:
base = alt.Chart(test_pred_df).transform_calculate(
legend1="'Actual price of last 6 months'",
legend2="'Predicted price of last 6 months'",
legend3="'First July 2020 Close price Prediction'"
)
scale = alt.Scale(domain=['Actual price of last 6 months', 'Predicted price of last 6 months', 'First July 2020 Close price Prediction'], range=['blue', 'red', 'black'], zero=False)

# timeseries plot of close prices of stocks/indices in blue colour
line1 = base.mark_line(color='blue').encode(
x = 'Date:T',
y = 'Close:Q',
color=alt.Color('legend1:N', scale=scale, title=''),
)

# timeseries between two volume shocks in a different color (Red)
# Since the difference value is too big, in order to visualize all plots properly,
# we are taking the percentage change of Volumes.
line2 = base.mark_line(color='red').encode(
x = 'Date:T',
y = 'prediction:Q',
color=alt.Color('legend2:N', scale=scale, title='')
)


circle = base.mark_point(color='black',size=50).encode(
x = 'Date:T',
y = 'jul_pred:Q',
color=alt.Color('legend3:N', scale=scale, title='')
)

(line1 + line2 + circle).properties(
    title={"text":['Fisrt July Prediction'],
                   "fontSize":15,
                   "fontWeight": 'bold',
                   "font":'Courier New',},
    height=200, width=650
).interactive()

**We have successfully predicted the next day stock price.**

## <font color="teal">Give me your feedback and if you find my kernel helpful please UPVOTE will be appreciated.</font>