In [None]:
!pip install git+https://github.com/ourownstory/neural_prophet.git  
!pip install livelossplot    #it will allow us to use plot_live_live parameter 
#in the train function to get live training and validation plots.

# **Important note**
**This is my first Time Series Analysis project where I use predictors, I am new to this topic. If you have any suggestions how to improve this work in those algorithms and EDA I used here, please, leave your comment. It will be highly appreciated.**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
from neuralprophet import NeuralProphet
from fbprophet import Prophet
pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)
import warnings
warnings.filterwarnings("ignore")

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


In this notebook we are going to look at time series EDA and forecasting using 3 ways, including NeuralProphet - a new version of Prophet from Facebook AI. Our goal is to look at this model in work having only:

1) 2 features - date and y for NeuralProphet;

2) 2 features for Prophet;

3) using some extra features after FeatureEngineering.

Let's start!

Our dataset is stock market data of the Nifty-50 index from NSE (National Stock Exchange) India over the last 21 years (2000 - 2020), particularly, for Britannia.

From the [official website](http://britannia.co.in/about-us/overview): "Britannia Industries is one of India’s leading food companies with a 100 year legacy and annual revenues in excess of Rs. 9000 Cr. Britannia is among the most trusted food brands, and manufactures India’s favorite brands like Good Day, Tiger, NutriChoice, Milk Bikis and Marie Gold which are household names in India. Britannia’s product portfolio includes Biscuits, Bread, Cakes, Rusk, and Dairy products including Cheese, Beverages, Milk and Yoghurt". 

In [None]:
df_full = pd.read_csv('/kaggle/input/nifty50-stock-market-data/BRITANNIA.csv', parse_dates=["Date"])
df_full.set_index("Date", drop=False, inplace=True)
df_full

The variable that we are going to predict is **VWAP -  volume weighted average price**.
The volume weighted average price helps in comparing the current price of the stock to a benchmark, making it easier for investors to make decisions on when to enter and exit the market. Also, the VWAP can assist investors to determine their approach towards a stock (active or passive) and make the right trade at the right time.

First, we need to look at the data and check missing values.

In [None]:
df_full.VWAP.plot(figsize=(14, 7)) #plotting the target through the time

In [None]:
df_full.info() #checking the info about data

Okay, the graph shows some sharp things, they are changepoints.

And there are missing values - around 50% is missing in Trades and 10% - in Deliverable Volume and %Deliverable. We will fill them with mean values.

In [None]:
df_full.fillna(df_full.mean(), inplace=True)

In [None]:
df = df_full[["Date", "VWAP"]]
df.rename(columns={"Date": "ds", "VWAP": "y"}, inplace=True)
test_length = 365   #this is the period in days which we will predict
df_train = df.iloc[:-test_length] #splitting data into train and test sets
df_test = df.iloc[-test_length:]

In [None]:
nprophet_model = NeuralProphet(
    n_changepoints=80,
    yearly_seasonality=True, #include in model yearly seasonality
    weekly_seasonality=False, #skipping weeks
    daily_seasonality=True,   #including days
    batch_size=64,            #this is NN parameter
    learning_rate=1.0)      #another NN parameter

metrics = nprophet_model.fit(df_train, freq="D", plot_live_loss=True, epochs=120)
future_df = nprophet_model.make_future_dataframe(df_train, periods = test_length, 
                                                n_historic_predictions=len(df_train)) 
preds_df_1 = nprophet_model.predict(future_df)  

In [None]:
nprophet_model.plot(preds_df_1)

In [None]:
preds_df_1.set_index("ds", drop=False, inplace=True)
preds_df_1[["yhat1"]].plot(figsize=(14, 7))
df.y.plot(figsize=(14, 7), legend = True)

In the next cell we will create rolling features, using mean and std for 7 and 30 days. 
This idea was used from this [notebook](https://www.kaggle.com/rohanrao/a-modern-time-series-tutorial/comments). The difference is that author shifted data only for 1 position which is good if you want to predict next day value. In our case period for predictions is 1 year, 365 days, so we used that period for shifting to make sure we don't use any data from test_set, it is designated in 'test_length' variable.

At first, we will build a classical Prophet model using only Date and VWAP (renamed as 'ds' and 'y' respectively as algorithm requires).

In [None]:
df_full.reset_index(drop=True, inplace=True)
lag_features = ["High", "Low", "Volume", "Turnover", "Trades"]
window2 = 7
window3 = 30

df_rolled_7d = df_full[lag_features].rolling(window=window2, min_periods=0)
df_rolled_30d = df_full[lag_features].rolling(window=window3, min_periods=0)

df_mean_7d = df_rolled_7d.mean().shift(test_length).reset_index().astype(np.float32)
df_mean_30d = df_rolled_30d.mean().shift(test_length).reset_index().astype(np.float32)

df_std_7d = df_rolled_7d.std().shift(test_length).reset_index().astype(np.float32)
df_std_30d = df_rolled_30d.std().shift(test_length).reset_index().astype(np.float32)

for feature in lag_features:
    df_full[f"{feature}_mean_lag{window2}"] = df_mean_7d[feature]
    df_full[f"{feature}_mean_lag{window3}"] = df_mean_30d[feature]
    
    df_full[f"{feature}_std_lag{window2}"] = df_std_7d[feature]
    df_full[f"{feature}_std_lag{window3}"] = df_std_30d[feature]

df_full.fillna(df_full.mean(), inplace=True)

df_full.set_index("Date", drop=False, inplace=True)
df_full.head(5)

In [None]:
df_tr = df_full.iloc[:-test_length] #another splitting of data into train and validation sets
df_val = df_full.iloc[-test_length:]

features = ["High_mean_lag7", "High_std_lag7", "Low_mean_lag7", 
            "Low_std_lag7","Volume_mean_lag7", "Volume_std_lag7", 
            "Turnover_mean_lag7","Turnover_std_lag7", "Trades_mean_lag7", 
            "Trades_std_lag7","High_mean_lag30", "High_std_lag30", 
            "Low_mean_lag30", "Low_std_lag30","Volume_mean_lag30", 
            "Volume_std_lag30", "Turnover_mean_lag30",
            "Turnover_std_lag30", "Trades_mean_lag30", "Trades_std_lag30"]
  #additional rolling features that will be used with regressor

In [None]:
model_fbp_1 = Prophet() #classical Prophet without any tuning
model_fbp_1.fit(df_train) 

forecast_1 = model_fbp_1.predict(df_test)
df_test["Forecast_Prophet_1"] = forecast_1.yhat.values #adding yhat to df_test for further fraphs

In [None]:
df_test[["y", "Forecast_Prophet_1"]].plot(figsize=(14, 7)) #plotting

In [None]:
#let's have a look at predicted and ytrue for our period 
forecast_1.set_index("ds", drop=False, inplace=True)
forecast_1[["yhat"]].plot(figsize=(14, 7))
df.y.plot(figsize=(14, 7), legend = True)

In [None]:
model_fbp_2 = Prophet() #Prophet with regressor
for feature in features:
    model_fbp_2.add_regressor(feature)

model_fbp_2.fit(df_tr[["Date", "VWAP"] + features].rename(columns={"Date": "ds", "VWAP": "y"}))

forecast_2 = model_fbp_2.predict(df_val[["Date", "VWAP"] + features].rename(columns={"Date": "ds", "VWAP": "y"}))

In [None]:
df_test["Forecast_Prophet_2"] = forecast_2.yhat.values #adding predictions to test-set for a further graph

In [None]:
#plotting predictions made with regressor and ytrue
forecast_2.set_index("ds", drop=False, inplace=True) 
forecast_2[["yhat"]].plot(figsize=(14, 7))
df.y.plot(figsize=(14, 7), legend = True)

In [None]:
#plotting two predictions made by Prophet - without ("Forecast_Prophet_1") 
#and with regressors ("Forecast_Prophet_2")
df_test[["y", "Forecast_Prophet_1", 'Forecast_Prophet_2']].plot(figsize=(14, 7)) 

In [None]:
from sklearn.metrics import mean_absolute_error
print("Results for the test set:")
print("MAE of Prophet without regressors:", mean_absolute_error(df_test.y, df_test.Forecast_Prophet_1))
print("MAE of Prophet with regressors:", mean_absolute_error(df_test.y, df_test.Forecast_Prophet_2))
print("MAE of NeuralProphet:", mean_absolute_error(df_test.y, preds_df_1.yhat1.iloc[-test_length:]))

The common feature of all algorithms is that they **failed** at including decreasing of values (for the last years) for predictions, all the models considered those low values as noise and predicted even more increasing of VWAP. 

We can see from the graphs that Prophet with regressor did the worst job, predicting VWAPover 8000 at some points while the true values weren't higher than 3600. 

Prophet (classical one, without regressor) did slightly better job in terms of MAE and a way better at generalisation.

**NeuralProphet** beats both of them in terms of generalisation and MAE.