# Project Covid impact on video streaming services

This project will focus on the netflix stock price, and we'll demonstrate here how time series modeling (and more generally all sorts ML) fail when an unpreticted crisis occurs.

## Get the Data 

Load the financial data from Yahoo and its `yfinance` library for Netflix : ```NFLX``` between january 2017 and september 2020.

In [1]:
pip install pystan==2.19.1.1

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install fbprophet 

Note: you may need to restart the kernel to use updated packages.


In [3]:
from fbprophet import Prophet

In [4]:
!pip install yfinance



In [5]:
import yfinance as yf  
import pandas as pd
import numpy as np

In [6]:
pip install plotly

Note: you may need to restart the kernel to use updated packages.


Our goal is to predict future stock prices of Netflix Inc. from data pre-covid. Then we will add the first month then the frist two months after covid crisis was declared pandemic to our training data

In [7]:
data = yf.download('NFLX','2017-01-01','2020-09-01')

[*********************100%***********************]  1 of 1 completed


In [8]:
data = data.reset_index()

In [9]:
data

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2017-01-03,124.959999,128.190002,124.309998,127.489998,127.489998,9437900
1,2017-01-04,127.489998,130.169998,126.550003,129.410004,129.410004,7843600
2,2017-01-05,129.220001,132.750000,128.899994,131.809998,131.809998,10185500
3,2017-01-06,132.080002,133.880005,129.809998,131.070007,131.070007,10657900
4,2017-01-09,131.479996,131.990005,129.889999,130.949997,130.949997,5771800
...,...,...,...,...,...,...,...
917,2020-08-25,488.190002,492.470001,485.089996,490.579987,490.579987,5727700
918,2020-08-26,492.500000,549.039978,492.079987,547.530029,547.530029,20373700
919,2020-08-27,537.780029,541.000000,521.250000,526.270020,526.270020,9062900
920,2020-08-28,532.000000,539.000000,522.000000,523.890015,523.890015,4417500


### Function to convert the date column and extract the day, month and year (useful for later)

In [10]:
def to_date(df):
    df['Date'] = pd.to_datetime(df['Date'])
    df['Day'], df['Month'], df['Year'] = df['Date'].dt.day, df['Date'].dt.month, df['Date'].dt.year
    return df

In [11]:
data = to_date(data)

### Load data pre covid in a separate df

In [12]:
pre_cvd = pd.DataFrame(yf.download('NFLX','2017-01-01','2020-02-28'))

[*********************100%***********************]  1 of 1 completed


In [13]:
pre_cvd = pre_cvd.reset_index()

In [14]:
pre_cvd = to_date(pre_cvd)

In [15]:
pre_cvd

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Day,Month,Year
0,2017-01-03,124.959999,128.190002,124.309998,127.489998,127.489998,9437900,3,1,2017
1,2017-01-04,127.489998,130.169998,126.550003,129.410004,129.410004,7843600,4,1,2017
2,2017-01-05,129.220001,132.750000,128.899994,131.809998,131.809998,10185500,5,1,2017
3,2017-01-06,132.080002,133.880005,129.809998,131.070007,131.070007,10657900,6,1,2017
4,2017-01-09,131.479996,131.990005,129.889999,130.949997,130.949997,5771800,9,1,2017
...,...,...,...,...,...,...,...,...,...,...
788,2020-02-21,385.329987,387.320007,377.899994,380.070007,380.070007,3930100,21,2,2020
789,2020-02-24,364.760010,372.820007,361.000000,368.700012,368.700012,6936400,24,2,2020
790,2020-02-25,372.000000,375.649994,357.720001,360.089996,360.089996,6481200,25,2,2020
791,2020-02-26,366.309998,382.000000,365.000000,379.239990,379.239990,8934100,26,2,2020


In [16]:
df17 = data[data['Year'] == 2017]
df18 = data[data['Year'] == 2018]
df19 = data[data['Year'] == 2019]

df17, df18, df19 = to_date(df17), to_date(df18), to_date(df19)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Date'] = pd.to_datetime(df['Date'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Day'], df['Month'], df['Year'] = df['Date'].dt.day, df['Date'].dt.month, df['Date'].dt.year


### How have Netflix's stock prices fluctuated in the last 4 years?

In [17]:
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "iframe_connected"

fig = px.line(data, x="Date", y="Close", color="Year") #text="Year")
fig.update_traces(textposition="bottom right")
fig.show()

In [19]:
pre_cvd.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume,Day,Month,Year
count,793.0,793.0,793.0,793.0,793.0,793.0,793.0,793.0,793.0
mean,275.362333,279.340113,271.05551,275.383216,275.383216,8557221.0,15.769231,6.277427,2018.099622
std,83.524386,84.909198,81.959662,83.452266,83.452266,5234469.0,8.738829,3.512777,0.906716
min,124.959999,128.190002,124.309998,127.489998,127.489998,2019300.0,1.0,1.0,2017.0
25%,187.850006,189.940002,185.75,187.860001,187.860001,5164600.0,8.0,3.0,2017.0
50%,296.119995,300.329987,290.850006,295.76001,295.76001,6980000.0,16.0,6.0,2018.0
75%,349.0,354.0,343.230011,349.359985,349.359985,10407900.0,23.0,9.0,2019.0
max,421.380005,423.209991,413.079987,418.970001,418.970001,58410400.0,31.0,12.0,2020.0


2018 had the most variations in stock prices. It went up to over 400dollars, but went back down to 230dollars a few months later.

### Let's try to forecast Netflix's stock prices from data before covid with FbProphet model.

In [20]:
p_df = pd.DataFrame({
    "ds":pre_cvd["Date"],
    "y": pre_cvd["Close"]
}).reset_index(drop=True)

p_df

Unnamed: 0,ds,y
0,2017-01-03,127.489998
1,2017-01-04,129.410004
2,2017-01-05,131.809998
3,2017-01-06,131.070007
4,2017-01-09,130.949997
...,...,...
788,2020-02-21,380.070007
789,2020-02-24,368.700012
790,2020-02-25,360.089996
791,2020-02-26,379.239990


In [21]:
m = Prophet()
m.fit(p_df)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<fbprophet.forecaster.Prophet at 0x7f503e0a97f0>

### Now that the model is fitted on our data, let's create 100 more days artificially to test our forecast.

The stock market is not opened every day. Let's extract the missing days from our original dataset in an object 'k' : 

In [22]:
k = pd.date_range(start="2017-01-03", end="2020-08-31").difference(data.Date)

In [26]:
k = k.to_frame(index=False, name='Date')

In [23]:
def resetindex(df):
    df = df.reset_index()
    df = df.drop(['index'], axis=1)
    return df

We create 150 more days, since some of them might be dropped right after.

In [24]:
future = m.make_future_dataframe(periods=150)
future = resetindex(future)

In [27]:
k

Unnamed: 0,Date
0,2017-01-07
1,2017-01-08
2,2017-01-14
3,2017-01-15
4,2017-01-16
...,...
410,2020-08-16
411,2020-08-22
412,2020-08-23
413,2020-08-29


With a loop, we store in a list all the indexes of the rows we will need to drop in our 'future' df.

In [28]:
result = []

for i in range(len(future['ds'])): 
    for j in range(len(k['Date'])):
        if future['ds'][i] == k['Date'][j]:
            result.append(i)

In [29]:
len(result)

47

In [31]:
for i in range(len(future)):
    for j in range(len(result)):
        if future.index[i] == result[j]:
            future = future.drop([future.index[i]])

In [32]:
future = resetindex(future)
future

Unnamed: 0,ds
0,2017-01-03
1,2017-01-04
2,2017-01-05
3,2017-01-06
4,2017-01-09
...,...
891,2020-07-20
892,2020-07-21
893,2020-07-22
894,2020-07-23


Now we have our dataset with around 100 more days to predict on, that matches our original dataset!

### Now we make predictions on our newly created data with the model that we fitted on our real data.

In [33]:
forecast = m.predict(future)
forecast

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2017-01-03,130.269728,76.963215,122.532379,130.269728,130.269728,-30.662721,-30.662721,-30.662721,1.284924,1.284924,1.284924,-31.947645,-31.947645,-31.947645,0.0,0.0,0.0,99.607007
1,2017-01-04,130.221433,79.979283,122.638009,130.221433,130.221433,-29.161377,-29.161377,-29.161377,0.679177,0.679177,0.679177,-29.840554,-29.840554,-29.840554,0.0,0.0,0.0,101.060057
2,2017-01-05,130.173138,80.837157,126.021452,130.173138,130.173138,-26.720595,-26.720595,-26.720595,1.027514,1.027514,1.027514,-27.748109,-27.748109,-27.748109,0.0,0.0,0.0,103.452543
3,2017-01-06,130.124843,81.518973,128.412143,130.124843,130.124843,-25.271143,-25.271143,-25.271143,0.412374,0.412374,0.412374,-25.683516,-25.683516,-25.683516,0.0,0.0,0.0,104.853701
4,2017-01-09,129.979958,86.172478,132.068067,129.979958,129.979958,-19.945931,-19.945931,-19.945931,-0.176853,-0.176853,-0.176853,-19.769077,-19.769077,-19.769077,0.0,0.0,0.0,110.034027
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
891,2020-07-20,382.498420,379.508460,433.511735,365.270991,397.011005,25.410149,25.410149,25.410149,-0.176853,-0.176853,-0.176853,25.587002,25.587002,25.587002,0.0,0.0,0.0,407.908568
892,2020-07-21,382.686126,378.607233,436.516277,365.266117,397.446004,24.908455,24.908455,24.908455,1.284924,1.284924,1.284924,23.623531,23.623531,23.623531,0.0,0.0,0.0,407.594580
893,2020-07-22,382.873832,374.445942,433.311266,365.321860,397.881003,22.318272,22.318272,22.318272,0.679177,0.679177,0.679177,21.639095,21.639095,21.639095,0.0,0.0,0.0,405.192104
894,2020-07-23,383.061538,372.790301,429.499125,365.300795,398.220505,20.674901,20.674901,20.674901,1.027514,1.027514,1.027514,19.647386,19.647386,19.647386,0.0,0.0,0.0,403.736438


## Let's see how our predictions match with the actual data!

### Creating a new dataframe that includes the real data from the 100 days we artificially created :

In [34]:
real_d = pd.DataFrame(yf.download('NFLX','2017-01-03','2020-07-25'))
real_d['Date'] = real_d.index

[*********************100%***********************]  1 of 1 completed


In [35]:
real_d.shape

(896, 7)

In [182]:
import matplotlib.pyplot as plt


fig = px.line(real_d, x="Date", y=[real_d["Close"], forecast['yhat']]) #text="Year")
fig.update_traces(textposition="bottom right")
#fig.add_scatter(forecast2, x='ds', y='yhat')
fig.show()

### As we can see, the model was not able to predict the major ups and downs that occured when Covid started. 

### Let's take a look at some metrics:

In [41]:
from sklearn.metrics import mean_squared_error, mean_absolute_error 
mse = mean_squared_error(y_true=real_d['Close'],
                   y_pred=forecast['yhat'])

mae = mean_absolute_error(y_true=real_d['Close'],
                   y_pred=forecast['yhat'])

print("The MAE is : {}".format(mae))

print("The MSE is : {}".format(mse))

The MAE is : 16.50708683283382
The MSE is : 527.4027634530402


### This isn't bad, but it takes into consideration all the data pre-covid that were quite well predicted. I'd like to see how the model performed specifically on the days that we created.

Create two objects y_pred and y_true with predictions and real data on the last 100 days

In [42]:
start_date = '2020-02-28'
end_date = '2020-07-24'
y_pred = (forecast['ds'] > start_date) & (forecast['ds'] <= end_date)
y_pred = forecast.loc[y_pred]
y_pred

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
794,2020-03-02,356.219587,357.752099,399.156673,356.219587,356.219587,22.383624,22.383624,22.383624,-0.176853,-0.176853,-0.176853,22.560478,22.560478,22.560478,0.0,0.0,0.0,378.603212
795,2020-03-03,356.407293,357.898201,403.906780,356.407293,356.407293,24.413392,24.413392,24.413392,1.284924,1.284924,1.284924,23.128468,23.128468,23.128468,0.0,0.0,0.0,380.820685
796,2020-03-04,356.594999,358.887628,403.992433,356.594999,356.594999,24.358679,24.358679,24.358679,0.679177,0.679177,0.679177,23.679501,23.679501,23.679501,0.0,0.0,0.0,380.953678
797,2020-03-05,356.782705,357.943388,403.525427,356.782705,356.782705,25.234436,25.234436,25.234436,1.027514,1.027514,1.027514,24.206922,24.206922,24.206922,0.0,0.0,0.0,382.017142
798,2020-03-06,356.970411,360.037552,404.412990,356.970411,356.970411,25.116075,25.116075,25.116075,0.412374,0.412374,0.412374,24.703701,24.703701,24.703701,0.0,0.0,0.0,382.086486
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
891,2020-07-20,382.498420,379.508460,433.511735,365.270991,397.011005,25.410149,25.410149,25.410149,-0.176853,-0.176853,-0.176853,25.587002,25.587002,25.587002,0.0,0.0,0.0,407.908568
892,2020-07-21,382.686126,378.607233,436.516277,365.266117,397.446004,24.908455,24.908455,24.908455,1.284924,1.284924,1.284924,23.623531,23.623531,23.623531,0.0,0.0,0.0,407.594580
893,2020-07-22,382.873832,374.445942,433.311266,365.321860,397.881003,22.318272,22.318272,22.318272,0.679177,0.679177,0.679177,21.639095,21.639095,21.639095,0.0,0.0,0.0,405.192104
894,2020-07-23,383.061538,372.790301,429.499125,365.300795,398.220505,20.674901,20.674901,20.674901,1.027514,1.027514,1.027514,19.647386,19.647386,19.647386,0.0,0.0,0.0,403.736438


In [43]:
start_date = '2020-02-28'
end_date = '2020-07-24'
y_true = (real_d.index > start_date) & (real_d.index <= end_date)
y_true = real_d.loc[y_true]
y_true

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Date
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-03-02,373.109985,381.359985,364.500000,381.049988,381.049988,6997900,2020-03-02
2020-03-03,381.029999,393.519989,367.399994,368.769989,368.769989,8364600,2020-03-03
2020-03-04,377.769989,384.010010,370.510010,383.790009,383.790009,5487300,2020-03-04
2020-03-05,381.000000,391.399994,368.640015,372.779999,372.779999,8747000,2020-03-05
2020-03-06,367.700012,371.309998,356.850006,368.970001,368.970001,8147200,2020-03-06
...,...,...,...,...,...,...,...
2020-07-20,489.140015,504.500000,484.200012,502.410004,502.410004,11940300,2020-07-20
2020-07-21,506.000000,506.220001,488.609985,490.100006,490.100006,9113700,2020-07-21
2020-07-22,492.190002,497.200012,487.200012,489.820007,489.820007,6954100,2020-07-22
2020-07-23,491.130005,491.899994,472.019989,477.579987,477.579987,7722000,2020-07-23


In [44]:
from sklearn.metrics import mean_squared_error, mean_absolute_error 
mse = mean_squared_error(y_true=y_true['Close'],
                   y_pred=y_pred['yhat'])

mae = mean_absolute_error(y_true=y_true['Close'],
                   y_pred=y_pred['yhat'])

print("The MAE is : {}".format(mae))

print("The MSE is : {}".format(mse))

The MAE is : 38.42954233792358
The MSE is : 2252.912023094565


### The metrics show that, how we anticipated, the model was not able to predict well after Covid started.

## Now let's add data from the Covid period to our training, and see how the model performs!

Create a new df including 2 months post covid:

In [68]:
with_cvd = pd.DataFrame(yf.download('NFLX','2017-01-03','2020-05-01'))

[*********************100%***********************]  1 of 1 completed


In [69]:
with_cvd

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-03,124.959999,128.190002,124.309998,127.489998,127.489998,9437900
2017-01-04,127.489998,130.169998,126.550003,129.410004,129.410004,7843600
2017-01-05,129.220001,132.750000,128.899994,131.809998,131.809998,10185500
2017-01-06,132.080002,133.880005,129.809998,131.070007,131.070007,10657900
2017-01-09,131.479996,131.990005,129.889999,130.949997,130.949997,5771800
...,...,...,...,...,...,...
2020-04-24,425.000000,427.170013,415.880005,424.989990,424.989990,8658900
2020-04-27,425.000000,429.000000,420.839996,421.380005,421.380005,6277500
2020-04-28,419.989990,421.000000,402.910004,403.829987,403.829987,10101200
2020-04-29,399.529999,415.859985,393.600006,411.890015,411.890015,9693100


In [70]:
with_cvd2 = pd.DataFrame({
    "ds":with_cvd.index,
    "y": with_cvd["Close"]
}).reset_index(drop=True)

with_cvd2

Unnamed: 0,ds,y
0,2017-01-03,127.489998
1,2017-01-04,129.410004
2,2017-01-05,131.809998
3,2017-01-06,131.070007
4,2017-01-09,130.949997
...,...,...
832,2020-04-24,424.989990
833,2020-04-27,421.380005
834,2020-04-28,403.829987
835,2020-04-29,411.890015


In [71]:
m1 = Prophet()
m1.fit(with_cvd2)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<fbprophet.forecaster.Prophet at 0x7f502cd79130>

### We create around 100 days to predict with our new model

In [72]:
future2 = m1.make_future_dataframe(periods=100)
future2 = resetindex(future2)

In [73]:
future2

Unnamed: 0,ds
0,2017-01-03
1,2017-01-04
2,2017-01-05
3,2017-01-06
4,2017-01-09
...,...
932,2020-08-04
933,2020-08-05
934,2020-08-06
935,2020-08-07


In [74]:
result2 = []

for i in range(len(future2['ds'])): 
    for j in range(len(k['Date'])):
        if future2['ds'][i] == k['Date'][j]:
            result2.append(i)

In [76]:
for i in range(len(future2)):
    for j in range(len(result2)):
        if future2.index[i] == result2[j]:
            future2 = future2.drop([future2.index[i]])

In [77]:
future2 = resetindex(future2)

### Predictions 

In [78]:
forecast2 = m1.predict(future2)
forecast2

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2017-01-03,133.373227,80.754056,126.771918,133.373227,133.373227,-29.748560,-29.748560,-29.748560,1.934636,1.934636,1.934636,-31.683197,-31.683197,-31.683197,0.0,0.0,0.0,103.624667
1,2017-01-04,133.326425,82.826441,126.754103,133.326425,133.326425,-28.475058,-28.475058,-28.475058,1.176760,1.176760,1.176760,-29.651818,-29.651818,-29.651818,0.0,0.0,0.0,104.851367
2,2017-01-05,133.279623,83.985650,130.740130,133.279623,133.279623,-25.962300,-25.962300,-25.962300,1.662953,1.662953,1.662953,-27.625253,-27.625253,-27.625253,0.0,0.0,0.0,107.317322
3,2017-01-06,133.232820,86.663355,130.741957,133.232820,133.232820,-24.645404,-24.645404,-24.645404,0.972186,0.972186,0.972186,-25.617590,-25.617590,-25.617590,0.0,0.0,0.0,108.587416
4,2017-01-09,133.092413,91.465042,135.876830,133.092413,133.092413,-19.292668,-19.292668,-19.292668,0.540371,0.540371,0.540371,-19.833039,-19.833039,-19.833039,0.0,0.0,0.0,113.799745
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
901,2020-08-03,401.508625,380.061708,430.942691,392.469107,410.253413,3.266297,3.266297,3.266297,0.540371,0.540371,0.540371,2.725926,2.725926,2.725926,0.0,0.0,0.0,404.774921
902,2020-08-04,401.791322,380.792961,431.457961,392.469802,410.770264,3.512603,3.512603,3.512603,1.934636,1.934636,1.934636,1.577967,1.577967,1.577967,0.0,0.0,0.0,405.303925
903,2020-08-05,402.074019,377.326113,430.329739,392.575845,411.261749,1.694019,1.694019,1.694019,1.176760,1.176760,1.176760,0.517259,0.517259,0.517259,0.0,0.0,0.0,403.768039
904,2020-08-06,402.356717,377.994543,427.998253,392.606054,411.716113,1.206351,1.206351,1.206351,1.662953,1.662953,1.662953,-0.456602,-0.456602,-0.456602,0.0,0.0,0.0,403.563067


### Create df with full days to compare our predictions

In [79]:
real_d2 = pd.DataFrame(yf.download('NFLX','2017-01-03','2020-08-08'))
real_d2['Date'] = real_d2.index

[*********************100%***********************]  1 of 1 completed


In [90]:
fig = px.line(real_d2, x="Date", y=[real_d2["Close"], forecast2['yhat']]) #text="Year")
fig.update_traces(textposition="bottom right")
#fig.add_scatter(forecast2, x='ds', y='yhat')
fig.show()

### There is not much difference in the predictions. The model was able to identify that the trend was going up, but it still can't predict huge variations.

In [82]:
from sklearn.metrics import mean_squared_error, mean_absolute_error 
mse = mean_squared_error(y_true=real_d2['Close'],
                   y_pred=forecast2['yhat'])

mae = mean_absolute_error(y_true=real_d2['Close'],
                   y_pred=forecast2['yhat'])

print("The MAE is : {}".format(mae))

print("The MSE is : {}".format(mse))

The MAE is : 15.727066251179895
The MSE is : 507.12675929538636


### The metrics are a bit better than when we trained the model outside of the covid period, but not much. This proves that the effects of the Covid period were impossible to forecast accurately!

## Is there any way to improve our predictions?

### Let's try a cross validation method:

In [216]:
from fbprophet.diagnostics import cross_validation
cv_results = cross_validation(model=m1, initial='366 days', period='30 days', horizon = '5 days')

INFO:fbprophet:Making 29 forecasts with cutoffs between 2018-01-06 00:00:00 and 2020-04-25 00:00:00


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=29.0), HTML(value='')))






In [217]:
cv_results

Unnamed: 0,ds,yhat,yhat_lower,yhat_upper,y,cutoff
0,2018-01-08,205.997996,201.783328,210.056659,212.050003,2018-01-06
1,2018-01-09,206.955047,202.789865,210.893514,209.309998,2018-01-06
2,2018-01-10,207.646011,203.426221,212.181758,212.520004,2018-01-06
3,2018-01-11,208.207024,203.939865,212.439319,217.240005,2018-01-06
4,2018-02-06,273.351483,267.281670,279.192043,265.720001,2018-02-05
...,...,...,...,...,...,...
94,2020-03-31,364.864871,340.836386,388.628719,375.500000,2020-03-26
95,2020-04-27,400.170618,377.725276,422.661773,421.380005,2020-04-25
96,2020-04-28,401.617574,380.021109,425.376523,403.829987,2020-04-25
97,2020-04-29,400.634907,377.730584,423.839685,411.890015,2020-04-25


In [218]:
fig = px.line(cv_results, x="ds", y=[cv_results["y"], cv_results['yhat']]) #text="Year")
fig.update_traces(textposition="bottom right")
fig.show()

In [219]:
mse = mean_squared_error(y_true=cv_results['y'],
                   y_pred=cv_results['yhat'])

mae = mean_absolute_error(y_true=cv_results['y'],
                   y_pred=cv_results['yhat'])

print("The MAE is : {}".format(mae))

print("The MSE is : {}".format(mse))

The MAE is : 14.160408897628741
The MSE is : 347.4378394884628


### With the cross validation method, we obtain better results, especially on the MSE!

### Let's try to tune some of the model's hyperparameters:

In [177]:
 m2 = Prophet(
   growth='linear', 
    seasonality_mode='additive', 
    seasonality_prior_scale=3, 
    changepoint_prior_scale=0.05,
    holidays_prior_scale=8.0,
     mcmc_samples = 3,
    interval_width=0.8, 
    )

In [178]:
m2.fit(with_cvd2)

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.


<fbprophet.forecaster.Prophet at 0x7f5032378970>

In [179]:
forecast3 = m2.predict(future2)
forecast3

Unnamed: 0,ds,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
0,2017-01-03,133.732529,-173.923280,515.593746,124.427197,141.568597,38.366498,-8.152258,83.997497,20.084836,-4.412946,38.613485,18.281662,-11.528088,45.384013,0.0,0.0,0.0,172.099027
1,2017-01-04,133.974697,-201.480740,493.312588,124.667472,141.810251,13.588117,-3.692166,43.512013,-6.420737,-49.123225,21.369350,20.008854,-6.032119,48.318613,0.0,0.0,0.0,147.562813
2,2017-01-05,134.216865,-176.460196,500.835976,124.907748,142.051905,27.433803,-11.247355,79.213650,5.830392,-4.991478,36.576153,21.603411,-6.524954,50.779480,0.0,0.0,0.0,161.650668
3,2017-01-06,134.459032,-183.460394,504.268225,125.148024,142.293559,21.765145,-8.890862,66.755318,-1.279954,-25.652784,13.996313,23.045099,-7.832748,52.759005,0.0,0.0,0.0,156.224177
4,2017-01-09,135.185536,-209.276335,511.718543,125.868850,143.018522,23.338703,-7.606221,45.645926,-2.914367,-12.709988,5.802082,26.253070,-11.097289,55.846921,0.0,0.0,0.0,158.524239
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
901,2020-08-03,431.133670,121.475563,822.392342,418.894084,441.625719,21.218058,-12.014491,63.380758,-2.914367,-12.709988,5.802082,24.132425,-3.773862,57.578676,0.0,0.0,0.0,452.351728
902,2020-08-04,431.358440,88.260266,816.628511,419.106393,441.866198,44.684944,0.854891,85.750630,20.084836,-4.412946,38.613485,24.600108,-1.578704,47.574153,0.0,0.0,0.0,476.043384
903,2020-08-05,431.583210,100.610023,813.555862,419.318702,442.107925,18.275886,5.479171,37.909791,-6.420737,-49.123225,21.369350,24.696624,0.272296,54.602396,0.0,0.0,0.0,449.859096
904,2020-08-06,431.807979,132.155906,842.799817,419.531010,442.356723,30.277053,-1.727784,63.526270,5.830392,-4.991478,36.576153,24.446661,1.812922,61.212534,0.0,0.0,0.0,462.085032


In [180]:
fig = px.line(real_d2, x="Date", y=[real_d2["Close"], forecast3['yhat']]) #text="Year")
fig.update_traces(textposition="bottom right")
fig.show()

In [181]:
mse = mean_squared_error(y_true=real_d2['Close'],
                   y_pred=forecast3['yhat'])

mae = mean_absolute_error(y_true=real_d2['Close'],
                   y_pred=forecast3['yhat'])

print("The MAE is : {}".format(mae))

print("The MSE is : {}".format(mse))

The MAE is : 47.052074270423496
The MSE is : 3398.0756974291357


### When we try to tune the hyperparameters of the model, the metrics are much worse than the basic model. But on a larger scale, it was able to identify an ongoing trend.