---
## Time Series
---

<!---Create stock_df and save as .pkl
stocks_df = pd.read_csv("raw_data/all_stocks_5yr.csv")
stocks_df["clean_date"] = pd.to_datetime(stocks_df["date"], format="%Y-%m-%d")
stocks_df.drop(["date", "clean_date", "volume", "Name"], axis=1, inplace=True)
stocks_df.rename(columns={"string_date": "date"}, inplace=True)
pickle.dump(stocks_df, open("write_data/all_stocks_5yr.pkl", "wb"))
--->

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle

In [None]:
stocks_df = pickle.load(open("write_data/all_stocks_5yr.pkl", "rb"))
stocks_df.head()

### 1. Transform the `date` feature so that it becomes a `datetime` object that contains the following format: YYYY-MM-DD and set `date` to be the index of `stocks_df`.

In [None]:
stocks_df["date"] = pd.to_datetime(stocks_df["date"], format="%B %d, %Y")
stocks_df.set_index(keys="date", inplace=True)
stocks_df.head()

### 2. Perform monthly upsampling on `stocks_df` that takes the mean of the `open`, `high`, `low`, and `close` features on a monthly basis. Store the results in `stocks_monthly_df`.

> Hint: `stocks_monthly_df` should have 61 rows and 4 columns after you perform upsampling.

In [None]:
stocks_monthly = stocks_df.resample("MS")
stocks_monthly_df = stocks_monthly.mean()
stocks_monthly_df.head()

In [None]:
stocks_monthly_df.shape

### 3. Create a line graph that visualizes the monthly open stock prices from `stocks_monthly_df` for the purposes of identifying if average monthly open stock price is stationary or not using the rolling mean and rolling standard deviation.

> Hint: 
> * store your sliced version of `stocks_monthly_df` in a new DataFrame called `open_monthly_df`;
> * use a window size of 3 to represent one quarter of time in a year

In [None]:
open_monthly_df = bstocks_monthly_df.loc[:, "open"]

rolmean = open_monthly_df.rolling(window=3, center=False).mean()
rolstd = open_monthly_df.rolling(window=3, center=False).std()

fig, ax = plt.subplots(figsize=(13, 10))
ax.plot(open_monthly_df, color="blue",label="Average monthly opening stock price")
ax.plot(rolmean, color="red", label="Rolling quarterly mean")
ax.plot(rolstd, color="black", label="Rolling quarterly std. deviation")
ax.set_ylim(0, 120)
ax.legend()
fig.suptitle("Average monthly open stock prices, Feb. 2013 to Feb. 2018")
fig.tight_layout()

The average monthly open stock price is not stationary, which is supported by the fact that the rolling mean is not flat.

### 4. Use the Dickey-Fuller Test to identify if `open_monthly_df` is stationary. Does this confirm your answer from Question 3? Explain why the time series is stationary or not based on the output from the Dickey-Fuller Test.

In [None]:
from statsmodels.tsa.stattools import adfuller

dftest = adfuller(open_monthly_df)

dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])

print ('Results of Dickey-Fuller Test:')
print(dfoutput)

The p-value is close to 1, so the time series are clearly not stationary. The answer from question 3 is confirmed.

### 5. Looking at the decomposition of the time series in `open_monthly_df`, it looks like the peaks are the same value. To confirm or deny this, create a function that returns a dictionary where each key is year and each values is the maximum value from the `seasonal` object for each year.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(np.log(open_monthly_df))

# Gather the trend, seasonality and noise of decomposed object
seasonal = decomposition.seasonal

# Plot gathered statistics
plt.figure(figsize=(13, 10))
plt.plot(seasonal,label='Seasonality', color="blue")
plt.title("Seasonality of average monthly open stock prices")
plt.ylabel("Average monthly open stock prices")
plt.tight_layout()
plt.show()

In [None]:
def calc_yearly_max(seasonal_series):
    """Returns the max seasonal value for each year"""
    output = {}
    for year in seasonal.index.year.unique():
        year_str = str(year)
        output[year_str] = seasonal[year_str].max()
    return output

In [None]:
calc_yearly_max(seasonal)