# Project: Financial time series forecasting

The following project is could be done individually or in pairs but you not allowed to share your solution with anyone else. Read below carefully!

- The aim of the project is that you learn how to set up an analytics project end-to-end. A secondary aim is that you understand how to work with a time series data set and forecast based on such data. Third aim is that you gain an insight into how to interpret data and results.

- The solution must address each grade in the written order. Therefore, to complete grade five you must have completed the other grades first.

- Unintentionally, there may be information missing in the description, please go through the description early in advance so that you have time to ask for assistance.

- Make sure to rewrite your code as functions where you can


In [None]:
# First upgrade the environment.
# https://pypi.org/project/yfinance
import pip
from subprocess import run
# add what you will need
modules =[
#     'pandas_datareader',
#     'yfinance',
    'pandas_market_calendars',
    'plotly', 
    'numpy',
    'sklearn',
    'pandas'
]
proc = run(f'pip install {" ".join(modules)} --upgrade --no-input', 
       shell=True, 
       text=True, 
       capture_output=True, 
       timeout=40)
print(proc.stderr)

In [None]:
import pandas as pd
from pathlib import Path
import numpy as np

import matplotlib
import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, FuncFormatter, StrMethodFormatter
%matplotlib inline

import plotly as ply
import plotly.graph_objects as go

import sklearn
from sklearn.preprocessing import StandardScaler, Normalizer
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import TimeSeriesSplit

from functools import reduce
from operator import mul
from pprint import PrettyPrinter
pprint = PrettyPrinter().pprint

<a id='g1'></a>
# Grade 1
## Implement a complete process for forecasting a single stock.

You should do the following steps:
- Use the [EURUSD data set](https://people.arcada.fi/~parland/hjd5_8amp_Gt3/EURUSD1m.zip) (52Mb)
- Subsampe data to one day timesteps(remember to use .agg())
- Create a label column for your forecast, by shifting the Close value 1 step. You will predict one day ahead,  insert the new column into a new dataframe or the existing one
- Split data into 80/20 (train/test). Be carefull: you are splitting a time serie
- [Normalize or standardize](https://scikit-learn.org/stable/modules/preprocessing.html) wisely so you don't allow information leakage from the test subset
- Calculate feature [Larry William’s %R](https://www.investopedia.com/terms/w/williamsr.asp) from the paper [Predicting the Direction of Stock Market Index Movement Using an Optimized Artificial Neural Network Model]
(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873195) (implement in code, insert values in complementary columns). Note that you need to implement your own calculation of each feature and be able to explain the code.
- You can use something like this to understand Larry williams %R better: https://school.stockcharts.com/doku.php?id=technical_indicators:williams_r
- Drop other data than the Close and the features for inference. You don't want to feed time-column into the model, it's not a feature to base prediction on.
- Set up a [linear model](https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares) 
- Fit/train the linear model to the training data
- Forecast 1 day ahead based on the test data and compare it to the closing values
- Calculate the [R² error](https://en.wikipedia.org/wiki/Coefficient_of_determination) on both the training data set and the test. Please format numbers to four [significant digits](https://en.wikipedia.org/wiki/Significant_figures).
- Compare the errors and explain the outcome


In [None]:
#downloading the data to a dataframe (Im not saving it locally though)
df = pd.read_csv('https://people.arcada.fi/~parland/hjd5_9amp_Gt3/EURUSD1m.zip')


#Converting the int64 to string
df['NewDate'] = pd.to_datetime(df['Date'].astype(str), format='%Y%m%d')
#Set the date as the index
df = df.set_index(pd.DatetimeIndex(df['NewDate'].values))

#Here i download the csv file and save it directly to a dataframe, convert the dates to a string, format them and 
#set them as a new row

In [None]:
#Create a new dataframe but grouped by days instead of minutes and aggregates the new rows
df2Daily = (df.resample('D').agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last','Volume':'sum'}))

#Dropping rows with NaN
df2Daily.dropna(inplace=True)

In [None]:
#Calculating LWR
period = 14
df2Daily['LWR'] = ((df2Daily.High.rolling(period).max()-df2Daily.Close)/
                   (df2Daily.High.rolling(period).max()-df2Daily.Low.rolling(period).min())*100)*-1

In [None]:
#shifting Close row to the label row one day ahead
df2Daily['Label'] = df2Daily['Close'].shift(-1)
df2Daily.dropna(inplace=True)

In [None]:
#Splitting the train and test data
train, test = df2Daily[:int(len(df2Daily)*.8)], df2Daily[int(len(df2Daily)*.8):]

In [None]:
#creating the axises for linear regression
X_train = train[['Close','LWR']]
Y_train = train['Label']

X_test = test[['Close','LWR']]
Y_test = test['Label']

In [None]:
#Standardizing x,y train values
from sklearn import preprocessing
Xscaler = preprocessing.StandardScaler()
Yscaler = preprocessing.StandardScaler()

X_scaled = Xscaler.fit_transform(np.array(X_train))
Y_scaled = Yscaler.fit_transform(np.array(Y_train).reshape(-1,1))

X_scaledTest = Xscaler.transform(np.array(X_test))
Y_scaledTest = Yscaler.transform(np.array(Y_test).reshape(-1,1))

predictionArray = Xscaler.inverse_transform(X_scaledTest)

In [None]:
#creating linear regression predictor
regressor = LinearRegression().fit(X_scaled,Y_scaled)

#Inversing back the scaled values so they aren't minus
#prediction = regressor.predict(Xscaler.inverse_transform(X_scaledTest))
prediction = regressor.predict((predictionArray))

#R2 using regressor.score
print("Regression test Score: ","{0:.4}".format( regressor.score(X_scaledTest,Y_scaledTest)))

print("Regression Train score: ","{0:.4f}".format(regressor.score(X_scaled, Y_scaled)))

In [None]:
#Creationg a copy of the daily dataframe and later inserting a new prediction row
DailyPredict = df2Daily.copy().tail(len(X_scaledTest))

#Inserting the prediction array into the dataframe as a newcolumn
DailyPredict['Prediction'] = prediction

#Saving the new row in the dataframe 
DailyPredict['Prediction'] = DailyPredict['Prediction']

### I assume that the reason for as to why the R2 value is higher for the train score over the test score is because of test scores lower sample value vs train's sample value (623 vs 2491), thus having less datapoints to go after. 

### **Larry William’s %R**

$ (H_n − C_t)/(H_n − L_n)\times100 $

### Coefficient of determination ($R^2$)

$$R^2 = 1 - \frac {SSResid}{SSTot}$$

#### Residual Sum of Squares: $SSResid = \sum_{i} (y_i - \hat{y_i})^2$

#### Total Sum of Squares: $SSTot = \sum_{i} (y_i - \bar{y})^2$

#### A baseline model, which always predicts $\bar {y}$, will have $R^2 = 0$

<a id='g2'></a>
# Grade 2
## Illustrate data using plotly (or other) library

- Calculate additional feature [Stochastic slow %D](https://tradingsim.com/blog/slow-stochastics)
- Create a figure based on OHLC candles covering the test period, you can re-use it from the past assignments
- Second add a line chart(s) that illustrates the *label* (actual data) and the *forecast* in the same figure over OHLC. The lines should have different colors and include names of series.
- Add subplot(s) with features so we can se them time-aligned
- What patterns can you observe from the line figure?
# found no errors in this description

### Stochastic %K	
<br>
<span style='font-size:20px'>
$\frac{(C_t − L_n)}{(H_n − L_n)}\times100$
</span>
    
### Stochastic %D
<br>
<span style='font-size:20px'>
$\sum\nolimits_{i=0}^{n-1}\frac{\%K_{t-i}}n$
</span>
    
### Stochastic slow %D
<br>
<span style='font-size:25px'>
$\frac{\sum_{i=0}^{n-1}\%D_{t-i}}n$
</span>

In [None]:
#Periods
period = 14
shortPeriod = 3

#Fast stochastic
fastK = (DailyPredict['Close'] - DailyPredict['Low'].rolling(period).min())/(DailyPredict['High'].rolling(period).max() - DailyPredict['Low'].rolling(period).min()) * 100
fastD = fastK.rolling(shortPeriod).mean()

#Slow stochastic
slowK = fastK.rolling(shortPeriod).mean()
slowD = slowK.rolling(shortPeriod).mean()

In [None]:
#Plotting the different features
from plotly.subplots import make_subplots

#Creating the Euro to usd OHLC
fig = make_subplots(rows=3, cols=1, shared_xaxes=True, 
               vertical_spacing=0.03, subplot_titles=('EUR/USD', ''), 
               row_width=[0.4,0.4,0.9] )

# Potting OHLC
fig.add_trace(go.Candlestick(x=DailyPredict.index, open=DailyPredict["Open"], high=DailyPredict["High"],
                low=DailyPredict["Low"], close=DailyPredict["Close"], name="EUR/USD"), 
                row=1, col=1, )

fig.update_layout(
    height=1000
)

#fig.append_trace(go.Scatter(x=DailyPredict.index,y=slowD), row=2, col=1)
fig.add_trace(go.Scatter(x=DailyPredict.index, y=slowD,line=dict(width=2, color='#5fad75'), name="Stochastic slow %D"), row=2, col=1)
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['LWR'],line=dict(width=2, color='#7d5a8c'), name="Larry William’s %R"), row=3, col=1)


# Bar trace for volumes on 2nd row without legend
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['Prediction'],line=dict(width=2, color='#FF00FF'), name="Prediction"))
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['Close'],line=dict(width=2, color='#00FFFF'), name="Label"))
fig.update(layout_xaxis_rangeslider_visible=False)
fig.show()

## I notice that the dips in the stock price correlate with the same date on both Larry williams and stochastic slow %D


<a id='g3'></a>
# Grade 3

- Calculate additional feature [RSI (relative strength index)](https://www.investopedia.com/terms/r/rsi.asp)
- Add the feature as a subplot to the illustration from in the previos step
- Set up an [ElasticNet](https://scikit-learn.org/stable/modules/linear_model.html#elastic-net) model
- Fit/train the ElasticNet to the training data
- Forecast and calculate the R² error on both the training data set and the test
- Combine line chart(s) that illustrates the *label* (actual data) and the *forecast* from both models in the previos figure.
- Compare the errors and explain the outcome
# Found no errors in this description either

### RSI
<br>
<span style='font-size:25px'>
$100-\frac{100}{\left(1+\frac{\frac{\sum_{i=0}^{n-1}Up_{t-i}}{\text{n}}}{\frac{\sum_{i=0}^{n-1}Dw_{t-i}}{\text{n}}}\right)} $
    </span>

In [None]:
#Calculating SMA
def SMA(DailyPredict, period=14, column='Close'):
    return DailyPredict[column].rolling(window=period).mean()


#Calculating RSI
def RSI(DailyPredict,period = 14, column='Close'):
    delta = DailyPredict[column].diff(1)
    delta = delta[1:]
    up = delta.copy()
    down = delta.copy()
    up[up <0] = 0
    down[down>0] = 0
    DailyPredict['up'] = up
    DailyPredict['down'] = down
    AVG_Gain = SMA(DailyPredict, period, column = 'up')
    AVG_Loss = abs(SMA(DailyPredict, period, column = 'down'))
    RS = AVG_Gain / AVG_Loss
    RSI = 100.0 - (100.0/(1.0+RS))
        
    return RSI

#Inserting the RSI into the dataframe
DailyPredict['RSI'] = RSI(DailyPredict)
DailyPredict.dropna(inplace=True)

In [None]:
#Graph with RSI

from plotly.subplots import make_subplots

#Creating the Eur to usd OHLC
fig = make_subplots(rows=4, cols=1, shared_xaxes=True, 
               vertical_spacing=0.03, subplot_titles=('EUR/USD', ''), 
               row_width=[0.4,0.4,0.4,0.9] )

#Potting OHLC
fig.add_trace(go.Candlestick(x=DailyPredict.index, open=DailyPredict["Open"], high=DailyPredict["High"],
                low=DailyPredict["Low"], close=DailyPredict["Close"], name="EUR/USD"), 
                row=1, col=1, )

fig.update_layout(
    height=1000
)

#fig.append_trace(go.Scatter(x=DailyPredict.index,y=slowD), row=2, col=1)
fig.add_trace(go.Scatter(x=DailyPredict.index, y=slowD,line=dict(width=2, color='#5fad75'), name="Stochastic slow %D"), row=2, col=1)
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['LWR'],line=dict(width=2, color='#7d5a8c'), name="Larry William’s %R"), row=3, col=1)
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['RSI'],line=dict(width=2, color='#2FAC9F'), name="RSI"), row=4, col=1)

#Bar trace for volumes on 2nd row without legend
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['Prediction'],line=dict(width=2, color='#FF00FF'), name="Prediction"))
fig.add_trace(go.Scatter(x=DailyPredict.index, y=DailyPredict['Label'],line=dict(width=2, color='#00FFFF'), name="Label"))

fig.update(layout_xaxis_rangeslider_visible=False)
fig.show()

In [None]:
from sklearn.linear_model import ElasticNet
best_lr_score = regressor.score(X_scaledTest,Y_scaledTest)
best_d = 0
best_a = 0
best_r = 0
for alpha in np.logspace(-19,-15,5):
    for ratio in np.logspace(8,13,6):
        lr = ElasticNet(alpha=alpha, l1_ratio=ratio)
        lr.fit(X_scaled, Y_scaled)
        dif = lr.score(X_scaledTest, Y_scaledTest) - best_lr_score
        if dif > best_d:
            best_d = dif
            best_a = alpha
            best_r = ratio
print("ElasticNet with Lwr and stochastic slow d")
print(f'{"Best alpha":<20}{best_a:.1e}')
print(f'{"Best l1_ratio":<20}{best_r:.1e}')
print(f'{"Gain over LinearR":<20}{best_d:.1e}')