# Using ESN to build an e2e prediction system with more features (WIP)
Continuation of `1_esn_le2e.ipynb` notebook with an aim to enhance accuracy by fusing more features like some common TAs. More of a exploration than the final results.|

## I. Obtaining data
Here, we get the data from Yahoo Finance API (yfinance package) which has some limitation for bulk and daily usage. So restricting the notebook to examination purpose.

In [2]:
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from easyesn import PredictionESN

Configurations

In [3]:
stock_name = "MSFT"
start_date="2021-01-01"
end_date="2025-03-28"
interval="1d"
n_steps = 7
train_test_split = 0.8

Create the instance for the given ticker

In [None]:
tikr = yf.Ticker(stock_name)
tikr_history = tikr.history(start=start_date, end=end_date, interval="1d")
#tikr_history = tikr.history(period="1mo", interval="5m")
#tikr_history = tikr.history(period="max")

We can run sentiment analysis if we want on the current news.

In [None]:
#print(tikr.news)
#print(tikr.info)

In [None]:
print("Shape of df", tikr_history.shape)
tikr_history.head(5)

In [None]:
tikr_history.columns

#### Let's focus on limited columns
We will consider only 4 variables (Open, High, Low, Close), therefore dropping reamining columns.

In [None]:
tikr_history = tikr_history.drop(["Dividends", "Stock Splits"], axis=1)
tikr_history.reset_index(inplace=True)
tikr_history.head(5)
df = tikr_history

In [None]:
import pandas_ta as ta
# Add RSI
df['RSI_14'] = ta.rsi(df['Close'], length=14)

# Add MACD (default values: fast=12, slow=26, signal=9)
macd = ta.macd(df['Close'])
df['MACD'] = macd['MACD_12_26_9']
df['MACD_Signal'] = macd['MACDs_12_26_9']

# Add Bollinger Bands
bollinger = ta.bbands(df['Close'], length=20, std=2)
df['BB_Upper'] = bollinger['BBU_20_2.0']
df['BB_Lower'] = bollinger['BBL_20_2.0']
df['BB_Middle'] = bollinger['BBM_20_2.0']

# Add EMA (Exponential Moving Average)
df['EMA_10'] = ta.ema(df['Close'], length=10)
df['EMA_50'] = ta.ema(df['Close'], length=50)

# Add ATR (Average True Range)
df['ATR_14'] = ta.atr(df['High'], df['Low'], df['Close'], length=14)
df.dropna(inplace=True)

print(df.head())


## II. Building ESN model

## Data prepartion
- Separating the train and test data with an 80-20% split
- Scaling the data

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler


# Define features
FEATURES = ["Open", "High", "Low", "Close", "Volume", "RSI_14"]

# Train-Test Split
train_size = int(len(df) * train_test_split)
train_df = df.iloc[:train_size]
test_df = df.iloc[train_size:]

# Scaling
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train_df[FEATURES])
test_scaled = scaler.transform(test_df[FEATURES])

# Prepare Data for ESN Training
def create_sequences(data, n_steps):
    X, y = [], []
    for i in range(len(data) - n_steps):
        X.append(data[i:i + n_steps].flatten())  # Flatten past n_steps
        y.append(data[i + n_steps, -1].flatten())  # Next day's Close price
    return np.array(X), np.array(y)

# Define sequence length (lookback window)
n_steps = 6  # Adjust based on model performance

x_train, y_train = create_sequences(train_scaled, n_steps)
x_test, y_test = create_sequences(test_scaled, n_steps)

print("X_train shape:", x_train.shape)  # Should be (samples, n_steps * features)
print("y_train shape:", y_train.shape)  # Should be (samples,)


Observe above, the shape of `x` and `y`. Let's double check: If, current x(t=0) has 3 values then its y(t=0) will be equal to the last element(s) in x(t=1) 

In [None]:
print(y_train[0][0]-x_train[n_steps][3])

In [None]:
%%time
esn = PredictionESN(
    n_input = x_train.shape[1],
    n_output = y_train.shape[1],
    n_reservoir= 100,
    spectralRadius=1,
    leakingRate=1,
    feedback=False,
    randomSeed = 42
)
esn.fit(x_train, y_train, transientTime="Auto", verbose=1)

In [None]:
# Predict values from test data trained using training data and reverse transform
y_hat_scaled = esn.predict(x_test)
# Following part just to perform inverse transform as it expect n_features columns
y_hat_scaled_4 = np.repeat(y_hat_scaled, 6, axis=1)
y_hat = scaler.inverse_transform(y_hat_scaled_4)[:,0]
print("y_test.shape, y_hat.shape", y_test.shape, y_hat.shape)

In [None]:
y_test_orig = test_df[n_steps:].reset_index(drop=True)
# Visualise the ask_price predictions
plt.figure(figsize = (8, 2))
plt.plot(y_test_orig['Open'], color = 'red', linestyle = "--", label = 'y_test')
plt.plot(y_hat, color = 'green', label = 'y_hat')
plt.title('y_hat vs y_test')
plt.ylabel('Open')
plt.legend()
plt.show()

Zoom-in to the last 100 values to see differences.

In [None]:
y_test_orig = test_df[n_steps:].reset_index(drop=True)
# Visualise the ask_price predictions
plt.figure(figsize = (12, 2))
plt.plot(y_test_orig['Open'][-100:].to_list(), color = 'red', linestyle = "--", label = 'y_test')
plt.plot(y_hat[-100:], color = 'green', label = 'y_hat_esn')
plt.title('y_hat vs y_test')
plt.ylabel('Open')
plt.legend()
plt.show()

Export the model for future use (ESN)

In [None]:
import dill as pickle
pickle.dump(esn, open(f"stock_esn_model_{n_steps}.pkl", "wb"))
# Can be use afterwards: esn = pickle.load(open("stock_esn_model.pkl", "rb"))

# Also save the scaler
pickle.dump(scaler, open(f"scaler_esn_model_{n_steps}.pkl", "wb"))

END OF NOTEBOOK