The goal of this code is the building of an SI-RCNN model to forcast intraday directional movements.

The first step is the loading of seven technical indicators from our stock of choice. For the remit of this assignment we used the S&P 500.

We made use of the following 7 indicators:


1. Stochastic %K
2. William’s %R
3. Stochastic %D
4. A/D Oscillator
5. Momentum
6. Disparity
7. Rate of Change

In [23]:
import yfinance as yf
import pandas as pd
import numpy as np
import ta

data = yf.download("^GSPC", start="2023-01-01", end="2025-01-01", interval="1d")
data.dropna(inplace=True)

# 1. SMA (Simple Moving Average - 20 days)
data['SMA_20'] = data['Close'].rolling(window=20).mean()

# 2. EMA (Exponential Moving Average - 20 days)
data['EMA_20'] = data['Close'].ewm(span=20, adjust=False).mean()

# 3. RSI (Relative Strength Index - 14 days)
delta = data['Close'].diff()
gain = np.where(delta > 0, delta, 0)
loss = np.where(delta < 0, -delta, 0)
avg_gain = pd.Series(gain.reshape(-1)).rolling(window=14).mean()
avg_loss = pd.Series(loss.reshape(-1)).rolling(window=14).mean()
rs = avg_gain / avg_loss
data['RSI_14'] = 100 - (100 / (1 + rs))

# 4. MACD (Moving Average Convergence Divergence)
ema_12 = data['Close'].ewm(span=12, adjust=False).mean()
ema_26 = data['Close'].ewm(span=26, adjust=False).mean()
data['MACD'] = ema_12 - ema_26

# 5. Stochastic Oscillator %K (14-day)
low_14 = data['Low'].rolling(window=14).min()
high_14 = data['High'].rolling(window=14).max()
data['Stochastic_K'] = 100 * ((data['Close'] - low_14) / (high_14 - low_14))

# 6. ATR (Average True Range - 14 days)
high_low = data['High'] - data['Low']
high_close = np.abs(data['High'] - data['Close'].shift())
low_close = np.abs(data['Low'] - data['Close'].shift())
true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
data['ATR'] = true_range.rolling(window=14).mean()

# 7. OBV (On-Balance Volume)
obv = (np.sign(data['Close'].diff()) * data['Volume']).fillna(0).cumsum()
data['OBV'] = obv

# Keep only indicator columns
indicators = data[['SMA_20', 'EMA_20', 'RSI_14', 'MACD', 'Stochastic_K', 'ATR', 'OBV']].dropna()

[*********************100%***********************]  1 of 1 completed


In [7]:
import pandas as pd
import numpy as np
import yfinance as yf

# Ensure the data is sorted by date
df = yf.download("^GSPC", start="2023-01-01", end="2025-01-01", interval="1d")
df.reset_index(inplace=True)
df.dropna(inplace=True)


# Parameters
lookback = 14  # typical lookback for most of these indicators

# 1. Stochastic %K
low_min = df['Low'].rolling(window=lookback).min()
high_max = df['High'].rolling(window=lookback).max()
df['Stochastic_%K'] = 100 * ((df['Close'] - low_min) / (high_max - low_min))

# 2. Williams %R
df["Williams_%R"] = -100 * ((high_max - df['Close']) / (high_max - low_min))

# 3. Stochastic %D (3-period SMA of %K)
df['Stochastic_%D'] = df['Stochastic_%K'].rolling(window=3).mean()

# 4. A/D Oscillator (Accumulation/Distribution Line)
ad = ((df['Close'] - df['Low']) - (df['High'] - df['Close'])) / (df['High'] - df['Low']) * df['Volume']
df['AD_Line'] = ad.cumsum()
df['AD_Oscillator'] = df['AD_Line'] - df['AD_Line'].shift(lookback)

# 5. Momentum (Close - Close n periods ago)
df['Momentum'] = df['Close'] - df['Close'].shift(lookback)

# 6. Disparity (Close / Moving Average * 100)
df['Disparity'] = (df['Close'] / df['Close'].rolling(window=lookback).mean()) * 100

# 7. Rate of Change (ROC)
df['ROC'] = ((df['Close'] - df['Close'].shift(lookback)) / df['Close'].shift(lookback)) * 100

# Display relevant columns
technical_indicators = df[['Date', 'Stochastic_%K', 'Williams_%R', 'Stochastic_%D','AD_Oscillator', 'Momentum', 'Disparity', 'ROC']]
technical_indicators.dropna(inplace=True)
print(technical_indicators.head())

YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  1 of 1 completed

Price        Date Stochastic_%K Williams_%R Stochastic_%D AD_Oscillator  \
Ticker                                                                    
15     2023-01-25     90.252829   -9.747171     90.951325  1.302313e+10   
16     2023-01-26     99.547583   -0.452417     93.453797  1.936106e+10   
17     2023-01-27     89.097404  -10.902596     92.965938  1.618603e+10   
18     2023-01-30     64.761217  -35.238783     84.468735  1.680585e+10   
19     2023-01-31     91.560900   -8.439100     81.806507  1.764679e+10   

Price     Momentum   Disparity       ROC  
Ticker                                    
15      163.250000  101.656858  4.236991  
16      252.329834  102.309142  6.626134  
17      175.479980  102.241487  4.505170  
18      125.679932  100.688509  3.229112  
19      157.350098  101.875888  4.014801  



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  technical_indicators.dropna(inplace=True)


The next step was to create another input of financial news sentence embeddings, for this we used the FNSPID dataset which hold millions of financial news records covering S&P 500 companies.

https://github.com/Zdong104/FNSPID_Financial_News_Dataset

In [20]:
from gensim.models import Word2Vec
from sklearn.preprocessing import OneHotEncoder
import nltk
import re

# Ensure NLTK resources are available
nltk.download('punkt')

# Load the dataset
df = pd.read_csv('All_external_subset.csv')  
#print(df['Article_title'].head())
df = df.sort_values('Date').reset_index(drop=True)  

def preprocess_title(title):
    title = title.lower()
    title = title.replace("’", "'").replace("‘", "'").replace("“", '"').replace("”", '"')
    tokens = re.findall(r"\b[a-zA-Z']+\b", title)
    return tokens

df['tokens'] = df['Article_title'].apply(preprocess_title)
print(df['tokens'].head)

# Build vocabulary
all_tokens = [token for tokens in df['tokens'] for token in tokens]
vocab = sorted(set(all_tokens))
word_to_index = {word: idx for idx, word in enumerate(vocab)}

# One-hot encode titles
encoder = OneHotEncoder(handle_unknown='ignore')
encoder.fit(np.array(vocab).reshape(-1, 1))

def one_hot_encode(tokens):
    indices = [word_to_index[word] for word in tokens if word in word_to_index]
    one_hot = np.zeros(len(vocab))
    one_hot[indices] = 1
    return one_hot

df['one_hot'] = df['tokens'].apply(one_hot_encode)

# Train Word2Vec model
model = Word2Vec(sentences=df['tokens'], vector_size=100, window=5, min_count=1, workers=4)

# Generate sentence embeddings by averaging word vectors
def get_sentence_vector(tokens):
    vectors = [model.wv[word] for word in tokens if word in model.wv]
    if vectors:
        return np.mean(vectors, axis=0)
    else:
        return np.zeros(model.vector_size)

df['sentence_vector'] = df['tokens'].apply(get_sentence_vector)
print(df['sentence_vector'].head)

# Now, df['sentence_vector'] contains the embedding for each title

<bound method NDFrame.head of 0     [int'l, air, transport, authority, chief, econ...
1     [shares, of, several, healthcare, companies, a...
2     [stifel, maintains, hold, on, agilent, technol...
3     [shares, of, several, healthcare, companies, a...
4     [shares, of, several, companies, in, the, auto...
5                   [agilent, withdraws, and, guidance]
6     [agilent, reports, has, become, top, level, sp...
7     [agilent, reports, fda, approval, for, pd, com...
8     [ubs, maintains, neutral, on, agilent, technol...
9     [shares, of, several, healthcare, companies, a...
10    [shares, of, several, healthcare, companies, a...
11    [how, bill, ackman, successfully, navigated, c...
12    [pershing, square, shows, fund, raises, stake,...
13    [roundup, how, buffett, einhorn, ackman, and, ...
14    [agilent, technologies, receives, fda, approva...
15                      [earnings, scheduled, for, may]
16    [agilent, technologies, shares, are, trading, ...
17    [stocks, mov

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\gianm\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
