We will use finta library to help us with our indicator features.

In [None]:
pip install finta



In [None]:
import pandas as pd
import numpy as np
from finta import TA

train_file = 'https://raw.githubusercontent.com/joshuasir/ml-project/main/datasets-stock-exchange/stock-market-train.csv'
test_file = 'https://raw.githubusercontent.com/joshuasir/ml-project/main/datasets-stock-exchange/stock-market-test.csv'



df = pd.read_csv(train_file)

df_test = pd.read_csv(test_file)

In [None]:
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
df.sort_values('Date')

Unnamed: 0,Index,Date,Open,High,Low,Close,Adj Close,Volume,CloseUSD
32734,N225,1965-01-05,1257.719971,1257.719971,1257.719971,1257.719971,1257.719971,0.0,12.577200
32735,N225,1965-01-06,1263.989990,1263.989990,1263.989990,1263.989990,1263.989990,0.0,12.639900
32736,N225,1965-01-07,1274.270020,1274.270020,1274.270020,1274.270020,1274.270020,0.0,12.742700
32737,N225,1965-01-08,1286.430054,1286.430054,1286.430054,1286.430054,1286.430054,0.0,12.864301
32738,N225,1965-01-12,1288.540039,1288.540039,1288.540039,1288.540039,1288.540039,0.0,12.885400
...,...,...,...,...,...,...,...,...,...
83368,J203.JO,2019-07-09,57731.398440,57731.398440,56923.851560,56963.800780,56963.800780,0.0,3987.466055
83369,J203.JO,2019-07-10,56963.800780,57852.609380,56963.800780,57597.871090,57597.871090,0.0,4031.850976
83370,J203.JO,2019-07-11,57597.871090,58033.390630,57132.058590,57273.140630,57273.140630,0.0,4009.119844
83371,J203.JO,2019-07-12,57273.140630,57548.250000,57162.230470,57277.351560,57277.351560,0.0,4009.414609


we will complement our features with more indicators such as WMA, STC, MOM, CCI where CCI act as a momentum base ocsillator, MOM will be replacing OBV instead of the steps we will look for the speed of the trend, STC complements MACD in looking for the momentum of a certain period.

In [None]:
def calc_target(closing,open):
  target = []
  for open,close in zip(open,closing):
    target.append(int(close > open))
  return target
def calc_MACD(EMA_12,EMA_26):
  MACD = []
  for first,second in zip(EMA_12,EMA_26):
    MACD.append(first-second)
  return MACD
def add_indicators(df,days=50):
  new_df = pd.DataFrame()
  
  for index in df.Index.unique():
    df_index = df[df['Index']==index].reset_index(drop=True).copy()
    ohlc = df_index[['Index','Open','High','Low','Close','Volume','Date']].set_index('Date').sort_index()
    Indicators = pd.DataFrame({
         'SMA' : TA.SMA(ohlc,column='Open'),
         'EMA' : TA.EMA(ohlc,column='Open'),
         'WMA' : TA.WMA(ohlc,column='Open'),
         'RSI' : TA.RSI(ohlc,14,column='Open'),
         'MOM' : TA.MOM(ohlc,column='Open'),
         'STC' : TA.STC(ohlc,column='Open'),
         'CCI' : TA.CCI(ohlc),
         'Target' : calc_target(closing = ohlc['Close'],open = ohlc['Open']),
         'MACD' : TA.MACD(ohlc,column='Open')['MACD'],
         'SIG' : TA.MACD(ohlc,column='Open')['SIGNAL']
         # 'Index' : ohlc['Index']
         })
    new_df = new_df.append(Indicators)
  return new_df

In [None]:
new_df = add_indicators(df).sort_index().dropna()

new_df_test = add_indicators(df_test).sort_index().dropna()

Our model will try to understand the indicators.
* for Moving Averages if the price is lower than the Moving Averages than it means the price is going up vice versa
* MACD,STC if the value is greater than the price before than the price will go up vice versa
* if MOM is positive than the price will go up vice versa
* if RSI is lower than 30 means the price will go up, if it's bigger than 70 the price goes down otherwise if the CCI is between 30 and 70 than if RSI value is bigger than yesterday's RSI value than the price goes up vice versa.
* if CCI is lower than -200 means the price will go up, if it's bigger than 200 the price goes down otherwise if the CCI is between -200 and 200 than if CCI value is bigger than yesterday's CCI value than the price goes up vice versa.

In [None]:
new_df = new_df.append(new_df_test).reset_index(drop=True)
from sklearn.model_selection import train_test_split

new_df,new_df_test = train_test_split(new_df,test_size=.3,stratify=new_df['Target'])

In [None]:
new_df.to_csv('stock-market-train-clean.csv',index=False)
new_df_test.to_csv('stock-market-test-clean.csv',index=False)