 #  <font color='Green'>Forecasting the trend of NIFTY50</font> 
 # <font color='Green'>-----------------------------------------------</font> 
 

 ## <font color='red'>Introduction</font> 

<b>NIFTY50 futures is the most traded index in Indian stock markets.<br></br>There are over 2 crore active traders in india and just 10% of them are successfull. <br></br>The success of a day/swing trader depends on their prediction of the current day's trend.

The objective of this case study is to predict the current day's trend(just after the start of indian stock markets).
<br></br>So by making use of The model's predictions , Day/Swing Traders can plan their trade for any current day.</b>

## <font color='red'>ML Formulation</font> 
<b>Trend on any day can be either Bullish (when closing price of the day is greater than opening price)

                                        or

<b>Bearish (when closing price is less than opening price). So This is a Binary Classification problem</b>

## <font color='red'>Performance Metrics</font> 

<b> 1. Accuracy(As data is balanced)<br></br>2. F1-Score(As the cost of false positives and false negatives is very high)</b>

## <font color='red'>Data Sources</font> 

<b> Data is extracted from the following sources<br></br>

        1.NSE(National stock exchange of India),by using nsepy library.
        2.Yahoo finance by using yfinance library. 
        3.Data is also scrapped from way2wealth.com.

 ## <font color='red'>Data and Feature Extraction</font> 

In [None]:
#pip install nsepy

In [1]:
from datetime import date,timedelta
from nsepy import get_history
from nsepy.derivatives import get_expiry_date
import pandas as pd
from tqdm import tqdm

In [2]:
def Get_Futures_Data(Index,from_year,to_year):
    
    """
    This function gets the NIFTY50 Futures historical day wise data by making use of the nsepy library. 
    From the NIFTY50 historical data ,features such as open,high,low,close,volumes are formed.
    """
    
    total_months = (to_year - from_year+1)*12
    start_date   =  date(from_year,1,1)
    expiry_year  =  start_date.year
    expiry_month = start_date.month
    futures_expiry_date = get_expiry_date(year=expiry_year, month=expiry_month) 
    futures_expiry_date = list(futures_expiry_date)
    futures_expiry_date.sort()
    futures_expiry_date = futures_expiry_date[-1]
    end_date = futures_expiry_date
    Nifty50_data =  get_history(symbol=Index,
                    start= start_date,
                    end  = end_date,
                    index=True,
                    futures=True,
                    expiry_date=futures_expiry_date)
    Nifty50_data = Nifty50_data[['Expiry','Open','High','Low','Close','Number of Contracts','Turnover','Open Interest','Change in OI']]
    
    for i in tqdm(range(total_months-1)):
        prev_expiry_year =  end_date.year
        prev_expiry_month = end_date.month
        prev_expiry_day = end_date.day
        
        start_date   =  end_date + timedelta(days=1)
        
        expiry_year  =  start_date.year
        expiry_month =  prev_expiry_month+1
        
        if(prev_expiry_month == 12):
            expiry_month = 1
            if(prev_expiry_day!= 31):
                expiry_year += 1

        futures_expiry_date = get_expiry_date(year=expiry_year, month=expiry_month) 
        futures_expiry_date = list(futures_expiry_date)
        futures_expiry_date.sort()
        futures_expiry_date = futures_expiry_date[-1]
        end_date = futures_expiry_date

        curr_month_Nifty50_data =  get_history(symbol=Index,
                                   start= start_date,
                                   end  = end_date,
                                   index=True,
                                   futures=True,
                                   expiry_date=futures_expiry_date)
        curr_month_Nifty50_data = curr_month_Nifty50_data[['Expiry','Open','High','Low','Close','Number of Contracts','Turnover','Open Interest','Change in OI']]
        Nifty50_data=Nifty50_data.append(curr_month_Nifty50_data)
        
    return Nifty50_data

In [5]:
Nifty50_data_df = Get_Futures_Data('Nifty',2004,2020)
Nifty50_data_df.to_csv('Nifty50_Futures_Data_2004_to_2020.csv')

100%|████████████████████████████████████████████████████████████████████████████████| 203/203 [00:40<00:00,  5.03it/s]


In [3]:
import talib
import pandas as pd

In [90]:
def candle_stick_pattern_recognizer(data):
    """
    This function takes the Open,High,Low and Closing prices of the data(i.e returned by the previous function)
    as input and by making using of talib library,famous candle stick patterns are recognized.
    These patterns(each pattern is a feature) are joined with dataframe returned by the previous function.
    Features Dataframe, till this stage is returned by this function
    """
    cols_drop = ['Expiry','Number of Contracts','Turnover','Open Interest','Change in OI']
    data = data.drop(columns=cols_drop)
    
    candle_patterns = talib.get_function_groups()['Pattern Recognition']
    for pattern in candle_patterns:
        data[pattern] = getattr(talib, pattern)(data['Open'],data['High'],data['Low'],data['Close'])
    
    print("TA-LIB can identify a total of {0} candle stick patterns".format(len(candle_patterns)))
    
    deleted_patterns = 0
    for pattern in candle_patterns:
        recognized_pattern_flag = data[pattern].value_counts()
        if (recognized_pattern_flag[0] == len(data)):
            #print(pattern)
            deleted_patterns += 1
            data = data.drop(columns=[pattern])
        else:
            data[pattern] = data[pattern].apply(lambda pat_val: (pat_val/200))
   
    print("In the given data, {0} patterns are identified".format(len(candle_patterns)-deleted_patterns))
    return data

In [91]:
nifty_data = pd.read_csv('Nifty50_Futures_Data_2004_to_2020.csv')
nifty_data = nifty_data.set_index('Date')

In [92]:
nifty_data_with_candle_stick_patterns = candle_stick_pattern_recognizer(nifty_data.copy())

TA-LIB can identify a total of 61 candle stick patterns
In the given data, 51 patterns are identified


In [93]:
nifty_data_with_candle_stick_patterns = nifty_data_with_candle_stick_patterns.drop(columns=['Open','High','Low','Close'])
nifty_data = nifty_data.drop(columns=['Expiry'])
nifty50_with_candle_patterns = pd.concat([nifty_data,nifty_data_with_candle_stick_patterns],axis=1)
nifty50_with_candle_patterns.to_csv('Nifty50_With_Candle_Patterns.csv')

In [94]:
nifty50_with_candle_patterns.head(3)

Unnamed: 0_level_0,Open,High,Low,Close,Number of Contracts,Turnover,Open Interest,Change in OI,CDL2CROWS,CDL3INSIDE,...,CDLSPINNINGTOP,CDLSTALLEDPATTERN,CDLSTICKSANDWICH,CDLTAKURI,CDLTASUKIGAP,CDLTHRUSTING,CDLTRISTAR,CDLUNIQUE3RIVER,CDLUPSIDEGAP2CROWS,CDLXSIDEGAP3METHODS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2004-01-01,1894.0,1928.0,1890.1,1925.3,74471,28513000000.0,7422600,128600,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2004-01-02,1929.5,1961.0,1929.5,1950.8,73078,28370950000.0,7770000,347400,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2004-01-05,1962.8,1969.75,1931.0,1957.15,101796,39728740000.0,8339000,569000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [2]:
import pandas as pd
import datetime
import calendar
from talib import SMA,EMA,BBANDS,MOM,MACD,RSI,ADX,MFI,OBV,ADOSC
from tqdm import tqdm

In [3]:
def add_technical_indicators(data):
    """
    This function takes the Open,High,Low and Closing prices of the data(returned by the previous function)
    as input and by making using of talib librarytechnical indicators such as EMA,Bollinger bands,RSI 
    MFI and many more are calculated. All these are latter joined in a data frame to form feature matrix.
    Features Dataframe, till this stage is returned by this function.
    """
    cols_2_drop = (list(data.columns))
    
    # EMA and Bollinger Bands
    #........................
    data['EMA_3']  = EMA(data['Close'],3)
    data['EMA_5']  = EMA(data['Close'],5)
    data['EMA_8']  = EMA(data['Close'],8)
    data['EMA_13'] = EMA(data['Close'],13)
    data['EMA_21'] = EMA(data['Close'],21)
    data['EMA_34'] = EMA(data['Close'],34)
    data['EMA_55'] = EMA(data['Close'],55)
    data['EMA_100'] = EMA(data['Close'],100)
    data['EMA_200'] = EMA(data['Close'],200)
    data['BollUp'], data['BollMid'], data['BollDown'] = BBANDS(data['Close'], timeperiod=5, nbdevup=2, nbdevdn=2, matype=0)
   
    # Momentum Indicators
    #....................
    data['trend_3']  = MOM(data['Close'], timeperiod=3)
    data['trend_5']  = MOM(data['Close'], timeperiod=5)
    data['trend_8']  = MOM(data['Close'], timeperiod=8)
    data['trend_13'] = MOM(data['Close'], timeperiod=13)
    data['trend_34'] = MOM(data['Close'], timeperiod=34)
    data['trend_55'] = MOM(data['Close'], timeperiod=55)
    data['trend_100'] = MOM(data['Close'], timeperiod=100)
    data['trend_200'] = MOM(data['Close'], timeperiod=200)
    data['MACD'],data['MACDsignal'],macdhist = MACD(data['Close'], fastperiod=12, slowperiod=26, signalperiod=9)  
    data['RSI-14'] = RSI(data['Close'],timeperiod=14)
    data['ADX-14'] = ADX(data['High'],data['Low'],data['Close'], timeperiod=14)

    # Volume Indicators
    #..................
    data['MFI-14'] = MFI(data['High'],data['Low'],data['Close'],data['Volume'],timeperiod=14)
    data['OBV']    = OBV(data['Close'],data['Volume'])
    data['ADOSC']  = ADOSC(data['High'],data['Low'],data['Close'],data['Volume'],fastperiod=3,slowperiod=10)
    
    
    #Daily pivots calculation
    #........................
    val_init =  data.iloc[0]['Close']
    data['Pre_Close'] = data["Close"].shift(periods=1)
    data.iloc[0]['Pre_Close'] = val_init

    val_init =  data.iloc[0]['High']
    data['Pre_High'] = data['High'].shift(periods=1)
    data.iloc[0]['Pre_High'] = val_init

    val_init =  data.iloc[0]['Low']
    data['Pre_Low'] = data['Low'].shift(periods=1)
    data.iloc[0]['Pre_Low'] = val_init

    data['Pivot'] =  (data['Pre_High']+data['Pre_Low']+data['Pre_Close'])/3
    data['R1']    =  (data['Pivot']*2) - data['Pre_Low']
    data['R2']    =  data['Pivot'] + data['Pre_High'] - data['Pre_Low']
    data['S1']    =  (data['Pivot']*2) - data['Pre_High']
    data['S2']    =  data['Pivot'] - data['Pre_High'] + data['Pre_Low']

    del data['Pre_High']
    del data['Pre_Low']
    del data['Pre_Close']
    
    
    # Weekly pivot points calculation
    #................................
    week_indices = []
    week_indices.append(2)

    for i in tqdm(range(2,len(data)-1)):
        current_day = data.iloc[i]['Day']
        next_day    = data.iloc[i+1]['Day']
        if (current_day >= next_day):
            week_indices.append(i+1)

    low_prices   = data['Low']
    high_prices  = data['High']
    close_prices = data['Close']
    weekly_pivots = []
    weekly_r1 =[]
    weekly_s1 =[]
    weekly_r2 =[]
    weekly_s2 =[]
    
    curr_week_index = len(week_indices)-1
    while (curr_week_index >= 1):
        prev_week = week_indices[curr_week_index-1]
        curr_week = week_indices[curr_week_index]
        if(curr_week != len(data)-1):
            next_week = week_indices[curr_week_index+1]

        prev_week_high = max(list(data['High'].iloc[prev_week:curr_week]))
        prev_week_low  = min(list(data['Low'].iloc[prev_week:curr_week]))
        prev_week_close = data['Close'].iloc[curr_week-1]             
    
        Pivot_for_curr_week =  (prev_week_high + prev_week_low + prev_week_close)/3
        R1_for_curr_week    =  (Pivot_for_curr_week*2) - prev_week_low
        R2_for_curr_week    =  Pivot_for_curr_week+prev_week_high - prev_week_low
        S1_for_curr_week    =  (Pivot_for_curr_week*2) - prev_week_high
        S2_for_curr_week    =  Pivot_for_curr_week - prev_week_high + prev_week_low

        if(curr_week != len(data)-1):              
            num_days_in_curr_week = next_week - curr_week
        else:
            num_days_in_curr_week = 1

        weekly_pivots = [Pivot_for_curr_week]*(num_days_in_curr_week) + weekly_pivots
        weekly_r1     = [R1_for_curr_week]*(num_days_in_curr_week) + weekly_r1 
        weekly_r2     = [R2_for_curr_week]*(num_days_in_curr_week) + weekly_r2
        weekly_s1     = [S1_for_curr_week]*(num_days_in_curr_week) + weekly_s1
        weekly_s2     = [S2_for_curr_week ]*(num_days_in_curr_week)+ weekly_s2

        curr_week_index = curr_week_index -1
        #End of While Loop
    
    #Neglect this part of code this is adjusment for initial data (2004 data)                          
    len_diff = len(data)-len(weekly_pivots)                            
    weekly_pivots = [(data["High"].iloc[0])]*len_diff  + weekly_pivots
    weekly_r1     = [(data["High"].iloc[0])]*len_diff  + weekly_r1 
    weekly_r2     = [(data["High"].iloc[0])]*len_diff  + weekly_r2
    weekly_s1     = [(data["High"].iloc[0])]*len_diff  + weekly_s1 
    weekly_s2     = [(data["High"].iloc[0])]*len_diff  + weekly_s2     
                         
    #print(len(weekly_pivots))
    #print(len(data))
    data['Weekly_Pivots'] = weekly_pivots
    data['Weekly_R1']     = weekly_r1 
    data['Weekly_R2']     = weekly_r2
    data['Weekly_S1']     = weekly_s1
    data['Weekly_S2']     = weekly_s2
    
    
    #MVWAP-7,MVWAP-14,MVWAP-21
    #.........................
    Mvwap_7  =[]
    Mvwap_14 =[]
    Mvwap_21 =[]
    
    data["Typical Price"] = (data['High']+data['Low']+data['Close'])/3
    
    for i in tqdm(range(6,len(data))):
        cumm_TP = ( (data['Typical Price'].iloc[i-6:i+1]) * (data['Volume'].iloc[i-6:i+1]) ).sum()
        cumm_vol = data['Volume'].iloc[i-6:i+1].sum()
        Mvwap_7.append(cumm_TP/cumm_vol)
        
    for i in tqdm(range(13,len(data))):
        cumm_TP = ( (data['Typical Price'].iloc[i-13:i+1]) * (data['Volume'].iloc[i-13:i+1]) ).sum()
        cumm_vol = data['Volume'].iloc[i-13:i+1].sum()
        Mvwap_14.append(cumm_TP/cumm_vol)                        
        
    for i in tqdm(range(20,len(data))):
        cumm_TP = ( (data['Typical Price'].iloc[i-20:i+1]) * (data['Volume'].iloc[i-20:i+1]) ).sum()
        cumm_vol = data['Volume'].iloc[i-20:i+1].sum()
        Mvwap_21.append(cumm_TP/cumm_vol)
       
        
    #Neglect the next lines , These are Just for maintaining the length of the data frame columns 
    some_val  = (data['Typical Price'].iloc[0] * data['Volume'].iloc[0]) / (data['Volume'].iloc[0])
    Mvwap_7   = [some_val]*6 + Mvwap_7 
    Mvwap_14  = [some_val]*13 + Mvwap_14
    Mvwap_21  = [some_val]*20 + Mvwap_21
    
    data['MVWAP-7']  = Mvwap_7 
    data['MVWAP-14'] = Mvwap_14
    data['MVWAP-21'] = Mvwap_14
    
    
    # PRICE ACTION INDICATORS
    # ..................
    
    data['Day_trend']                       =  data.apply(lambda day: ((day['Close']-day['Open'])*100)/ (day['Open']),axis=1 )
    
    data['Day_trend_strength_up_shadow']    =  data.apply(lambda day: ((day['High']-day['Close']))/(abs(day['Close']-day['Open'])+0.0001)
                                                                      if (day['Day_trend'] >= 0)
                                                                      else ((day['High']-day['Open']))/(abs(day['Close']-day['Open'])+0.0001)
                                                                     ,axis=1)

    data['Day_trend_strength_low_shadow']   =  data.apply(lambda day: ((day['Open']-day['Low']))/(abs(day['Close']-day['Open'])+0.0001)
                                                                      if (day['Day_trend'] >= 0)
                                                                      else ((day['Close']-day['Low']))/(abs(day['Close']-day['Open'])+0.0001)
                                                                     ,axis=1)
    
    data['pre_close'] = data['Close'].shift()
    data = data.iloc[1:]
    data['gap_up_down_opening'] = data.apply(lambda day: ((day['Open']-day['pre_close'])*100)/ (day['pre_close']),axis=1 )
   
    data['Pre_vol'] = data['Volume'].shift()
    data = data.iloc[1:]
    data['Change_in_Volume'] = data.apply(lambda day: ((day['Volume']-day['Pre_vol'])*100)/(day['Pre_vol']),axis=1 )
    
    cols_2_drop += ['pre_close','Pre_vol']
    data = data.drop(columns=cols_2_drop)
    return data


In [28]:
nifty_with_candle_sticks = pd.read_csv('Nifty50_With_Candle_Patterns.csv')
nifty_with_candle_sticks = nifty_with_candle_sticks.set_index('Date')
del nifty_with_candle_sticks['Turnover']
nifty_with_candle_sticks = nifty_with_candle_sticks.rename(columns={'Number of Contracts':'Volume'})

nifty_with_candle_sticks = nifty_with_candle_sticks.reset_index()
nifty_with_candle_sticks = nifty_with_candle_sticks.rename(columns={'index':'Date'})
nifty_with_candle_sticks['Date']  = nifty_with_candle_sticks['Date'].apply(lambda d: datetime.datetime.strptime(d, '%Y-%m-%d'))
nifty_with_candle_sticks['Day']   = nifty_with_candle_sticks['Date'].apply(lambda date: date.weekday())
nifty_with_candle_sticks['Month'] = nifty_with_candle_sticks['Date'].apply(lambda date: date.month)
nifty_with_candle_sticks['Year']  = nifty_with_candle_sticks['Date'].apply(lambda date: date.year)
nifty_with_candle_sticks = nifty_with_candle_sticks.set_index('Date')

nifty_tech_indicators = add_technical_indicators(nifty_with_candle_sticks.copy())
nifty_with_candle_patterns_n_tech_indicators = pd.concat([nifty_with_candle_sticks,nifty_tech_indicators],axis=1)

In [29]:
nifty_with_candle_patterns_n_tech_indicators.head(3)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Open Interest,Change in OI,CDL2CROWS,CDL3INSIDE,CDL3LINESTRIKE,...,Weekly_S2,Typical Price,MVWAP-7,MVWAP-14,MVWAP-21,Day_trend,Day_trend_strength_up_shadow,Day_trend_strength_low_shadow,gap_up_down_opening,Change_in_Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2004-01-01,1894.0,1928.0,1890.1,1925.3,74471,7422600,128600,0.0,0.0,0.0,...,,,,,,,,,,
2004-01-02,1929.5,1961.0,1929.5,1950.8,73078,7770000,347400,0.0,0.0,0.0,...,,,,,,,,,,
2004-01-05,1962.8,1969.75,1931.0,1957.15,101796,8339000,569000,0.0,0.0,0.0,...,1928.0,1952.633333,1914.466667,1914.466667,1914.466667,-0.287854,1.230067,4.628237,0.615132,39.297737


In [30]:
nifty_with_candle_patterns_n_tech_indicators.to_csv('Nifty50_With_Candle_Patterns_N_Tech_Indicators.csv')

In [4]:
from selenium.webdriver.chrome import webdriver
from selenium.common.exceptions import NoSuchElementException
import yfinance as yf
import time
import pandas as pd
import datetime
from tqdm import tqdm

In [175]:
# REF: https://medium.com/datadriveninvestor/python-utility-for-scrapping-historical-fii-data-ml-data-mining-a804a885df15

from selenium.webdriver.chrome import webdriver
from selenium.common.exceptions import NoSuchElementException

def Scrap_FII_Data(start_date,end_date):
    """
    This function scraps data(FII's data) from way2wealth.com, 
    Features are fomed using the FII's Data.Later this data is saved in csv file.
    """
    
    cols = ['Date','FII_Futures_Net','FII_Options_Net']
    fii_data_df = pd.DataFrame(columns=cols)
    browser_options = webdriver.Options()
    browser_options.add_argument("--start-maximized")
    browser_driver = webdriver.WebDriver("chromedriver.exe",options=browser_options)
    browser_driver.get("https://www.way2wealth.com/derivatives/fiiactivity/");
    time.sleep(10);
    start_date_field = browser_driver.find_element_by_id("ContentPlaceHolder1_ctl00_txtstart")
    end_date_field   = browser_driver.find_element_by_id("ContentPlaceHolder1_ctl00_txtend")
    start_date_field.clear();
    end_date_field.clear(); 

    start_date_field.send_keys(start_date);
    end_date_field.send_keys(end_date);
    
    go_button = browser_driver.find_element_by_id("ContentPlaceHolder1_ctl00_btngo")
    browser_driver.execute_script("arguments[0].click();", go_button)
    
    # Setting up the xpath for the next-pagination of the table
    pagination = 2
    xpath_before_pagination = "//*[@id='ContentPlaceHolder1_ctl00_grdboard']/tbody/tr[19]/td/table/tbody/tr/td[";
    xpath_after_pagination  = "]/a";
    next_pagination_xpath = xpath_before_pagination+str(pagination)+xpath_after_pagination;
    
    time.sleep(10);
    element_present=True
    p=1
    while element_present: 
    # Getting the values(Date, FII net futures and options) for each pagination/page of the table
    
        date_element_present=True
        beforeXpath= "//*[@id='ContentPlaceHolder1_ctl00_grdboard']/tbody/tr[";
        afterXpath_Date="]/td[1]";
        afterXpath_IndexNet="]/td[5]";
        afterXpath_IndexOptionNet="]/td[13]";
            
        for row in range (1,19):
            # Getting the values(Date, FII net futures and options) for each row in a page of the table
            contentXpath = beforeXpath+str(row)+afterXpath_IndexNet;
            try:
                IndexNet = browser_driver.find_element_by_xpath(contentXpath);
                date_element_present=True
            except NoSuchElementException:
                date_element_present=False
                
            if date_element_present:
                contentXpath = beforeXpath+str(row)+afterXpath_Date;                    
                Date = browser_driver.find_element_by_xpath(contentXpath).text;
                    
                contentXpath = beforeXpath+str(row)+afterXpath_IndexNet;
                IndexNet     = browser_driver.find_element_by_xpath(contentXpath).text;
                    
                contentXpath   = beforeXpath+str(row)+afterXpath_IndexOptionNet;
                IndexOptionNet = browser_driver.find_element_by_xpath(contentXpath).text;
                    
                fii_data_df =  fii_data_df.append({'Date': Date, 
                                                       'FII_Futures_Net' : IndexNet,
                                                       'FII_Options_Net': IndexOptionNet}, 
                                                        ignore_index=True)
        #print("Page {0} done , Next-Pagination {1}".format(p,pagination))
        p=p+1
                    
        try:
            element_present=True   
            next_pagination = browser_driver.find_element_by_xpath(next_pagination_xpath);
            pagination = pagination+1;
            next_pagination_xpath = xpath_before_pagination+str(pagination)+xpath_after_pagination;
            
        except NoSuchElementException:
                print(next_pagination_xpath)
                element_present=False
                
        if element_present:
            browser_driver.execute_script("arguments[0].click();",next_pagination);
            time.sleep(10);
        else:
            break;
            
    browser_driver.quit();
    return fii_data_df

In [182]:
from datetime import date,timedelta

time_periods = []
start_date = date(2009,1,31)
end_date = date(2001,1,1)

while (end_date<date(2020,12,8)):
    end_date   = start_date + timedelta(days=180)
    s_d   = start_date.strftime('%m/%d/%Y')
    e_d   = end_date.strftime('%m/%d/%Y')
    entry = (s_d,e_d)
    time_periods.append(entry)
    start_date = end_date + timedelta(days=1)
    
period_count = 1
for period in tqdm(time_periods):
    fii_data = Scrap_FII_Data(period[0],period[1])
    fii_data.to_csv('fii_'+str(period_count)+'.csv')
    period_count += 1
    

In [204]:
file_name = ('./Source Data Sets/Fii/fii_1.csv')
fii_final_data = pd.read_csv(file_name)
fii_final_data = fii_final_data.iloc[::-1]
del fii_final_data['Unnamed: 0']
    
for i in range(2,25):
    file_name = ('./Source Data Sets/Fii/fii_'+str(i)+'.csv')
    curr_fii_data = pd.read_csv(file_name)
    curr_fii_data = curr_fii_data.iloc[::-1]
    del curr_fii_data['Unnamed: 0']
    fii_final_data = pd.concat([fii_final_data,curr_fii_data],axis=0)
    
fii_final_data = fii_final_data.set_index('Date')
fii_final_data['pre_fut_net'] = fii_final_data['FII_Futures_Net'].shift()
fii_final_data['pre_opt_net'] = fii_final_data['FII_Options_Net'].shift()
fii_final_data = fii_final_data.iloc[1:]

fii_final_data['FII_Futures_Net_change'] = fii_final_data['FII_Futures_Net'] - fii_final_data['pre_fut_net'] 
fii_final_data['FII_Options_Net_change'] = fii_final_data['FII_Options_Net'] - fii_final_data['pre_opt_net'] 

del fii_final_data['pre_fut_net']
del fii_final_data['pre_opt_net']

#print(fii_final_data)
fii_final_data.to_csv('Final_FII_Data.csv')

In [None]:
def get_world_stock_indices_data(index,start_date,end_date):
    """
    This Function gets the world stock indices historical day wise data by making use of yfinance library.
    From Each country's index data, features are formed.
    Each country's features data is saved in seperate csv files.
    
    """
    if   (index=='^DJI'):
        country = 'US'
    elif (index=='^FTSE'):
        country = 'Lon'
    elif (index=='^FCHI'):
        country = 'Fran'
    elif (index=='000001.SS'):
        country = 'China'
    elif (index=='^N225'):
        country = 'Japan'
    elif (index=='^STI'):
        country = 'Singp'
    elif (index=='^HSI'):
        country = 'Honkon'
        
        
    data = yf.download(index, start=start_date, end=end_date)
    print(country)
    print(data[['Open','High','Low','Close']].head(5))
    print("."*60)
    
    country_Openchange_from_pre_clo   = country+'_gap_up_down_opening'
    country_day_trend                 = country+'_day_trend'
    country_trend_strength_up_shadow  = country+'_trend_strength_up_shadow'
    country_trend_strength_low_shadow = country+'_trend_strength_low_shadow'
    
    
    data[country_day_trend]                   =  data.apply(lambda day: ((day['Close']-day['Open'])*100)/ (day['Open']),axis=1 )
    
    data[country_trend_strength_up_shadow]    =  data.apply(lambda day: ((day['High']-day['Close']))/(abs(day['Close']-day['Open'])+0.0001)
                                                         if (day[country_day_trend] >= 0)
                                                         else ((day['High']-day['Open']))/(abs(day['Close']-day['Open'])+0.0001)
                                                         ,axis=1)
    
    data[country_trend_strength_low_shadow]   =  data.apply(lambda day: ((day['Open']-day['Low']))/(abs(day['Close']-day['Open'])+0.0001)
                                                         if (day[country_day_trend] >= 0)
                                                         else ((day['Close']-day['Low']))/(abs(day['Close']-day['Open'])+0.0001)
                                                         ,axis=1)
    
    
    if (index == '^FTSE'):
        cols_2_drop = ['Open','High','Low','Close','Adj Close','Volume']
    else:
        data['pre_close'] = data['Close'].shift()
        data = data.iloc[1:]
        data[country_Openchange_from_pre_clo]   = data.apply(lambda day: ((day['Open']-day['pre_close'])*100)/ (day['pre_close']),axis=1 )

        cols_2_drop = ['Open','High','Low','Close','Adj Close','Volume','pre_close']
        
    data = data.drop(columns=cols_2_drop)
    return data


#..............................................................................

us_dow_index_data = get_world_stock_indices_data('^DJI',"2009-1-30","2020-12-08")
us_dow_index_data.to_csv('us_index_data.csv')

lon_index_data = get_world_stock_indices_data('^FTSE',"2009-1-30","2020-12-08")
lon_index_data.to_csv('london_index_data.csv')

fran_index_data = get_world_stock_indices_data('^FCHI',"2009-1-30","2020-12-08")
fran_index_data.to_csv('france_index_data.csv')

chi_index_data = get_world_stock_indices_data('000001.SS',"2009-1-30","2020-12-08")
chi_index_data.to_csv('china_index_data.csv')

jap_index_data = get_world_stock_indices_data('^N225',"2009-1-30","2020-12-08")
jap_index_data.to_csv('japan_index_data.csv')

honkon_index_data = get_world_stock_indices_data('^HSI',"2009-1-30","2020-12-08")
honkon_index_data.to_csv('honkong_index_data.csv')

singa_index_data = get_world_stock_indices_data('^STI',"2009-1-30","2020-12-08")
singa_index_data.to_csv('singapore_index_data.csv')



# 000001.SS     - china      (7.00am  to 12.30pm)    IST         2hrs        ---- feb 2nd 2009   -2h candle  (UTC+8) Shangai
# ^HSI          - honkon     (6.45am  to 1.30am)     IST         2hrs 15mins ---- feb 2nd 2009   -3h candle
# ^N225         - japan      (5.30am  to 11.30am)    IST         3hrs 30mins ---- feb 2nd 2009   -4h candle
# ^NSEI         - India

# ^DJI          - US         (7.00pm  to 1.30am)     IST      
# ^FTSE         - London     (1.30pm  to  10pm)      IST
# ^FCHI         - CAC 40     (12.30pm to 10pm)       IST

In [None]:
def get_Todays_World_Indices_Data_Till_9AM(Country,c_data):
    """
    This Function gets the Asian markets indices historical day wise data till 9am(IST).
    From Each country's index data, features are formed.
    Each country's features data is saved in seperate csv files.
    """
    data = c_data
    country = Country
    add_utc_hours = 8
    
    if  (country == 'china' or country == 'hongkong'):
        add_utc_hours = 8
    elif (country == 'japan'):
        add_utc_hours = 9
  
        
    data['Date_Time'] = data['time'].apply(lambda date_unix:( (datetime.datetime.fromtimestamp(date_unix))
                                                              .strftime('%Y-%m-%d %H:%M'))                 
                                           )

    data[['Date','Time']] = data['Date_Time'].str.split(' ', 1, expand=True)
    data = data[(data['Time'].str)[0:2]=='09']

    
    country_day_trend                 = country+'_today_init_trend'
    country_trend_strength_up_shadow  = country+'_today_init_trend_strength_up_shadow'
    country_trend_strength_low_shadow = country+'_today_init_trend_strength_low_shadow'
    
    data[country_day_trend]                   =  data.apply(lambda day: ((day['close']-day['open'])*100)/ (day['open']),axis=1 )
    data[country_trend_strength_up_shadow]    =  data.apply(lambda day: ((day['high']-day['close']))/(abs(day['close']-day['open'])+0.0001)
                                                             if (day[country_day_trend] >= 0)
                                                             else ((day['high']-day['open']))/(abs(day['close']-day['open'])+0.0001)
                                                             ,axis=1)

    data[country_trend_strength_low_shadow]   =  data.apply(lambda day: ((day['open']-day['low']))/(abs(day['close']-day['open'])+0.0001)
                                                             if (day[country_day_trend] >= 0)
                                                             else ((day['close']-day['low']))/(abs(day['close']-day['open'])+0.0001)
                                                             ,axis=1)
    
    cols_2_drop =['open','close','high','low','Volume','Volume MA','Date_Time','Time','time']
    data = data.drop(columns=cols_2_drop)
    data = data.set_index('Date')
    return data



china_data_df = pd.read_csv('SSE_DLY_000001, 120.csv')
china_data    = get_Todays_World_Indices_Data_Till_9AM('china',china_data_df)
china_data.to_csv('china_todays_init_data.csv')

hon_kon_data_df = pd.read_csv('HSI_HSI, 180.csv')
hon_kon_data    = get_Todays_World_Indices_Data_Till_9AM('hongkong',hon_kon_data_df)
hon_kon_data.to_csv('hongkong_todays_init_data.csv')

japan_data_df = pd.read_csv('TVC_NI225, 240.csv')
japan_data    = get_Todays_World_Indices_Data_Till_9AM('japan',japan_data_df)
japan_data.to_csv('japan_todays_init_data.csv')


In [None]:
def get_brent_crude_prices_data():
    """
    This Function reads crude oil price's historical data and forms features from it. 
    crude oil features are saved in a csv file.
    """
    crude_oil = pd.read_csv('BrentCrude.csv')
    crude_oil['Date'] = crude_oil['Date'].apply(lambda date: (parser.parse(date)).strftime('%d-%m-%Y')) 
    end_index = crude_oil[ crude_oil['Date']=='30-01-2009'].index[0]
    crude_oil = crude_oil[:end_index+1]
    crude_oil = crude_oil.iloc[::-1]
    crude_oil['pre_price'] = crude_oil['Brent Crude Price'].shift()
    crude_oil['change_in_crude_price'] = crude_oil['Brent Crude Price']-crude_oil['pre_price']
    crude_oil = crude_oil.set_index('Date')
    del crude_oil['pre_price']
    del crude_oil['Brent Crude Price']
    crude_oil = crude_oil[1:]
    #crude_oil.head()
    #crude_oil .tail()


    crude_oil_2 = pd.read_csv('HistoricalQuotes.csv')
    crude_oil_2['Date'] = crude_oil_2['Date'].apply(lambda date: (parser.parse(date)).strftime('%d-%m-%Y')) 
    end_index = crude_oil_2[ crude_oil_2['Date']=='21-06-2019'].index[0]
    crude_oil_2 = crude_oil_2[:end_index+1]
    crude_oil_2= crude_oil_2.iloc[::-1]
    crude_oil_2 = crude_oil_2.rename(columns={' Close':'Brent Crude Price'})
    crude_oil_2['pre_price'] = crude_oil_2['Brent Crude Price'].shift()
    crude_oil_2['change_in_crude_price'] = crude_oil_2['Brent Crude Price']-crude_oil_2['pre_price']
    crude_oil_2 = crude_oil_2.set_index('Date')
    del crude_oil_2['pre_price']
    del crude_oil_2['Brent Crude Price']
    crude_oil_2 = crude_oil_2[1:]
    
    crude_oil_df = pd.concat([crude_oil,crude_oil_2],axis=0)
    #crude_oil_2.head(5)
    #crude_oil .tail()
    return crude_oil_df

    
crude_oil_df = get_brent_crude_prices_data()
crude_oil_df.to_csv("Brent_crude_prices_data.csv")

In [None]:
def get_usd_inr_change():
    """
    This Function gets the USD_INR historical data by making use of yfinance library.
    USD_INR features are formed from the obtained data and are saved in a csv file.
    """
    data = yf.download("INR=X", start="2009-1-30",end="2020-12-8",prepost = True)
    
    cols_2_drop  = ['Open','High','Low','Adj Close','Volume']
    data = data.drop(columns=cols_2_drop)

    data['pre_close'] = data['Close'].shift()
    data['Change_in_dollar_rate'] = data['Close']-data['pre_close']
    data = data.iloc[2:]
    del data['pre_close']
    del data['Close']

    return data

usd_inr_df = get_usd_inr_change()
usd_inr_df.to_csv("USD_INR_data.csv")

In [None]:
from nsepy import get_history
from datetime import date

def get_india_vix():
    """
    This Function reads the india_vix historical data.
    india_vix features are formed from the obtained data and are saved in a csv file.
    """
    
    vix_data_1 = pd.read_csv('./Source Data Sets/Others/India VIX Historical Data.csv')
    
    end_index = vix_data_1[ vix_data_1['Date'] == '30-01-2009'].index[0]
    vix_data_1 = vix_data_1.iloc[:end_index+1]
    
    vix_data_1 = vix_data_1.iloc[::-1]
    vix_data_1  = vix_data_1.set_index('Date')
    
    vix_data_2 =get_history(symbol="INDIAVIX",
                start=date(2019,6,24),
                end=date(2020,12,8),
                index=True)

    vix_data_2 = vix_data_2.dropna()
    vix_data_2 = vix_data_2[['Close','Previous']]
    del vix_data_2['Previous']
    
    vix_data_2 = vix_data_2.rename(columns={'Close':'VIX'})
    VIX_DATA = pd.concat([vix_data_1,vix_data_2],axis=0)
    
    VIX_DATA['pre_VIX'] = VIX_DATA['VIX'].shift()
    VIX_DATA  = VIX_DATA.iloc[1:]
    VIX_DATA['VIX'] = ( (VIX_DATA['VIX']-VIX_DATA['pre_VIX']) / (VIX_DATA['pre_VIX']) ) *100
    del VIX_DATA['pre_VIX']
      
    return VIX_DATA
    

india_vix_df = get_india_vix()
india_vix_df.to_csv('India_VIX_data.csv')

In [21]:
cols_order = [
             'Open','High','Low','Close','Typical Price','Day_trend','Day_trend_strength_up_shadow','Day_trend_strength_low_shadow',
             'gap_up_down_opening','Volume','Change_in_Volume','Open Interest','Change in OI',
             'Pivot','R1','R2','S1','S2',
             'Weekly_Pivots','Weekly_R1','Weekly_R2','Weekly_S1','Weekly_S2',
             'MVWAP-7','MVWAP-14','MVWAP-21',
             'trend_3','trend_5','trend_8','trend_13','trend_34','trend_55','trend_100','trend_200',
             'EMA_3','EMA_5','EMA_8','EMA_13','EMA_21','EMA_34','EMA_55','EMA_100','EMA_200',
             'BollUp','BollMid','BollDown','MACD','MACDsignal','RSI-14','ADX-14','MFI-14','OBV','ADOSC','Day','Month',
             'CDL2CROWS','CDL3INSIDE','CDL3LINESTRIKE','CDL3OUTSIDE','CDLADVANCEBLOCK','CDLBELTHOLD','CDLBREAKAWAY',
             'CDLCLOSINGMARUBOZU','CDLCOUNTERATTACK','CDLDARKCLOUDCOVER','CDLDOJI','CDLDOJISTAR','CDLDRAGONFLYDOJI',
             'CDLENGULFING','CDLEVENINGDOJISTAR','CDLEVENINGSTAR','CDLGAPSIDESIDEWHITE','CDLGRAVESTONEDOJI',
             'CDLHAMMER','CDLHANGINGMAN','CDLHARAMI','CDLHARAMICROSS','CDLHIGHWAVE','CDLHIKKAKE','CDLHIKKAKEMOD',
             'CDLHOMINGPIGEON','CDLINNECK','CDLINVERTEDHAMMER','CDLLADDERBOTTOM','CDLLONGLEGGEDDOJI','CDLLONGLINE',
             'CDLMARUBOZU','CDLMATCHINGLOW','CDLMORNINGDOJISTAR','CDLMORNINGSTAR','CDLONNECK','CDLPIERCING',
             'CDLRICKSHAWMAN','CDLSEPARATINGLINES','CDLSHOOTINGSTAR','CDLSHORTLINE','CDLSPINNINGTOP','CDLSTALLEDPATTERN',
             'CDLSTICKSANDWICH','CDLTAKURI','CDLTASUKIGAP','CDLTHRUSTING','CDLTRISTAR','CDLUNIQUE3RIVER',
             'CDLUPSIDEGAP2CROWS','CDLXSIDEGAP3METHODS'
             ]

In [22]:
data = data[cols_order]
data.to_csv('Nifty_Technical_Analysis_Data.csv')

In [160]:
Final_df = pd.read_csv('Nifty_Technical_Analysis_Data.csv')
Final_df = Final_df.set_index('Date')
Final_df .shape

(2895, 106)

<b> <font color ='red'>JOIINING ALL THE DATA PRESENT ACROSS MULTIPLE CSV FILES<b></font>

In [32]:
# import os
# Files = os.listdir()

Files = [ 
          'Final_FII_Data.csv','India_VIX_data.csv', 'USD_INR_data.csv','Brent_crude_prices_data.csv',
          'us_index_data.csv','london_index_data.csv', 'france_index_data.csv','singapore_index_data.csv',
          'china_index_data.csv','china_todays_init_data.csv',
          'honkong_index_data.csv', 'hongkong_todays_init_data.csv',
          'japan_index_data.csv', 'japan_todays_init_data.csv',
        ]

for file in Files:
    data = pd.read_csv(file)
    #print(str( len(data['Date']) ) + ' : ' + str( len(data['Date'].unique()) ) ) 
    data = data.set_index('Date')
    Final_df = Final_df.join(data)
    
Final_df.to_csv('Final_Source_Data.csv')
print(Final_df.shape)

In [199]:
Final_df = Final_df.fillna(method='ffill',axis=0)
Final_df.to_csv('Final_Source_Data.csv')

In [61]:
Final_df = pd.read_csv('Final_Source_Data.csv')
Final_df = Final_df.set_index('Date')
#print(len((Final_df.columns)))

In [62]:
cols_rename_dict = {
'Open' : 'Open_T' , 'High'  : 'High_T' ,'Low'  : 'Low_T' , 'Close' : 'Close_T' ,'Day_trend' : 'Day_trend_T' ,'gap_up_down_opening' : 'gap_up_down_opening_T', 
'Pivot' : 'Pivot_T','R1': 'R1_T','R2': 'R2_T','S1':'S1_T','S2':'S2_T','Weekly_Pivots':'Weekly_Pivots_T','Weekly_R1':'Weekly_R1_T','Weekly_R2':'Weekly_R2_T',
'Weekly_S1':'Weekly_S1_T','Weekly_S2':'Weekly_S2_T','RSI-14' : 'RSI-14_T' ,'MFI-14' : 'MFI-14_T','Day' : 'Day_T','Month': 'Month_T',
'Singp_gap_up_down_opening':'Singp_gap_up_down_opening_T','China_gap_up_down_opening':'China_gap_up_down_opening_T',
'china_today_init_trend':'china_today_init_trend_T','china_today_init_trend_strength_up_shadow':'china_today_init_trend_strength_up_shadow_T',
'china_today_init_trend_strength_low_shadow' : 'china_today_init_trend_strength_low_shadow_T' ,'Honkon_gap_up_down_opening' : 'Honkon_gap_up_down_opening_T',
'hongkong_today_init_trend': 'hongkong_today_init_trend_T','hongkong_today_init_trend_strength_up_shadow' : 'hongkong_today_init_trend_strength_up_shadow_T',
'hongkong_today_init_trend_strength_low_shadow': 'hongkong_today_init_trend_strength_low_shadow_T','Japan_gap_up_down_opening' : 'Japan_gap_up_down_opening_T', 
'japan_today_init_trend' : 'japan_today_init_trend_T',
'japan_today_init_trend_strength_up_shadow' : 'japan_today_init_trend_strength_up_shadow_T', 
'japan_today_init_trend_strength_low_shadow' :  'japan_today_init_trend_strength_low_shadow_T'
}


shift_helper_data_df  = Final_df

Final_df_copy = Final_df.copy()
Final_df_copy = Final_df_copy.rename(columns = cols_rename_dict)
Today_df = Final_df_copy[ (list(cols_rename_dict.values())) ]


In [63]:
print(Final_df.shape)
print(Today_df.shape)

(2893, 149)
(2893, 33)


<b><font color='red'>COMBINING PREVIOUS 5 DAYS DATA.</font>

Current Day's Input data point will contain...<br><br>
    

T  day Features(Current day's features)<br>
T1 day Features(Previous day's features) <br>
T2 day Features <br>
T3 day Features<br>
T4 day Features<br>
T5 day Features<br>
   
<b>

In [64]:
def  combine_last_5_days(data,shift_data):
    """
    This function joins the previous 5 days data for each current day data point.
    
    """
    final_data = data.copy()
    shift_data = shift_data.shift(1)
    final_data = final_data.join(shift_data)
    
    for i in range(2,6):
        suffix_l = '_T'+str(i-1)
        suffix_r = '_T'+str(i)
        shift_data = shift_data.shift(1)
        
        if(i==5):
            final_data = final_data.join(shift_data,lsuffix=suffix_l,rsuffix=suffix_r)
        else:
            final_data = final_data.join(shift_data,lsuffix=suffix_l)
            
    return final_data

FINAL_DATA_DF = combine_last_5_days(Today_df,shift_helper_data_df )

<b><font color='red'>For each Day's data point,Forming the Y's data (i.e output or the classes data) and saving the FINAL DATA FILE.CSV</font>

In [None]:
FINAL_DATA_DF = FINAL_DATA_DF.dropna()
FINAL_DATA_DF['Y_change'] = FINAL_DATA_DF['Close_T']-FINAL_DATA_DF['Open_T']
# FINAL_DATA_DF['Y_trend']  = FINAL_DATA_DF.apply(lambda day: 0 if (day['Y_change'] < 0) else 1, axis =1)
# FINAL_DATA_DF['Y_high']   = FINAL_DATA_DF['High_T']
# FINAL_DATA_DF['Y_low']    = FINAL_DATA_DF['Low_T']

cols_2_drop = ['Close_T','High_T','Low_T','Day_trend_T']

FINAL_DATA_DF = FINAL_DATA_DF.drop(columns=cols_2_drop)
FINAL_DATA_DF.to_csv('FINAL_DATA.csv')
FINAL_DATA_DF.shape

In [3]:
import pandas as pd
data = pd.read_csv('FINAL_DATA - changes.csv')

In [4]:
data.shape

(2888, 599)