<div class="alert alert-block alert-warning">
    <h1> <strong> JPX Tokyo Stock Exchange Prediction </strong> 
    </h1>
</div>  

<div class="alert alert-block alert-info">
    <h5>Success in any financial market requires one to identify solid investments. When a stock or derivative is undervalued, it makes sense to buy. If it's overvalued, perhaps it's time to sell. While these finance decisions were historically made manually by professionals, technology has ushered in new opportunities for retail investors. Data scientists, specifically, may be interested to explore quantitative trading, where decisions are executed programmatically based on predictions from trained models.

There are plenty of existing quantitative trading efforts used to analyze financial markets and formulate investment strategies. To create and execute such a strategy requires both historical and real-time data, which is difficult to obtain especially for retail investors. This competition will provide financial data for the Japanese market, allowing retail investors to analyze the market to the fullest extent.

Japan Exchange Group, Inc. (JPX) is a holding company operating one of the largest stock exchanges in the world, Tokyo Stock Exchange (TSE), and derivatives exchanges Osaka Exchange (OSE) and Tokyo Commodity Exchange (TOCOM). JPX is hosting this competition and is supported by AI technology company AlpacaJapan Co.,Ltd.

This competition will compare your models against real future returns after the training phase is complete. The competition will involve building portfolios from the stocks eligible for predictions (around 2,000 stocks). Specifically, each participant ranks the stocks from highest to lowest expected returns and is evaluated on the difference in returns between the top and bottom 200 stocks. You'll have access to financial data from the Japanese market, such as stock information and historical stock prices to train and test your model.

This is a competition that ranks the change rate Target of the closing price (Close) between the next day and the 2 days later for each stock on each date
    </h5>
</div>  

<div class="alert alert-block alert-warning">
    <h2> <strong> Merics Used </strong> 
    </h2>
</div>  

<div class="alert alert-block alert-info">
    <h5>
        In this competition, the following conditions set will be used to compete for scores.

1. The model will use the closing price ($C_{(k, t)}$) until that business day ($t$) and other data every business day as input data for a stock ($k$), and predict rate of change ($r_{(k, t)}$) of closing price of the top 200 stocks and bottom 200 stocks on the following business day ($C_{(k, t+1)}$) to next following business day ($C_{(k, t+2)}$)

    $$
    r_{(k, t)} = \frac{C_{(k, t+2)} - C_{(k, t+1)}}{C_{(k, t+1)}}
    $$
    
2. Within top 200 stock predicted ($up_i\;\;(i = 1, 2, \ldots, 200)$), multiply by their respective rate of change with linear weights of 2-1 for rank 1-200 and denote their sum as $S_{up}$.

    $$
    S_{up} = \frac{\sum^{200}_{i=1}(r_{({up_i}, t)} * linear function(2, 1)_i))}{Average(linear function(2, 1))}
    $$
    
3. Within bottom 200 stocks predicted  ($down_i\;\;(i = 1, 2, \ldots, 200)$), multiply by their respective rate of change with linear weights of 2-1 for bottom rank 1-200 and denote their sum as $S_{down}$.

    $$
    S_{down} = \frac{\sum^{200}_{i=1}(r_{({down_i}, t)} * linear function(2, 1)_i)}{Average(linear function(2, 1))}
    $$
    
4. The result of subtracting $S_{down}$ from $S_{up}$ is $R_{day}$ and is called "**daily spread return**".

    $$
    R_{day} = S_{up} - S_{down}
    $$
    
5. The daily spread return is calculated every business day during the public/private period and obtained as a time series for that period. The mean/standard deviation of the time series of daily spread returns is used as the score. Score calculation formula (x is the business day of public/private period)

    $$
    Score = \frac{Average(R_{day_1-day_x})}{STD(R_{day_1-day_x})}
    $$
    </h5>
</div>  


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import warnings
warnings.filterwarnings("ignore")

sample = pd.read_csv("../input/jpx-tokyo-stock-exchange-prediction/example_test_files/sample_submission.csv")
print(sample.info())
sample.head(5)

<div class="alert alert-block alert-warning">
    <h3> 
    Let's understand the computation.
        <br> 
    Below is a sample computation focusing on only one stock:
    </h3>
</div>  

In [None]:
stock_prices = pd.read_csv("../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv")
stock_prices['Date'] = pd.to_datetime(stock_prices['Date']) 
stock_A = stock_prices[stock_prices['SecuritiesCode']==1301].reset_index(drop = True)
stock_A.head(5)

<div class="alert alert-block alert-info">
    <h4>


1. The model will use the closing price ($C_{(k, t)}$) until that business day ($t$) and other data every business day as input data for a stock ($k$), and predict rate of change ($r_{(k, t)}$) of closing price of the top 200 stocks and bottom 200 stocks on the following business day ($C_{(k, t+1)}$) to next following business day ($C_{(k, t+2)}$)

    $$
    r_{(k, t)} = \frac{C_{(k, t+2)} - C_{(k, t+1)}}{C_{(k, t+1)}}
    $$
<br> <br>  This will be calculated until the target and my calculated rate match    
    </h4>
</div>  

In [None]:
stock_A['close_shift_1'] = stock_A['Close'].shift(-1)
stock_A['close_shift_2'] = stock_A['Close'].shift(-2)
stock_A['rate'] = (stock_A['close_shift_2'] - stock_A['close_shift_1'])/stock_A['close_shift_1']   
stock_A.head(3)

<div class="alert alert-block alert-info">
    <h5>
        Lets try branking the stock for a particular day.
The lower the rank (the larger the rate of change to + ), the larger the target hence more profitable to buy, and vise versa(profitable to sell)
    </h5>
</div>  

In [None]:
test_rank = stock_prices[stock_prices['Date'] == '2021-11-02'].reset_index(drop = True)
test_rank['rank'] = test_rank['Target'].rank(ascending=False,method='first')-1
test_rank = test_rank.sort_values('rank').reset_index(drop = True)
test_rank.head(3)

<div class="alert alert-block alert-info">
    <h4>
     
2. Within top 200 stock predicted ($up_i\;\;(i = 1, 2, \ldots, 200)$), multiply by their respective rate of change with linear weights of 2-1 for rank 1-200 and denote their sum as $S_{up}$.

    $$
    S_{up} = \frac{\sum^{200}_{i=1}(r_{({up_i}, t)} * linear function(2, 1)_i))}{Average(linear function(2, 1))}
    $$
    
    </h4>
</div>  


In [None]:
#get top 200 of already ranked
test_rank_top200 = test_rank.iloc[:200,:]

#assign weights
weights = np.linspace(start=2,stop=1,num=200)
print("sample weights include: {}".format(weights[0:5]))

#multiply weights by target
test_rank_top200['weights'] = weights
test_rank_top200['calc_weights'] = test_rank_top200['weights'] *test_rank_top200['Target'] 

#Calculate Sup
Sup = test_rank_top200['calc_weights'].sum()/np.mean(weights)
print("Calculated Sup is {}: ".format(Sup))
test_rank_top200.head(3)

<div class="alert alert-block alert-info">
    <h4>
   
3. Within bottom 200 stocks predicted  ($down_i\;\;(i = 1, 2, \ldots, 200)$), multiply by their respective rate of change with linear weights of 2-1 for bottom rank 1-200 and denote their sum as $S_{down}$.

    $$
    S_{down} = \frac{\sum^{200}_{i=1}(r_{({down_i}, t)} * linear function(2, 1)_i)}{Average(linear function(2, 1))}
    $$
    
</h4>
    </div>

In [None]:
#bottom 200  and re-sort
test_rank_bottom200 = test_rank.iloc[-200:,:]
test_rank_bottom200 = test_rank_bottom200.sort_values("rank",ascending = False).reset_index(drop=True)

#assign weights
weights = np.linspace(start=2,stop=1,num=200)
print("sample weights include: {}".format(weights[0:5]))

#multiply weights by target
test_rank_bottom200['weights'] = weights
test_rank_bottom200['calc_weights'] = test_rank_bottom200['weights'] *test_rank_bottom200['Target'] 

#Calculate Sdown
Sdown = test_rank_bottom200['calc_weights'].sum()/np.mean(weights)
print("Calculated Sdown is {}: ".format(Sdown))
test_rank_bottom200.head(3)

<div class="alert alert-block alert-info">
    <h4>
   
4. The result of subtracting $S_{down}$ from $S_{up}$ is $R_{day}$ and is called "**daily spread return**".

    $$
    R_{day} = S_{up} - S_{down}
    $$
  <br>
        This is to be calculated on a daily basis
    </h4>
    </div>

In [None]:
daily_spread_return = Sup-Sdown
print("The Daily spread return is : {}".format(daily_spread_return))

<div class="alert alert-block alert-info">
    <h4>
   
5. The daily spread return is calculated every business day during the public/private period and obtained as a time series for that period. The mean/standard deviation of the time series of daily spread returns is used as the score. Score calculation formula (x is the business day of public/private period)

    $$
    Score = \frac{Average(R_{day_1-day_x})}{STD(R_{day_1-day_x})}
    $$
      </h4>
    </div>

<div class="alert alert-block alert-warning">
    <h3> 
Modified Condensed Formula
    </h3>
</div>

<div class="alert alert-block alert-info">
<h3> <strong> Args: </strong> </h3>
 <ul>
     <li> df (pd.DataFrame): predicted results </li>
     <li> portfolio_size (int): # of equities to buy/sell </li>
     <li> toprank_weight_ratio (float): the relative weight of the most highly ranked stock compared to the least. </li>
     <li> Returns: (float): spread return </li>
        </ul>
    
 
 </div>



In [None]:
def calc_spread_return_sharpe(df: pd.DataFrame, portfolio_size: int = 200, toprank_weight_ratio: float = 2) -> float:
    """
    Args:
        df (pd.DataFrame): predicted results
        portfolio_size (int): # of equities to buy/sell
        toprank_weight_ratio (float): the relative weight of the most highly ranked stock compared to the least.
    Returns:
        (float): sharpe ratio
    """
    def _calc_spread_return_per_day(df, portfolio_size, toprank_weight_ratio):
        """
        Args:
            df (pd.DataFrame): predicted results
            portfolio_size (int): # of equities to buy/sell
            toprank_weight_ratio (float): the relative weight of the most highly ranked stock compared to the least.
        Returns:
            (float): spread return
        """
        assert df['Rank'].min() == 0
        assert df['Rank'].max() == len(df['Rank']) - 1
        weights = np.linspace(start=toprank_weight_ratio, stop=1, num=portfolio_size)
        purchase = (df.sort_values(by='Rank')['Target'][:portfolio_size] * weights).sum() / weights.mean()
        short = (df.sort_values(by='Rank', ascending=False)['Target'][:portfolio_size] * weights).sum() / weights.mean()
        return purchase - short

    buf = df.groupby('Date').apply(_calc_spread_return_per_day, portfolio_size, toprank_weight_ratio)
    sharpe_ratio = buf.mean() / buf.std()
    return sharpe_ratio

In [None]:
plt.figure(figsize = (18,9))

securities_count_annual = stock_prices.groupby("Date")["SecuritiesCode"].count().reset_index()
plt.plot(securities_count_annual['Date'],securities_count_annual['SecuritiesCode'])

We will only use cases where there are 2000 stocks ie 2021

In [None]:
securities_count_annual =securities_count_annual[securities_count_annual['SecuritiesCode'] == 2000]

print("Cases where there are 2000 stocks are from {} to {}".format(securities_count_annual['Date'].min().date(),
securities_count_annual['Date'].max().date()))

stock_prices_2 = stock_prices[stock_prices['Date'] >= '2020-12-23'].reset_index(drop=True)

In [None]:
#rank
stock_prices_2['Rank'] = stock_prices_2.groupby('Date')['Target'].rank(ascending=False,method = 'first') - 1

stock_prices_2['Rank'] = stock_prices_2['Rank'].astype('int')

stock_prices_2.head(3)

In [None]:
score = calc_spread_return_sharpe(stock_prices_2, portfolio_size= 200, toprank_weight_ratio= 2)
print("The score using the condensed formula is {} ".format(score))

<div class="alert alert-block alert-warning">
    <h1> 
        <strong> Stock Market Predictions with LSTM </strong>
    </h1>
</div>

<div class="alert alert-block alert-info">
    <h5>
Here we will:
        <ul>
            <li> Split data into train-test data </li>
            <li> Normalize Data </li>
            <li> apply averaging techniques for predictions</li>
            <li> Predict and visualize predictions against actual data </li>
        </ul> 
        <br>
 Princeton University economist Burton Malkiel (1973): "A Random Walk Down Wall Street," that if the market is truly efficient and a share price reflects all factors immediately as soon as they're made public, a blindfolded monkey throwing darts at a newspaper stock listing should do as well as any investment professional.
    </h5>
</div>

<div class="alert alert-block alert-info">
    <p style = "color:red;">
<i> Princeton University economist Burton Malkiel (1973): If the market is truly efficient and a share price reflects all factors immediately as soon as they're made public, a blindfolded monkey throwing darts at a newspaper stock listing should do as well as any investment professional. </i>
    </p>
</div>


In [None]:
import datetime as dt
import os
import tensorflow as tf
import sklearn
from sklearn.preprocessing import MinMaxScaler

#lets join our securities codes with the specific names
stock_prices = pd.read_csv("../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv")
stock_prices['Date'] = pd.to_datetime(stock_prices['Date']) 
stock_prices = stock_prices[stock_prices['SecuritiesCode'] == 1332]

stock_names = pd.read_csv("../input/jpx-tokyo-stock-exchange-prediction/stock_list.csv")
stock_names.head(3)

stock_prices['SecuritiesName'] = stock_prices['SecuritiesCode']
stock_prices = pd.merge(stock_prices,stock_names[['Name','SecuritiesCode']],left_on='SecuritiesCode',right_on='SecuritiesCode', how = 'left')
                        
df = stock_prices
df.head(3)

<div class="alert alert-block alert-warning">
    <h2>
Exploratory Data Analysis
    </h2>
</div>

<div class="alert alert-block alert-info">
    <h5>
Let's try visualize our data midprices and closing prices to show the distribution
    </h5>
</div>


In [None]:
from matplotlib.dates import DateFormatter
import seaborn as sns
from matplotlib.pyplot import GridSpec

In [None]:
sns.set_style("dark")
sns.set_context("notebook", font_scale=1.8, rc={"lines.linewidth": 1.0})
fig, axes = plt.subplots(figsize = (18,9),sharex=True)
gs=GridSpec(nrows=2,ncols=1,wspace=0.2)
for ax in fig.get_axes():
    ax.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)

ax1 = fig.add_subplot(gs[0,0])
ax1 = sns.lineplot(y = df['Close'],x = df['Date'],color='red')
ax1.tick_params(bottom=False,labelbottom=False,left=False,labelleft=False)
ax1.set(xlabel = None)
ax1.set(ylabel = "Closing Prices")
ax1.patch.set_facecolor('PowderBlue')

ax2 = fig.add_subplot(gs[1,0])
ax2 = sns.lineplot(y = df['Open'],x = df['Date'])
ax2.tick_params(bottom=True,labelbottom=True,left=False,labelleft=False)
ax2.set(ylabel = "Open Prices")
ax2.patch.set_facecolor('PowderBlue')





<div class="alert alert-block alert-warning">
    <h2>
Filter Data, Convert to array and Normalize
    </h2>
</div>

<div class="alert alert-block alert-info">
    <h5>
We shall normalize both train and test data to be between feature ranges of  0 and 1
        <br>
        We shall equally separate our data into five equal windows so that the predictions are for the nth+1 day depending on the intervals.
        eg if the intervals are 50 each then the predictions will be for the 51st day
    </h5>
</div>    
</div>

In [None]:
df = df[['Date','Open','High','Low','Close','Volume']]
#df2 = df.set_index('Date')
df_close = df.filter(['Close'])

#convert to array
df_close = df_close.values

#Normalize to make values between 0 and 1
scaler = MinMaxScaler(feature_range=(0,1))
df_scaled = scaler.fit_transform(df_close)

print("Below is a sample of the normalized data: {}".format(df_scaled))

#separate train_data
import math
train_len = math.ceil(len(df_scaled)*0.7)
df_train = df_scaled[0:train_len,:]
print('converted training data has {} rows ie {}% of the original data'.format(df_train.shape[0],round(df_train.shape[0]/df.shape[0]*100,0)))

#separate data into x and y

In [None]:
df_train_x = []
df_train_y = []




In [None]:
high_prices = df.loc[:,'High']
low_prices = df.loc[:,'Low']
mid_prices = (high_prices+low_prices)/2

df_train = mid_prices[:int(df.shape[0]*0.7)]
df_test = mid_prices[int(df.shape[0]*0.7):]

print('converted training data has {} rows ie {}%'.format(df_train.shape[0],round(df_train.shape[0]/df.shape[0]*100,0)))
print('converted test data has {} rows ie {}%'.format(df_test.shape[0],round(df_test.shape[0]/df.shape[0]*100,0)))

df_train = np.asmatrix(df_train)
df_test = np.asmatrix(df_test)

In [None]:
scaler = MinMaxScaler()
df_train = df_train.reshape(-1,1)
df_test = df_test.reshape(-1,1)