## Tokyo Stock Market Prediction with CatBoost v2

Thanks for Swimmy's great notebook [LGBM Opt Model JPX](https://www.kaggle.com/code/swimmy/lgbm-opt-model-jpx) that show a possible way to gain good profit in stock market. Now I will create a Tokyo Stock Exchange Prediction Model using CatBoost and same data preprocessing method.

I plan to update the model in a timely manner. 
For the current situation, I referred to the following.

# if it is useful for you, please vote.

* https://www.kaggle.com/code/swimmy/lgbm-opt-model-jpx
* https://www.kaggle.com/code/realneuralnetwork/jpx-lgbm-model-overfitting-high-score
* https://www.kaggle.com/code/paulorzp/median-model-jpx
* https://www.kaggle.com/code/realneuralnetwork/jpx-lgbm-model-overfitting-high-score

# Needs suppression as it is likely to be overfitted.

In [None]:
import numpy as np
import pandas as pd
import jpx_tokyo_market_prediction
from catboost import CatBoostRegressor
import optuna.integration.lightgbm as lgb
import matplotlib.pyplot as plt
import tensorflow as tf
import warnings
warnings.filterwarnings("ignore")

## Loading data

In [None]:
prices = pd.read_csv("../input/jpx-tokyo-stock-exchange-prediction/supplemental_files/stock_prices.csv")

In [None]:
NDAYS = 34
lastdays = prices[prices["Date"]>=prices.Date.iat[-2000*NDAYS]].reset_index(drop=True)

In [None]:
lastdays = pd.DataFrame(prices.groupby("SecuritiesCode").Target.mean())
def get_avg(_id_):
    return lastdays.loc[_id_]
prices["Avg"] = prices["SecuritiesCode"].apply(get_avg)

In [None]:
prices.Date = pd.to_datetime(prices.Date)
prices['Date'] = prices['Date'].dt.strftime("%Y%m%d").astype(int)
X=prices[["Date","SecuritiesCode","Avg"]]
y=prices[["Target"]]
codes = X.SecuritiesCode.unique()
X.head()

## Modeling

In [None]:
import optuna

def objectives(trial):
    # optunaでのハイパーパラメータサーチ範囲の設定
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
        'max_bin': trial.suggest_int('max_bin', 2, 100),
        'learning_rate': trial.suggest_uniform('learning_rate',0, 1),
        'verbose': 1000
    }

    model = CatBoostRegressor(**params)
    model.fit(X,y)
    score = model.score(X,y)
    print("Score:", score)
    return score

In [None]:
opt = optuna.create_study(direction='minimize',sampler=optuna.samplers.RandomSampler(seed=0))
opt.optimize(objectives, n_trials=20)
trial = opt.best_trial

In [None]:
best_param = dict(trial.params.items())
model = CatBoostRegressor(**best_param)
model.fit(X,y)

## Submission

In [None]:
env = jpx_tokyo_market_prediction.make_env()
iter_test = env.iter_test()

for (prices, options, financials, trades, secondary_prices, sample_prediction) in iter_test:
    ds=[prices, options, financials, trades, secondary_prices, sample_prediction]
    sample_prediction["Avg"] = sample_prediction["SecuritiesCode"].apply(get_avg)
    df = sample_prediction[["Date","SecuritiesCode","Avg"]]
    df.Date = pd.to_datetime(df.Date)
    df['Date'] = df['Date'].dt.strftime("%Y%m%d").astype(int)
    sample_prediction["Prediction"] = model.predict(df)
    sample_prediction = sample_prediction.sort_values(by = "Prediction", ascending=False)
    sample_prediction.Rank = np.arange(0,2000)
    sample_prediction = sample_prediction.sort_values(by = "SecuritiesCode", ascending=True)
    sample_prediction.drop(["Prediction"],axis=1)
    submission = sample_prediction[["Date","SecuritiesCode","Rank"]]
    env.predict(submission)