<h2>Hello. This time, I tried optuna.
I use <a href="https://www.kaggle.com/code/paulorzp/jpx-simple-overfitting-model">this notebook</a> as a baseline.
I hope you enjoy reading this notebook and leave some comments.</h2>

<h2>I updated this notebook with feature engineering ,referring to <a href="https://www.kaggle.com/code/lucasmorin/jpx-online-feature-engineering-prices">this notebook</a>, and succeeded in improving public score!!</h2>

# Import

In [None]:
import numpy as np
import pandas as pd
import jpx_tokyo_market_prediction

In [None]:
path = "../input/jpx-tokyo-stock-exchange-prediction/"
prices = pd.read_csv(f"{path}supplemental_files/stock_prices.csv")

In [None]:
def prep_prices(prices):
    prices.Date = pd.to_datetime(prices.Date).view(int)
    prices["Volume"].fillna(1,inplace=True)
    prices.fillna(0,inplace=True)
    return prices

In [None]:
prices = prep_prices(prices)

In [None]:
prices.head()

# Split data and feature engineering

In [None]:
Base_Features  = ['Side','ret_H','ret_L','ret','ret_Div','log_Dollars','GK_sqrt_vol','RS_sqrt_vol']

In [None]:
prices.describe()

In [None]:
diff = prices["High"]-prices["Low"]

In [None]:
diff.min()

In [None]:
def Base_FE(df):
    
    df['Avg_Price'] = (df['Close']+df['Open'])/2
    df['Avg_Price_HL'] = (df['High']+df['Low'])/2
    df['Side'] = 2*(df['Avg_Price']-df['Avg_Price_HL'])/(df['High']-df['Low']+1)
    
    df['ret_H'] = df['High']/(df['Open']+1)
    df['ret_L'] = df['Low']/(df['Open']+1)
    df['ret'] = df['Close']/(df['Open']+1)
    df['ret_Div'] = df['ExpectedDividend']/(df['Open']+1)
    
    df['log_Dollars'] = np.log(df['Avg_Price']*df['Volume'])
    
    df['GK_sqrt_vol'] = np.sqrt((1 / 2 * np.log(df['High']/(df['Low']+1)) ** 2 - (2 * np.log(2) - 1) * np.log(df['Close'] / (df['Open'])+1) ** 2))
    df['RS_sqrt_vol'] = np.sqrt(np.log(df['High']/(df['Close']+1))*np.log(df['High']/(df['Open']+1)) + np.log(df['Low']/(df['Close']+1))*np.log(df['Low']/(df['Open']+1)))
    
    df[Base_Features] = df[Base_Features].astype('float32')
    
    return df

In [None]:
prices = Base_FE(prices)

In [None]:
X = prices[Base_Features]
y = prices["Target"]

In [None]:
prices.head()

# Try Optuna

<h4>As it took too much time, I commented out optuna in this version.</h4>

In [None]:
from sklearn.ensemble import RandomForestRegressor
#from sklearn.model_selection import cross_val_score
#import optuna

In [None]:
def objective(trial):
    max_depth = trial.suggest_int('max_depth', 1, 1000)
    
    regr = RandomForestRegressor(max_depth = max_depth, n_jobs=2)
    score = cross_val_score(regr, X, y, cv=5, scoring="r2")
    r2_mean = score.mean()
    return r2_mean

**I just optimized "max_depth", but you may improve your score by optimizing more hyperparameters. (Though it will take more time.)**

In [None]:
#study = optuna.create_study(direction='maximize')
#study.optimize(objective, timeout = 60)

#model = RandomForestRegressor(max_depth = study.best_params['max_depth'], n_jobs=2)
model = RandomForestRegressor(max_depth = 200, n_jobs=2)
model.fit(X,y)

**I set running time to 60 seconds, but you may also improve your score by changing this setting.**

In [None]:
env = jpx_tokyo_market_prediction.make_env()
iter_test = env.iter_test()

for (prices, options, financials, trades, secondary_prices, sample_prediction) in iter_test:
    prices = prep_prices(prices)
    sample_prediction["Prediction"] = model.predict(prices[Base_Features])
    sample_prediction["rate"] = sample_prediction["Prediction"]/prices["Volume"]
    sample_prediction.sort_values(by = "rate", ascending=False, inplace=True)
    sample_prediction.Rank = np.arange(0,2000)
    sample_prediction.sort_values(by = "SecuritiesCode", ascending=True, inplace=True)
    submission = sample_prediction[["Date","SecuritiesCode","Rank"]]
    env.predict(submission)

**<h3>That's all. Thank you for your reading!!! I would be glad if you upvote this notebook.</h3>**