## LightGBMでCAEのサロゲートモデルを作る
[【実践python】AIによるCAEサロゲートモデルの作り方](https://hasimoto-soken.com/%e3%80%90%e5%ae%9f%e8%b7%b5python%e3%80%91ai%e3%81%ab%e3%82%88%e3%82%8bcae%e3%82%b5%e3%83%ad%e3%82%b2%e3%83%bc%e3%83%88%e3%83%a2%e3%83%87%e3%83%ab%e3%81%ae%e4%bd%9c%e3%82%8a%e6%96%b9/)にて利用したコードです。<br>
なるべくシンプルにLightGBMを実装したつもりですが、改善できる個所などあればコメント頂けると嬉しいです。<br>

This is the code used in[【python】How to make a CAE surrogate model by AI](https://hasimoto-soken.com/%e3%80%90%e5%ae%9f%e8%b7%b5python%e3%80%91ai%e3%81%ab%e3%82%88%e3%82%8bcae%e3%82%b5%e3%83%ad%e3%82%b2%e3%83%bc%e3%83%88%e3%83%a2%e3%83%87%e3%83%ab%e3%81%ae%e4%bd%9c%e3%82%8a%e6%96%b9/)<br>
I intend to implement LightGBM as simply as possible, but I would be grateful if you could comment if there are any points that can be improved.<br>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas as pd
import numpy as np

from xgboost import XGBClassifier as XGB
import seaborn as sns

In [None]:
import lightgbm as lgb

In [None]:
data_all = pd.read_csv("../input/210718-cae-resultcsv/210718_cae_result.csv")
data_all

In [None]:
data_all.describe()

In [None]:
train = data_all[:80]
test = data_all[80:]

In [None]:
# 学習データ
train_x = train[["fai","L","d","x"]]
train_y = train[["P"]]

# テストデータ
test_x = test[["fai","L","d","x"]]
test_y = test[["P"]]

In [None]:
%%time
params = {
    "objective" : "regression",
    "metric" : "rmse",
    "num_leaves" : 40,
    "learning_rate" : 0.01,
    "bagging_fraction" : 0.8,
    "feature_fraction" : 0.4,
    "bagging_frequency" : 6,
    "bagging_seed" : 42,
    "verbosity" : -1,
    "seed": 42
}

train_lgb = lgb.Dataset(train_x, label=train_y)
test_lgb = lgb.Dataset(test_x, label=test_y)
evals_result = {}
model_lgb = lgb.train(params, train_lgb, 10000, 
                  valid_sets=[train_lgb, test_lgb], 
                  early_stopping_rounds=200, 
                  verbose_eval=1000, 
                  evals_result=evals_result)

In [None]:
pred_lgb = model_lgb.predict(test_x)
pred_lgb

In [None]:
columns = ["pred_y"]
pred_d = pd.DataFrame(data=pred_lgb, columns = columns)
test_y_reindex = test_y.reset_index(drop=True)

sns.scatterplot(x=test_y_reindex['P'], y=pred_d['pred_y'])
sns.scatterplot(x=test_y_reindex['P'], y=test_y_reindex['P']) #perfect fitting line

In [None]:
# 特徴量重要度の算出 (データフレームで取得)

# 特徴量名のリスト(目的変数CRIM以外)
# 特徴量重要度の算出方法 'gain'(推奨) : トレーニングデータの損失の減少量を評価
cols = list(train_x.columns)
cols_df = pd.DataFrame(cols)

# 特徴量重要度の算出 //
f_importance = np.array(model_lgb.feature_importance(importance_type='gain'))
# 正規化(必要ない場合はコメントアウト)
f_importance = f_importance / np.sum(f_importance)
f_importance_df = pd.DataFrame(f_importance)
df_importance = cols_df.join(f_importance_df,lsuffix='_features', rsuffix='_importance')
# 降順ソート
df_importance = df_importance.sort_values('0_importance', ascending=False)
df_importance

In [None]:
# 予測したい値を入れる
list_new = [[103,510,24,12],[101,505,22,11]]
test_new = pd.DataFrame(list_new)
test_new.columns = ["fai","L","d","x"] 
test_new

In [None]:
pred_lgb_new = model_lgb.predict(test_new)
test_new["pred"] = pred_lgb_new
test_new