## Submission Code (Submission Notebook)

### Overview
- **Objective**: Load a trained model from the published Dataset (`/kaggle/input/mitsui-lgbm-models-v1/models`), make predictions on test data, and create a submission file.
- **Outcome**: `/kaggle/working/submission.parquet` (automatically used by the Kaggle API for submission).

### Steps
1. **Data Load**
- Load `test.csv` and apply the same preprocessing as during training.

2. **Load Meta Information**
- Load `meta.json` and recreate `feat_cols` and `label_cols`.

3. **Model Load & Inference**
- Loop through `label_cols` and load the saved LightGBM model (`.pkl`).
- Calculate predictions for the corresponding target.

4. **Constructing submission data**
- `submission = pd.DataFrame({"date_id": test["date_id"], ...})`
- The output format is parquet (`submission.parquet`).

5. **Confirmation code**
- Just before submission, use `submission.head()` to output and check the shape and a portion of the data.
- **CSV output is not required** (Kaggle submissions automatically use parquet).
- Training notebook -> https://www.kaggle.com/code/shunyafukuda/baseline-lgbm-train/

In [1]:
import os, sys, json, warnings, joblib
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import polars as pl
import kaggle_evaluation.mitsui_inference_server

In [2]:
DATA_PATH = "/kaggle/input/mitsui-commodity-prediction-challenge"
MODEL_INPUT_DIR = "/kaggle/input/mitsui-lgbm-models-v1/models" # Replace with your Dataset added by Add data

def preprocess_for_lgbm(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    obj = df.select_dtypes(include=["object"]).columns
    if len(obj) > 0:
        df[obj] = df[obj].apply(pd.to_numeric, errors="coerce")
    for c in df.select_dtypes(include=["category"]).columns:
        df[c] = df[c].cat.codes
    return df

# meta と モデル群をロード
with open(os.path.join(MODEL_INPUT_DIR, "meta.json"), "r") as f:
    meta = json.load(f)
feat_cols  = meta["feat_cols"]
label_cols = meta["label_cols"]

trained_models = {}
for tgt in label_cols:
    pkl = os.path.join(MODEL_INPUT_DIR, f"{tgt}.pkl")
    skp = os.path.join(MODEL_INPUT_DIR, f"{tgt}.skip")
    if os.path.exists(pkl):
        trained_models[tgt] = joblib.load(pkl)
    elif os.path.exists(skp):
        trained_models[tgt] = None
    else:
        trained_models[tgt] = None  

def predict(test_batch: pl.DataFrame | pd.DataFrame, lag1, lag2, lag3, lag4) -> pd.DataFrame:
    if isinstance(test_batch, pl.DataFrame):
        Xb_raw = test_batch.to_pandas()
    else:
        Xb_raw = test_batch
    Xb_raw = preprocess_for_lgbm(Xb_raw)
    Xb = Xb_raw[feat_cols]

    out = {}
    for tgt in label_cols:
        mdl = trained_models.get(tgt)
        if mdl is None:
            out[tgt] = 0.0
        else:
            yhat = mdl.predict(Xb, num_iteration=getattr(mdl, "best_iteration_", None))
            out[tgt] = float(np.asarray(yhat).mean())
    return pd.DataFrame([out], columns=label_cols)

# ===== Generate submission via gateway =====
server = kaggle_evaluation.mitsui_inference_server.MitsuiInferenceServer(predict)
if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    server.serve()  # 本番Submit時
else:
    server.run_local_gateway((DATA_PATH,))  # Local Verification

In [3]:
# # ===== Inspect submission artifact (Parquet -> CSV for inspection) =====
# # Run if necessary to check
# pq_path  = "/kaggle/working/submission.parquet"
# csv_out  = "/kaggle/working/submission_from_parquet.csv"

# sub_pl = pl.read_parquet(pq_path)
# print(f"[OK] Loaded Parquet: {pq_path} shape={sub_pl.shape}")

# # Export CSV for confirmation (not for submission)
# sub_pl.write_csv(csv_out)
# print(f"[OK] Wrote CSV for inspection: {csv_out}")

# sub = sub_pl.to_pandas()
# pd.set_option("display.max_columns", 30)
# sub.head()

[OK] Loaded Parquet: /kaggle/working/submission.parquet shape=(90, 425)
[OK] Wrote CSV for inspection: /kaggle/working/submission_from_parquet.csv


Unnamed: 0,date_id,target_0,target_1,target_2,target_3,target_4,target_5,target_6,target_7,target_8,target_9,target_10,target_11,target_12,target_13,...,target_409,target_410,target_411,target_412,target_413,target_414,target_415,target_416,target_417,target_418,target_419,target_420,target_421,target_422,target_423
0,1827,0.002249,-0.004962,0.003065,-0.00035,-0.002507,-0.002997,0.001531,-0.003299,0.000889,-0.002137,-0.005831,0.002035,0.000513,-0.002803,...,0.002775,0.002001,6.6e-05,0.00335,-0.003952,0.002364,0.004139,-0.017052,-0.006036,0.02103,0.004009,0.001242,0.008168,-0.003741,-0.001046
1,1828,-0.000761,-0.003804,0.000616,0.002976,-0.000451,-0.005634,-0.002855,-0.002105,-0.004473,0.002611,-0.001057,0.003303,0.000526,-0.003612,...,-0.000274,0.002027,-0.000608,0.002704,-0.000899,0.001573,0.004079,-0.006444,-0.002731,0.011609,0.003021,0.000611,-0.001916,0.001838,-0.001389
2,1829,0.002311,-0.006806,0.001511,0.001887,-0.003989,-0.001172,6e-06,-0.00317,-0.000277,2.8e-05,-0.005289,0.001196,0.000513,-0.004471,...,0.004146,0.00121,6.6e-05,0.001988,0.007714,0.002364,0.002013,-0.02392,-0.0017,0.018894,0.004461,0.000199,-0.005802,-0.001915,0.002561
3,1830,0.00026,-0.003855,0.000387,0.001725,0.003172,-0.005349,-0.002258,-0.001362,0.002132,0.002308,0.001227,0.000251,0.000513,-0.003952,...,0.003963,2.9e-05,6.6e-05,0.002959,0.004526,0.002364,-0.00106,-0.019551,0.000603,0.016095,0.002952,0.001242,-0.005879,0.006781,2.2e-05
4,1831,-0.003153,0.007162,-0.003021,-0.003253,-0.002157,-0.00146,0.002266,-0.000351,-0.002743,-0.005616,-0.002417,0.005981,9e-06,0.008758,...,0.003927,0.000772,6.6e-05,0.001584,0.000478,0.002364,-0.003112,-0.020931,0.000588,0.008073,0.001251,0.001242,-0.004273,0.009299,0.001098
