# FDS Challenge: Starter Notebook

This notebook will guide you through the first steps of the competition. Our goal here is to show you how to:

1.  Load the `train.jsonl` and `test.jsonl` files from the competition data.
2.  Create a very simple set of features from the data.
3.  Train a basic model.
4.  Generate a `submission.csv` file in the correct format.
5.  Submit your results.

Let's get started!

In [None]:
from typing import Any
import json
import os
from pprint import pprint
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, cross_validate

pd.set_option("display.max_columns", 0)

### 1. Loading and Inspecting the Data

When you create a notebook within a Kaggle competition, the competition's data is automatically attached and available in the `../input/` directory.

The dataset is in a `.jsonl` format, which means each line is a separate JSON object. This is great because we can process it one line at a time without needing to load the entire large file into memory.

Let's write a simple loop to load the training data and inspect the first battle.

In [38]:
COMPETITION_NAME = 'fds-pokemon-battles-prediction-2025'
DATA_PATH = os.getcwd() #os.path.join('../input', COMPETITION_NAME)
UNTOUCHED = {'battle_id', 'player_won'}
INFORMATIVE = {
    "p1_unique_pokemon",
    "p2_unique_pokemon",
    "final_p1_hp",  
    "p1_fainted_count", 
    "p1_turns_statused", 
    "p1_missed_turns", 
    "p2_turns_statused",
    "p2_missed_turns", 
    "battle_id",
    "player_won"
}

train_file_path = os.path.join(DATA_PATH, 'train.jsonl')
test_file_path = os.path.join(DATA_PATH, 'test.jsonl')

print(f"Loading data from '{train_file_path}'...")
try:
    with open(train_file_path, 'r', encoding="utf-8") as f:
        train_data = [json.loads(line) for line in f]

    print(f"Successfully loaded {len(train_data)} battles.")

    #print("\n--- Structure of the first train battle: ---")
    if train_data:
        first_battle = train_data[0]
        
        battle_for_display = first_battle.copy()
        battle_for_display['battle_timeline'] = battle_for_display.get('battle_timeline', [])[:2] # Show first 2 turns
        
        #pprint(battle_for_display)
        if len(first_battle.get('battle_timeline', [])) > 3:
            print("    ...")
            print("    (battle_timeline has been truncated for display)")

except FileNotFoundError:
    print(f"ERROR: Could not find the training file at '{train_file_path}'.")
    print("Please make sure you have added the competition data to this notebook.")

Loading data from 'c:\Users\stefa\PycharmProjects\pokemon-challenge\train.jsonl'...
Successfully loaded 10000 battles.
    ...
    (battle_timeline has been truncated for display)


### 2. Basic Feature Engineering

A successful model will likely require creating many complex features. For this starter notebook, however, we will create a very simple feature set based **only on the initial team stats**. This will be enough to train a model and generate a submission file.

It's up to you to engineer more powerful features!

In [39]:
def features_check(data: dict) -> None:
    print("All battles have at least one turn: ", all(all(turn for turn in battle.get('battle_timeline', False)) for battle in data))
    print("All battles' turns have at least one P1 move: ", 
        all((
            any((turn.get("p1_move_details", False) for turn in battle.get('battle_timeline', False))) for battle in data
        ))
    )
    print("All battles' turns have at least one P2 move: ", 
        all((
            any((turn.get("p2_move_details", False) for turn in battle.get('battle_timeline', False))) for battle in data
        ))
    )
    print("player_won feature always exists: ", all(('player_won' in battle for battle in data)))
    print("P1 Team always exists: ", all(battle.get('p1_team_details', False) for battle in data))
    print("P2 Team always exists: ", all(battle.get('p2_team_details', False) for battle in data))
    
    return None

In [40]:
def agg_pokemons_stats(prefix: str, stats: dict[str, Any]):
    return {
        f"{prefix}_mean_power": np.mean(stats["powers"]) if stats["powers"] else 0,
        f"{prefix}_mean_accuracy": np.mean(stats["accuracy"]) if stats["accuracy"] else 0,
        f"{prefix}_lost_hp": stats["lost_hp"],
        f"{prefix}_turns_statused": stats["turns_statused"],
        f"{prefix}_missed_turns": stats["missed_turns"],
        f"{prefix}_switches": stats["switches"],
        f"{prefix}_net_boost": stats["net_boost"],
    }

In [41]:
def create_features(data: list[dict]) -> pd.DataFrame:
    feature_list = []
    
    features_check(data)

    for battle in data:
        features = {}

        p1_stats = {
            "powers": [], "accuracy": [], "hp_t0": {}, "lost_hp": 0, "turns_statused": 0,
            "missed_turns": 0, "priority": 0, "switches": 0, "net_boost": 0,
            "base_boosts": {"atk": 0, "def": 0, "spa": 0, "spd": 0, "spe": 0}
        }
        
        p2_stats = {
            "powers": [], "accuracy": [], "hp_t0": {}, "lost_hp": 0, "turns_statused": 0,
            "missed_turns": 0, "priority": 0, "switches": 0, "net_boost": 0,
            "base_boosts": {"atk": 0, "def": 0, "spa": 0, "spd": 0, "spe": 0}
        }

        # --- Initial Pokémon states ---
        first_turn = battle["battle_timeline"][0]
        p1_lead = first_turn.get("p1_pokemon_state", {}).get("name", "")
        p2_lead = battle.get("p2_lead_details", {}).get("name", "")
        p1_prev_hp = first_turn.get("p1_pokemon_state", {}).get("hp_pct", 1.0)
        p2_prev_hp = first_turn.get("p2_pokemon_state", {}).get("hp_pct", 1.0)
        
        # --- Player 1 Team Features ---
        p1_team = battle.get('p1_team_details', None)
        features['p1_mean_hp'] = np.mean([p.get('base_hp') for p in p1_team])
        features['p1_mean_spe'] = np.mean([p.get('base_spe') for p in p1_team])
        features['p1_mean_atk'] = np.mean([p.get('base_atk') for p in p1_team])
        features['p1_mean_def'] = np.mean([p.get('base_def') for p in p1_team])

        # --- Player 2 Lead Features ---
        if p2_lead := battle.get('p2_lead_details'):
            # Player 2's lead Pokémon's stats
            features['p2_lead_hp'] = p2_lead.get('base_hp')
            features['p2_lead_spe'] = p2_lead.get('base_spe')
            features['p2_lead_atk'] = p2_lead.get('base_atk')
            features['p2_lead_def'] = p2_lead.get('base_def')
        
        # --- Battle Timeline Features ---
        if timeline := battle.get('battle_timeline', []):
            p1_names = [t['p1_pokemon_state']['name'] for t in timeline if t.get('p1_pokemon_state')]
            p1_moves = [t['p1_move_details']['name'] for t in timeline if t.get('p1_move_details')]
            p2_names = [t['p2_pokemon_state']['name'] for t in timeline if t.get('p2_pokemon_state')]

            # Unique Pokémons
            features['p1_unique_pokemon'] = len(set(p1_names))
            #features['p1_unique_moves'] = len(set(p1_moves))
            features['p2_unique_pokemon'] = len(set(p2_names))

            # Compute damage dealt (approximate)
            # delta HP of opponent between turns
            p2_hp_deltas = []
            for t, t_stats in enumerate(timeline):
                p1_state = t_stats.get("p1_pokemon_state", {})
                p2_state = t_stats.get("p2_pokemon_state", {})

                # --- Moves and accuracy ---
                for player, stats, move_key in [
                    ("p1", p1_stats, "p1_move_details"),
                    ("p2", p2_stats, "p2_move_details")
                ]:
                    move = t_stats.get(move_key)
                    if move:
                        stats["powers"].append(move.get("base_power", 0))
                        stats["accuracy"].append(move.get("accuracy", 0))
                    else:
                        stats["missed_turns"] += 1

                # --- Status tracking ---
                if p1_state.get("status") != "nostatus":
                    p1_stats["turns_statused"] += 1
                if p2_state.get("status") != "nostatus":
                    p2_stats["turns_statused"] += 1
                
                # --- HP and damage tracking ---
                p1_name, p2_name = p1_state.get("name", ""), p2_state.get("name", "")
                p1_hp, p2_hp = p1_state.get("hp_pct", 1.0), p2_state.get("hp_pct", 1.0)

                # --- Switches ---
                if p1_name != p1_lead and p1_prev_hp > 0:
                    p1_stats["switches"] += 1

                if p2_name != p2_lead and p2_prev_hp > 0:
                    p2_stats["switches"] += 1

                # --- Boost tracking ---
                if p1_name != p1_lead:
                    p1_stats["base_boosts"] = {k: 0 for k in p1_stats["base_boosts"]}
                if p2_name != p2_lead:
                    p2_stats["base_boosts"] = {k: 0 for k in p2_stats["base_boosts"]}

                p1_boosts = p1_state.get("boosts", {})
                p2_boosts = p2_state.get("boosts", {})

                for stat in ["atk", "def", "spa", "spd", "spe"]:
                    p1_stats["net_boost"] += (p1_boosts.get(stat, 0) - p1_stats["base_boosts"].get(stat, 0))
                    p2_stats["net_boost"] += (p2_boosts.get(stat, 0) - p2_stats["base_boosts"].get(stat, 0))
                
                prev_hp = timeline[t-1]['p2_pokemon_state']['hp_pct']
                curr_hp = timeline[t]['p2_pokemon_state']['hp_pct']
                p2_hp_deltas.append(prev_hp - curr_hp)
            features['mean_damage_dealt'] = np.mean([d for d in p2_hp_deltas if d > 0]) if p2_hp_deltas else None

            # Final HP and KO counts
            last_state = timeline[-1]['p1_pokemon_state']
            features['final_p1_hp'] = last_state.get('hp_pct', None)
            features['p1_fainted_count'] = sum(t['p1_pokemon_state']['status'] == 'fnt' for t in timeline)
            features['p2_fainted_count'] = sum(t['p2_pokemon_state']['status'] == 'fnt' for t in timeline)
            
        else:
            features.update({
                'p1_unique_pokemon': None,
                'p1_unique_moves': None,
                'p2_unique_pokemon': None,
                'mean_damage_dealt': None,
                'final_p1_hp': None,
                'p1_fainted_count': None,
                'p2_fainted_count': None,
            })

        features.update(agg_pokemons_stats("p1", p1_stats))
        features.update(agg_pokemons_stats("p2", p2_stats))


        features['battle_id'] = battle.get('battle_id')
        if 'player_won' in battle:
            features['player_won'] = int(battle['player_won'])
            
        feature_list.append(features)
        
    return pd.DataFrame(feature_list) #.fillna(0) #NOTE accuracy improvement of 0.01

In [42]:
print("Processing training data...")
train_df = create_features(train_data)

print("\nProcessing test data...")
with open(test_file_path, 'r', encoding="utf-8") as f:
    test_data = [json.loads(line) for line in f]
        
test_df = create_features(test_data)

Processing training data...
All battles have at least one turn:  True
All battles' turns have at least one P1 move:  False
All battles' turns have at least one P2 move:  False
player_won feature always exists:  True
P1 Team always exists:  True
P2 Team always exists:  False


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)



Processing test data...
All battles have at least one turn:  True
All battles' turns have at least one P1 move:  False
All battles' turns have at least one P2 move:  False
player_won feature always exists:  False
P1 Team always exists:  True
P2 Team always exists:  False


In [43]:

#train_df = train_df.drop(columns=)
#test_df = test_df.drop(columns=)
#train_df.corr()

In [44]:
keepers = list(set(train_df.columns).difference(UNTOUCHED))

In [45]:
scaler = StandardScaler(with_mean=True, with_std=True)

train_df[keepers] = scaler.fit_transform(train_df[keepers])
test_df[keepers] = scaler.fit_transform(test_df[keepers])

In [46]:
print("\nTraining dataset preview:")
display(train_df.head())
display(train_df.describe())
display(train_df.dtypes)

print("\nTesting dataset preview:")
display(test_df.head())
display(test_df.describe())
display(test_df.dtypes)


Training dataset preview:


Unnamed: 0,p1_mean_hp,p1_mean_spe,p1_mean_atk,p1_mean_def,p2_lead_hp,p2_lead_spe,p2_lead_atk,p2_lead_def,p1_unique_pokemon,p2_unique_pokemon,mean_damage_dealt,final_p1_hp,p1_fainted_count,p2_fainted_count,p1_mean_power,p1_mean_accuracy,p1_lost_hp,p1_turns_statused,p1_missed_turns,p1_switches,p1_net_boost,p2_mean_power,p2_mean_accuracy,p2_lost_hp,p2_turns_statused,p2_missed_turns,p2_switches,p2_net_boost,battle_id,player_won
0,0.202093,0.520813,-0.732064,-0.745443,-0.279683,0.490498,0.642099,1.185993,-1.404495,-1.33184,-0.308829,-0.821146,-0.956146,1.113808,-0.118412,-0.823055,0.0,-1.089872,-1.109216,-0.142418,-0.135324,0.446467,0.785598,0.0,1.042992,2.443488,0.093138,-0.31842,0,1
1,0.761596,-1.738011,-0.732064,-0.492591,-0.450188,0.723525,-0.711192,-0.611967,0.91852,0.950271,-1.393217,-0.382753,0.405885,-0.60027,1.450473,0.172684,0.0,-0.341616,0.192108,0.039076,-0.135324,0.170319,0.322559,0.0,-1.090316,0.158909,0.093138,-0.31842,1,1
2,0.823762,-1.224642,0.906915,0.097399,6.199509,-2.538851,-3.147117,-2.409928,-2.566002,-1.33184,-0.661738,-0.189724,-0.956146,-0.60027,-1.285978,-0.326265,0.0,0.40664,-1.109216,-1.594365,0.308258,-0.30951,-0.358604,0.0,0.509665,0.485277,0.093138,-0.395747,2,1
3,0.637262,0.007444,-0.029644,-0.492591,0.231832,0.257471,1.99539,1.635483,-0.242987,-1.33184,0.158946,-1.513355,0.405885,-0.60027,0.404353,-0.060591,0.0,1.34196,0.192108,0.765049,-0.135324,1.549122,-0.440752,0.0,-1.090316,-0.493828,0.093138,-0.163767,3,1
4,0.07776,-0.403251,-0.263784,0.855957,-0.279683,0.490498,0.642099,1.185993,-0.242987,-0.190785,0.321136,1.133907,-0.956146,-0.60027,-1.299584,0.906155,0.0,-0.715744,-0.783885,-1.049885,-0.312756,-0.686812,0.363574,0.0,1.93187,-0.820197,0.093138,-0.163767,4,1


Unnamed: 0,p1_mean_hp,p1_mean_spe,p1_mean_atk,p1_mean_def,p2_lead_hp,p2_lead_spe,p2_lead_atk,p2_lead_def,p1_unique_pokemon,p2_unique_pokemon,mean_damage_dealt,final_p1_hp,p1_fainted_count,p2_fainted_count,p1_mean_power,p1_mean_accuracy,p1_lost_hp,p1_turns_statused,p1_missed_turns,p1_switches,p1_net_boost,p2_mean_power,p2_mean_accuracy,p2_lost_hp,p2_turns_statused,p2_missed_turns,p2_switches,p2_net_boost,battle_id,player_won
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,9988.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,-4.185097e-16,-9.471535e-16,7.389644e-17,3.140599e-16,5.115908e-17,1.605827e-16,1.012523e-16,7.105426999999999e-19,-1.136868e-16,-2.842171e-16,-3.083903e-16,-2.046363e-16,1.136868e-17,-1.527667e-17,-1.548983e-16,1.150369e-15,0.0,1.136868e-17,-1.364242e-16,-1.747935e-16,1.49214e-17,1.620037e-16,-1.056577e-15,0.0,-9.094947000000001e-17,1.364242e-16,-3.069545e-16,7.105427e-18,4999.5,0.5
std,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,0.0,1.00005,1.00005,1.00005,1.00005,1.00005,1.00005,0.0,1.00005,1.00005,1.00005,1.00005,2886.89568,0.500025
min,-3.714422,-3.58614,-3.073463,-2.262559,-0.620693,-3.470958,-3.147117,-2.409928,-4.889016,-4.755008,-3.123789,-1.623657,-1.637162,-0.6002704,-3.124315,-25.66253,0.0,-2.39932,-2.085208,-4.135274,-2.708099,-3.154841,-24.70957,0.0,-1.979194,-2.12567,-10.73681,-2.483573,0.0,0.0
25%,-0.2952418,-0.8139467,-0.7320642,-0.7454431,-0.4501879,-0.4416091,-0.7111924,-0.6119674,-0.2429873,-0.1907845,-0.6751185,-0.8488874,-0.956146,-0.6002704,-0.6608279,-0.4256231,0.0,-0.7157443,-0.7838847,-0.3239114,-0.2240399,-0.6610847,-0.4070766,0.0,-0.7347645,-0.8201966,0.09313754,-0.2410936,2499.75,0.0
50%,0.2642603,0.007443852,-0.2637844,-0.1554538,-0.2796828,0.4904982,0.1007823,0.06226787,-0.2429873,-0.1907845,-0.07287296,0.196335,-0.2751303,-0.6002704,0.04948732,0.2139934,0.0,0.03251172,-0.133223,0.220569,-0.1353235,0.06870744,0.1939377,0.0,-0.02366194,-0.1674597,0.09313754,-0.1637667,4999.5,0.5
75%,0.6372618,0.5208129,0.5557053,0.5188198,-0.1091778,0.723525,0.6420988,1.185993,0.91852,0.9502713,0.5653945,1.133907,0.4058853,1.113808,0.7220467,0.6812239,0.0,0.7807677,0.5174387,0.5835559,-0.1353235,0.7192934,0.6389051,0.0,0.6874406,0.4852773,0.09313754,-0.1637667,7499.25,1.0
max,1.694099,4.319744,4.62974,4.227324,6.199509,1.189579,3.835866,5.45615,0.91852,0.9502713,7.259643,1.133907,2.448932,6.256045,3.901601,1.164103,0.0,3.2126,7.674717,1.128036,14.23673,4.33588,1.108321,0.0,2.998524,7.665383,0.09313754,13.13646,9999.0,1.0


p1_mean_hp           float64
p1_mean_spe          float64
p1_mean_atk          float64
p1_mean_def          float64
p2_lead_hp           float64
p2_lead_spe          float64
p2_lead_atk          float64
p2_lead_def          float64
p1_unique_pokemon    float64
p2_unique_pokemon    float64
mean_damage_dealt    float64
final_p1_hp          float64
p1_fainted_count     float64
p2_fainted_count     float64
p1_mean_power        float64
p1_mean_accuracy     float64
p1_lost_hp           float64
p1_turns_statused    float64
p1_missed_turns      float64
p1_switches          float64
p1_net_boost         float64
p2_mean_power        float64
p2_mean_accuracy     float64
p2_lost_hp           float64
p2_turns_statused    float64
p2_missed_turns      float64
p2_switches          float64
p2_net_boost         float64
battle_id              int64
player_won             int64
dtype: object


Testing dataset preview:


Unnamed: 0,p1_mean_hp,p1_mean_spe,p1_mean_atk,p1_mean_def,p2_lead_hp,p2_lead_spe,p2_lead_atk,p2_lead_def,p1_unique_pokemon,p2_unique_pokemon,mean_damage_dealt,final_p1_hp,p1_fainted_count,p2_fainted_count,p1_mean_power,p1_mean_accuracy,p1_lost_hp,p1_turns_statused,p1_missed_turns,p1_switches,p1_net_boost,p2_mean_power,p2_mean_accuracy,p2_lost_hp,p2_turns_statused,p2_missed_turns,p2_switches,p2_net_boost,battle_id
0,0.347635,0.297212,-0.497947,-0.937347,-0.111797,1.206605,0.120196,0.067841,-0.25292,-0.210231,0.358433,1.132883,1.082543,-0.602802,-0.069245,0.803228,0.0,1.15507,1.16616,0.216933,-0.147568,0.672315,-0.350403,0.0,1.0109,-0.172866,0.097413,1.52641,0
1,-3.135315,2.448247,2.49983,2.635469,-0.437793,0.730007,-0.697689,-0.607419,-1.414171,0.926151,-1.763069,1.132883,-0.957938,-0.602802,-0.312162,-2.03769,0.0,-1.600432,-1.116337,0.952675,3.230791,1.285556,1.15332,0.0,0.313102,3.749966,0.097413,-0.147762,1
2,0.531594,-1.751393,1.825911,1.784799,-0.437793,0.730007,-0.697689,-0.607419,-0.25292,0.926151,-2.366457,1.132883,-0.957938,-0.602802,-1.296566,-3.555448,0.0,-2.151533,-0.464195,0.400869,-0.147568,-0.48058,-0.290254,0.0,1.0109,-0.499769,0.097413,0.354489,2
3,0.102357,-0.52223,-1.078912,0.083458,2.985166,-3.559383,2.573852,0.292928,-2.575421,-0.210231,0.457273,-0.736347,-1.638098,-0.602802,-1.357988,0.803228,0.0,1.15507,-1.442408,-0.150937,-0.312366,0.683265,0.707772,0.0,1.0109,-0.172866,0.097413,-0.31518,3
4,0.286316,0.297212,-0.381754,-0.512012,-0.274795,0.253408,0.120196,0.067841,-0.25292,0.926151,1.211936,-1.094245,0.402383,1.163909,0.755483,0.528891,0.0,-0.314531,-0.138124,0.032998,-0.229967,0.208003,-0.684564,0.0,-0.384696,-0.172866,0.097413,-0.231471,4


Unnamed: 0,p1_mean_hp,p1_mean_spe,p1_mean_atk,p1_mean_def,p2_lead_hp,p2_lead_spe,p2_lead_atk,p2_lead_def,p1_unique_pokemon,p2_unique_pokemon,mean_damage_dealt,final_p1_hp,p1_fainted_count,p2_fainted_count,p1_mean_power,p1_mean_accuracy,p1_lost_hp,p1_turns_statused,p1_missed_turns,p1_switches,p1_net_boost,p2_mean_power,p2_mean_accuracy,p2_lost_hp,p2_turns_statused,p2_missed_turns,p2_switches,p2_net_boost,battle_id
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,4998.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,-9.123369e-16,1.355005e-15,3.637979e-16,-7.560175e-16,-4.8316910000000003e-17,-3.467449e-16,-2.2737370000000003e-17,-9.166001e-17,-5.002221e-16,4.774847e-16,-1.734418e-16,1.136868e-17,0.0,2.557954e-17,3.836931e-17,1.720935e-15,0.0,1.364242e-16,-1.818989e-16,1.847411e-16,5.684342e-18,3.581135e-16,-1.016076e-15,0.0,4.5474740000000006e-17,-1.136868e-16,-3.1263880000000006e-17,-5.684342e-18,2499.5
std,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,0.0,1.0001,1.0001,1.0001,1.0001,1.0001,1.0001,0.0,1.0001,1.0001,1.0001,1.0001,1443.520003
min,-3.638136,-3.595138,-2.821806,-2.213353,-0.6007913,-3.559383,-3.151344,-2.408115,-4.897923,-4.755759,-2.829711,-1.615985,-1.638098,-0.6028019,-3.120609,-25.20398,0.0,-2.335233,-2.09455,-4.197516,-2.207544,-3.182349,-25.57953,0.0,-1.954741,-2.134282,-10.26562,-5.002863,0.0
25%,-0.3268798,-0.8295211,-0.7303329,-0.7672129,-0.4377932,-0.4614907,-0.6976887,-0.6074195,-0.2529204,-0.2102307,-0.6672999,-0.8463018,-0.957938,-0.6028019,-0.6716449,-0.417459,0.0,-0.6819316,-0.7902658,-0.3348728,-0.2299674,-0.6592964,-0.4263489,0.0,-0.7335948,-0.4997688,0.09741252,-0.2314711,1249.75
50%,0.2863157,0.09235111,-0.2655612,-0.1717436,-0.2747952,0.4917071,0.1201964,0.06784119,-0.2529204,-0.2102307,-0.0948633,0.1982678,-0.277778,-0.6028019,0.04031045,0.2106589,0.0,0.0528689,-0.1381237,0.2169334,-0.1475684,0.08097413,0.1909369,0.0,-0.03579702,-0.1728661,0.09741252,-0.1477625,2499.5
75%,0.654233,0.5020721,0.5477892,0.5087928,-0.1117971,0.7300065,0.6654531,1.193276,0.9083302,0.9261513,0.5626491,1.132883,0.402383,1.163909,0.7201829,0.6744088,0.0,0.7876694,0.5140184,0.5848042,-0.1475684,0.7336387,0.6672677,0.0,0.6620007,0.4809392,0.09741252,-0.1477625,3749.25
max,1.696665,5.009003,3.894145,4.251743,5.919131,1.206605,3.882468,5.469927,0.9083302,0.9261513,6.024642,1.132883,2.442864,6.464043,3.567549,1.132433,0.0,2.808371,7.687581,1.13661,14.02506,3.105814,1.15332,0.0,3.104293,7.672798,0.09741252,14.25012,4999.0


p1_mean_hp           float64
p1_mean_spe          float64
p1_mean_atk          float64
p1_mean_def          float64
p2_lead_hp           float64
p2_lead_spe          float64
p2_lead_atk          float64
p2_lead_def          float64
p1_unique_pokemon    float64
p2_unique_pokemon    float64
mean_damage_dealt    float64
final_p1_hp          float64
p1_fainted_count     float64
p2_fainted_count     float64
p1_mean_power        float64
p1_mean_accuracy     float64
p1_lost_hp           float64
p1_turns_statused    float64
p1_missed_turns      float64
p1_switches          float64
p1_net_boost         float64
p2_mean_power        float64
p2_mean_accuracy     float64
p2_lost_hp           float64
p2_turns_statused    float64
p2_missed_turns      float64
p2_switches          float64
p2_net_boost         float64
battle_id              int64
dtype: object

### 3. Training Models

In [None]:
# Define predictor features (X) and target (y)
X_train = train_df[keepers]
print(train_df.columns)
y_train = train_df['player_won']

X_test = test_df[keepers]

print("Training...")
model = XGBClassifier(
    random_state=100,
    n_estimators=200,
    learning_rate=0.05,
    max_depth=3,
    eval_metric='logloss',
    n_jobs=-1
)
model = CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6, verbose=0)
model.fit(X_train, y_train)
print("Model training complete.")

Index(['p1_mean_hp', 'p1_mean_spe', 'p1_mean_atk', 'p1_mean_def', 'p2_lead_hp',
       'p2_lead_spe', 'p2_lead_atk', 'p2_lead_def', 'p1_unique_pokemon',
       'p2_unique_pokemon', 'mean_damage_dealt', 'final_p1_hp',
       'p1_fainted_count', 'p2_fainted_count', 'p1_mean_power',
       'p1_mean_accuracy', 'p1_lost_hp', 'p1_turns_statused',
       'p1_missed_turns', 'p1_switches', 'p1_net_boost', 'p2_mean_power',
       'p2_mean_accuracy', 'p2_lost_hp', 'p2_turns_statused',
       'p2_missed_turns', 'p2_switches', 'p2_net_boost', 'battle_id',
       'player_won'],
      dtype='object')
Training...
Model training complete.


In [49]:
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)
cv_results = cross_validate(
    model,
    X_train,
    y_train,
    cv=cv,
    scoring={
        "accuracy_score": make_scorer(accuracy_score),
        "precision_score": make_scorer(precision_score),
        "recall_score": make_scorer(recall_score),
        "f1_score": make_scorer(f1_score),
        "roc_auc_score": make_scorer(roc_auc_score)
    },
    return_train_score=True,
    n_jobs=-1
)

results_df = pd.DataFrame(cv_results)
display(results_df)

results_df = pd.DataFrame(cv_results)
summary = results_df.filter(regex='(train_|test_)').describe().loc[['mean', 'std']].T
summary.rename(columns={'mean': 'Mean', 'std': 'Std'}, inplace=True)
display(summary)

Unnamed: 0,fit_time,score_time,test_accuracy_score,train_accuracy_score,test_precision_score,train_precision_score,test_recall_score,train_recall_score,test_f1_score,train_f1_score,test_roc_auc_score,train_roc_auc_score
0,1.941696,0.022309,0.804,0.851667,0.804,0.852215,0.804,0.850889,0.804,0.851551,0.804,0.851667
1,1.90776,0.030193,0.816,0.851889,0.825103,0.851031,0.802,0.853111,0.813387,0.85207,0.816,0.851889
2,2.338782,0.022896,0.822,0.849778,0.819444,0.848384,0.826,0.851778,0.822709,0.850078,0.822,0.849778
3,2.352577,0.02351,0.807,0.850333,0.80396,0.848398,0.812,0.853111,0.80796,0.850748,0.807,0.850333
4,2.10252,0.024005,0.819,0.849778,0.812133,0.848848,0.83,0.851111,0.820969,0.849978,0.819,0.849778
5,1.900064,0.022218,0.83,0.849,0.838115,0.848767,0.818,0.849333,0.827935,0.84905,0.83,0.849
6,2.141538,0.026516,0.82,0.849778,0.827869,0.847768,0.808,0.852667,0.817814,0.850211,0.82,0.849778
7,2.268579,0.02358,0.835,0.848778,0.839757,0.848236,0.828,0.849556,0.833837,0.848895,0.835,0.848778
8,0.812482,0.014745,0.831,0.848778,0.821359,0.849321,0.846,0.848,0.833498,0.84866,0.831,0.848778
9,0.737574,0.0206,0.834,0.848778,0.827451,0.849166,0.844,0.848222,0.835644,0.848694,0.834,0.848778


Unnamed: 0,Mean,Std
test_accuracy_score,0.8218,0.01083
train_accuracy_score,0.849856,0.001149
test_precision_score,0.821919,0.012483
train_precision_score,0.849213,0.001376
test_recall_score,0.8218,0.015676
train_recall_score,0.850778,0.001927
test_f1_score,0.821775,0.011093
train_f1_score,0.849993,0.001198
test_roc_auc_score,0.8218,0.01083
train_roc_auc_score,0.849856,0.001149


### 4. Creating the Submission File

The competition requires a `.csv` file with two columns: `battle_id` and `player_won`. Let's use our trained model to make predictions on the test set and format them correctly.

In [None]:
print("Generating predictions on the test set...")
submission_df = pd.DataFrame({
    'battle_id': test_df['battle_id'],
    'player_won': model.predict(X_test)
})

submission_df.to_csv('submission.csv', index=False)

print("\n'submission.csv' file created successfully!")
display(submission_df.head())

Generating predictions on the test set...

'submission.csv' file created successfully!


Unnamed: 0,battle_id,player_won
0,0,0
1,1,1
2,2,1
3,3,1
4,4,1


### 5. Submitting Your Results

Once you have generated your `submission.csv` file, there are two primary ways to submit it to the competition.

---

#### Method A: Submitting Directly from the Notebook

This is the standard method for code competitions. It ensures that your submission is linked to the code that produced it, which is crucial for reproducibility.

1.  **Save Your Work:** Click the **"Save Version"** button in the top-right corner of the notebook editor.
2.  **Run the Notebook:** In the pop-up window, select **"Save & Run All (Commit)"** and then click the **"Save"** button. This will run your entire notebook from top to bottom and save the output, including your `submission.csv` file.
3.  **Go to the Viewer:** Once the save process is complete, navigate to the notebook viewer page. 
4.  **Submit to Competition:** In the viewer, find the **"Submit to Competition"** section. This is usually located in the header of the output section or in the vertical "..." menu on the right side of the page. Clicking the **Submit** button this will submit your generated `submission.csv` file.

After submitting, you will see your score in the **"Submit to Competition"** section or in the [Public Leaderboard](https://www.kaggle.com/competitions/fds-pokemon-battles-prediction-2025/leaderboard?).

---

#### Method B: Manual Upload

You can also generate your predictions and submission file using any environment you prefer (this notebook, Google Colab, or your local machine).

1.  **Generate the `submission.csv` file** using your model.
2.  **Download the file** to your computer.
3.  **Navigate to the [Leaderboard Page](https://www.kaggle.com/competitions/fds-pokemon-battles-prediction-2025/leaderboard?)** and click on the **"Submit Predictions"** button.
4.  **Upload Your File:** Drag and drop or select your `submission.csv` file to upload it.

This method is quick, but keep in mind that for the final evaluation, you might be required to provide the code that generated your submission.

Good luck!