# FDS Challenge: Starter Notebook

This notebook will guide you through the first steps of the competition. Our goal here is to show you how to:

1.  Load the `train.jsonl` and `test.jsonl` files from the competition data.
2.  Create a very simple set of features from the data.
3.  Train a basic model.
4.  Generate a `submission.csv` file in the correct format.
5.  Submit your results.

Let's get started!

In [None]:
from typing import Any
import json
import os
from pprint import pprint
import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.svm import SVC

from xgboost import XGBClassifier

from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.model_selection import KFold, StratifiedKFold, cross_validate

### 1. Loading and Inspecting the Data

When you create a notebook within a Kaggle competition, the competition's data is automatically attached and available in the `../input/` directory.

The dataset is in a `.jsonl` format, which means each line is a separate JSON object. This is great because we can process it one line at a time without needing to load the entire large file into memory.

Let's write a simple loop to load the training data and inspect the first battle.

In [2]:
COMPETITION_NAME = 'fds-pokemon-battles-prediction-2025'
DATA_PATH = os.getcwd() #os.path.join('../input', COMPETITION_NAME)

train_file_path = os.path.join(DATA_PATH, 'train.jsonl')
test_file_path = os.path.join(DATA_PATH, 'test.jsonl')

print(f"Loading data from '{train_file_path}'...")
try:
    with open(train_file_path, 'r', encoding="utf-8") as f:
        train_data = [json.loads(line) for line in f]

    print(f"Successfully loaded {len(train_data)} battles.")

    print("\n--- Structure of the first train battle: ---")
    if train_data:
        first_battle = train_data[0]
        
        battle_for_display = first_battle.copy()
        battle_for_display['battle_timeline'] = battle_for_display.get('battle_timeline', [])[:2] # Show first 2 turns
        
        pprint(battle_for_display)
        if len(first_battle.get('battle_timeline', [])) > 3:
            print("    ...")
            print("    (battle_timeline has been truncated for display)")

except FileNotFoundError:
    print(f"ERROR: Could not find the training file at '{train_file_path}'.")
    print("Please make sure you have added the competition data to this notebook.")

Loading data from 'c:\Users\stefa\PycharmProjects\pokemon-challenge\train.jsonl'...
Successfully loaded 10000 battles.

--- Structure of the first train battle: ---
{'battle_id': 0,
 'battle_timeline': [{'p1_move_details': {'accuracy': 1.0,
                                          'base_power': 95,
                                          'category': 'SPECIAL',
                                          'name': 'icebeam',
                                          'priority': 0,
                                          'type': 'ICE'},
                      'p1_pokemon_state': {'boosts': {'atk': 0,
                                                      'def': 0,
                                                      'spa': 0,
                                                      'spd': 0,
                                                      'spe': 0},
                                           'effects': ['noeffect'],
                                           'hp_pct': 1.0,
           

### 2. Basic Feature Engineering

A successful model will likely require creating many complex features. For this starter notebook, however, we will create a very simple feature set based **only on the initial team stats**. This will be enough to train a model and generate a submission file.

It's up to you to engineer more powerful features!

In [3]:
def features_check(data: dict) -> None:
    print("All battles have at least one turn: ", all(all(turn for turn in battle.get('battle_timeline', False)) for battle in data))
    print("All battles' turns have at least one P1 move: ", 
        all((
            any((turn.get("p1_move_details", False) for turn in battle.get('battle_timeline', False))) for battle in data
        ))
    )
    print("All battles' turns have at least one P2 move: ", 
        all((
            any((turn.get("p2_move_details", False) for turn in battle.get('battle_timeline', False))) for battle in data
        ))
    )
    print("player_won feature always exists: ", all(('player_won' in battle for battle in data)))
    print("P1 Team always exists: ", all(battle.get('p1_team_details', False) for battle in data))
    print("P2 Team always exists: ", all(battle.get('p2_team_details', False) for battle in data))
    
    return None

In [4]:
def agg_pokemons_stats(prefix: str, stats: dict[str, Any]):
    return {
        f"{prefix}_mean_power": np.mean(stats["powers"]) if stats["powers"] else 0,
        f"{prefix}_mean_accuracy": np.mean(stats["accuracy"]) if stats["accuracy"] else 0,
        f"{prefix}_lost_hp": stats["lost_hp"],
        f"{prefix}_turns_statused": stats["turns_statused"],
        f"{prefix}_missed_turns": stats["missed_turns"],
        f"{prefix}_priority": stats["priority"],
        f"{prefix}_switches": stats["switches"],
        f"{prefix}_KOs": stats["KOs"],
        f"{prefix}_net_boost": stats["net_boost"],
    }

In [5]:
def create_features(data: list[dict]) -> pd.DataFrame:
    feature_list = []
    
    features_check(data)

    for battle in data:
        features = {}
        
        # --- Player 1 Team Features ---
        p1_team = battle.get('p1_team_details', None)
        features['p1_mean_hp'] = np.nanmean([p.get('base_hp') for p in p1_team])
        features['p1_mean_spe'] = np.nanmean([p.get('base_spe') for p in p1_team])
        features['p1_mean_atk'] = np.nanmean([p.get('base_atk') for p in p1_team])
        features['p1_mean_def'] = np.nanmean([p.get('base_def') for p in p1_team])

        # --- Player 2 Lead Features ---
        if p2_lead := battle.get('p2_lead_details'):
            # Player 2's lead Pokémon's stats
            features['p2_lead_hp'] = p2_lead.get('base_hp')
            features['p2_lead_spe'] = p2_lead.get('base_spe')
            features['p2_lead_atk'] = p2_lead.get('base_atk')
            features['p2_lead_def'] = p2_lead.get('base_def')
        
        # --- Battle Timeline Features ---
        if timeline := battle.get('battle_timeline', []):
            turns = len(timeline)
            p1_names = [t['p1_pokemon_state']['name'] for t in timeline if t.get('p1_pokemon_state')]
            p1_moves = [t['p1_move_details']['name'] for t in timeline if t.get('p1_move_details')]
            p2_names = [t['p2_pokemon_state']['name'] for t in timeline if t.get('p2_pokemon_state')]

            # Number of turns and unique Pokémon
            features['n_turns'] = turns
            features['p1_unique_pokemon'] = len(set(p1_names))
            #features['p1_unique_moves'] = len(set(p1_moves))
            features['p2_unique_pokemon'] = len(set(p2_names))

            # Compute damage dealt (approximate)
            # delta HP of opponent between turns
            p2_hp_deltas = []
            for i in range(1, turns):
                prev_hp = timeline[i-1]['p2_pokemon_state']['hp_pct']
                curr_hp = timeline[i]['p2_pokemon_state']['hp_pct']
                p2_hp_deltas.append(prev_hp - curr_hp)
            features['mean_damage_dealt'] = np.mean([d for d in p2_hp_deltas if d > 0]) if p2_hp_deltas else None

            # Final HP and KO counts
            last_state = timeline[-1]['p1_pokemon_state']
            features['final_p1_hp'] = last_state.get('hp_pct', None)
            features['p1_fainted_count'] = sum(t['p1_pokemon_state']['status'] == 'fnt' for t in timeline)
            features['p2_fainted_count'] = sum(t['p2_pokemon_state']['status'] == 'fnt' for t in timeline)
            
        else:
            features.update({
                'n_turns': None,
                'p1_unique_pokemon': None,
                'p1_unique_moves': None,
                'p2_unique_pokemon': None,
                'mean_damage_dealt': None,
                'final_p1_hp': None,
                'p1_fainted_count': None,
                'p2_fainted_count': None,
            })

        features['battle_id'] = battle.get('battle_id')
        if 'player_won' in battle:
            features['player_won'] = int(battle['player_won'])
            
        feature_list.append(features)
        
    return pd.DataFrame(feature_list).fillna(0)

print("Processing training data...")
train_df = create_features(train_data)

print("\nProcessing test data...")
with open(test_file_path, 'r', encoding="utf-8") as f:
    test_data = [json.loads(line) for line in f]
        
test_df = create_features(test_data)

print("\nTraining dataset preview:")
display(train_df.head())

print("\nTesting dataset preview:")
display(test_df.head())

Processing training data...
All battles have at least one turn:  True
All battles' turns have at least one P1 move:  False
All battles' turns have at least one P2 move:  False
player_won feature always exists:  True
P1 Team always exists:  True
P2 Team always exists:  False


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)



Processing test data...
All battles have at least one turn:  True
All battles' turns have at least one P1 move:  False
All battles' turns have at least one P2 move:  False
player_won feature always exists:  False
P1 Team always exists:  True
P2 Team always exists:  False

Training dataset preview:


Unnamed: 0,p1_mean_hp,p1_mean_spe,p1_mean_atk,p1_mean_def,p2_lead_hp,p2_lead_spe,p2_lead_atk,p2_lead_def,n_turns,p1_unique_pokemon,p2_unique_pokemon,mean_damage_dealt,final_p1_hp,p1_fainted_count,p2_fainted_count,battle_id,player_won
0,115.833333,80.0,72.5,63.333333,60,115,75,85,30,4,4,0.292968,0.291022,1,1,0,1
1,123.333333,61.666667,72.5,65.833333,55,120,50,45,30,6,6,0.191667,0.45,3,0,1,1
2,124.166667,65.833333,84.166667,71.666667,250,50,5,5,30,3,4,0.26,0.52,1,0,2,1
3,121.666667,75.833333,77.5,65.833333,75,110,100,95,30,5,4,0.336667,0.04,3,0,3,1
4,114.166667,72.5,75.833333,79.166667,60,115,75,85,30,5,5,0.351818,1.0,1,0,4,1



Testing dataset preview:


Unnamed: 0,p1_mean_hp,p1_mean_spe,p1_mean_atk,p1_mean_def,p2_lead_hp,p2_lead_spe,p2_lead_atk,p2_lead_def,n_turns,p1_unique_pokemon,p2_unique_pokemon,mean_damage_dealt,final_p1_hp,p1_fainted_count,p2_fainted_count,battle_id
0,117.5,78.333333,74.166667,61.666667,65,130,65,60,30,5,5,0.352222,1.0,4,0,0
1,70.166667,95.833333,95.666667,96.666667,55,120,50,45,30,4,6,0.154615,1.0,1,0,1
2,120.0,61.666667,90.833333,88.333333,55,120,50,45,30,5,6,0.098413,1.0,1,0,2
3,114.166667,71.666667,70.0,71.666667,160,30,110,65,30,3,5,0.361429,0.32,0,0,3
4,116.666667,78.333333,75.0,65.833333,60,110,65,60,30,5,6,0.431722,0.189802,3,1,4


### 3. Training Models

In [None]:
# Define predictor features (X) and target (y)
features = [col for col in train_df.columns if col not in ['battle_id', 'player_won']]
X_train = train_df[features]
y_train = train_df['player_won']

X_test = test_df[features]

print("Training...")
model = XGBClassifier(
    random_state=100,
    n_estimators=300,
    learning_rate=0.05,
    max_depth=5,
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric='logloss',
    n_jobs=-1
)
model.fit(X_train, y_train)
print("Model training complete.")

Training...
Model training complete.


In [None]:
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=100)

cv_results = cross_validate(
    model,
    X_train,
    y_train,
    cv=cv,
    scoring=make_scorer(
        {
            "accuracy_score": accuracy_score,
            "precision_score": precision_score,
            "recall_score": recall_score,
            "f1_score": f1_score,
            "roc_auc_score": roc_auc_score
        }
    ),
    return_train_score=True,
    n_jobs=-1
)


InvalidParameterError: The 'score_func' parameter of make_scorer must be a callable. Got {'accuracy_score': <function accuracy_score at 0x0000020DB2C067A0>, 'precision_score': <function precision_score at 0x0000020DB2C07880>, 'recall_score': <function recall_score at 0x0000020DB2C079C0>, 'f1_score': <function f1_score at 0x0000020DB2C07060>, 'roc_auc_score': <function roc_auc_score at 0x0000020DB2C2EB60>} instead.

### 4. Creating the Submission File

The competition requires a `.csv` file with two columns: `battle_id` and `player_won`. Let's use our trained model to make predictions on the test set and format them correctly.

In [None]:
print("Generating predictions on the test set...")
submission_df = pd.DataFrame({
    'battle_id': test_df['battle_id'],
    'player_won': model.predict(X_test)
})

submission_df.to_csv('submission.csv', index=False)

print("\n'submission.csv' file created successfully!")
display(submission_df.head())

Generating predictions on the test set...

'submission.csv' file created successfully!


Unnamed: 0,battle_id,player_won
0,0,0
1,1,0
2,2,0
3,3,1
4,4,0


### 5. Submitting Your Results

Once you have generated your `submission.csv` file, there are two primary ways to submit it to the competition.

---

#### Method A: Submitting Directly from the Notebook

This is the standard method for code competitions. It ensures that your submission is linked to the code that produced it, which is crucial for reproducibility.

1.  **Save Your Work:** Click the **"Save Version"** button in the top-right corner of the notebook editor.
2.  **Run the Notebook:** In the pop-up window, select **"Save & Run All (Commit)"** and then click the **"Save"** button. This will run your entire notebook from top to bottom and save the output, including your `submission.csv` file.
3.  **Go to the Viewer:** Once the save process is complete, navigate to the notebook viewer page. 
4.  **Submit to Competition:** In the viewer, find the **"Submit to Competition"** section. This is usually located in the header of the output section or in the vertical "..." menu on the right side of the page. Clicking the **Submit** button this will submit your generated `submission.csv` file.

After submitting, you will see your score in the **"Submit to Competition"** section or in the [Public Leaderboard](https://www.kaggle.com/competitions/fds-pokemon-battles-prediction-2025/leaderboard?).

---

#### Method B: Manual Upload

You can also generate your predictions and submission file using any environment you prefer (this notebook, Google Colab, or your local machine).

1.  **Generate the `submission.csv` file** using your model.
2.  **Download the file** to your computer.
3.  **Navigate to the [Leaderboard Page](https://www.kaggle.com/competitions/fds-pokemon-battles-prediction-2025/leaderboard?)** and click on the **"Submit Predictions"** button.
4.  **Upload Your File:** Drag and drop or select your `submission.csv` file to upload it.

This method is quick, but keep in mind that for the final evaluation, you might be required to provide the code that generated your submission.

Good luck!