# Submission Server Example

This notebook shows how to use the submission server for Kaggle competition.

## How the Submission Server Works

The NFL Big Data Bowl uses a **server-based evaluation API**:
1. Your notebook runs as a server that receives test data in batches
2. Each batch must return predictions within 5 minutes
3. Kaggle's evaluation system calls your server repeatedly until all data is processed

## Setup

Your predict function must:
- Accept: `test` (Polars DataFrame) and `test_input` (Polars DataFrame)
- Return: Polars DataFrame with 'x' and 'y' columns
- Respond within 5 minutes per batch


In [None]:
import polars as pl
import torch
from pipeline.submission_server import create_submission_server, run_submission_server
from pipeline.save_model import load_model_ensemble
from pipeline.models import SeqModel, prepare_targets
from pipeline.config import Config

# Initialize config
config = Config()


## Define Prediction Function

The predict function is called by the server for each batch of test data.


In [None]:
# Load models (do this once at the start, before predict function)
# Models will be reused across all predict calls
saved_model = load_model_ensemble('models/nn_baseline.0')

models_x, models_y = [], []
for i in range(saved_model['num_folds']):
    # Load X-axis model
    model_x = SeqModel(input_dim=YOUR_FEATURE_DIM, horizon=config.MAX_FUTURE_HORIZON)
    model_x.load_state_dict(torch.load(saved_model['models_x_files'][i], map_location=config.DEVICE))
    model_x.eval()
    models_x.append(model_x)
    
    # Load Y-axis model
    model_y = SeqModel(input_dim=YOUR_FEATURE_DIM, horizon=config.MAX_FUTURE_HORIZON)
    model_y.load_state_dict(torch.load(saved_model['models_y_files'][i], map_location=config.DEVICE))
    model_y.eval()
    models_y.append(model_y)

scaler = saved_model['scalers']
print(f"Loaded {len(models_x)} models")

# Placeholder: Replace with your actual prediction logic
def predict_fn(test: pl.DataFrame, test_input: pl.DataFrame) -> pl.DataFrame:
    """
    Prediction function for Kaggle competition.
    
    Called by the server for each batch of test data.
    
    Args:
        test: Test data as Polars DataFrame
        test_input: Additional input data as Polars DataFrame
    
    Returns:
        Polars DataFrame with 'x' and 'y' columns
    """
    # TODO: Implement prediction logic
    # 1. Feature engineering on test/test_input data
    # 2. Sequence preparation (using saved config)
    # 3. Scaling (using saved scaler)
    # 4. Model inference (ensembling all folds)
    # 5. Post-processing (cumulative sum, clipping to field bounds)
    # 6. Format predictions
    
    # Placeholder
    predictions = pl.DataFrame({
        'x': [0.0] * len(test),
        'y': [0.0] * len(test)
    })
    
    return predictions


## Start the Server

The server automatically detects whether you're running in competition mode or local testing mode.


In [None]:
# For local testing (use gateway to simulate competition environment)
# server = run_submission_server(predict_fn, gateway_path=('/path/to/test/data/',))

# For competition submission (auto-detects KAGGLE_IS_COMPETITION_RERUN environment variable)
# server = run_submission_server(predict_fn)

# Or create server manually for more control:
# server = create_submission_server(predict_fn)
# if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
#     server.serve()  # Start server for competition
# else:
#     server.run_local_gateway(('/path/to/test/data/',))  # Local testing

print("Server setup complete. Uncomment lines above to start the server.")
