# Deployment & Submission: Titanic Machine Learning from Disaster

## Overview

This notebook documents the **Deployment** phase (CRISP-DM Phase 6) for the Titanic Kaggle competition. It covers the process of generating, validating, and submitting predictions, archiving artifacts, and logging results for reproducibility and business reporting.

---
**CRISP-DM Phase 6 of 6** | **Previous:** [Evaluation](05_evaluation.ipynb)

## 1. Submission Workflow & Checklist

**Deployment Steps:**
1. Generate predictions for the test set using the final model pipeline.
2. Format the submission file:
   - Exactly 2 columns: `PassengerId`, `Survived`
   - 418 rows (matches `test.csv`)
   - No extra columns or index; `Survived` as integer {0,1}
3. Save submission as `submission/submission_YYYYMMDD_modelname.csv`.
4. Validate file format and completeness.
5. Submit to Kaggle and log leaderboard score.
6. Archive model, pipeline, and notebook for reproducibility.

**Reference:** See planning.md for full checklist and code snippets.

In [2]:
# Generate predictions for test set using final model pipeline
import pandas as pd
from joblib import load
from datetime import datetime
import os

# Paths
test_path = '../data/raw/test.csv'
model_path = '../models/final_model.pkl'

# Check for required files
if not os.path.exists(test_path):
    print(f'ERROR: Test data not found at {test_path}. Please add test.csv to this location.')
elif not os.path.exists(model_path):
    print(f'ERROR: Model pipeline not found at {model_path}. Please train and save your model pipeline.')
else:
    # Load test data and model pipeline
    test = pd.read_csv(test_path)
    pipeline = load(model_path)

    # Predict
    preds = pipeline.predict(test)

    # Format submission file
    submission = pd.DataFrame({
        'PassengerId': test['PassengerId'],
        'Survived': preds.astype(int)
    })

    # Save with timestamp and model name
    today = datetime.today().strftime('%Y%m%d')
    model_name = 'gbdt'  # Update if needed
    submission_path = f'submission/submission_{today}_{model_name}.csv'
    submission.to_csv(submission_path, index=False)

    print(f'Submission file saved to {submission_path}')



ValueError: could not convert string to float: 'Kelly, Mr. James'

In [2]:
# Validate submission file format
try:
    submission
except NameError:
    print('ERROR: Submission file not created. Please run the previous cell and ensure required files are present.')
else:
    assert submission.shape == (418, 2), 'Submission must have 418 rows and 2 columns.'
    assert set(submission.columns) == {'PassengerId', 'Survived'}, 'Columns must be PassengerId and Survived.'
    assert submission['Survived'].isin([0,1]).all(), 'Survived must be 0 or 1.'
    print('Submission file format validated.')

    # Log leaderboard score (manual step after Kaggle submission)
    lb_score = None  # Fill in after submission
    print(f'Kaggle LB score: {lb_score if lb_score else "<to be filled after submission>"}')

NameError: name 'submission' is not defined

## 2. Archiving & Reproducibility

- Archive the final model pipeline, feature columns, and submission file.
- Save the deployment notebook and code artifacts for future reference.
- Document any changes to features, model parameters, or validation strategy.
- Maintain a log of leaderboard scores and notes on each submission.
- Ensure all steps are reproducible from raw data to submission.

**Professional Takeaway:**
This deployment workflow ensures robust, transparent, and reproducible submission for the Titanic Kaggle competition, aligning with CRISP-DM and business requirements.