# 🛳️ Final Submission

This notebook wraps up the Titanic survival prediction project.  
After evaluating all candidate models, we select the best-performing one, generate the final `.csv` file for submission,<br> and summarize key takeaways from the project.

---

## 🎯 Purpose

To finalize the project by:
- Selecting the most suitable model based on evaluation metrics  
- Generating the submission file in the required format  
- Reflecting briefly on the overall process

## 📦 Dataset

Same processed dataset used in earlier notebooks:  
[Titanic - Machine Learning from Disaster](https://www.kaggle.com/c/titanic)  
via public repository: [Data Science Dojo GitHub](https://github.com/datasciencedojo/datasets)


📦 1. Load the Dataset & Predictions

We load the same processed dataset and reuse the train/test split  
to ensure consistency with the previous evaluation.

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load processed dataset
df = pd.read_csv("feature_engineered_titanic.csv")

# Target and features
y = df['Survived']
X_full = df.drop(columns=['Survived', 'Name', 'Ticket', 'PassengerId'])
X_safe = X_full.drop(columns=['Cabin', 'Title'])

# Shared train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X_safe, y, test_size=0.2, random_state=42
)

🤖 2. Train the Final Model

Based on the evaluation results,  
we select `GradientBoostingClassifier` as the final model due to its consistently strong performance across all metrics, including F1 Score and AUC.

We'll now retrain it on the training set and generate predictions.

In [11]:
from sklearn.ensemble import GradientBoostingClassifier

# Initialize and train the final model
final_model = GradientBoostingClassifier(random_state=42)
final_model.fit(X_train, y_train)

# Predict on test set
final_preds = final_model.predict(X_test)

📄 3. Generate the Submission File

To prepare for submission, we generate predictions using the final model.  
The output file follows the required Kaggle format with the following columns:

- `PassengerId`: ID from the original dataset  
- `Survived`: Predicted survival (0 or 1)

The file will be saved as `submission.csv`.

In [12]:
# Reload original dataframe to retrieve PassengerId
df_original = pd.read_csv("feature_engineered_titanic.csv")
passenger_ids = df_original.loc[X_test.index, 'PassengerId']

# Create submission DataFrame
submission = pd.DataFrame({
    "PassengerId": passenger_ids,
    "Survived": final_preds
})

# Save to CSV
submission.to_csv("submission.csv", index=False)

### 🧠 Summary
In this notebook, I finalized the Titanic survival prediction project by:

- Selecting `GradientBoostingClassifier` as the final model based on F1 score and ROC AUC  
- Retraining the model on the training set  
- Generating predictions and saving the result as `submission.csv` in Kaggle format

This marks the completion of the core workflow.  
Future improvements could include hyperparameter tuning, model stacking, or feature expansion.

## 📦 Overall Summary

This notebook concludes a 7-part beginner-friendly Titanic survival prediction project.

Throughout the series, I explored:
- Data cleaning and visualization (01–02)
- Feature engineering (03–04)
- Model building and evaluation (05–06)
- Final model selection and submission preparation (07)

By completing this project, I practiced core data science skills such as:
- Exploratory data analysis (EDA)
- Feature preprocessing and transformation
- Model selection and evaluation using multiple metrics
- Real-world prediction workflow aligned with Kaggle submission format

This was my first fully documented project on GitHub,  
and it helped me build both skills and confidence in the data science workflow.
