# Titanic Model v2 - Feature Engineering + Modeling
This notebook builds an improved Titanic survival prediction model by combining:
- Insights from previous exploratory data analysis (EDA)  
  (`eda_titanic_based_on_raw_data_2025-07-21.ipynb`)
- The baseline v1 model structure (TFDF ensemble, Score: 0.80622)  
  (`titanic_model_v1_2025-07-21.ipynb`)
- New feature engineering steps
- Comparative model testing (e.g., TFDF, Random Forest)

**Goal:** Improve model interpretability and performance by adding meaningful features and testing multiple algorithms.

## 1. Import Dependencie & Load Dataset
### (讀取資料與初步觀察)

In [2]:
import pandas as pd
import numpy as np

# 載入原始資料
train = pd.read_csv("../data/train.csv")
test = pd.read_csv("../data/test.csv")

# 檢查基本資料結構
print(train.shape)   # (行數, 欄位數)
train.head()

(891, 12)


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 2. EDA Insight → Feature Engineering Objectives
### (整合EDA結果 → 特徵工程目標列出)

Based on previouse EDA (`eda_titanic_on_raw_data_2025-07-21.ipynb`), I identified several promising variables:  

**High-impact features from v1 EDA:**  
- `Sex`
- `Pclass`
- `Fare`

**New features to engineer:**
- `Title` (from `Name`)
- `FamilySize` = `SibSp` + `Parch` + 1
- `IsAlone` = 1 if FamilySize == 1
- `FarePerPerson` = Fare / FamilySize
- `Deck` (from Cabin) - optional

## 3. Feature Engineering

In [4]:
# 萃取 Title
def extract_title(name):
    return name.split(',')[1].split('.')[0].strip()

train['Title'] = train['Name'].apply(extract_title)

# 可以先看看有哪些 Title
print(train['Title'].value_counts())

Title
Mr              517
Miss            182
Mrs             125
Master           40
Dr                7
Rev               6
Mlle              2
Major             2
Col               2
the Countess      1
Capt              1
Ms                1
Sir               1
Lady              1
Mme               1
Don               1
Jonkheer          1
Name: count, dtype: int64
