# UAS Machine Learning - Titanic Survival Prediction
**Nama:** [Mohamad Taufik Wibowo]<br>
**NIM:** [231011400164]<br>
**Kelas:** [05 TPLE 004]

Notebook ini berisi langkah-langkah pengerjaan UAS Machine Learning: Eksplorasi Data, Preprocessing, Modeling dengan Decision Tree, dan Evaluasi.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

%matplotlib inline

## 1. Load Data & EDA Singkat

In [None]:
# Load dataset bawaan dari Seaborn
df = sns.load_dataset('titanic')

print("Shape:", df.shape)
display(df.head())

# Simpan ke CSV agar punya file fisiknya
df.to_csv('titanic.csv', index=False)

# Cek Missing Values
print("\nJumlah Missing Values:")
print(df.isnull().sum())

## 2. Preprocessing Data

In [None]:
# Buang kolom yang tidak relevan / terlalu banyak missing
cols_to_drop = ['deck', 'embark_town', 'alive', 'who', 'adult_male', 'class'] 
df_clean = df.drop(columns=cols_to_drop)

# Isi Missing Values
# Umur diisi median, Embarked diisi modus
df_clean['age'] = df_clean['age'].fillna(df_clean['age'].median())
df_clean['embarked'] = df_clean['embarked'].fillna(df_clean['embarked'].mode()[0])

# Encoding (Mengubah huruf ke angka)
le = LabelEncoder()
df_clean['sex'] = le.fit_transform(df_clean['sex'])
df_clean['embarked'] = le.fit_transform(df_clean['embarked'])

print("Data setelah bersih-bersih:")
display(df_clean.head())

## 3. Modeling (Decision Tree)

In [None]:
# Split Data
X = df_clean.drop('survived', axis=1)
y = df_clean['survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Simpan Data Training & Testing ke File (Opsional, untuk bukti)
train_data = pd.concat([X_train, y_train], axis=1)
test_data = pd.concat([X_test, y_test], axis=1)
train_data.to_csv('titanic_train.csv', index=False)
test_data.to_csv('titanic_test.csv', index=False)
print("File 'titanic_train.csv' dan 'titanic_test.csv' berhasil dibuat!")

# Melatih Model
# max_depth=3 agar pohon mudah dibaca
dt_model = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)
dt_model.fit(X_train, y_train)

print("Model berhasil dilatih!")

## 4. Evaluasi & Visualisasi

In [None]:
y_pred = dt_model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# Visualisasi Pohon Keputusan
plt.figure(figsize=(20,10))
plot_tree(dt_model, feature_names=X.columns, class_names=['Not Survived', 'Survived'], filled=True, rounded=True)
plt.show()