# Titanic Survival Prediction

This notebook demonstrates how to predict survival on the Titanic using a Random Forest Classifier. We will load the dataset, clean and preprocess the data, encode categorical variables, select features, train a model, and evaluate its performance.

## 1. Import Libraries and Load Dataset

Import pandas, scikit-learn modules, and load the Titanic dataset from a CSV file.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

# Load the dataset
df = pd.read_csv('Titanic-Dataset.csv')
df.head()

## 2. Data Cleaning and Preprocessing

Fill missing values in 'Age' and 'Fare' with the median, and 'Embarked' with the mode.

In [None]:
# Fill missing Age values with median
df['Age'].fillna(df['Age'].median(), inplace=True)
# Fill missing Embarked with mode
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
# Fill missing Fare with median
if 'Fare' in df.columns:
    df['Fare'].fillna(df['Fare'].median(), inplace=True)
df.isnull().sum()

## 3. Encode Categorical Variables

Use LabelEncoder to convert 'Sex' and 'Embarked' columns to numeric values.

In [None]:
from sklearn.preprocessing import LabelEncoder

label_encoders = {}
for col in ['Sex', 'Embarked']:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le
df[['Sex', 'Embarked']].head()

## 4. Feature Selection

Select relevant features for model training and define the target variable.

In [None]:
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
target = 'Survived'

X = df[features]
y = df[target]
X.head()

## 5. Split Data into Training and Test Sets

Split the dataset into training and test sets using train_test_split.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('Train shape:', X_train.shape)
print('Test shape:', X_test.shape)

## 6. Train Random Forest Classifier

Initialize and train a RandomForestClassifier on the training data.

In [None]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 7. Model Evaluation

Predict on the test set and evaluate the model using accuracy_score and classification_report.

In [None]:
from sklearn.metrics import accuracy_score, classification_report

y_pred = model.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))