**Write a program that takes Titanic data and**
**uses machine-learning algorithms to**
**predict whether or not a person will survive.**

**The program should be able to handle pre-**
**processing of the data, such as cleaning up**

**missing values and creating new features if**
**required.**

**Your task is to choose multiple machine**
**learning algorithms and compare their**
**accuracy in predicting survival rates. You**
**can use metrics such as accuracy,**
**precision, recall, or F1 score to evaluate the**
**performance of each model.**

In [None]:
#Loading the required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [None]:
#Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#Loading the dataset
df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/titanic.csv')

In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
# Data preprocessing
# Handling missing values
imputer = SimpleImputer(strategy='median')
df[['Age', 'Fare']] = imputer.fit_transform(df[['Age', 'Fare']])

In [None]:
# Creating new features
df['FamilySize'] = df['SibSp'] + df['Parch']

In [None]:
# Feature selection
features = ['Pclass', 'Sex', 'Age', 'Fare', 'FamilySize']
X = df[features]
y = df['Survived']

In [None]:
# One-hot encoding for categorical variables
X = pd.get_dummies(X, columns=['Sex'], drop_first=True)

In [None]:
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Initialize and train models
models = {
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42),
    'SVM': SVC(random_state=42),
    'Logistic Regression': LogisticRegression(random_state=42)
}

In [None]:
for name, model in models.items():
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)

    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print(f"Model: {name}")
    print(f"Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1 Score: {f1:.4f}\n")

Model: Decision Tree
Accuracy: 0.7486, Precision: 0.6790, Recall: 0.7432, F1 Score: 0.7097

Model: Random Forest
Accuracy: 0.8045, Precision: 0.7746, Recall: 0.7432, F1 Score: 0.7586

Model: SVM
Accuracy: 0.8101, Precision: 0.8030, Recall: 0.7162, F1 Score: 0.7571

Model: Logistic Regression
Accuracy: 0.8045, Precision: 0.8000, Recall: 0.7027, F1 Score: 0.7482

