# Titanic Survival Prediction Project


### Overview about titanic survival prediction project
🚢 **Titanic Survival Prediction Project**

• Utilize the Titanic dataset to build a predictive model that determines if a passenger survived the disaster.
• This classic project provides insights into survival patterns among Titanic passengers based on their age, gender, ticket class, and other features.
• Perfect for introductory learning in data analysis and machine learning! 📊💻

**Titanic Survival Prediction Dataset Column Descriptions:**

1. **PassengerId**: Unique identifier for each passenger.
2. **Survived**: Survival status (0 = No, 1 = Yes).
3. **Pclass**: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd).
4. **Name**: Full name of the passenger.
5. **Sex**: Gender of the passenger.
6. **Age**: Age of the passenger.
7. **SibSp**: Number of siblings or spouses aboard the Titanic.
8. **Parch**: Number of parents or children aboard the Titanic.
9. **Ticket**: Ticket number.
10. **Fare**: Fare paid by the passenger.
11. **Cabin**: Cabin number.
12. **Embarked**: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).

### 1. Import Libraries

In [16]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


### 2. Load the Dataset

In [17]:
# Load the dataset
df = pd.read_csv('Titanic dataset.csv')

# Display the first few rows of the dataframe
print(df.head())


   PassengerId  Survived  Pclass  \
0          892         0       3   
1          893         1       3   
2          894         0       2   
3          895         0       3   
4          896         1       3   

                                           Name     Sex   Age  SibSp  Parch  \
0                              Kelly, Mr. James    male  34.5      0      0   
1              Wilkes, Mrs. James (Ellen Needs)  female  47.0      1      0   
2                     Myles, Mr. Thomas Francis    male  62.0      0      0   
3                              Wirz, Mr. Albert    male  27.0      0      0   
4  Hirvonen, Mrs. Alexander (Helga E Lindqvist)  female  22.0      1      1   

    Ticket     Fare Cabin Embarked  
0   330911   7.8292   NaN        Q  
1   363272   7.0000   NaN        S  
2   240276   9.6875   NaN        Q  
3   315154   8.6625   NaN        S  
4  3101298  12.2875   NaN        S  


### 3. Preprocess the Data

In [18]:


# Drop columns that are not needed
df = df.drop(columns=['PassengerId', 'Name', 'Ticket', 'Cabin'])

# Handle missing values for 'Age' and 'Fare'
imputer = SimpleImputer(strategy='mean')
df[['Age', 'Fare']] = imputer.fit_transform(df[['Age', 'Fare']])

# Fill missing values for 'Embarked' with the most frequent value
df['Embarked'] = df['Embarked'].fillna(df['Embarked'].mode()[0])

# Encode categorical variables
label_encoder = LabelEncoder()
df['Sex'] = label_encoder.fit_transform(df['Sex'])
df['Embarked'] = label_encoder.fit_transform(df['Embarked'])


### 4. Split the Data

In [4]:
# Define features and target variable
X = df.drop(columns=['Survived'])
y = df['Survived']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 5. Standardize the Features

In [19]:
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


### 6. Train the Model

In [20]:





# Initialize the Random Forest Classifier
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
model.fit(X_train, y_train)




### 7. Evaluate the Model

In [21]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# Print classification report
report = classification_report(y_test, y_pred)
print(f'Classification Report:\n{report}')

# Print confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(f'Confusion Matrix:\n{conf_matrix}')


Accuracy: 1.0
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       1.00      1.00      1.00        34

    accuracy                           1.00        84
   macro avg       1.00      1.00      1.00        84
weighted avg       1.00      1.00      1.00        84

Confusion Matrix:
[[50  0]
 [ 0 34]]


### Conclusion of Titanic Survival Prediction Project 🚢✨

By utilizing the Titanic dataset, we successfully built a predictive model to determine passenger survival with an accuracy score and detailed classification metrics. This project provided valuable insights into survival patterns based on passenger details like age, gender, and ticket class. Such a classic exercise demonstrates the power of data analysis and machine learning in uncovering historical trends and making informed predictions.