Titanic Survival Prediction Model

Objective

The objective of this project was to build a predictive model using the Titanic dataset to determine whether a passenger on the Titanic survived or not. This dataset is a common starting point for data science and machine learning projects due to its simplicity and the availability of relevant features.

Dataset Description

The Titanic dataset contains information about individual passengers, including the following features:

Pclass: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd)
Sex: Gender of the passenger
Age: Age of the passenger
*SibSp#: Number of siblings or spouses aboard the Titanic
Parch: Number of parents or children aboard the Titanic
Ticket: Ticket number
Fare: Passenger fare
Cabin: Cabin number
Embarked: Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
Survived: Survival status (0 = No, 1 = Yes) [Target variable]

Data Preprocessing

Before building the models, the following preprocessing steps were undertaken:

Handling Missing Values: Missing values in the Age, Cabin, and Embarked columns were addressed.
- Age was imputed using the median age.
- Cabin information was dropped due to a large number of missing values.
- Missing values in Embarked were filled with the most common port (S).
Feature Encoding: Categorical variables (Sex, Embarked) were converted into numerical values using one-hot encoding.
Feature Scaling: Continuous variables (Age, Fare) were standardized to have a mean of 0 and a standard deviation of 1.

Model Building

Two machine learning models were trained and evaluated: Logistic Regression and Random Forest. Additionally, Randomized Search Cross-Validation was used to tune the hyperparameters of the Random Forest model.

Logistic Regression

Accuracy: 0.80
Precision, Recall, and F1-Score:

Random Forest

Accuracy: 0.82
Precision, Recall, and F1-Score:

Data Visualisations

### Best Model - Random Forest Model (with Hyperparameter Tuning)

Accuracy: 0.82
Precision, Recall, and F1-Score:

## Model Conclusion
The Random Forest model with hyperparameter tuning performed slightly better than the Logistic Regression model, achieving an accuracy of 82%. The precision, recall, and F1-score indicate that the model is reasonably good at predicting survival on the Titanic, with a higher precision for predicting non-survival (class 0) and a balanced performance for survival (class 1).

Final Thoughts

The predictive models built using the Titanic dataset offer valuable insights into the factors that influenced survival on the Titanic. Analysis of the dataset revealed the following key points about survival:

Gender: Women had a significantly higher survival rate compared to men. This is reflected in the model's feature importance, where gender (Sex) was one of the most influential factors.
Passenger Class: Passengers in first class (Pclass = 1) had a higher survival rate compared to those in second and third classes. This indicates that socio-economic status played a crucial role in survival chances.
Age: Younger passengers had a better chance of survival compared to older passengers. Children, in particular, had higher survival rates.
Family Size: Passengers with fewer family members aboard (SibSp and Parch) tended to survive more often than those with larger families.
Embarkation Point: Passengers who embarked from Cherbourg (Embarked = C) had a slightly higher survival rate compared to those who boarded at Queenstown or Southampton.

In conclusion, the model successfully identifies the key determinants of survival, emphasizing the importance of gender, socio-economic status, age, and family size. These findings align with historical accounts and provide a comprehensive understanding of the factors that influenced survival during the Titanic disaster. The models developed in this project serve as an effective tool for predicting survival and demonstrate the potential of machine learning in analyzing historical datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
DataSetTitanic		DataSetTitanic
Images		Images
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Survival Prediction Model

Table of Contents

Objective

Dataset Description

Data Preprocessing

Model Building

Logistic Regression

Random Forest

Data Visualisations

Final Thoughts

About

Releases

Packages

Languages

License

noturlee/Titanic-DataModel

Folders and files

Latest commit

History

Repository files navigation

Titanic Survival Prediction Model

Table of Contents

Objective

Dataset Description

Data Preprocessing

Model Building

Logistic Regression

Random Forest

Data Visualisations

Final Thoughts

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages