Skip to content

A python script to classify Titanic dataset (Survived and not Survived) by applying different Machine Learning Algorithms have been used such as Logistic Regression, SVM, KNN, Decision Tree and Random Forest.

Notifications You must be signed in to change notification settings

mohammedlajam/Titanic-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Titanic-ML:

Introduction:

The sinking of the Titanic is one of the most infamous shipwrecks in history. The aim of the project is to predict whether a person can survive or not.

Dataset:

The dataset is downloaded from Kaggle. The downloaded data consist of 2 files (train and test sets). Train dataset consists of many features such as (PassengerId, Age, Sex elc) and one label, which represents if a person is survived or not. Test dataset consists of only the features and it is not labeled. In this project, only the train dataset has been used by splitting the whole rows into train and test dataset, so that the test dataset has labels.

The code:

In this Repository, there are one code file 'code.ipynb' has been scripted using Jupyter Notebook.

Project steps:

1. EDA:

In this section, different EDA methods has been used to observe visualize the nature of the data such as (distrobution, Box plots, missing values outliers etc).

3. Data Preprocessing:

Now we have a clear image of what is required to preprocess the data. by removing the outliers, adjusting the missing data, normalize the distribution of some features as well as creating new features based on existing features. As the dataset were either numerical or categorial, an ecoding step were required to transform the categorial into numerical data, as most of Machine Learning algorithms use only numerical data such as Logistic Regression, SVM and KNN.

4. Building and training the model:

Different models were build to classify the output, which are Logistic Regression, SVM, KNN, Decision Tree, Random Forest and ANN. Next, Prediction method has been applied based on the test set, so that we can compare the accuracy between the different applied algorithms as well as a Cross-Validation technique.

5. XAI (Shap Value):

Until this stage, we have measured the accuracy of each algorithm, but we have no idea what features contributed in the decision process of the algorithm to take such a decision. Here, measuring shap value is applied to understand the black box model of the algorithm and what features contributed the most compared to other features.

About

A python script to classify Titanic dataset (Survived and not Survived) by applying different Machine Learning Algorithms have been used such as Logistic Regression, SVM, KNN, Decision Tree and Random Forest.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published