Skip to content

Exploratoray Data Analysis, Feature engineering using Python, Jupyter Notebook.

Notifications You must be signed in to change notification settings

nihar-phadnis/Titanic-Machine-learning-from-Disaster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Titanic-Machine-learning-from-Disaster

Ahoy, welcome to this repository. Machine learning from disaster using python

Exploratoray Data Analysis, Feature engineering using Python, Jupyter Notebook.

This is a tutorial in an python Notebook for the Kaggle competition, Titanic Machine Learning From Disaster. The goal of this repository is to practice exploring the data and visualising using matplotlib and seaborn, further cleaning and imputing and as well as using python for Kaggle's Data Science competitions.

Throughout the notebook, following libraries have been used

  1. Pandas.
  2. Numpy.
  3. Scikit-leaarn.
  4. Matplotlib.
  5. Seaborn.

Problem statement, from kaggle competition homepage

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this contest, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.

This Kaggle Getting Started Competition provides an ideal starting place for people who may not have a lot of experience in data science and machine learning."

My goal of this little exercise:

Explore the dataset, giving an extensive quantitative report. Impute the missing values and use appropriate model to predict the survival of the passenger.

And hopefully, you read the report too which includes my observations