# Titanic - Machine Learning from Disaster 

In this project, we will explore the Titanic dataset provided by Kaggle as part of the "Titanic - Machine Learning from Disaster" competition. The goal of this competition is to build a predictive model that can accurately classify whether a passenger survived or not based on various features such as age, gender, class, and fare.

The Titanic sank on April 15, 1912, after hitting an iceberg during its maiden voyage from Southampton to New York City. Of the 2,224 passengers and crew on board, more than 1,500 lost their lives, making it one of the deadliest maritime disasters in history. The dataset contains information about the passengers, including demographics and socio-economic status, which we will use to train our machine learning model.

In this notebook, we will follow an end-to-end data science workflow that includes:

1. **Data Exploration**: Analyzing the dataset to understand its structure, identify missing values, and visualize key features.
2. **Data Preprocessing**: Cleaning the data by handling missing values, encoding categorical variables, and scaling numerical features.
3. **Model Building**: Creating and training machine learning models to predict survival.
4. **Model Evaluation**: Evaluating the performance of the models using appropriate metrics and techniques.
5. **Submission Preparation**: Preparing the final predictions for submission to the Kaggle competition.

Let's get started by loading the necessary libraries and the Titanic dataset!


In [1]:
# data analysis and wrangling
import pandas as pd
import numpy as np
import random as rnd

# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# machine learning
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier

ModuleNotFoundError: No module named 'seaborn'

In [2]:
!pip install kaggle seaborn

Collecting kaggle
  Downloading kaggle-1.6.17.tar.gz (82 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.7/82.7 kB[0m [31m335.8 kB/s[0m eta [36m0:00:00[0m eta [36m0:00:01[0m[36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting python-slugify (from kaggle)
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting text-unidecode>=1.3 (from python-slugify->kaggle)
  Downloading text_unidecode-1.3-py2.py3-none-any.whl.metadata (2.4 kB)
Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m[31m1.6 MB/s[0m eta [36m0:00:01[0m
[?25hDownloading python_slugify-8.0.4-py2.py3-none-any.whl (10 kB)
Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
[2K   [38;2;114;156;31m━━━━