<center><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/RMS_Titanic_3.jpg/1280px-RMS_Titanic_3.jpg" alt="Titanic" width = "500"></center>

# Introduction
***
<font size="5"><center><em> "The Titanic case has been used too often and feels dull" </em></center></font>

But overthinking only kept my data project stuck in emptiness. Finally, I decided to push forward and disregard that notion. Why not kickstart my data science project with the Titanic case?
In that decision, I found newfound inspiration. I laid out the steps, prepared my Python notebook, and began delving into the Titanic dataset. Despite the debated boredom, I discovered intriguing findings that sharpened my data science skills.<br>
So, let's leave behind the common mindset and dare to face the challenge. This is my first step towards building an extraordinary data science portfolio.

## About Titanic Disaster
The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.<br>
While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

## Objectives
- Do an **Exploratory Data Analysis (EDA)** of Titanic Disaster
- Build a **predictive model** to predict the chances of passengers survival
- **KICKSTART FOR MY DATA SCIENCE PROJECT 🔥🔥🔥**

# Part 1: Importing Necessary Library and Dataset 📊
***
To start this project, first we need to prepare the required libraries and datasets

## 1.1 Importing Library
Want to expolore and analyze data more easily❔❔❔<br>
The answer lies in using **libraries**❗❗❗

In this section we will import libraries to assist us in analyzing the Titanic data

In [1]:
import pandas as pd
import numpy as py
from matplotlib import pyplot as plt
plt.style.use('fivethirtyeight')
import seaborn as sns

import warnings
warnings.filterwarnings('ignore') ## To ignore warning

%config InlineBackend.figure_format = 'retina' 
# %matplotlib inline
## the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/titanic/train.csv
/kaggle/input/titanic/test.csv
/kaggle/input/titanic/gender_submission.csv


## 1b. Loading Dataset
There are three csv on this project, but we will only use two of them  (train.csv & test.csv).

**Data Training** (train.csv): The training data is used to train the machine learning model by feeding it labeled examples and enabling it to capture patterns and relationships between input features and corresponding labels.

**Data Test**(test.csv): The test data is a separate dataset used to evaluate the performance of the trained model by comparing its predictions against the true labels, allowing us to assess how well the model can generalize and make accurate predictions on unseen data.

In [2]:
train = pd.read_csv('/kaggle/input/titanic/train.csv')
test = pd.read_csv('/kaggle/input/titanic/test.csv')
combine = pd.concat([train.drop('Survived',1),test])

# 2. Overview Data 🔎
***
Let's look at the data in general starting from its shape, what columns are available, summary statistics, and so on

## 2a. An Overview of The Dataset

### Training Dataset (train.csv)

In [3]:
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [5]:
train.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


### Test Dataset (test.csv)

In [6]:
test.head()

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [7]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  418 non-null    int64  
 1   Pclass       418 non-null    int64  
 2   Name         418 non-null    object 
 3   Sex          418 non-null    object 
 4   Age          332 non-null    float64
 5   SibSp        418 non-null    int64  
 6   Parch        418 non-null    int64  
 7   Ticket       418 non-null    object 
 8   Fare         417 non-null    float64
 9   Cabin        91 non-null     object 
 10  Embarked     418 non-null    object 
dtypes: float64(2), int64(4), object(5)
memory usage: 36.0+ KB


In [8]:
test.describe()

Unnamed: 0,PassengerId,Pclass,Age,SibSp,Parch,Fare
count,418.0,418.0,332.0,418.0,418.0,417.0
mean,1100.5,2.26555,30.27259,0.447368,0.392344,35.627188
std,120.810458,0.841838,14.181209,0.89676,0.981429,55.907576
min,892.0,1.0,0.17,0.0,0.0,0.0
25%,996.25,1.0,21.0,0.0,0.0,7.8958
50%,1100.5,3.0,27.0,0.0,0.0,14.4542
75%,1204.75,3.0,39.0,1.0,0.0,31.5
max,1309.0,3.0,76.0,8.0,9.0,512.3292


In general there is no big difference between **Data Training** and **Data Test**. </br>However, there is one very clear difference between these two data, which is that the test data does not have a `survived` column. This is not a problem because later we will fill in the `survived` column in the test data using the predicted results that we will make later

## 2b. About Dataset

Now let's look at the features of the dataset and its types:

### Categorical
* **Nominal** (Categories or group that have no intrinsic order or rank)
 * **Cabin** (Cabin Number)
 * **Embarked** (Port of Embarkation)
 * **Sex**
 
* **Ordinal** (Categories or group that have intrinsic order or rank)
 * **Pclass** (Ticket Class)
   * 1 = 1st / Upper
   * 2 = 2nd / Middle
   * 3 = 3rd / Lower
 

### Numeric
* **Discrete**
 * **Passenger ID** (Unique ID for each passenger)
 * **SibSp** (# of siblings / spouse aboard the Titanic)
 * **Parch**(# of parents / children aboard the Titanic)
 * **Survival**
   * **0** = Did Not Survived
   * **1** = Survived

* **Continuous**
 * **Age** (Age in years)
 * **Fare** (Passenger fare)

### Text
* **Ticket** (Ticket Number)
* **Name** (Passenger Name)

## 2c. Tableau Visualization
Before I made this notebook I've tried to play around using the same data using Tableau. I did this in order to get another perspective when exploring data. Apart from that, by using Tableau, we can also create interactive dashboards.

In [9]:
%%HTML
<div class='tableauPlaceholder' id='viz1688721483092' style='position: relative'><noscript><a href='#'><img alt='Dashboard ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ov&#47;OverviewTitanicTrainingDataset&#47;Dashboard&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='OverviewTitanicTrainingDataset&#47;Dashboard' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ov&#47;OverviewTitanicTrainingDataset&#47;Dashboard&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1688721483092');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='1600px';vizElement.style.height='927px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>