 <h1 align="center"> <font color="lightgreen"> Case study : Titanic data analysis using Numpy </font> </h1> 

> This is a dummy dataset and is not meant to be used for any other purpose other than learning.

Here we are going to analyze the Titanic surviver dataset. It contains the following columns:

> $\color{lightgreen}number, passenger\_id, survived, gender, ticket\_class$

> <picture>
>   <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/Mqxx/GitHub-Markdown/main/blockquotes/badge/light-theme/info.svg">
>   <img alt="Info" src="https://raw.githubusercontent.com/Mqxx/GitHub-Markdown/main/blockquotes/badge/dark-theme/info.svg">
> </picture><br>
>
> Although this is a case study; we won't be focusing heavily on the dataset, but more on the effectiveness of `numpy` in handling data and performing analysis. 

In [1]:
import numpy as np

In [2]:
# import data
data = np.loadtxt('./data/data.txt')

In [3]:
print("Dimension: ", data.ndim)
print("Shape: ", data.shape)

Dimension:  2
Shape:  (400, 5)


In [4]:
passengerId, survived, gender, ticket_class = data.T[1:]

In [5]:
print(passengerId.dtype, survived.dtype, gender.dtype, ticket_class.dtype)

float64 float64 float64 float64


### Analysis

In [7]:
# 1. How many of the total survived
print(f'{len(survived[survived == 1])} people survived out of {len(survived)}')

144 people survived out of 400


In [8]:
# 2. gender distribution {0: Male, 1: Female}
m, f= np.unique(gender, return_counts=True)[1::][0]
print(f'Male: {m}, Female: {f}')
print(f'Ratio of male:Female: {(m/400)*100:.0f}:{(f/400)*100:.0f}')

# 3. ticket class distribution {1: 1st, 2: 2nd, 3: 3rd}
c1, c2, c3 = np.unique(ticket_class, return_counts=True)[1::][0]
print(f'1st class: {c1}, 2nd class: {c2}, 3rd class: {c3}')
print(f'Ratio of 1st:2nd:3rd: {(c1/400)*100:.0f}:{(c2/400)*100:.0f}:{(c3/400)*100:.0f}')


Male: 231, Female: 169
Ratio of male:Female: 58:42
1st class: 41, 2nd class: 153, 3rd class: 206
Ratio of 1st:2nd:3rd: 10:38:52


In [9]:
# 3 Survival gender distribution
maleCount = np.sum((survived == 1) & (gender == 0))
femaleCount = np.sum((survived == 1) & (gender == 1))
total_survived = maleCount + femaleCount

print(f"survived Male : {maleCount}, Females survived: {femaleCount}")
print(f'Ratio: {(maleCount/total_survived)*100:.0f}:{(femaleCount/total_survived)*100:.0f}')

survived Male : 85, Females survived: 59
Ratio: 59:41


In [11]:
# 5. How many of the total survived in each class
for i in range(1, 4):
    print(f'{len(survived[(survived == 1) & (ticket_class == i)])} people survived in class {i}')

9 people survived in class 1
53 people survived in class 2
82 people survived in class 3


In [12]:
# percentage of men survived
print(f'Out of {m} Men {(maleCount/m)*100:.1f}% or {maleCount} men survived.')
print(f'Out of {f} Females {(femaleCount/f)*100:.1f}% or {femaleCount} females survived.')

Out of 231 Men 36.8% or 85 men survived.
Out of 169 Females 34.9% or 59 females survived.


In [13]:
# Chance of survival in each class
for i in range(1, 4):
    print(f'Chance of survival in class {i}: {(len(survived[(survived == 1) & (ticket_class == i)])/len(survived[ticket_class == i]))*100:.1f}%')

Chance of survival in class 1: 22.0%
Chance of survival in class 2: 34.6%
Chance of survival in class 3: 39.8%
