# <span style="color:#7A5197;">***Spaceship Titanic***<span>


### <span style="color:#7A5197;">*Table of content*<span>
<a id="table-of-contents"></a>
- [1. Introduction](#1)
    - [1.1 Storytelling](#1.1)
    - [1.2 Storytelling](#1.2)
- [2. Preparations](#2)
- [3. Dataset Overview](#3)
    - [3.1 Train Dataset](#3.1)
        - [3.1.1 Quick view](#3.1.1)
        - [3.1.2 Data types](#3.1.2)
        - [3.1.3 Basic Statistics](#3.1.3)
        - [3.1.4 Target Column](#3.1.4)
    - [3.2 Test Dataset](#3.2)
        - [3.2.1 Quick view](#3.2.1)
        - [3.2.2 Data types](#3.2.2)
        - [3.2.3 Basic Statistics](#3.2.3)
    - [3.3 Submission](#3.3)
- [4. Explore Data Analisys](#4)
    - [4.1 Transported](#4.1)
    - [4.2 HomePlanet](#4.2)
    - [4.3 CryoSleep](#4.3)
    - [4.4 Cabins](#4.4)
    - [4.5 Destination](#4.5)
    - [4.6 Age](#4.6)
    - [4.7 VIP](#4.7)
    - [4.8 RoomService](#4.8)
    - [4.9 FoodCourt](#4.9)
    - [4.10 ShoppingMall](#4.10)
    - [4.11 SPA](#4.11)
    - [4.12 VRDeck](#4.12)
- [5. Reference](#5)    

[back to top](#table-of-contents)
<a id="1"></a>
# **<span style="color:#7A5197;">1. Introduction</span>**
<a id="1.1"></a>
### **<span style="color:#7A5197;"> 1.1 Storytelling </span>**


> _"There are no barriers to human… "_ - (c) Sergei Pavlovich Korolev
 
 
Welcome to the year 2912, where your data science skills are needed to solve a cosmic mystery. We've received a transmission from four lightyears away and things aren't looking good.

The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.

While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

[back to top](#table-of-contents)
<a id="1.2"></a>
### **<span style="color:#7A5197;"> 1.2 Evaluation </span>**

Submissions are evaluated based on their classification accuracy, the percentage of predicted labels that are correct.


$ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}$

[back to top](#table-of-contents)
<a id="2"></a>
# **<span style="color:#7A5197;">2. Preparations</span>**

Preparing packages and data that will be used in the analysis process. Packages that will be loaded are mainly for data manipulation, data visualization and modeling. There are 2 datasets that are used in the analysis, they are train and test dataset. The main use of train dataset is to train models and use it to predict test dataset. While sample submission file is used to informed participants on the expected submission for the competition. (to see the details, please expand)


In [None]:
# import packages
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

# setting up options
import warnings
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
warnings.filterwarnings('ignore')
from cycler import cycler
from IPython.core.display import HTML


# read datasets
train_df = pd.read_csv('../input/spaceship-titanic/train.csv')
test_df = pd.read_csv('../input/spaceship-titanic/test.csv')
ssub = pd.read_csv('../input/spaceship-titanic/sample_submission.csv')

features = [col for col in test_df if col not in ['PassengerId']]

def multi_table(table_list):
    return HTML(
        f"<table><tr> {''.join(['<td>' + table._repr_html_() + '</td>' for table in table_list])} </tr></table>")

def formatter(v):
    if type(v) is str:
        return v
    if pd.isna(v) or v <= 0:
        return ''
    if v == int(v):
        return f'{v:.0f}'
    return f'{v:.1f}'

[back to top](#table-of-contents)
<a id="3"></a>
# **<span style="color:#7A5197;">3. Dataset Overview</span>**

The intend of the overview is to get a feel of the data and its structure in train, test and submission file. An overview on train and test datasets will include a quick analysis on missing values and basic statistics, while sample submission will be loaded to see the expected submission.

|Variable|Definition|
|------|---|
|PassengerId|A unique Id for each passenger. Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. People in a group are often family members, but not always.||
|HomePlanet|The planet the passenger departed from, typically their planet of permanent residence.|
|CryoSleep|Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.HomePlanet||
|Cabin|The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.||
|Destination|The planet the passenger will be debarking to.||
|Age|The age of the passenger.||
|VIP|Whether the passenger has paid for special VIP service during the voyage.||
|RoomService, FoodCourt, ShoppingMall, Spa, VRDeck|Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.||
|Name|The first and last names of the passenger.||
|Transported|Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.||

[back to top](#table-of-contents)
<a id="3"></a>
### **<span style="color:#7A5197;">3.1 Train Dataset</span>**

As stated before, train dataset is mainly used to train predictive model as there is an available target variable in this set. This dataset is also used to explore more on the data itself including find a relation between each predictors and the target variable.

__Observations:__
* There are two class to predicted.  (`True` or `False`)
* All features (not include index and target) have missing values. (`NaN`, `Null` and `zeros`)
* `CryoSleep` an `VIP` needs repalce from object to bool data type.

[back to top](#table-of-contents)
<a id="3.1.1"></a>
#### **<span style="color:#7A5197;">3.1.1 Quick view</span>**


In [None]:
train_df.head()

In [None]:
print(f'Number of rows: {train_df.shape[0]};  Number of columns: {train_df.shape[1]}; No of missing values: {sum(train_df.isna().sum())}')
train_df.isna().sum()

[back to top](#table-of-contents)
<a id="3.1.2"></a>
#### **<span style="color:#7A5197;">3.1.2 Data types</span>**


In [None]:
train_df.dtypes

[back to top](#table-of-contents)
<a id="3.1.3"></a>
#### **<span style="color:#7A5197;">3.1.3 Basic Statistics</span>**
Below is the basic statistics for each variables which contain information on count, mean, standard deviation, minimum, 1st quartile, median, 3rd quartile and maximum.

In [None]:
train_df.describe()

[back to top](#table-of-contents)
<a id="3.1.3"></a>
#### **<span style="color:#7A5197;">3.1.4 Target Column</span>**


In [None]:
print('Target column basic statistics:')
train_df['Transported'].describe()

In [None]:
print('Frequency of each target classes:')
train_df['Transported'].value_counts()

[back to top](#table-of-contents)
<a id="3.2"></a>
### **<span style="color:#7A5197;">3.2 Test Dataset</span>**
Test dataset is used to make a prediction based on the model that has previously trained. Exploration in this dataset is also needed to see how the data is structured and especially on it’s similiarity with the train dataset.

__Observations:__
* Observations in the `train` dataset are still consistent in the `test` dataset.



[back to top](#table-of-contents)
<a id="3.2.1"></a>
#### **<span style="color:#7A5197;">3.2.1 Quick view</span>**


In [None]:
test_df.head()

[back to top](#table-of-contents)
<a id="3.2.2"></a>
#### **<span style="color:#7A5197;">3.2.2 Data types</span>**

In [None]:
test_df.dtypes

[back to top](#table-of-contents)
<a id="3.2.3"></a>
#### **<span style="color:#7A5197;">3.2.3 Basic Statistics</span>**
Below is the basic statistics for each variables which contain information on count, mean, standard deviation, minimum, 1st quartile, median, 3rd quartile and maximum.

In [None]:
test_df.describe()

[back to top](#table-of-contents)
<a id="3"></a>
### **<span style="color:#7A5197;">3.3 Submission</span>**
Below is the first 5 rows of submission file:

In [None]:
ssub.head()

[back to top](#table-of-contents)
<a id="4"></a>
# **<span style="color:#7A5197;">4. Explore Data Analisys</span>**

<a id="4.1"></a>
### **<span style="color:#7A5197;">4.1 Transported</span>**

Target value is binary, 49.64% passengers were transported to an alternate dimension!

In [None]:
plt.subplots(figsize=(25, 10), facecolor='#f6f5f5')
plt.pie(train_df.Transported.value_counts(), startangle=90, wedgeprops={'width':0.3}, colors=['#F5C63C', '#7A5197'] )
plt.title('Target Balance Pie Chart', loc='center', fontsize=24, color='#7A5197', fontweight='bold')
plt.text(0, 0, f"{train_df.Transported.value_counts()[0] / train_df.Transported.count() * 100:.2f}%", ha='center', va='center', fontweight='bold', fontsize=42, color='#7A5197')
plt.legend(train_df.Transported.value_counts().index, ncol=2, facecolor='#f6f5f5', edgecolor='#f6f5f5', loc='lower center', fontsize=16)
plt.show()

[back to top](#table-of-contents)
<a id="4.2"></a>
### **<span style="color:#7A5197;">4.2 HomePlanet</span>**

The most part of passengers from Earth. Also as we can see below residents from Europa was transported more then others.
* 57.6% mistransported passengers from Earth in relation to fellow countrymans -> Probably residents Earth is not able to get safely cabins.
* 34.1% mistransported passengers from Europe in relation to fellow countrymans -> Reverse effect.
___
* 31.2% residents Earth wasnt transported in relation to all.
___
* 54.2% residents Earth.
* 25.1% residents from Europe.
* 20.7% residents from Mars.

In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(9, 5), facecolor=facecolor)
colors = ['#7A5197', '#BB5098', '#5344A9', '#F5C63C', '#F47F6B']

for i, t_val in enumerate(train_df.Transported.unique()):
    ax = fig.add_subplot(2, 2, i+1, facecolor=facecolor)
    _pivot_df = pd.pivot_table(train_df[train_df['Transported'] == t_val], index='HomePlanet', aggfunc='count', values='Transported')
    ax.barh(_pivot_df.index, _pivot_df['Transported'], color=colors[i])
    _target_col = {False: 'Mistransported', True: 'Transported'}
    ax.set_title(_target_col[t_val], fontsize=12, fontweight="medium")
    ax.tick_params(left=False, bottom=False, right=False, top=False, labelleft=(i<=1))
    ax.xaxis.set_ticks([])
    
    for _, spine in ax.spines.items():
        spine.set_visible(False)
        
    if i != 0:
        ax.yaxis.set_ticks([])
    else:
        pass
    for ind, val in enumerate(_pivot_df['Transported']):
        #print(val)
        ax.text(0, ind, formatter(val), fontweight="bold", color=facecolor, fontsize=14)
    
fig.text(0.1, 1.08,'Bar Plot Count of Passengers',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1,'From Home Planets ',fontweight='bold', fontsize=24, color='#7A5197');


In [None]:
print('Percentage of mistransported passengers in relation to all')
_ = pd.pivot_table(train_df[train_df['Transported'] == False], index='HomePlanet', aggfunc='count', values='Transported')['Transported'] / train_df['CryoSleep'].count() * 100
_pivot_1 =  pd.DataFrame(_).style.background_gradient(cmap='BuPu')

print('Percentage of mistransported passengers in relation to fellow countrymans')
_ = pd.pivot_table(train_df[train_df['Transported'] == False], index='HomePlanet', aggfunc='count', values='Transported')['Transported'] / pd.pivot_table(train_df, index='HomePlanet', aggfunc='count', values='Transported')['Transported'] * 100
_pivot_2 = pd.DataFrame(_).style.background_gradient(cmap='BuPu')

multi_table([_pivot_1, _pivot_2])

[back to top](#table-of-contents)
<a id="4.3"></a>
### **<span style="color:#7A5197;">4.3 CryoSleep</span>**


* 30.7% passengers from Earth with option CryoSleep in relation to fellow countrymans. -> The hypothesis that Earth residents live poorer than those from Mars and Europe is look true.
* 43.9% passengers from Europe with option CryoSleep in relation to fellow countrymans. 
___
* 16.3% residents Earth with option CryoSleep in relation to all.

In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(9, 5), facecolor=facecolor)
colors = ['#7A5197', '#BB5098', '#5344A9', '#F5C63C', '#F47F6B']

for i, t_val in enumerate(train_df.Transported.unique()):
    ax = fig.add_subplot(2, 2, i+1, facecolor=facecolor)
    _pivot_df = pd.pivot_table(train_df[train_df['CryoSleep'] == t_val], index='HomePlanet', aggfunc='count', values='CryoSleep')
    ax.barh(_pivot_df.index, _pivot_df['CryoSleep'], color=colors[i])
    _target_col = {False: 'Without CryoSleep', True: 'CryoSleep'}
    ax.set_title(_target_col[t_val], fontsize=12, fontweight="medium")
    ax.tick_params(left=False, bottom=False, right=False, top=False, labelleft=(i<=1))
    ax.xaxis.set_ticks([])
    
    for _, spine in ax.spines.items():
        spine.set_visible(False)
        
    if i != 0:
        ax.yaxis.set_ticks([])
    else:
        pass
    for ind, val in enumerate(_pivot_df['CryoSleep']):
        #print(val)
        ax.text(0, ind, formatter(val), fontweight="bold", color=facecolor, fontsize=14)
    
fig.text(0.1, 1.08,'Bar Plot Count of Passengers',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1,'With Option CryoSleep ',fontweight='bold', fontsize=24, color='#7A5197');


In [None]:
print('Percentage of Passengers with Option CryoSleep  in relation to all')
_ = pd.pivot_table(train_df[train_df['CryoSleep'] == True], index='HomePlanet', aggfunc='count', values='CryoSleep')['CryoSleep'] / train_df['CryoSleep'].count() * 100
_pivot_1 = pd.DataFrame(_).style.background_gradient(cmap='BuPu')

print('Percentage of Passengers with Option CryoSleep in relation to fellow countrymans')
_ = pd.pivot_table(train_df[train_df['CryoSleep'] == True], index='HomePlanet', aggfunc='count', values='CryoSleep')['CryoSleep'] / pd.pivot_table(train_df, index='HomePlanet', aggfunc='count', values='CryoSleep')['CryoSleep'] * 100
_pivot_2 = pd.DataFrame(_).style.background_gradient(cmap='BuPu')

multi_table([_pivot_1, _pivot_2])

[back to top](#table-of-contents)
<a id="4.4"></a>
### **<span style="color:#7A5197;">4.4 Cabins</span>**

* CryoSleep option is positive affect to be transported.
* Residents Earth have worst Deck and probability to be transported.
* Residents Mars have high dispersion of probability to be transported.
___
I have computed difference between transported probabilities within cryosleep option. The more this values, the options to be transported of this Deck for different classes of HomePlanet. 

* `D`, `E`, `F` within cryoSleep is optimal to be transported.
* `B` is optimal in general.
* `D` and `F` optimal for Mars residents.
* `S Side`  more optimal to be transported.

In [None]:
train_df = train_df.dropna()
train_df['CabinDeck'] = train_df.Cabin.map(lambda x: x.split('/')[0])
train_df['CabinNum'] = train_df.Cabin.map(lambda x: x.split('/')[1])
train_df['CabinSide'] = train_df.Cabin.map(lambda x: x.split('/')[2])

In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(14, 2), facecolor=facecolor)
colors_deck = ['#2C0735', '#7A5197', '#BB5098', '#4E148C', '#613DC1', '#858AE3','#56CBF9', '#97DFFC', '#F7F4EA']

for i, t_val in enumerate(train_df.CabinDeck.unique()):
    ax = fig.add_subplot(1, 9, i+1, facecolor=facecolor)
    _pivot_df = pd.pivot_table(train_df[train_df['CabinDeck'] == t_val], index='CabinSide', aggfunc='count', values='CryoSleep')
    ax.barh(_pivot_df.index, _pivot_df['CryoSleep'], color=colors_deck[i])
    #_target_col = {False: 'Without CryoSleep', True: 'CryoSleep'}
    #ax.set_title(_target_col[t_val], fontsize=12, fontweight="medium")
    ax.tick_params(left=False, bottom=False, right=False, top=False, labelleft=(i<=1))
    ax.xaxis.set_ticks([])
    
    for _, spine in ax.spines.items():
        spine.set_visible(False)
        
    if i != 0:
        ax.yaxis.set_ticks([])
    else:
        pass
    for ind, val in enumerate(_pivot_df['CryoSleep']):
        #print(val)
        ax.text(0, ind, formatter(val), fontweight="bold", color=facecolor, fontsize=14)
    
fig.text(0.13, 1.20,'Bar Plot Count of Passengers',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.13, 1,'by CabinDeck & CabinSide',fontweight='bold', fontsize=24, color='#7A5197');
fig.legend(train_df.CabinDeck.unique(), ncol=8, facecolor='#f6f5f5', edgecolor='#f6f5f5',loc='lower center', fontsize=8, framealpha=1)


In [None]:
#pd.pivot_table(index='CabinSide', data=train_df, values='Transported', aggfunc=['count', 'sum', 'mean'])
_pivot_1 = pd.pivot_table(index=['CryoSleep', 'HomePlanet', 'CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_2 = pd.pivot_table(index=['CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_3 = pd.pivot_table(index=['CabinSide', 'CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_4 = pd.pivot_table(index=['CabinSide', 'HomePlanet'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_5 = pd.pivot_table(index=['HomePlanet', 'CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')

_ = pd.pivot_table(index=['CryoSleep', 'HomePlanet', 'CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].reset_index()

_val = _[_['CryoSleep'] == True].sort_values(by='CabinDeck')['Transported'].values - _[_['CryoSleep'] == False].sort_values(by='CabinDeck').iloc[:-1]['Transported'].values
_df = _[_['CryoSleep'] == True].sort_values(by='CabinDeck')
_df['Diff'] = _val
_df = _df[['HomePlanet', 'CabinDeck', 'Diff']].sort_values(by='Diff', ascending=False).style.background_gradient(cmap='BuPu')

multi_table([_pivot_1,_pivot_2,_pivot_3,_pivot_4,_pivot_5, _df])

[back to top](#table-of-contents)
<a id="4.5"></a>
### **<span style="color:#7A5197;">4.5 Destination</span>**
* `55 Cancri-e` is optimal probablility to be transported than other.
* `TRAPPIST-1e & Earth`: 0.38 probability to be transported.
* if `CryoSleep` option include, this given more probability to be transported.

In [None]:
_pivot_1 = pd.pivot_table(index=['Destination'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_2 = pd.pivot_table(index=['Destination'], columns=['HomePlanet'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu', axis=0)
_pivot_3 = pd.pivot_table(index=['Destination', 'CryoSleep'], columns=['HomePlanet'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_4 = pd.pivot_table(index=['Destination', 'CabinDeck'], columns=['HomePlanet'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')

multi_table([_pivot_1, _pivot_2, _pivot_3, _pivot_4])

[back to top](#table-of-contents)
<a id="4.6"></a>
### **<span style="color:#7A5197;">4.6 Age</span>**

* Age under 10 is `0.72` probablity to be transported.
* Earthlings over 20 years old suffered the most.
* Passengers in CabinDeck `e` and `f` over 20 years old suffered the most.


In [None]:
train_df['AgeBins'] = pd.cut(train_df['Age'], bins=8, labels=['>10', '10-20', '20-30', '30-40', '40-50', '50-60', '60-70', '70-80'])

facecolor = '#f6f5f5'
fig = plt.figure(figsize=(16, 3), facecolor=facecolor)
colors_age = ['#2C0735', '#7A5197', '#BB5098', '#4E148C', '#613DC1', '#858AE3','#56CBF9', '#97DFFC', '#F7F4EA']

for i, t_val in enumerate(train_df.AgeBins.unique()):
    ax = fig.add_subplot(1, 9, i+1, facecolor=facecolor)
    _pivot_df = pd.pivot_table(train_df[train_df['AgeBins'] == t_val], index='HomePlanet', aggfunc='count', values='CryoSleep')
    ax.barh(_pivot_df.index, _pivot_df['CryoSleep'], color=colors_age[:9][i])
    #_target_col = {False: 'Without CryoSleep', True: 'CryoSleep'}
    ax.set_title(t_val, fontsize=12, fontweight="medium")
    ax.tick_params(left=False, bottom=False, right=False, top=False, labelleft=(i<=1))
    ax.xaxis.set_ticks([])

    for _, spine in ax.spines.items():
        spine.set_visible(False)
        
    if i != 0:
        ax.yaxis.set_ticks([])
    else:
        pass
    for ind, val in enumerate(_pivot_df['CryoSleep']):
        #print(val)
        ax.text(0, ind, formatter(val), fontweight="bold", color=facecolor, fontsize=14)
    

fig.text(0.1, 1.2,'Bar Plot Count of Passengers',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1.08,'by Age',fontweight='bold', fontsize=24, color='#7A5197');
fig.legend(train_df.AgeBins.unique(), ncol=8, facecolor='#f6f5f5', edgecolor='#f6f5f5',loc='lower center', fontsize=8, framealpha=1)


In [None]:
_pivot_1 = pd.pivot_table(index=['AgeBins'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_2 = pd.pivot_table(index=['HomePlanet', 'AgeBins'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_3 = pd.pivot_table(index=['AgeBins', 'HomePlanet'], columns=['Destination'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu', axis=0)
_pivot_4 = pd.pivot_table(index=['AgeBins'], columns=['CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu', axis=0)

multi_table([_pivot_1, _pivot_2, _pivot_3, _pivot_4])


In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(9, 5), facecolor=facecolor)
colors = ['#5344A9', '#7A5197', '#BB5098', '#F5C63C', '#F47F6B']
colors_dark = ['#382F6C', '#563C68', '#864670', '#F5C63C', '#F47F6B']


def formatter(v):
    if type(v) is str:
        return v
    if pd.isna(v) or v <= 0:
        return ''
    if v == int(v):
        return f'{v:.0f}'
    return f'{v:.1f}'


for i, t_val in enumerate(train_df.HomePlanet.unique()):
    ax = fig.add_subplot(1, 3, i+1, facecolor=facecolor)
    _pivot_df = pd.pivot_table(train_df[train_df['HomePlanet'] == t_val], columns=['CryoSleep'], index='AgeBins', aggfunc='mean', values='Transported')

    ax.barh(_pivot_df.index, _pivot_df[True], color=colors[i])
    ax.barh(_pivot_df.index, _pivot_df[False], color=colors_dark[i])

    _target_col = {False: 'Without CryoSleep', True: 'CryoSleep'}
    ax.set_title(f'{t_val}', fontsize=12, fontweight="medium")
    ax.tick_params(left=False, bottom=False, right=False, top=False, labelleft=(i<=1))
    ax.xaxis.set_ticks([])
    plt.legend(['CryoSleep', 'Without CryoSleep'], ncol=1, facecolor='#f6f5f5', edgecolor='#f6f5f5', loc='lower center', fontsize=8, framealpha=1)

    for _, spine in ax.spines.items():
        spine.set_visible(False)
        
    if i != 0:
        ax.yaxis.set_ticks([])
    else:
        pass
    for ind, val in enumerate(_pivot_df[False]):
        #print(val)
        ax.text(0, ind, formatter(val), fontweight="bold", color=facecolor, fontsize=14)
        
fig.text(0.1, 1.16,'Bar Plot Probability of Being Transported',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1.08,'by HomePlanet & Age',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1,'With Option CryoSleep ',fontweight='bold', fontsize=24, color='#7A5197');


In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(9, 5), facecolor=facecolor)
colors = ['#5344A9', '#7A5197', '#BB5098', '#F47F6B', '#F5C63C']
colors_dark = ['#382F6C', '#563C68', '#864670', '#C46D5E','#B89631']


def formatter(v):
    if type(v) is str:
        return v
    if pd.isna(v) or v <= 0:
        return ''
    if v == int(v):
        return f'{v:.0f}'
    return f'{v:.1f}'


for i, t_val in enumerate(train_df.Destination.unique()):
    ax = fig.add_subplot(1, 3, i+1, facecolor=facecolor)
    _pivot_df = pd.pivot_table(train_df[train_df['Destination'] == t_val], columns=['CryoSleep'], index='AgeBins', aggfunc='mean', values='Transported')

    ax.barh(_pivot_df.index, _pivot_df[True], color=colors[::-1][i])
    ax.barh(_pivot_df.index, _pivot_df[False], color=colors_dark[::-1][i])

    _target_col = {False: 'Without CryoSleep', True: 'CryoSleep'}
    ax.set_title(f'{t_val}', fontsize=12, fontweight="medium")
    ax.tick_params(left=False, bottom=False, right=False, top=False, labelleft=(i<=1))
    ax.xaxis.set_ticks([])
    plt.legend(['CryoSleep', 'Without CryoSleep'], ncol=1, facecolor='#f6f5f5', edgecolor='#f6f5f5', loc='lower center', fontsize=8, framealpha=1)

    for _, spine in ax.spines.items():
        spine.set_visible(False)
        
    if i != 0:
        ax.yaxis.set_ticks([])
    else:
        pass
    for ind, val in enumerate(_pivot_df[False]):
        #print(val)
        ax.text(0, ind, formatter(val), fontweight="bold", color=facecolor, fontsize=14)
        
fig.text(0.1, 1.16,'Bar Plot Probability of Being Transported',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1.08,'by Destination & Age',fontweight='bold', fontsize=24, color='#7A5197');
fig.text(0.1, 1,'With Option CryoSleep ',fontweight='bold', fontsize=24, color='#7A5197');


[back to top](#table-of-contents)
<a id="4.7"></a>
### **<span style="color:#7A5197;">4.7 Vip</span>**

* `VIP` status does not affect transported chance

In [None]:
plt.subplots(figsize=(25, 10), facecolor='#f6f5f5')
plt.pie(train_df.VIP.value_counts(), startangle=90, wedgeprops={'width':0.3}, colors=[colors[3], '#f6f5f5'] )
plt.title('VIP Balance Pie Chart', loc='center', fontsize=24, color=colors[3], fontweight='bold')
plt.text(0, 0, f"{train_df.VIP.value_counts()[0] / train_df.VIP.count() * 100:.2f}%", ha='center', va='center', fontweight='bold',  fontsize=42, color=colors[3])
#plt.legend(['True', ''], ncol=2, facecolor='#f6f5f5', edgecolor='#f6f5f5', loc='lower center', fontsize=16)
plt.show()

In [None]:
_pivot_1 = pd.pivot_table(index=['VIP'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_2 = pd.pivot_table(index=['VIP', 'HomePlanet'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu')
_pivot_3 = pd.pivot_table(index=['VIP', 'HomePlanet'], columns=['Destination'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu', axis=0)
_pivot_4 = pd.pivot_table(index=['VIP'], columns=['CabinDeck'], data=train_df, values='Transported', aggfunc=['mean'])['mean'].style.background_gradient(cmap='BuPu', axis=1)

multi_table([_pivot_1, _pivot_2, _pivot_3, _pivot_4])

[back to top](#table-of-contents)
<a id="4.8"></a>
### **<span style="color:#7A5197;">4.8 RoomService</span>**
* RoomService option is popular for residers of `Mars` and `Europe` and whose destination of `Trappist` and `Canci`. 
* `VIP` passengers is also used this option more than other.
* `Young` passengers more used this option then older.

In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(14, 8), facecolor=facecolor)
gs = fig.add_gridspec(2, 1)

gs.update(wspace=0.2, hspace=0.7)

ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[1, 0])

ax0.set_facecolor(facecolor)
for s in ["top","right", "left", "bottom"]:
    ax0.spines[s].set_visible(False)

ax1.set_facecolor(facecolor)
for s in ["top","right","left", "bottom"]:
    ax1.spines[s].set_visible(False)
    
sns.violinplot(x="HomePlanet", y="RoomService", data=train_df, hue='Destination', ax=ax0, color=colors[3])
sns.violinplot(x="AgeBins", y="RoomService", data=train_df, hue='VIP', split=True, ax=ax1, color=colors[1])

ax0.legend(ncol=3, facecolor=facecolor, edgecolor=facecolor, loc='upper center')
ax0.text(-0.6, 15000, 'RoomService Distributions', fontweight='bold', fontsize=24, color=colors[3]);
ax0.text(-0.6, 13000, 'by Destination & HomePlanet', fontweight='bold', fontsize=24, color=colors[3]);

ax1.legend(ncol=2, facecolor=facecolor, edgecolor=facecolor, loc='upper center');
ax1.text(-0.7, 14000, 'RoomService Distributions', fontweight='bold', fontsize=24, color=colors[1]);
ax1.text(-0.7, 12000, 'by VIP & Age', fontweight='bold', fontsize=24, color=colors[1]);


[back to top](#table-of-contents)
<a id="4.9"></a>
### **<span style="color:#7A5197;">4.9 FoodCourt</span>**
* FoodCourt preferred residents `Europe`.
* `VIP` passengers is also preffered this option.
* Passengers whose years between`30-50` preffered FooCourt more then other.

In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(14, 8), facecolor=facecolor)
gs = fig.add_gridspec(2, 1)

gs.update(wspace=0.2, hspace=0.7)

ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[1, 0])

ax0.set_facecolor(facecolor)
for s in ["top","right", "left", "bottom"]:
    ax0.spines[s].set_visible(False)

ax1.set_facecolor(facecolor)
for s in ["top","right","left", "bottom"]:
    ax1.spines[s].set_visible(False)
    
sns.violinplot(x="HomePlanet", y="FoodCourt", data=train_df, hue='Destination', ax=ax0, color=colors[3])
sns.violinplot(x="AgeBins", y="FoodCourt", data=train_df, hue='VIP', split=True, ax=ax1, color=colors[1])

ax0.legend(ncol=3, facecolor=facecolor, edgecolor=facecolor, loc='upper center')
ax0.text(-0.6, 50000, 'FoodCourt Distributions', fontweight='bold', fontsize=24, color=colors[3]);
ax0.text(-0.6, 44000, 'by Destination & HomePlanet', fontweight='bold', fontsize=24, color=colors[3]);

ax1.legend(ncol=2, facecolor=facecolor, edgecolor=facecolor, loc='upper center');
ax1.text(-0.6, 54000, 'FoodCourt Distributions', fontweight='bold', fontsize=24, color=colors[1]);
ax1.text(-0.6, 46000, 'by VIP & Age', fontweight='bold', fontsize=24, color=colors[1]);


[back to top](#table-of-contents)
<a id="4.10"></a>
### **<span style="color:#7A5197;">4.10 ShoppingMall</span>**
* ShoppingMall preferred residents `Europe` and whose destination to `Trappist` and `Cancri`.
* Passengers whose years between`10-50` preffered ShoppingMall more then other.

In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(14, 8), facecolor=facecolor)
gs = fig.add_gridspec(2, 1)

gs.update(wspace=0.2, hspace=0.7)

ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[1, 0])

ax0.set_facecolor(facecolor)
for s in ["top","right", "left", "bottom"]:
    ax0.spines[s].set_visible(False)

ax1.set_facecolor(facecolor)
for s in ["top","right","left", "bottom"]:
    ax1.spines[s].set_visible(False)
    
sns.violinplot(x="HomePlanet", y="ShoppingMall", data=train_df, hue='Destination', ax=ax0, color=colors[3])
sns.violinplot(x="AgeBins", y="ShoppingMall", data=train_df, hue='VIP', split=True, ax=ax1, color=colors[1])

ax0.legend(ncol=3, facecolor=facecolor, edgecolor=facecolor, loc='upper center')
ax0.text(-0.6, 18000, 'ShoppingMall Distributions', fontweight='bold', fontsize=24, color=colors[3]);
ax0.text(-0.6, 15000, 'by Destination & HomePlanet', fontweight='bold', fontsize=24, color=colors[3]);

ax1.legend(ncol=2, facecolor=facecolor, edgecolor=facecolor, loc='upper center');
ax1.text(-0.6, 18000, 'ShoppingMall Distributions', fontweight='bold', fontsize=24, color=colors[1]);
ax1.text(-0.6, 15000, 'by VIP & Age', fontweight='bold', fontsize=24, color=colors[1]);


[back to top](#table-of-contents)
<a id="4.11"></a>
### **<span style="color:#7A5197;">4.11 Spa</span>**
* SPA preferred residents `Europe`.
* SPA preferred also whose older and age more than `30`.


In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(14, 8), facecolor=facecolor)
gs = fig.add_gridspec(2, 1)

gs.update(wspace=0.2, hspace=0.7)

ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[1, 0])

ax0.set_facecolor(facecolor)
for s in ["top","right", "left", "bottom"]:
    ax0.spines[s].set_visible(False)

ax1.set_facecolor(facecolor)
for s in ["top","right","left", "bottom"]:
    ax1.spines[s].set_visible(False)
    
sns.violinplot(x="HomePlanet", y="Spa", data=train_df, hue='Destination', ax=ax0, color=colors[3])
sns.violinplot(x="AgeBins", y="Spa", data=train_df, hue='VIP', split=True, ax=ax1, color=colors[1])

ax0.legend(ncol=3, facecolor=facecolor, edgecolor=facecolor, loc='upper center')
ax0.text(-0.6, 33000, 'Spa Distributions', fontweight='bold', fontsize=24, color=colors[3]);
ax0.text(-0.6, 28000, 'by Destination & HomePlanet', fontweight='bold', fontsize=24, color=colors[3]);

ax1.legend(ncol=2, facecolor=facecolor, edgecolor=facecolor, loc='upper center');
ax1.text(-0.6, 33000, 'Spa Distributions', fontweight='bold', fontsize=24, color=colors[1]);
ax1.text(-0.6, 28000, 'by VIP & Age', fontweight='bold', fontsize=24, color=colors[1]);


[back to top](#table-of-contents)
<a id="4.12"></a>
### **<span style="color:#7A5197;">4.12 VRDeck</span>**
* VRDeck preferred also whose `young` and without `VIP`.
* VRDeck preferred `VIP` passengers whose age more then `30`.
* Target audience is `VIP` passengers ~ 45 years old.


In [None]:
facecolor = '#f6f5f5'
fig = plt.figure(figsize=(14, 8), facecolor=facecolor)
gs = fig.add_gridspec(2, 1)

gs.update(wspace=0.2, hspace=0.7)

ax0 = fig.add_subplot(gs[0, 0])
ax1 = fig.add_subplot(gs[1, 0])

ax0.set_facecolor(facecolor)
for s in ["top","right", "left", "bottom"]:
    ax0.spines[s].set_visible(False)

ax1.set_facecolor(facecolor)
for s in ["top","right","left", "bottom"]:
    ax1.spines[s].set_visible(False)
    
sns.violinplot(x="HomePlanet", y="VRDeck", data=train_df, hue='Destination', ax=ax0, color=colors[3])
sns.violinplot(x="AgeBins", y="VRDeck", data=train_df, hue='VIP', split=True, ax=ax1, color=colors[1])

ax0.legend(ncol=3, facecolor=facecolor, edgecolor=facecolor, loc='upper center')
ax0.text(-0.6, 31000, 'VRDeck Distributions', fontweight='bold', fontsize=24, color=colors[3]);
ax0.text(-0.6, 26000, 'by Destination & HomePlanet', fontweight='bold', fontsize=24, color=colors[3]);

ax1.legend(ncol=2, facecolor=facecolor, edgecolor=facecolor, loc='upper center');
ax1.text(-0.7, 31000, 'VRDeck Distributions', fontweight='bold', fontsize=24, color=colors[1]);
ax1.text(-0.7, 26000, 'by VIP & Age', fontweight='bold', fontsize=24, color=colors[1]);


In [None]:
fig = plt.figure(figsize=(10, 13), facecolor=facecolor)
ax = fig.add_subplot(111, projection='3d')
ax.scatter(train_df['RoomService'], train_df['FoodCourt'], train_df['ShoppingMall'], color=colors[0])
ax.view_init(25, 45)
ax.set_facecolor(facecolor)
ax.set_xlabel('RoomService')
ax.set_ylabel('FoodCourt')
ax.xaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
ax.yaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
ax.zaxis.set_pane_color((1.0, 1.0, 1.0, 0.0))
ax.xaxis._axinfo['grid']['color'] = (1, 1, 1, 0)
ax.yaxis._axinfo['grid']['color'] = (1, 1, 1, 0)
ax.zaxis._axinfo['grid']['color'] = (1, 1, 1, 0)
ax.w_zaxis.line.set_lw(0.)
ax.set_zticks([]);
fig.text(0.15, 0.8, '3d Scatter of included options', fontweight='bold', fontsize=24, color=colors[1]);
fig.text(0.15, 0.77, 'RoomService & FoodCourt & ShoppingMall', fontweight='bold', fontsize=18, color=colors[1]);


<a id="5"></a>
# **<span style="color:#7A5197;">5 Reference</span>**

* [multitables](https://www.kaggle.com/code/arootda/pycaret-visualization-optimization-0-81#%F0%9F%93%8C-Import-Modules)  
* [visualition](https://www.kaggle.com/code/jtrotman/f1-race-traces-2007#More-F1-Race-Traces)  
* [visualition](https://www.kaggle.com/code/dwin183287/tps-june-2021-eda)  