## Overview

Welcome to the year 2912, where your data science skills are needed to solve a cosmic mystery. We've received a transmission from four lightyears away and things aren't looking good.

The Spaceship Titanic was an interstellar passenger liner launched a month ago. With almost 13,000 passengers on board, the vessel set out on its maiden voyage transporting emigrants from our solar system to three newly habitable exoplanets orbiting nearby stars.

While rounding Alpha Centauri en route to its first destination—the torrid 55 Cancri E—the unwary Spaceship Titanic collided with a spacetime anomaly hidden within a dust cloud. Sadly, it met a similar fate as its namesake from 1000 years before. Though the ship stayed intact, almost half of the passengers were transported to an alternate dimension!

To help rescue crews and retrieve the lost passengers, you are challenged to predict which passengers were transported by the anomaly using records recovered from the spaceship’s damaged computer system.

## Imports

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns

## Data Loading and Data Preparation

In [2]:
train = pd.read_csv('Dataset/test.csv')
test = pd.read_csv('Dataset/test.csv')
submission = pd.read_csv('Dataset/sample_submission.csv')

In [3]:
train.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name
0,0013_01,Earth,True,G/3/S,TRAPPIST-1e,27.0,False,0.0,0.0,0.0,0.0,0.0,Nelly Carsoning
1,0018_01,Earth,False,F/4/S,TRAPPIST-1e,19.0,False,0.0,9.0,0.0,2823.0,0.0,Lerome Peckers
2,0019_01,Europa,True,C/0/S,55 Cancri e,31.0,False,0.0,0.0,0.0,0.0,0.0,Sabih Unhearfus
3,0021_01,Europa,False,C/1/S,TRAPPIST-1e,38.0,False,0.0,6652.0,0.0,181.0,585.0,Meratz Caltilter
4,0023_01,Earth,False,F/5/S,TRAPPIST-1e,20.0,False,10.0,0.0,635.0,0.0,0.0,Brence Harperez


In [4]:
test.head()

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name
0,0013_01,Earth,True,G/3/S,TRAPPIST-1e,27.0,False,0.0,0.0,0.0,0.0,0.0,Nelly Carsoning
1,0018_01,Earth,False,F/4/S,TRAPPIST-1e,19.0,False,0.0,9.0,0.0,2823.0,0.0,Lerome Peckers
2,0019_01,Europa,True,C/0/S,55 Cancri e,31.0,False,0.0,0.0,0.0,0.0,0.0,Sabih Unhearfus
3,0021_01,Europa,False,C/1/S,TRAPPIST-1e,38.0,False,0.0,6652.0,0.0,181.0,585.0,Meratz Caltilter
4,0023_01,Earth,False,F/5/S,TRAPPIST-1e,20.0,False,10.0,0.0,635.0,0.0,0.0,Brence Harperez


#### Exploring each column:

- `PassengerId`: represents the ID for each passenger. <b>Numerical</b>.

- `HomePlanet`: represents the home planet where a passenger comes from. <b>Categorical</b>.

- `CryoSleep`: reprsents whether the passenger has been in cryosleep or not. <b>Categorical</b>. (True or False)

- `Cabin`: is in the form of A/B/C, where A can be any letter, B can be any number, and C is either S or P. <b>Categorical</b>/<b>Categorical</b>/<b>Categorical</b>.

- `Destination`: represents the destination for each passenger. <b>Categorical</b>.

- `Age`: represents the passenger's age. <b>Numerical</b>.

- `VIP`: shows whether the passenger is VIP or not. <b>Categorical</b>.

- `RoomService`, `FoodCourt`, `ShoppingMall`, `Spa`, `VRDeck`: shows how much the passenger was billed for different services. <b>Numerical</b>.

- `Name`: the passenger's full name. <b>Categorical</b>.

In [5]:
train_shape = train.shape
test_shape = test.shape

print(
    'there are a total of', train_shape[1], 'columns. the train dataset has', train_shape[0], 'rows and the test dataset has', test_shape[0]
)
print(
    'the test dataset contains', str(train_shape[1] / (test_shape[1] + train_shape[1]) * 100), '% of the overall data.'
)

there are a total of 13 columns. the train dataset has 4277 rows and the test dataset has 4277
the test dataset contains 50.0 % of the overall data.
