#### 1. Refer to this section for a description of the problem

__[Kaggle](https://www.kaggle.com/competitions/spaceship-titanic/data)__ platform provides two datasets (_train.csv_, _test.csv_) containing information about passengers on a titanic spaceship, such as :

- _PassengerId_ : A unique Id for each passenger. 

Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. 

People in a group are often family members, but not always.

- _HomePlanet_ : The planet the passenger departed from, typically their planet of permanent residence.

- _CryoSleep_ : Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. 

Passengers in cryosleep are confined to their cabins.

- _Cabin_ : The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.

- _Destination_ : The planet the passenger will be debarking to.

- _Age_ - The age of the passenger.

- _VIP_ : Whether the passenger has paid for special VIP service during the voyage.

- _RoomService_, _FoodCourt_, _ShoppingMall_, _Spa_, _VRDeck_ : Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.

- _Name_ : The first and last names of the passenger.

- _Transported_ : Whether the passenger was transported to another dimension (only present in _train.csv_). This is the target, the column we are trying to predict for _test.csv_.

#### 2. Import required libraries

In [12]:
import gc, pandas as pd, numpy as np
from pathlib import Path

#### 3. Set up correct path

In [13]:
windowspath__scripts = Path().resolve()
windowspath__data = windowspath__scripts.parent / "data"

#### 4. Import train.csv and test.csv files

In [14]:
df__train = pd.read_csv(filepath_or_buffer=windowspath__data / "train" / "train.csv", dtype=str)
df__test = pd.read_csv(filepath_or_buffer=windowspath__data / "test" / "test.csv", dtype=str)

In [15]:
# Take a look at df__train
df__train.sample(n=3)

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
872,0933_02,Europa,False,C/35/P,TRAPPIST-1e,29.0,False,0.0,1127.0,0.0,1867.0,1070.0,Astorux Fawnsive,False
6016,6363_07,Earth,True,G/1026/P,TRAPPIST-1e,,False,0.0,,0.0,0.0,0.0,Tiney Gouldensen,False
7497,8020_01,Earth,False,F/1661/P,TRAPPIST-1e,22.0,False,668.0,79.0,0.0,0.0,1.0,Ianya Callowery,False


In [16]:
# Get a list of df__train "PassengerId"
list__trainids = df__train["PassengerId"].tolist()

In [17]:
# Take a look at df__test
df__test.sample(n=3)

Unnamed: 0,PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name
1348,2898_01,Europa,False,B/90/P,TRAPPIST-1e,22.0,False,0.0,4313.0,0.0,0.0,25.0,Magnon Healted
1771,3764_01,Earth,True,G/614/P,TRAPPIST-1e,28.0,False,0.0,0.0,0.0,0.0,0.0,Leenny Moodmandez
393,0831_01,Earth,False,F/172/P,PSO J318.5-22,39.0,False,182.0,148.0,35.0,23.0,277.0,Henryn Pagedy


In [18]:
# Get a list of df__test "PassengerId"
list__testids = df__test["PassengerId"].tolist()

In [22]:
gc.collect()
%reset -f