# Spaceship Titanic

Train dataframe description:
- **PassengerId** - A unique Id for each passenger. Each Id takes the form gggg_pp where gggg indicates a group the passenger is travelling with and pp is their number within the group. People in a group are often family members, but not always.
- **HomePlanet** - The planet the passenger departed from, typically their planet of permanent residence.
- **CryoSleep** - Indicates whether the passenger elected to be put into suspended animation for the duration of the voyage. Passengers in cryosleep are confined to their cabins.
- **Cabin** - The cabin number where the passenger is staying. Takes the form deck/num/side, where side can be either P for Port or S for Starboard.
- **Destination** - The planet the passenger will be debarking to.
- **Age** - The age of the passenger.
- **VIP** - Whether the passenger has paid for special VIP service during the voyage.
- **RoomService, FoodCourt, ShoppingMall, Spa, VRDeck** - Amount the passenger has billed at each of the Spaceship Titanic's many luxury amenities.
- **Name** - The first and last names of the passenger.
- **Transported** - Whether the passenger was transported to another dimension. This is the target, the column you are trying to predict.

In [1]:
# Some modules that we are going to use.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import OrdinalEncoder, StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, LearningCurveDisplay
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.tree import DecisionTreeClassifier
import torch
import torch.nn as nn

## Define Train and Test dataframe

In [2]:
# Define train and test dataframe.
train_df = pd.read_csv("../inputdata/train.csv", sep=",")
test_df = pd.read_csv("../inputdata/test.csv", sep=",")

print("Train_df shape:", train_df.shape)
print(train_df.describe())

print("\n\nFirst 10 samples:")
print(train_df.head(10))

Train_df shape: (8693, 14)
               Age   RoomService     FoodCourt  ShoppingMall           Spa  \
count  8514.000000   8512.000000   8510.000000   8485.000000   8510.000000   
mean     28.827930    224.687617    458.077203    173.729169    311.138778   
std      14.489021    666.717663   1611.489240    604.696458   1136.705535   
min       0.000000      0.000000      0.000000      0.000000      0.000000   
25%      19.000000      0.000000      0.000000      0.000000      0.000000   
50%      27.000000      0.000000      0.000000      0.000000      0.000000   
75%      38.000000     47.000000     76.000000     27.000000     59.000000   
max      79.000000  14327.000000  29813.000000  23492.000000  22408.000000   

             VRDeck  
count   8505.000000  
mean     304.854791  
std     1145.717189  
min        0.000000  
25%        0.000000  
50%        0.000000  
75%       46.000000  
max    24133.000000  


First 10 samples:
  PassengerId HomePlanet CryoSleep  Cabin    Destina

## Explore the dataframes

### Missing values

In [3]:
print("Train missing values:")
print(train_df.isna().sum())

print("\n\nTest missing values:")
print(test_df.isna().sum())

Train missing values:
PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64


Test missing values:
PassengerId       0
HomePlanet       87
CryoSleep        93
Cabin           100
Destination      92
Age              91
VIP              93
RoomService      82
FoodCourt       106
ShoppingMall     98
Spa             101
VRDeck           80
Name             94
dtype: int64


### New feature: TotBill = RoomService + FoodCourt + ShoppingMall + Spa + VRDeck

In [4]:
# Create a new column that contains the sum of the bills.
train_df["TotBill"] = train_df["RoomService"] + train_df["FoodCourt"] + train_df["ShoppingMall"] + train_df["Spa"] + train_df["VRDeck"]

In [5]:
# Let's see how the distribution changes if we fill with 0's the missing data.
zero_fillna = (
    train_df["RoomService"].fillna(0) + 
    train_df["FoodCourt"].fillna(0) + 
    train_df["ShoppingMall"].fillna(0) + 
    train_df["Spa"].fillna(0) + 
    train_df["VRDeck"].fillna(0)
)

In [6]:
# Let's replace the TotBill column with the zero_fillna.

# Train dataframe.
train_df["TotBill"] = zero_fillna

# Test dataframe.
test_df["TotBill"] = (
    test_df["RoomService"].fillna(0) + 
    test_df["FoodCourt"].fillna(0) + 
    test_df["ShoppingMall"].fillna(0) + 
    test_df["Spa"].fillna(0) + 
    test_df["VRDeck"].fillna(0)
)

### RoomService, FoodCourt, ShoppingMall, Spa, VRDeck: manage missing values

In [7]:
# Train dataframe.
for name in ["RoomService", "FoodCourt", "ShoppingMall", "Spa", "VRDeck"]:
    train_df[name] = train_df[name].fillna(0)

# Test dataframe.
for name in ["RoomService", "FoodCourt", "ShoppingMall", "Spa", "VRDeck"]:
    test_df[name] = test_df[name].fillna(0)

### Categorize TotBill feature

In [8]:
def categ_totbill(totbill):
    if totbill <= 500:
        return "Low"
    elif totbill <= 1500:
        return "Mid"
    elif totbill <= 5000:
        return "High"
    else:
        return "Very high"

# Train dataframe.
status_col = train_df["TotBill"].apply(categ_totbill)
print(status_col.value_counts())
train_df["BillCateg"] = status_col

# Test dataframe.
test_df["BillCateg"] = test_df["TotBill"].apply(categ_totbill)

TotBill
Low          3758
Mid          2819
High         1479
Very high     637
Name: count, dtype: int64


### New feature: Group

Extract the Group from the PassengerId column.

In [9]:
def get_group(Id):
    Id = Id.split("_")
    return Id[0]

# Train dataframe.
group_col = train_df["PassengerId"].apply(get_group)
print(group_col.value_counts())
train_df["Group"] = group_col

# Train dataframe.
test_df["Group"] = test_df["PassengerId"].apply(get_group)

PassengerId
4256    8
5133    8
8956    8
8168    8
0984    8
       ..
3469    1
3468    1
3467    1
3465    1
3491    1
Name: count, Length: 6217, dtype: int64


### New feature: Deck

In [10]:
def get_deck(cabin):
    if cabin is np.nan:
        return np.nan
    return cabin.split("/")[0]

# Train dataset.
deck_col = train_df["Cabin"].apply(get_deck)
print(deck_col.value_counts())
train_df["Deck"] = deck_col

# Test dataset.
test_df["Deck"] = test_df["Cabin"].apply(get_deck)

Cabin
F    2794
G    2559
E     876
B     779
C     747
D     478
A     256
T       5
Name: count, dtype: int64


In [11]:
# Train dataframe.
train_df.loc[train_df["Deck"] == "T", "Deck"] = "F"
# Test dataframe.
test_df.loc[test_df["Deck"] == "T", "Deck"] = "F"

In [12]:
print("Tot passenger with nan Deck:", train_df["Deck"].isna().sum())
print("Tot passenger with nan Deck that are part of a group of size > 1:", train_df.groupby(["Group"]).filter(lambda x: len(x) > 1)["Deck"].isna().sum())
print()
print("Tot passenger travelling in a group larger than 1:", len(train_df.groupby(["Group"]).filter(lambda x: len(x) > 1).groupby("Group")))
print("Tot passenger travelling in a group larger that 1 that have same Deck:", len(train_df.groupby(["Group", "Deck"]).filter(lambda x: len(x) > 1).groupby("Group").size()))

Tot passenger with nan Deck: 199
Tot passenger with nan Deck that are part of a group of size > 1: 100

Tot passenger travelling in a group larger than 1: 1412
Tot passenger travelling in a group larger that 1 that have same Deck: 1107


In [13]:
group_sizes = train_df["Group"].map(train_df["Group"].value_counts())

def fill_deck(group):
    if group["Deck"].isnull().any():
        non_null = group["Deck"].dropna()
        if not non_null.empty:
            group["Deck"] = group["Deck"].fillna(non_null.iloc[0])
    return group

# Train dataframe.
train_df = train_df.groupby("Group", group_keys=False).apply(fill_deck)

# Test dataframe.
test_df = test_df.groupby("Group", group_keys=False).apply(fill_deck)

  train_df = train_df.groupby("Group", group_keys=False).apply(fill_deck)
  test_df = test_df.groupby("Group", group_keys=False).apply(fill_deck)


In [14]:
def fill_deck_2(row):
    if row["Deck"] is not np.nan:
        return row
    elif row["HomePlanet"] == "Europa":
        row["Deck"] = np.random.choice(["B", "A", "C"])
        return row
    elif row["HomePlanet"] == "Earth":
        row["Deck"] = np.random.choice(["F", "G", "E"])
        return row
    elif row["HomePlanet"] == "Mars":
        row["Deck"] = np.random.choice(["F", "D", "E"])
        return row
    else:
        row["Deck"] = np.random.choice(["F", "G"])
        return row

# Train dataframe.
train_df = train_df.apply(fill_deck_2, axis=1)

# Test dataframe (use the same probability used before)
test_df = test_df.apply(fill_deck_2, axis=1)

### New feature: CabinNum

In [15]:
def get_cabinnum(cabin):
    if cabin == "FakeCabin":
        return -1
    return int(cabin.split("/")[1])

# Train dataframe.
cabinnum_col = train_df["Cabin"].fillna("FakeCabin").apply(get_cabinnum)
print(cabinnum_col.value_counts())
train_df["CabinNum"] = cabinnum_col

# Test dataframe.
test_df["CabinNum"] = test_df["Cabin"].fillna("FakeCabin").apply(get_cabinnum)

Cabin
-1       199
 82       28
 19       22
 86       22
 176      21
        ... 
 1863      1
 1864      1
 1865      1
 1761      1
 1866      1
Name: count, Length: 1818, dtype: int64


### New feature: CabinSide

In [16]:
def get_cabinside(cabin):
    if cabin == "FakeCabin":
        return np.random.choice(["S", "P"])
    return cabin.split("/")[2]

# Train dataframe.
cabinside_col = train_df["Cabin"].fillna("FakeCabin").apply(get_cabinside)
print(cabinside_col.value_counts())
train_df["CabinSide"] = cabinside_col

# Test dataframe.
test_df["CabinSide"] = test_df["Cabin"].fillna("FakeCabin").apply(get_cabinside)

Cabin
S    4394
P    4299
Name: count, dtype: int64


### HomePlanet, manage missing values

In [17]:
print("Tot passenger with nan HomePlanet that are part of a group of size > 1:", train_df.groupby(["Group"]).filter(lambda x: len(x) > 1)["HomePlanet"].isna().sum())
print("Tot passenger with nan HomePlanet that are lonely in the group:", train_df.groupby(["Group"]).filter(lambda x: len(x) == 1)["HomePlanet"].isna().sum())
print()
print("Tot passenger travelling in a group larger than 1:", len(train_df.groupby(["Group"]).filter(lambda x: len(x) > 1).groupby("Group")))
print("Tot passenger travelling in a group larger that 1 that have same HomePlanet:", len(train_df.groupby(["Group", "HomePlanet"]).filter(lambda x: len(x) > 1).groupby("Group").size()))

Tot passenger with nan HomePlanet that are part of a group of size > 1: 92
Tot passenger with nan HomePlanet that are lonely in the group: 109

Tot passenger travelling in a group larger than 1: 1412
Tot passenger travelling in a group larger that 1 that have same HomePlanet: 1370


In [18]:
# We can try the following: consider the passenger that have a missing HomePlanet. If they are travelling in a group larger than one,
# then set missing HomePlanet to a value taken from one of the other member of the group.

group_sizes = train_df["Group"].map(train_df["Group"].value_counts())

def fill_home_planet(group):
    if group["HomePlanet"].isnull().any():
        non_null_planets = group["HomePlanet"].dropna()
        if not non_null_planets.empty:
            group["HomePlanet"] = group["HomePlanet"].fillna(non_null_planets.iloc[0])
    return group

# Train dataframe.
old_homeplanet_col = train_df["HomePlanet"].dropna()
train_df = train_df.groupby("Group", group_keys=False).apply(fill_home_planet)

# Test dataframe.
test_df = test_df.groupby("Group", group_keys=False).apply(fill_home_planet)

  train_df = train_df.groupby("Group", group_keys=False).apply(fill_home_planet)
  test_df = test_df.groupby("Group", group_keys=False).apply(fill_home_planet)


In [19]:
# There are more missing values. Let's use the relation Deck-HomePlanet.
np.random.seed(12)

def fill_home_planet_2(row):
    if row["HomePlanet"] is not np.nan:
        return row
    elif row["Deck"] in ["A", "B", "C"]:
        row["HomePlanet"] = "Europa"
        return row
    elif row["Deck"] == "G":
        row["HomePlanet"] = "Earth"
        return row
    elif row["Deck"] == "F":
        row["HomePlanet"] = np.random.choice(["Earth", "Mars"])
        return row
    elif row["Deck"] == "D":
        row["HomePlanet"] = np.random.choice(["Europa", "Mars"])
        return row
    elif row["Deck"] == "E":
        row["HomePlanet"] = np.random.choice(["Europa", "Mars", "Earth"])
        return row
    elif row["Deck"] == "FakeDeck":
        row["HomePlanet"] = np.random.choice(["Europa", "Mars", "Earth"])
        return row
    else:
        row["HomePlanet"] = "Earth"
        return row

# Train dataframe.
train_df = train_df.apply(fill_home_planet_2, axis=1)

# Test dataframe (use the same probability used before)
test_df = test_df.apply(fill_home_planet_2, axis=1)

### Destination, manage missing values

In [20]:
# Train dataframe.
train_df["Destination"] = train_df["Destination"].fillna("TRAPPIST-1e")
# Test dataframe.
test_df["Destination"] = test_df["Destination"].fillna("TRAPPIST-1e")

### VIP, manage missing values

Just fill the missing values with False since there are very few VIPs.

In [21]:
# Train dataframe.
train_df["VIP"] = train_df["VIP"].fillna(0)
# Test dataframe.
test_df["VIP"] = test_df["VIP"].fillna(0)

### Age, manage missing values
Up to this point there are two columns that have missing values: Age and CryoSleep.

In [22]:
# Try with the median.
median_age_col = train_df['Age'].fillna(train_df['Age'].median())

In [23]:
# Train dataframe.
train_df["Age"] = median_age_col
# Test dataframe.
test_df["Age"] = test_df["Age"].fillna(test_df["Age"].median())

### Encode Age feature

In [24]:
def categ_age(age):
    if age <= 12:
        return "Child"
    elif age <= 20:
        return "Teen"
    elif age <= 30:
        return "Young adult"
    elif age <= 55:
        return "Adult"
    else:
        return "Elderly"

# Train dataframe.
new_age_col = train_df["Age"].apply(categ_age)
print(new_age_col.value_counts())
train_df["Age"] = new_age_col

# Test dataframe.
test_df["Age"] = test_df["Age"].apply(categ_age)

Age
Adult          2981
Young adult    2847
Teen           1629
Child           806
Elderly         430
Name: count, dtype: int64


### CryoSleep, manage missing values

Since passengers in CryoSleep are confinated in their cabin they must have TotBill = 0.

In [25]:
# How many passenger in CryoSleep have a TotBill equal to zero?
print("Tot passenger in CryoSleep:", len(train_df[train_df["CryoSleep"] == 1]))
print("Tot passenger in CryoSleep with TotBill = 0:", len(train_df[(train_df["TotBill"] == 0) & (train_df["CryoSleep"] == 1)]))
print("Tot passenger in with TotBill = 0:", len(train_df[train_df["TotBill"] == 0]))

Tot passenger in CryoSleep: 3037
Tot passenger in CryoSleep with TotBill = 0: 3037
Tot passenger in with TotBill = 0: 3653


$3037 / 3653 \approx 83\%$ of the passenger with TotBill = 0 are in CryoSleep.

In [26]:
# Train dataframe.
train_df.loc[train_df["CryoSleep"].isna(), "CryoSleep"] = (train_df["TotBill"] == 0).astype(int)

# Test dataframe.
test_df.loc[test_df["CryoSleep"].isna(), "CryoSleep"] = (test_df["TotBill"] == 0).astype(int)

### Drop some features

In [27]:
train_df = train_df.drop(columns=["PassengerId", "Cabin", "Name", "TotBill", "Group", "CabinNum"], axis=1)
print(train_df.isna().sum())
print("\n\n")
test_df = test_df.drop(columns=["Cabin", "Name", "TotBill", "Group", "CabinNum"], axis=1)
print(test_df.isna().sum())

HomePlanet      0
CryoSleep       0
Destination     0
Age             0
VIP             0
RoomService     0
FoodCourt       0
ShoppingMall    0
Spa             0
VRDeck          0
Transported     0
BillCateg       0
Deck            0
CabinSide       0
dtype: int64



PassengerId     0
HomePlanet      0
CryoSleep       0
Destination     0
Age             0
VIP             0
RoomService     0
FoodCourt       0
ShoppingMall    0
Spa             0
VRDeck          0
BillCateg       0
Deck            0
CabinSide       0
dtype: int64


## Encode categorical features

In [28]:
X_train = train_df.dropna().drop(columns="Transported")
X_test = test_df.dropna()
Y_train = train_df.dropna()["Transported"]

print("Tot number of samples in train set:", X_train.shape[0])
print("Tot number of samples in test set:", X_test.shape[0])
features_names = list(X_train.columns)
print("Features names:", features_names)

X_train = X_train.values
X_test = X_test.values
Y_train = Y_train.values

Tot number of samples in train set: 8693
Tot number of samples in test set: 4277
Features names: ['HomePlanet', 'CryoSleep', 'Destination', 'Age', 'VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'BillCateg', 'Deck', 'CabinSide']


In [29]:
# Print the data type of each column.
for index_col, name_col in zip(range(X_train.shape[1]), features_names):
    print(f"Column {name_col} (index: {index_col}) -- data type: {type(X_train[0, index_col])}")

Column HomePlanet (index: 0) -- data type: <class 'str'>
Column CryoSleep (index: 1) -- data type: <class 'bool'>
Column Destination (index: 2) -- data type: <class 'str'>
Column Age (index: 3) -- data type: <class 'str'>
Column VIP (index: 4) -- data type: <class 'bool'>
Column RoomService (index: 5) -- data type: <class 'float'>
Column FoodCourt (index: 6) -- data type: <class 'float'>
Column ShoppingMall (index: 7) -- data type: <class 'float'>
Column Spa (index: 8) -- data type: <class 'float'>
Column VRDeck (index: 9) -- data type: <class 'float'>
Column BillCateg (index: 10) -- data type: <class 'str'>
Column Deck (index: 11) -- data type: <class 'str'>
Column CabinSide (index: 12) -- data type: <class 'str'>


In [30]:
# Encode the categorical variables.
to_encode = np.array([0, 2, 3, 10, 11, 12])

enc = OrdinalEncoder(dtype=int)

enc.fit(X_train[:, to_encode])
X_train[:, to_encode] = enc.transform(X_train[:, to_encode])
X_test[:, to_encode+1] = enc.transform(X_test[:, to_encode+1])

X_train = X_train.astype(int)
X_test = X_test.astype(int)

# Print the data type of each column.
for index_col, name_col in zip(range(X_train.shape[1]), features_names):
    print(f"Column {name_col} (index: {index_col}) -- data type: {type(X_train[0, index_col])}")

Column HomePlanet (index: 0) -- data type: <class 'numpy.int64'>
Column CryoSleep (index: 1) -- data type: <class 'numpy.int64'>
Column Destination (index: 2) -- data type: <class 'numpy.int64'>
Column Age (index: 3) -- data type: <class 'numpy.int64'>
Column VIP (index: 4) -- data type: <class 'numpy.int64'>
Column RoomService (index: 5) -- data type: <class 'numpy.int64'>
Column FoodCourt (index: 6) -- data type: <class 'numpy.int64'>
Column ShoppingMall (index: 7) -- data type: <class 'numpy.int64'>
Column Spa (index: 8) -- data type: <class 'numpy.int64'>
Column VRDeck (index: 9) -- data type: <class 'numpy.int64'>
Column BillCateg (index: 10) -- data type: <class 'numpy.int64'>
Column Deck (index: 11) -- data type: <class 'numpy.int64'>
Column CabinSide (index: 12) -- data type: <class 'numpy.int64'>


In [31]:
m = X_train.shape[0]
m_train = int(4./5. * m)
m_val = m - m_train

print("Amount of data for training and deciding parameters:", m_train)
print("Amount of data for test:", m_val)

Amount of data for training and deciding parameters: 6954
Amount of data for test: 1739


In [32]:
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size=m_val/m, random_state=12)

## Standardize data

In [33]:
## Standardize age and fare columns.
#to_scale = np.array([5, 6, 7, 8, 9])

# Create a copy for each set.
X_train_scaled = np.copy(X_train)
X_val_scaled = np.copy(X_val)
X_test_scaled = np.copy(X_test)

# Scale data.
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled[:, 1:] = scaler.transform(X_test[:, 1:])

## Neural Network

In [34]:
class NN(nn.Module):

    def __init__(self, layers):
        super(NN, self).__init__()
        torch.manual_seed(12)

        # Define the structure.
        self.layers = nn.ModuleList()
        for i in range(1, len(layers)):
            self.layers.append(nn.Linear(layers[i-1], layers[i]))
        self.layers.append(nn.Linear(layers[-1], 1)) # Last layer
        
        # Acivaction function.
        self.actfunc = nn.ReLU() 

    def forward(self, x):
        out = x
        for i, layer in enumerate(self.layers):
            out = layer(out)
            if i < len(self.layers) - 1:
                out = self.actfunc(out)
        out = torch.sigmoid(out)
        return out

In [35]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X).squeeze(1)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [36]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X).squeeze(1)
            test_loss += loss_fn(pred, y).item()
            pred = (pred >= 0.5).float()
            correct += (pred == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [37]:
# Set device.
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")

# Define the Neural Network
# 2 hidden layers with 5 notes each.
input_size = len(features_names)
layers = {"layers": [input_size, 5, 5]}
model = NN(**layers).to(device)
print(model)

# Define the loss function.
loss_fn = nn.BCELoss()

# Define the optimizer.
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train dataset and dataloader.
train_ds = torch.utils.data.TensorDataset(torch.Tensor(X_train_scaled), torch.Tensor(Y_train))
train_loader = torch.utils.data.DataLoader(dataset=train_ds, batch_size=64, shuffle=True)

# Val dataset and dataloader.
val_ds = torch.utils.data.TensorDataset(torch.Tensor(X_val_scaled), torch.Tensor(Y_val))
val_loader = torch.utils.data.DataLoader(dataset=val_ds, batch_size=64, shuffle=True)

Using cuda device
NN(
  (layers): ModuleList(
    (0): Linear(in_features=13, out_features=5, bias=True)
    (1): Linear(in_features=5, out_features=5, bias=True)
    (2): Linear(in_features=5, out_features=1, bias=True)
  )
  (actfunc): ReLU()
)


In [38]:
epochs = 100
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_loader, model, loss_fn, optimizer)
    test(val_loader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 0.699854  [   64/ 6954]
loss: 0.677670  [ 6464/ 6954]
Test Error: 
 Accuracy: 52.0%, Avg loss: 0.687897 

Epoch 2
-------------------------------
loss: 0.687113  [   64/ 6954]
loss: 0.595921  [ 6464/ 6954]
Test Error: 
 Accuracy: 75.0%, Avg loss: 0.601150 

Epoch 3
-------------------------------
loss: 0.622571  [   64/ 6954]
loss: 0.533244  [ 6464/ 6954]
Test Error: 
 Accuracy: 75.7%, Avg loss: 0.515638 

Epoch 4
-------------------------------
loss: 0.509400  [   64/ 6954]
loss: 0.415478  [ 6464/ 6954]
Test Error: 
 Accuracy: 75.7%, Avg loss: 0.481796 

Epoch 5
-------------------------------
loss: 0.481192  [   64/ 6954]
loss: 0.330118  [ 6464/ 6954]
Test Error: 
 Accuracy: 76.0%, Avg loss: 0.465062 

Epoch 6
-------------------------------
loss: 0.467720  [   64/ 6954]
loss: 0.460935  [ 6464/ 6954]
Test Error: 
 Accuracy: 76.0%, Avg loss: 0.465429 

Epoch 7
-------------------------------
loss: 0.523608  [   64/ 6954]
loss: 0.380048  [ 

## Create submission

In [39]:
# Compute predictions.
#predictions = rf_gs.best_estimator_.predict(X_test_scaled[:, 1:])
model.eval()
predictions = [0] * X_test_scaled.shape[0]
X_test_tensor = torch.Tensor(X_test_scaled[:, 1:])
for i, x in enumerate(X_test_tensor):
    with torch.no_grad():
        x = x.to(device)
        pred = model(x)
        pred = (pred >= 0.5).int()
        predictions[i] = bool(pred.to("cpu"))

#print(predictions)

# Save predictions.
output = pd.DataFrame({'PassengerId': test_df["PassengerId"], 'Transported': predictions}) 
output.to_csv('../outputdata/submission.csv', index=False)
print("Your submission was successfully saved!")

Your submission was successfully saved!
