The passenger list on the Titanic is a classic dataset for machine learning, so I thought it was a fitting way to start this self-documentation of AI experiements.

It's a bit morbid (it includes passenger names!) but it's a dataset made classic by the Kaggle data science competition platform, so we'll start with that.

## Import and explore
First, we import pandas and explore the data.

In [26]:
!wget -q https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv

In [27]:
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

data = pd.read_csv('titanic.csv')
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


We are only going to use data about survival, passenger class, age, siblings on board and parents on board. We will ignore the fare, cabin number and what port they sailed from.

In [28]:
data = data[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch']]
data

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch
0,0,3,male,22.0,1,0
1,1,1,female,38.0,1,0
2,1,3,female,26.0,0,0
3,1,1,female,35.0,1,0
4,0,3,male,35.0,0,0
...,...,...,...,...,...,...
886,0,2,male,27.0,0,0
887,1,1,female,19.0,0,0
888,0,3,female,,1,2
889,1,1,male,26.0,0,0


In [29]:
# Impute missing values with the median
data['Age'] = data['Age'].fillna(data['Age'].median())
# scale/normalize age
scaler = MinMaxScaler()
data['Age'] = scaler.fit_transform(data[['Age']])
#fit = scaler.fit(data['Age'])
#print(scaler.data_max_)
#scaler.fit_transform([data['Age']])


In [30]:
data = pd.get_dummies(data, columns=['Sex'])

In [31]:
# Split the dataset into train and test sets
train_df, test_df = train_test_split(data, test_size=0.2, random_state=42)

# Further split the training set into train and validation sets
train_df, val_df = train_test_split(train_df, test_size=0.25, random_state=42) # 0.25 x 0.8 = 0.2


In [32]:
X_train = train_df.drop("Survived", axis=1)
Y_train = train_df['Survived']
print(X_train.shape)
print(Y_train.shape)

(534, 6)
(534,)


In [40]:
from keras.layers import Dense, Input #, Dropout
from keras.models import Sequential

model = Sequential()

model.add(Dense(units=100, input_shape=(6,), activation='relu'))
model.add(Dense(units=100, activation='relu'))
model.add(Dense(units=100, activation='relu'))
model.add(Dense(units =1 , activation = 'sigmoid'))

In [41]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_20 (Dense)            (None, 100)               700       
                                                                 
 dense_21 (Dense)            (None, 100)               10100     
                                                                 
 dense_22 (Dense)            (None, 100)               10100     
                                                                 
 dense_23 (Dense)            (None, 1)                 101       
                                                                 
Total params: 21001 (82.04 KB)
Trainable params: 21001 (82.04 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [43]:
model.compile(
    loss = tf.keras.losses.binary_crossentropy,
    optimizer = tf.keras.optimizers.Adam(),
    metrics = ['acc']
)
model.fit(X_train, Y_train, verbose = 2, epochs = 8)

Epoch 1/8
17/17 - 1s - loss: 0.4253 - acc: 0.8071 - 1s/epoch - 60ms/step
Epoch 2/8
17/17 - 0s - loss: 0.4211 - acc: 0.8090 - 66ms/epoch - 4ms/step
Epoch 3/8
17/17 - 0s - loss: 0.4075 - acc: 0.8146 - 59ms/epoch - 3ms/step
Epoch 4/8
17/17 - 0s - loss: 0.4115 - acc: 0.8165 - 54ms/epoch - 3ms/step
Epoch 5/8
17/17 - 0s - loss: 0.4138 - acc: 0.8202 - 69ms/epoch - 4ms/step
Epoch 6/8
17/17 - 0s - loss: 0.4112 - acc: 0.8202 - 53ms/epoch - 3ms/step
Epoch 7/8
17/17 - 0s - loss: 0.4087 - acc: 0.8165 - 51ms/epoch - 3ms/step
Epoch 8/8
17/17 - 0s - loss: 0.4011 - acc: 0.8240 - 56ms/epoch - 3ms/step


<keras.src.callbacks.History at 0x7bc217538e80>