概述

介绍

1912年4月15日，泰坦尼克号在与冰山相撞后沉没，在2224名乘客和船员中造成1502人死亡。 虽然不幸遇难下沉，但一些群体比如女性，儿童和上层阶级更有可能生存下去。 在本教程中，我们进行分析，以了解这些人是谁。

数据集

``````VARIABLE DESCRIPTIONS:
survived        Survived
(0 = No; 1 = Yes)
pclass          Passenger Class
(1 = 1st; 2 = 2nd; 3 = 3rd)
name            Name
sex             Sex
age             Age
sibsp           Number of Siblings/Spouses Aboard
parch           Number of Parents/Children Aboard
ticket          Ticket Number
fare            Passenger Fare
``````

survived pclass name sex age sibsp parch ticket fare
1 1 Aubart, Mme. Leontine Pauline female 24 0 0 PC 17477 69.3000
0 2 Bowenur, Mr. Solomon male 42 0 0 211535 13.0000
1 3 Baclini, Miss. Marie Catherine female 5 2 1 2666 19.2583
0 3 Youseff, Mr. Gerious male 45.5 0 0 2628 7.2250

构建分类器

加载数据

```import numpy as np
import tflearn

from tflearn.datasets import titanic

# Load CSV file, indicate that the first column represents labels
categorical_labels=True, n_classes=2)```

预处理数据

```# Preprocessing function
def preprocess(passengers, columns_to_delete):
# Sort by descending id and delete columns
for column_to_delete in sorted(columns_to_delete, reverse=True):
[passenger.pop(column_to_delete) for passenger in passengers]
for i in range(len(passengers)):
# Converting 'sex' field to float (id is 1 after removing labels column)
passengers[i][1] = 1. if passengers[i][1] == 'female' else 0.
return np.array(passengers, dtype=np.float32)

# Ignore 'name' and 'ticket' columns (id 1 & 6 of data array)
to_ignore=[1, 6]

# Preprocess data
data = preprocess(data, to_ignore)```

建立深度神经网络

```# Build neural network
net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)```

训练

TFLearn提供了一个可以自动执行神经网络分类器任务的模型包装器“DNN”，例如训练，预测，保存/恢复等...我们将运行10个次（神经网络将处理所有数据10次）批量大小为16。

```# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)```

``````---------------------------------
Run id: MG9PV8
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 1309
Validation samples: 0
--
Training Step: 82  | total loss: 0.64003
| Adam | epoch: 001 | loss: 0.64003 - acc: 0.6620 -- iter: 1309/1309
--
Training Step: 164  | total loss: 0.61915
| Adam | epoch: 002 | loss: 0.61915 - acc: 0.6614 -- iter: 1309/1309
--
Training Step: 246  | total loss: 0.56067
| Adam | epoch: 003 | loss: 0.56067 - acc: 0.7171 -- iter: 1309/1309
--
Training Step: 328  | total loss: 0.51807
| Adam | epoch: 004 | loss: 0.51807 - acc: 0.7799 -- iter: 1309/1309
--
Training Step: 410  | total loss: 0.47475
| Adam | epoch: 005 | loss: 0.47475 - acc: 0.7962 -- iter: 1309/1309
--
Training Step: 492  | total loss: 0.51677
| Adam | epoch: 006 | loss: 0.51677 - acc: 0.7701 -- iter: 1309/1309
--
Training Step: 574  | total loss: 0.48988
| Adam | epoch: 007 | loss: 0.48988 - acc: 0.7891 -- iter: 1309/1309
--
Training Step: 656  | total loss: 0.55073
| Adam | epoch: 008 | loss: 0.55073 - acc: 0.7427 -- iter: 1309/1309
--
Training Step: 738  | total loss: 0.50242
| Adam | epoch: 009 | loss: 0.50242 - acc: 0.7854 -- iter: 1309/1309
--
Training Step: 820  | total loss: 0.41557
| Adam | epoch: 010 | loss: 0.41557 - acc: 0.8110 -- iter: 1309/1309
--
``````

测试模型

```# Let's create some data for DiCaprio and Winslet
dicaprio = [3, 'Jack Dawson', 'male', 19, 0, 0, 'N/A', 5.0000]
winslet = [1, 'Rose DeWitt Bukater', 'female', 17, 1, 2, 'N/A', 100.0000]
# Preprocess data
dicaprio, winslet = preprocess([dicaprio, winslet], to_ignore)
# Predict surviving chances (class 1 results)
pred = model.predict([dicaprio, winslet])
print("DiCaprio Surviving Rate:", pred[0][1])
print("Winslet Surviving Rate:", pred[1][1])```

``````DiCaprio Surviving Rate: 0.13849584758281708
Winslet Surviving Rate: 0.92201167345047
``````

源代码

```from __future__ import print_function

import numpy as np
import tflearn

from tflearn.datasets import titanic

# Load CSV file, indicate that the first column represents labels
categorical_labels=True, n_classes=2)

# Preprocessing function
def preprocess(passengers, columns_to_delete):
# Sort by descending id and delete columns
for column_to_delete in sorted(columns_to_delete, reverse=True):
[passenger.pop(column_to_delete) for passenger in passengers]
for i in range(len(passengers)):
# Converting 'sex' field to float (id is 1 after removing labels column)
passengers[i][1] = 1. if data[i][1] == 'female' else 0.
return np.array(passengers, dtype=np.float32)

# Ignore 'name' and 'ticket' columns (id 1 & 6 of data array)
to_ignore=[1, 6]

# Preprocess data
data = preprocess(data, to_ignore)

# Build neural network
net = tflearn.input_data(shape=[None, 6])
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 32)
net = tflearn.fully_connected(net, 2, activation='softmax')
net = tflearn.regression(net)

# Define model
model = tflearn.DNN(net)
# Start training (apply gradient descent algorithm)
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)

# Let's create some data for DiCaprio and Winslet
dicaprio = [3, 'Jack Dawson', 'male', 19, 0, 0, 'N/A', 5.0000]
winslet = [1, 'Rose DeWitt Bukater', 'female', 17, 1, 2, 'N/A', 100.0000]
# Preprocess data
dicaprio, winslet = preprocess([dicaprio, winslet], to_ignore)
# Predict surviving chances (class 1 results)
pred = model.predict([dicaprio, winslet])
print("DiCaprio Surviving Rate:", pred[0][1])
print("Winslet Surviving Rate:", pred[1][1])
```
Clone this wiki locally
You can’t perform that action at this time.
Press h to open a hovercard with more details.