# 目录
### 1. 导入模块
### 2. 导入泰坦尼克数据
### 3. 定义Estimator的输入feature_columns
  - `tf.feature_column.indicator_column`
  - `tf.feature_column.categorical_column_with_vocabulary_list`
  - `tf.feature_column.numeric_column`
  
### 4. 使用tf.data.Dataset定义喂数据的迭代器
  - `tf.data.Dataset.from_tensor_slices`
  - `shuffle`
  - `repeat`
  - `batch`

### 5. estimator.LinearClassifier 模型
- estimator.LinearClassifier定义
- estimator.LinearClassifier训练
- estimator.LinearClassifier测试

### 6. estimator.DNNClassifier 模型
- estimator.DNNClassifier定义
- estimator.DNNClassifier训练
- estimator.DNNClassifier测试

## 1. 导入模块

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import sklearn

from tensorflow import keras
import tensorflow as tf
import sys
import os
import time
import datetime

for module in [np, pd, mpl, sklearn, keras, tf]:
    print(module.__name__, module.__version__)

numpy 1.18.1
pandas 0.25.3
matplotlib 3.1.2
sklearn 0.22.1
tensorflow_core.python.keras.api._v2.keras 2.2.4-tf
tensorflow 2.1.0


## 2. 导入泰坦尼克数据

In [2]:
train_file = "./data/titanic/train.csv"
eval_file = "./data/titanic/eval.csv"

train_df = pd.read_csv(train_file)
eval_df = pd.read_csv(eval_file)

# y_train y_eval 是 pd.Series 类型
y_train = train_df.pop("survived") # survived 这一列的数据从 train_df 中移除，并返回给 y_train
y_eval = eval_df.pop("survived")

x_train = train_df.copy()
x_eval = eval_df.copy()

x_train.head()

Unnamed: 0,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,female,26.0,0,0,7.925,Third,unknown,Southampton,y
3,female,35.0,1,0,53.1,First,C,Southampton,n
4,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y


## 3. 定义Estimator的输入feature_columns

In [3]:
numerical_columns = ["age", "fare"]
categorical_columns = list(set(x_train.columns.tolist()).difference(set(numerical_columns)))

feature_columns = []

# 离散数据的输入
for categorical_column in categorical_columns:
    vocab = x_train[categorical_column].unique()
    
    print(categorical_column, "---> ",vocab)
    feature_columns.append(
        tf.feature_column.indicator_column(
            tf.feature_column.categorical_column_with_vocabulary_list(categorical_column, vocab) # 列名 --> 几种类别的列表
        )
    )

# 连续数据的输入
for numerical_column in numerical_columns:
    feature_columns.append(
        tf.feature_column.numeric_column(
            numerical_column, dtype=tf.float32   # 列名 --> 维度为0 的连续数据
        )
    )

embark_town --->  ['Southampton' 'Cherbourg' 'Queenstown' 'unknown']
class --->  ['Third' 'First' 'Second']
n_siblings_spouses --->  [1 0 3 4 2 5 8]
deck --->  ['unknown' 'C' 'G' 'A' 'B' 'D' 'F' 'E']
parch --->  [0 1 2 5 3 4]
alone --->  ['n' 'y']
sex --->  ['male' 'female']


## 4. 使用tf.data.Dataset定义喂数据的迭代器

In [4]:
def make_dataset(train_df, label_df, shuffle=False, epochs=10, batch_size=32):
    '''
    train_df DataFrame 类型
    label_dfel  Series 类型
    '''
    # (features, labels), features 必须是一个字典类型，才能与feature_columns 的列名对应
    dataset = tf.data.Dataset.from_tensor_slices((dict(train_df), label_df))
    if shuffle:
        dataset = dataset.shuffle(10000)
    dataset = dataset.repeat(epochs).batch(batch_size)
    return dataset

## 5. estimator.LinearClassifier 模型

### 5.1. 定义estimator.LinearClassifier 模型

In [5]:
linear_output_dir = "linear_estimator_model"
if not os.path.exists(linear_output_dir):
    os.makedirs(linear_output_dir)
    
linear_estimator = tf.estimator.LinearClassifier(
    model_dir=linear_output_dir,
    feature_columns=feature_columns,
    n_classes=2
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'linear_estimator_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### 5.2. estimator.LinearClassifier 训练

In [6]:
# input_fn: 
#1. 是一个函数；
#2. 这个函数是一个迭代器；
#3. 返回 （features, labels）数据， features是一个字典，才能与feature_columns对应
linear_estimator.train(input_fn = lambda: make_dataset(x_train, y_train, shuffle=True, epochs=100, batch_size=32))

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Instructions for updating:
Please use `layer.add_weight` method instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constr

<tensorflow_estimator.python.estimator.canned.linear.LinearClassifierV2 at 0x7f7054009550>

### 5.3. estimator.LinearClassifier 测试

In [7]:
linear_estimator.evaluate(input_fn=lambda: make_dataset(x_eval, y_eval, shuffle=False, epochs=1, batch_size=32))

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-01-22T23:17:03Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from linear_estimator_model/model.ckpt-3920
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.67623s
INFO:tensorflow:Finished evaluation at 2020-01-22-23:17:03
INFO:tensorflow:Saving dict for global step 3920: accuracy = 0.7878788, accuracy_baseline = 0.625, auc = 0.83719015, auc_precision_recall = 0.7856679, average_loss = 0.48789534, global_step = 3920, label/mean = 0.375, loss = 0.46902195, precision = 0.6902655, pred

{'accuracy': 0.7878788,
 'accuracy_baseline': 0.625,
 'auc': 0.83719015,
 'auc_precision_recall': 0.7856679,
 'average_loss': 0.48789534,
 'label/mean': 0.375,
 'loss': 0.46902195,
 'precision': 0.6902655,
 'prediction/mean': 0.4391188,
 'recall': 0.7878788,
 'global_step': 3920}

## 6. estimator.DNNClassifier 模型

### 6.1. 定义 estimator.DNNClassifier 模型

In [8]:
dnn_output_dir = "dnn_estimator_model"
if not os.path.exists(dnn_output_dir):
    os.makedirs(dnn_output_dir)
    
dnn_estimator = tf.estimator.DNNClassifier(
    model_dir=dnn_output_dir,
    feature_columns=feature_columns,
    n_classes=2,
    hidden_units=[100, 100],
    activation_fn=tf.nn.relu,
    optimizer="Adam"
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'dnn_estimator_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### 6.2. estimator.DNNClassifier 训练

In [9]:
# input_fn: 
#1. 是一个函数；
#2. 这个函数是一个迭代器；
#3. 返回 （features, labels）数据， features是一个字典，才能与feature_columns对应
dnn_estimator.train(input_fn = lambda: make_dataset(x_train, y_train, shuffle=True, epochs=100, batch_size=32))

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from dnn_estimator_model/model.ckpt-1960
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1960 into dnn_estimator_model/model.ckpt.
INFO:tensorflow:loss = 0.45359203, step = 1960
INFO:tensorflow:global_step/sec: 247.76
INFO:tensorflow:loss = 0.29971343, step = 2060 (0.405 sec)
INFO:tensorflow:global_step/sec: 377.873
INFO:tensorflow:loss = 0.3535098, step = 2160 (0.264 sec)
INFO:tensorflow:global_step/sec: 379.621
INFO:tensorflow:l

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f6ed43b7ac8>

### 6.3. estimator.DNNClassifier 测试

In [10]:
dnn_estimator.evaluate(input_fn=lambda: make_dataset(x_eval, y_eval, shuffle=False, epochs=1, batch_size=32))

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-01-22T23:17:11Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from dnn_estimator_model/model.ckpt-3920
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.61394s
INFO:tensorflow:Finished evaluation at 2020-01-22-23:17:12
INFO:tensorflow:Saving dict for global step 3920: accuracy = 0.81060606, accuracy_baseline = 0.625, auc = 0.83535355, auc_precision_recall = 0.7717025, average_loss = 0.5599526, global_step = 3920, label/mean = 0.375, loss = 0.53544617, precision = 0.7425743, predict

{'accuracy': 0.81060606,
 'accuracy_baseline': 0.625,
 'auc': 0.83535355,
 'auc_precision_recall': 0.7717025,
 'average_loss': 0.5599526,
 'label/mean': 0.375,
 'loss': 0.53544617,
 'precision': 0.7425743,
 'prediction/mean': 0.39552742,
 'recall': 0.75757575,
 'global_step': 3920}