### Deep Optimal Stopping - Implementation with tensorflow estimator

#### General setting - demonstrated through the dice example

In [1]:
import tensorflow as tf
import numpy as np 
import scipy
tf.__version__

'1.13.1'

#### Creating samples in the format of an $M \times n$ matrix, where 
* $n$, the length of the row, represents the number of outcomes (or tosses), or the *time steps* in a time series;
* $M$ represents the sample size. 

#### In the example below, we demonstrate the algorithm in two cases, $n=3$ and $n=5$.

In [2]:
# define sample size for training 
M = 30000

# Two examples: M x 3, and M x 5
dice_3 = np.random.randint(low=1, high=7, size=(M, 3))
dice_5 = np.random.randint(low=1, high=7, size=(M, 5))

In [3]:
# see the first 10 paths from the samples generated above
print("The first 10 samples of the M x 3 case: ")
print(dice_3[:10, :])
print("The first 10 samples of the M x 5 case: ")
print(dice_5[:10, :])

The first 10 samples of the M x 3 case: 
[[4 6 1]
 [2 3 3]
 [5 4 3]
 [6 4 1]
 [3 3 3]
 [2 3 4]
 [3 4 6]
 [2 3 5]
 [1 2 5]
 [6 4 4]]
The first 10 samples of the M x 5 case: 
[[4 1 5 6 5]
 [5 4 2 4 1]
 [4 2 3 1 1]
 [4 5 4 5 2]
 [5 1 6 2 1]
 [2 2 4 4 4]
 [1 6 6 2 1]
 [6 4 4 3 2]
 [6 1 2 2 6]
 [1 3 4 4 4]]


#### Tensorflow estimators require input functions, one for training, and one for evaluation. We define these input functions below.

In [4]:
# define input function for training with estimator, and use the dice np dataset 
def numpy_train_input_fn(samples): 
    # get the number of time steps and number of samples
    n_timeSteps = np.shape(samples)[-1]
    n_samples = np.shape(samples)[0]
    return tf.estimator.inputs.numpy_input_fn(
        x = dict(zip(np.arange(n_timeSteps), np.reshape(samples.T.astype(float), 
                                                                             [n_timeSteps, n_samples, 1]))),
        
        batch_size = 64, 
        num_epochs = 25, 
        shuffle = True, 
        queue_capacity = 1000
    )

# define input function for evaluation
def numpy_eval_input_fn(samples):
    # get the number of time steps and number of samples
    n_timeSteps = np.shape(samples)[-1]
    n_samples = np.shape(samples)[0]
    return tf.estimator.inputs.numpy_input_fn(
       x = dict(zip(np.arange(n_timeSteps), np.reshape(samples.T.astype(float), 
                                                                             [n_timeSteps, n_samples, 1]))),
        
        num_epochs = 1, 
        shuffle = False
    )

#### A key feature of the model architecture is that a simple *neural network* sits at each time point. Hence, if there is $n$ time steps, there will be $n-1$ neural nets; $n-1$, because at the terminal time, no neural net is needed. These networks are relatively simple, with two layers, and all have the same architecture. We will call such a network *gridNet* and it is defined next.  

#### The *gridNet* function has the following inputs, 
- time point 
- inputs (that the network consumes)
- nextInputs (this is needed to define the cost function, and it comes from the output of previous networks)
- name (this is used to define the variable scope, which becomes important at training level, as training of such gridNets should be done separately, and sequentially);

#### and outputs: 

- F_theta (needed for the cost function and training);
- f_theta (needed for the calculation of stopping time);
- gridCost (this is the cost function that needs to be optimised at a given time point).

#### Note that the optimisers of these nets will be defined separately at the model/graph building stage 

In [5]:
# -------------------------------------------
# Define gridNet that sits on the time points
# -------------------------------------------

def gridNet(time_point, inputs, nextInputs, name):
    one = tf.constant(1, dtype=tf.float64)
    
    with tf.variable_scope(name): #, reuse=tf.AUTO_REUSE):
        
        # architecture 
        first_layer = tf.layers.dense(inputs, 51, activation=tf.nn.relu,
                                      kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=0.01),
                                      #kernel_initializer=tf.glorot_normal_initializer(),
                                      name="first_layer")
        second_layer = tf.layers.dense(first_layer, 51, activation=tf.nn.relu, 
                                       kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=0.01),     
                                       #kernel_initializer=tf.glorot_normal_initializer(),
                                       name="second_layer")
        logits = tf.layers.dense(second_layer, 1, activation=None, 
                                         kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=0.01),
                                        #kernel_initializer=tf.glorot_normal_initializer(),
                                          name="logits") 
        F_theta = tf.nn.sigmoid(logits, name="F_theta")
        f_theta = tf.cast(tf.clip_by_value(tf.sign(logits), 0, 2), dtype=tf.int32, name="f_theta")
        
        # cost 
        gridReward = tf.add(tf.multiply(F_theta, inputs), 
                        tf.multiply((one-F_theta), nextInputs), 
                       name = "reward_"+str(time_point))
     
        gridCost = tf.scalar_mul(-1,tf.reduce_mean(gridReward))
    
        
    return [F_theta, f_theta, gridCost]

#### Build the graph, which will serve as a model input for tensorflow estimator 

In [6]:
# ---------------
# Build the model 
# ---------------

def my_model_fn(features, mode, params):  
    
    """Defining the custom architecture"""
    
    N = len(features) # length of the time grid
    
    # creating the input set from the dictionary for practical considerations later
    input_set = tf.concat([features[0], features[1]], 1, name="input_set")    
    for i in range(N-2):
        input_set = tf.concat([input_set, features[2+i]], 1)
    
    # create a dictionaries that, for each time point, store 
    taus = {}  # stopping times
    nextInputs = {} # that go into the gridNets
    ops={}  # optimizers
    train_ops = {}  # training
    NN = {}  # gridNets 
    
    one = tf.constant(1, dtype=tf.float64)
    
    # define the networks and flows recursively, starting from N-2 and going to 0 
    for t in range(N-2, -1, -1):
        
        if t==N-2:
            nextInputs[t] = features[t+1]
            NN[t] = gridNet(t, features[t], nextInputs[t], 'grid_'+str(t))        
            ops[t] = tf.train.AdamOptimizer(learning_rate=0.001)         
            train_ops[t] = ops[t].minimize(NN[t][2], 
                                var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='grid_'+str(t)))
    
            taus[t] = (N-2)*NN[t][1] + (N-1)*(1-NN[t][1])
            
        else:
            nextInputs[t] = tf.gather_nd(input_set, 
                       indices=tf.concat([tf.reshape(tf.range(tf.shape(taus[t+1])[0]), shape=tf.shape(taus[t+1])), taus[t+1]], 1),
                       name="g"+str(t))
            NN[t] = gridNet(t, features[t], nextInputs[t], 'grid_'+str(t))
            ops[t] = tf.train.AdamOptimizer(learning_rate=0.001)         
            train_ops[t] = ops[t].minimize(NN[t][2], 
                                    var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='grid_'+str(t)))
    
            taus[t] = tf.math.reduce_sum([t*NN[t][1]]+
                    [i*NN[i][1]*tf.math.reduce_prod([(1-NN[j][1]) for j in range(t, i)], axis=0) for i in range((t+1),(N-1))]+
                     [(N-1)*tf.math.reduce_prod([(1-NN[k][1]) for k in range(t, (N-1))], axis=0)]           
                                , axis=0) 
        
    # as a final step, pick the right element from the input set according to the stopping time taus[0]
    gg_0 = tf.gather_nd(input_set, 
                       indices=tf.concat([tf.reshape(tf.range(tf.shape(taus[0])[0]), shape=tf.shape(taus[0])), 
                                          taus[0]], 1), 
                       name="gg_0")
    
    # this will give the price -- the quantity we are looking for 
    price = tf.reduce_mean(gg_0, name="price")
    
    # training params
    global_step = tf.train.get_global_step()
    update_global_step = tf.assign(global_step, global_step + 1, name = 'update_global_step')
    
    train_op =tf.group([train_ops[_] for _ in train_ops.keys()])
    cost = tf.math.reduce_sum([NN[kk][2] for kk in train_ops.keys()]) #cost_1st+cost_2nd 
    
    # estimator specs for EVAL 
    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(
            mode=mode,
            loss=price,
            evaluation_hooks=None)
       
    # estimator specs for PREDICT
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(
                mode=mode,
                predictions={"price": price})
    
    # estimator specs for TRAINING
    if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(
            mode=mode, 
            loss= price,  #tf.cond((global_step < 5000), lambda: cost_2nd, lambda: cost_1st), 
            train_op= tf.group(train_op, update_global_step),
            #tf.cond((global_step < 5000), lambda: tf.group(train_op2, update_global_step), 
                     #       lambda: tf.group(train_op1, update_global_step)),#tf.group(train_op, update_global_step), 
            training_hooks=None)

#### Setting up the estimator

In [7]:
#MODEL_DIR = '/Users/Cellini/Desktop/Quant/DL Udacity/DLexs/Estimator'

In [8]:
nn = tf.estimator.Estimator(model_fn=my_model_fn) #, model_dir=MODEL_DIR)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpuamqjgqs', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x6365f5c18>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [9]:
# training
nn.train(input_fn=numpy_train_input_fn(dice_3), steps=20000)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use keras.layers.dense instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpuamqjgqs/model.ckpt.
INFO:tensorflow:loss = 3.34375, step = 1
INFO:tensorflow:global_step/sec: 43.0429
INFO:tensorflow:loss = 3.890625, step = 101 (2.324 sec)
INFO:tensorflow:global_step/sec: 232.612
INFO:tensorflow:loss = 4.5, step = 201 (0.430 sec)
INFO:tensorflow:global_step/sec: 1

INFO:tensorflow:global_step/sec: 214.125
INFO:tensorflow:loss = 4.703125, step = 6501 (0.467 sec)
INFO:tensorflow:global_step/sec: 168.875
INFO:tensorflow:loss = 4.765625, step = 6601 (0.592 sec)
INFO:tensorflow:global_step/sec: 217.376
INFO:tensorflow:loss = 4.796875, step = 6701 (0.460 sec)
INFO:tensorflow:global_step/sec: 221.758
INFO:tensorflow:loss = 4.953125, step = 6801 (0.451 sec)
INFO:tensorflow:global_step/sec: 162.37
INFO:tensorflow:loss = 4.78125, step = 6901 (0.616 sec)
INFO:tensorflow:global_step/sec: 160.999
INFO:tensorflow:loss = 4.78125, step = 7001 (0.621 sec)
INFO:tensorflow:global_step/sec: 223.171
INFO:tensorflow:loss = 4.921875, step = 7101 (0.448 sec)
INFO:tensorflow:global_step/sec: 222.77
INFO:tensorflow:loss = 4.890625, step = 7201 (0.449 sec)
INFO:tensorflow:global_step/sec: 218.547
INFO:tensorflow:loss = 4.40625, step = 7301 (0.458 sec)
INFO:tensorflow:global_step/sec: 173.464
INFO:tensorflow:loss = 4.671875, step = 7401 (0.576 sec)
INFO:tensorflow:global_st

<tensorflow_estimator.python.estimator.estimator.Estimator at 0x6365f5a20>

In [10]:
# evaluation
ev = nn.evaluate(input_fn=numpy_eval_input_fn(dice_3))
print("Price: %s" % ev["loss"])


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Starting evaluation at 2019-10-06T17:47:03Z
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpuamqjgqs/model.ckpt-11720
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-10-06-17:47:05
INFO:tensorflow:Saving dict for global step 11720: global_step = 11720, loss = 4.6749
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 11720: /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpuamqjgqs/model.ckpt-11720
Price: 4.6749


#### Result for the $M \times 3$ case
The solver should produce a result around $4.6\dots$, which is close to the analytical solution $28/6\approx 4.66$. Evaluating the trained model on a separate, similarly sized sample, the result should be reasonably close, $4.6\dots$, to the result obtained on the training sample.

In [11]:
# Create a separate sample for evaluation
dice_3_eval = np.random.randint(low=1, high=7, size=(M, 3))

In [12]:
# Use the evaluation sample to get the price
ev_ck = nn.evaluate(input_fn=numpy_eval_input_fn(dice_3_eval))
print("Price: %s" % ev_ck["loss"])


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-10-06T17:47:11Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpuamqjgqs/model.ckpt-11720
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-10-06-17:47:13
INFO:tensorflow:Saving dict for global step 11720: global_step = 11720, loss = 4.657015
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 11720: /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpuamqjgqs/model.ckpt-11720
Price: 4.657015


#### Do the same for the $M \times 5$ case

In [13]:
nn5 = tf.estimator.Estimator(model_fn=my_model_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp8ggnrjxu', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x638efea58>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [14]:
nn5.train(input_fn=numpy_train_input_fn(dice_5), steps=20000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp8ggnrjxu/model.ckpt.
INFO:tensorflow:loss = 3.65625, step = 1
INFO:tensorflow:global_step/sec: 50.9263
INFO:tensorflow:loss = 4.0625, step = 101 (1.965 sec)
INFO:tensorflow:global_step/sec: 151.818
INFO:tensorflow:loss = 5.078125, step = 201 (0.659 sec)
INFO:tensorflow:global_step/sec: 154.014
INFO:tensorflow:loss = 5.0, step = 301 (0.649 sec)
INFO:tensorflow:global_step/sec: 106.546
INFO:tensorflow:loss = 5.203125, step = 401 (0.939 sec)
INFO:tensorflow:global_step/sec: 150.045
INFO:tensorflow:loss = 5.09375, step = 501 (0.666 sec)
INFO:tensorflow:global_step/sec: 100.428
INFO:tensorflow:loss = 5.1875, step = 601 (0.996 sec)
INFO:tensorflow:global_step/sec

INFO:tensorflow:global_step/sec: 138.874
INFO:tensorflow:loss = 4.96875, step = 8201 (0.720 sec)
INFO:tensorflow:global_step/sec: 114.969
INFO:tensorflow:loss = 5.125, step = 8301 (0.870 sec)
INFO:tensorflow:global_step/sec: 139.76
INFO:tensorflow:loss = 5.296875, step = 8401 (0.716 sec)
INFO:tensorflow:global_step/sec: 143.452
INFO:tensorflow:loss = 5.03125, step = 8501 (0.697 sec)
INFO:tensorflow:global_step/sec: 113.623
INFO:tensorflow:loss = 5.15625, step = 8601 (0.880 sec)
INFO:tensorflow:global_step/sec: 145.364
INFO:tensorflow:loss = 5.09375, step = 8701 (0.688 sec)
INFO:tensorflow:global_step/sec: 142.362
INFO:tensorflow:loss = 4.96875, step = 8801 (0.703 sec)
INFO:tensorflow:global_step/sec: 116.1
INFO:tensorflow:loss = 5.109375, step = 8901 (0.861 sec)
INFO:tensorflow:global_step/sec: 146.087
INFO:tensorflow:loss = 4.953125, step = 9001 (0.685 sec)
INFO:tensorflow:global_step/sec: 118.895
INFO:tensorflow:loss = 5.15625, step = 9101 (0.841 sec)
INFO:tensorflow:global_step/sec:

<tensorflow_estimator.python.estimator.estimator.Estimator at 0x638fd87b8>

In [15]:
ev5 = nn5.evaluate(input_fn=numpy_eval_input_fn(dice_5))
print("Price: %s" % ev5["loss"])


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-10-06T17:49:09Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp8ggnrjxu/model.ckpt-11720
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-10-06-17:49:12
INFO:tensorflow:Saving dict for global step 11720: global_step = 11720, loss = 5.1303525
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 11720: /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp8ggnrjxu/model.ckpt-11720
Price: 5.1303525


#### Result for the $M \times 5$ case
The solver should produce a result around $5.1\dots$, which is close to the analytical solution $277/54 \approx 5.129$. Evaluating the trained model on a separate, similarly sized sample, the result should be reasonably close, $5.12\dots$, to the result obtained on the training sample.

In [16]:
# Create a separate sample for evaluation
dice_5_eval = np.random.randint(low=1, high=7, size=(M, 5))

In [17]:
# Use the evaluation sample to get the price
ev5_ck = nn5.evaluate(input_fn=numpy_eval_input_fn(dice_5_eval))
print("Price: %s" % ev5_ck["loss"])

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-10-06T17:49:25Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp8ggnrjxu/model.ckpt-11720
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-10-06-17:49:28
INFO:tensorflow:Saving dict for global step 11720: global_step = 11720, loss = 5.1213984
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 11720: /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp8ggnrjxu/model.ckpt-11720
Price: 5.1213984
