### Deep Optimal Stopping - Implementation with tensorflow estimator

#### General setting - demonstrated through the dice example
The model implemented below is based on the paper "Deep Optimal Stopping" by Sebastian Becker, Patrick Cheridito & Arnulf Jentzen, see [link](https://arxiv.org/pdf/1804.05394.pdf).

In [1]:
import tensorflow as tf
import numpy as np 
import scipy
import time
tf.__version__

'1.14.0'

#### Tensorflow estimators require input functions, one for training, and one for evaluation. We define these input functions below.

In [2]:
# define input function for training with estimator, and use the dice np dataset 
def numpy_train_input_fn(samples): 
    # get the number of time steps and number of samples
    n_timeSteps = np.shape(samples)[-1]
    n_samples = np.shape(samples)[0]
    return tf.estimator.inputs.numpy_input_fn(
        x = dict(zip(np.arange(n_timeSteps), np.reshape(samples.T.astype(float), 
                                                                             [n_timeSteps, n_samples, 1]))),
        
        batch_size = 64, 
        num_epochs = 30, 
        shuffle = True, 
        queue_capacity = 1000
    )

# define input function for evaluation
def numpy_eval_input_fn(samples):
    # get the number of time steps and number of samples
    n_timeSteps = np.shape(samples)[-1]
    n_samples = np.shape(samples)[0]
    return tf.estimator.inputs.numpy_input_fn(
       x = dict(zip(np.arange(n_timeSteps), np.reshape(samples.T.astype(float), 
                                                                             [n_timeSteps, n_samples, 1]))),
        
        num_epochs = 1, 
        shuffle = False
    )

#### A key feature of the model architecture is that a simple *neural network* sits at each time point. Hence, if there is $n$ time steps, there will be $n-1$ neural nets; $n-1$, because at the terminal time, no neural net is needed. These networks are relatively simple, with two layers, and all have the same architecture. We will call such a network *gridNet* and it is defined next.  

#### The *gridNet* function has the following inputs, 
- time point 
- inputs (that the network consumes)
- nextInputs (this is needed to define the cost function, and it comes from the output of previous networks)
- name (this is used to define the variable scope, which becomes important at training level, as training of such gridNets should be done separately, and sequentially);

#### and outputs: 

- F_theta (needed for the cost function and training);
- f_theta (needed for the calculation of stopping time);
- gridCost (this is the cost function that needs to be optimised at a given time point).

#### Note that the optimisers of these nets will be defined separately at the model/graph building stage 

In [3]:
# define the parameters of the individual networks
# need: - the standard deviation of the initializers 
#       - learning rate of the individual optimizers; the learning rate will be given at a later stage

stddev = 0.0005
#alpha = 0.0008

#### Define the payoff function $g(\cdot)$ for the general case

In [4]:
# define the payOff function func_g as a tensorflow function 
# for the dice example this function is the identity 
testId = lambda x: x 

def func_g(tens):
    return tf.map_fn(testId, tens)

In [5]:
# -------------------------------------------
# Define gridNet that sits on the time points
# -------------------------------------------

def gridNet(time_point, inputs, nextInputs, name):
    one = tf.constant(1, dtype=tf.float64)
    
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        
        # architecture 
        first_layer = tf.layers.dense(inputs, 51, activation=tf.nn.relu,
                                            #kernel_initializer=tf.glorot_normal_initializer(),
                                            #kernel_initializer=tf.keras.initializers.he_normal(), 
                                            kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=stddev),
                                            name="first_layer")
        second_layer = tf.layers.dense(first_layer, 51, activation=tf.nn.relu, 
                                            #kernel_initializer=tf.glorot_normal_initializer(),
                                            #kernel_initializer=tf.keras.initializers.he_normal(), 
                                            kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=stddev),
                                            name="second_layer")
        logits = tf.layers.dense(second_layer, 1, activation=None, 
                                          #kernel_initializer=tf.glorot_normal_initializer(),
                                          #kernel_initializer=tf.keras.initializers.he_normal(),
                                          kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=stddev),
                                          name="logits") 
        F_theta = tf.nn.sigmoid(logits, name="F_theta")
    
        # cost
        
        # ----------------------------------------------------------
        # uncomment the lines below for the case with generic payoff
        # ----------------------------------------------------------
        #gridReward = tf.add(tf.multiply(F_theta, func_g(inputs)), 
        #                tf.multiply((one-F_theta), func_g(nextInputs)), 
        #               name = "reward_"+str(time_point))
        
        gridReward = tf.add(tf.multiply(F_theta, inputs), 
                        tf.multiply((one-F_theta), nextInputs), 
                       name = "reward_"+str(time_point))
     
        
        gridCost = tf.scalar_mul(-1,tf.reduce_mean(gridReward))
        
    f_theta = tf.cast(tf.clip_by_value(tf.sign(logits), 0, 2), dtype=tf.int32, name="f_theta")
          
    return [F_theta, f_theta, gridCost]

#### Build the graph, which will serve as a model input for tensorflow estimator 

In [6]:
# ---------------
# Build the model 
# ---------------

def my_model_fn(features, mode, params):  
    
    """Defining the custom architecture"""
    
    N = len(features) # length of the time grid
    
    # creating the input set from the dictionary for practical considerations later
    input_set = tf.concat([features[0], features[1]], 1, name="input_set")    
    for i in range(N-2):
        input_set = tf.concat([input_set, features[2+i]], 1)
    
    # create dictionaries that, for each time point, store 
    taus = {}  # stopping times
    nextInputs = {} # that go into the gridNets
    ops={}  # optimizers
    train_ops = {}  # training
    NN = {}  # gridNets 
    
    one = tf.constant(1, dtype=tf.float64)
    
    # define the networks and flows recursively, starting from N-2 and going to 0 
    for t in range(N-2, -1, -1):
        
        if t==N-2:
            nextInputs[t] = features[t+1]
            NN[t] = gridNet(t, features[t], nextInputs[t], 'grid_'+str(t))        
            ops[t] = tf.train.AdamOptimizer(learning_rate=params['alpha'][t])         
            train_ops[t] = ops[t].minimize(NN[t][2], 
                                var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='grid_'+str(t))) 
                                          #global_step=tf.train.get_global_step())
    
            taus[t] = (N-2)*NN[t][1] + (N-1)*(1-NN[t][1])
            
        else:
            nextInputs[t] = tf.gather_nd(input_set, 
                       indices=tf.concat([tf.reshape(tf.range(tf.shape(taus[t+1])[0]), shape=tf.shape(taus[t+1])), taus[t+1]], 1),
                       name="g"+str(t))
            NN[t] = gridNet(t, features[t], nextInputs[t], 'grid_'+str(t))
            ops[t] = tf.train.AdamOptimizer(learning_rate=params['alpha'][t])         
            train_ops[t] = ops[t].minimize(NN[t][2], 
                                    var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='grid_'+str(t))) 
                                          #global_step=tf.train.get_global_step())
    
            taus[t] = tf.math.reduce_sum([t*NN[t][1]]+
                    [i*NN[i][1]*tf.math.reduce_prod([(1-NN[j][1]) for j in range(t, i)], axis=0) for i in range((t+1),(N-1))]+
                     [(N-1)*tf.math.reduce_prod([(1-NN[k][1]) for k in range(t, (N-1))], axis=0)]           
                                , axis=0) 
        
    # as a final step, pick the right element from the input set according to the stopping time taus[0]
    
    # ----------------------------------------------------------
    # uncomment the lines below for the case with generic payoff
    # ----------------------------------------------------------
    #args_tau = tf.gather_nd(input_set, 
    #                   indices=tf.concat([tf.reshape(tf.range(tf.shape(taus[0])[0]), shape=tf.shape(taus[0])), 
    #                                      taus[0]], 1), 
    #                   name="args_tau")
    
    #gg_0 = func_g(args_tau)
    
    gg_0 = tf.gather_nd(input_set, 
                       indices=tf.concat([tf.reshape(tf.range(tf.shape(taus[0])[0]), shape=tf.shape(taus[0])), 
                                          taus[0]], 1), 
                       name="gg_0")
    
    
    # this will give the price -- the quantity we are looking for 
    price = tf.reduce_mean(gg_0, name="price")
    
    # training params
    global_step = tf.train.get_global_step()
    update_global_step = tf.assign(global_step, global_step + 1, name = 'update_global_step')
    
    train_op =tf.group([train_ops[_] for _ in train_ops.keys()])
    cost = tf.math.reduce_sum([NN[kk][2] for kk in train_ops.keys()]) #cost_1st+cost_2nd 
    
    # estimator specs for EVAL 
    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(
            mode=mode,
            loss=price,
            evaluation_hooks=None)
       
    # estimator specs for PREDICT
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(
                mode=mode,
                predictions={"price": price})
    
    # estimator specs for TRAINING
    if mode == tf.estimator.ModeKeys.TRAIN:
        return tf.estimator.EstimatorSpec(
            mode=mode, 
            loss= price,
            train_op= tf.group(train_op, update_global_step),
            training_hooks=None)

#### Calculate the analytical prices

In [7]:
#MODEL_DIR = '/Users/Cellini/Desktop/Quant/DL Udacity/DLexs/Estimator'

In [8]:
# function that computes the analytical solution for given n steps 
def dicePrice(n):
    prices = {}
    
    prices[1] = 3.5
    if n==1: 
        return prices[1]
    else:
        for k in range(2,n+1):
            prices[k] = (1./6.)*np.sum([max(prices[k-1], x) for x in [1, 2, 3, 4, 5, 6]])
        return prices[n]

In [9]:
# print the analytical solutions
for _ in range(3, 11):
    print((_, dicePrice(_)))

(3, 4.666666666666666)
(4, 4.944444444444444)
(5, 5.129629629629629)
(6, 5.274691358024691)
(7, 5.395576131687243)
(8, 5.496313443072702)
(9, 5.580261202560585)
(10, 5.6502176688004875)


#### Creating samples in the format of an $M \times n$ matrix, where 
* $n$, the length of the row, represents the number of outcomes (or tosses), or the *time steps* in a time series;
* $M$ represents the sample size. 

#### In the example below, we demonstrate the algorithm for $n=3, \dots, 10$. 

In [10]:
# set the run config
config = tf.estimator.RunConfig(log_step_count_steps=2000)

In [11]:
M = 100000 # number of sample paths
numberOfSteps = 2500 # number of training steps
params = {} # dictionary (of parameters) that holds the learning rates 

lower = 3
upper = 11

# results that includes the price calculated by the model and the training times will be stored in
# a dictionary, where the key indicated the number of time steps
DOS = {}

for i in range(lower, upper):
    if i == 3:
        params['alpha'] = [0.0008, 0.0008]
    else:
        params['alpha'] = np.linspace(0.0012, 0.0002, num=(i-1))
    
    dice = np.random.randint(low=1, high=7, size=(M, i)) # create samples for training
    dice_eval = np.random.randint(low=1, high=7, size=(M, i)) # a separate sample for evaluation 
        
    nn = tf.estimator.Estimator(#model_dir=MODEL_DIR, 
        model_fn=my_model_fn, params=params, config=config)
    start = time.time()
    nn.train(input_fn=numpy_train_input_fn(dice), steps=numberOfSteps)
    end = time.time()
    ev = nn.evaluate(input_fn=numpy_eval_input_fn(dice_eval))
    
    DOS[i] = [ev["loss"], end-start]
    
    del dice, dice_eval, nn, ev


INFO:tensorflow:Using config: {'_model_dir': '/var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp7j24oo4i', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 2000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x641fcbf28>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automat

INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-10-21-00:17:21
INFO:tensorflow:Saving dict for global step 2500: global_step = 2500, loss = 4.946731
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 2500: /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmpwnkz7307/model.ckpt-2500
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp1udz1esb', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 2000, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_service': None, '_cluster_spec': <tensor

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp4ov_mi2l/model.ckpt.
INFO:tensorflow:loss = 3.53125, step = 1
INFO:tensorflow:global_step/sec: 66.4072
INFO:tensorflow:loss = 5.59375, step = 2001 (30.118 sec)
INFO:tensorflow:Saving checkpoints for 2500 into /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp4ov_mi2l/model.ckpt.
INFO:tensorflow:Loss for final step: 5.40625.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-10-21T00:20:10Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/16/6vfzqvh50sv0n670v2qmktsw0000gn/T/tmp4ov_mi2l/model.ckpt-2500
INFO:tensorflow:Running local_init_op.
INFO:tens

In [12]:
# compare the results to the analytic solutions
for i in range(lower, upper):
    print("Steps: %d --" %i, "Analytical Solution: %.5f --" %dicePrice(i), "ML Solution: %.5f --" %DOS[i][0], "Difference: %.5f --" %abs(dicePrice(i) - DOS[i][0]), "Training time: %.4f" %DOS[i][1])

Steps: 3 -- Analytical Solution: 4.66667 -- ML Solution: 4.66699 -- Difference: 0.00032 -- Training time: 16.2470
Steps: 4 -- Analytical Solution: 4.94444 -- ML Solution: 4.94673 -- Difference: 0.00229 -- Training time: 20.6288
Steps: 5 -- Analytical Solution: 5.12963 -- ML Solution: 5.12955 -- Difference: 0.00008 -- Training time: 27.9400
Steps: 6 -- Analytical Solution: 5.27469 -- ML Solution: 5.27621 -- Difference: 0.00151 -- Training time: 32.0522
Steps: 7 -- Analytical Solution: 5.39558 -- ML Solution: 5.40224 -- Difference: 0.00667 -- Training time: 36.8816
Steps: 8 -- Analytical Solution: 5.49631 -- ML Solution: 5.49757 -- Difference: 0.00126 -- Training time: 45.9945
Steps: 9 -- Analytical Solution: 5.58026 -- ML Solution: 5.58343 -- Difference: 0.00317 -- Training time: 46.8430
Steps: 10 -- Analytical Solution: 5.65022 -- ML Solution: 5.64687 -- Difference: 0.00335 -- Training time: 48.3600


#### Concluding remarks
- The solver should produce reasonably accurate results, an error around $10^{-3}$, depending on the number of time steps. In the above setting, the first two decimals are expected to be correct, occasional slip ups, that is, greater errors, might appear in cases towards higher number of time steps, e.g. 8 or above. However even that is less expected in the current setting.
- The largest number of time steps that was tested was $10$;
- The standard deviation of the weight initializers in each individual network was set to the relatively low value of $\sigma = 0.0005$;
- Somewhat surprisingly, we found that taking different learning rates for the networks at each time step allows for a more accurate and reliable training results, and less training steps are needed. In the above setting, we found that a linearly decreasing learning rates works relatively well. More precisely, the last network has the smallest ($0.0002$) and the first network has the largest ($0.0012$) learning rate. The increments of the learning rate are linear and depend on the number of time steps; e.g. for the $10$ time steps case, the learning rates are set $[0.0012, 0.001075,  0.00095, 0.000825, 0.0007, 0.000575, 0.00045, 0.000325, 0.0002]$. The case of $3$ times steps, there are only two networks, and both learning rates are set to the same value of $0.0008$. Setting the learning rate the same for all networks can work reasonably well up until 8 time steps. We found that such setting breaks down, in terms of accuracy, for 9 and above number of time steps, and finding a good balance between number of training steps and learning rate becomes harder;  
- Implementing the payoff function $g(\cdot)$ that allows more general cases, slowed down the training times considerably. This is, likely, due to the follwoing two factors: 1) repeated function calls; 2) the payoff function appears in the optimisation process too. Currently that general setting is commented out. 

#### Potential directions for future work 
- It would be interesting to compare the performance of the solver to a more classical one, e.g. Longstaff-Schwartz American option pricer that is also based on MC methods. One could compare, whether the same samples lead to similar results, with similar variances. Even though there are other factors come into play, e.g. basis functions choice for the regression in the Longstaff-Schwartz method;
- Analysing the performance bottle neck of the above solver and find a more efficient implementation, and/or training setting 