## Example 8 (Finding maximum likelihood with the L-BFGS optimizer)

In this example we’ll **use the L-BFGS optimizer implemented in tfp**.
Details of the algorithm can be found here but in summary L-BFGS is a quasi-Newton method as it makes
use of an approximation to the Hessian matrix.
We’ll use the same coin flip example as the previous example and as before we begin by creating a loss
function to minimize which is the negative log likelihood. In this example we’re going to use a sigmoid
transform to ensure p stays between 0 and 1. We need to do this as the algorithm can make large steps
which could lead to invalid values of p

In [7]:
import tensorflow_probability as tfp
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
tfd = tfp.distributions

In [8]:
@tf.function
def loss(p_input):
    N=10
    h=2
    p = tf.nn.sigmoid(p_input)
    likelihood = tfd.Binomial(N,probs=p).log_prob(h)
    return tf.squeeze(-likelihood)

The optimizers offered by tfp have a different requirement to tensorflow’s standard optimizers in that they
require a function that provides the loss function to be minimized and its gradients. There is a built-in
function that facilitates this called tfp.math.value_and_gradient

In [9]:
def loss_and_gradient(p):
    return tfp.math.value_and_gradient(loss ,p)

In [10]:
# set the initial value of p
start = np.array([0.5])
# run the optimizer
results = tfp.optimizer.lbfgs_minimize(loss_and_gradient,
                    initial_position=start,
                    tolerance=1e-8)



In [11]:
#Unlike the tensorflow optimizers we can easily check whether we have converged on the optimal value,
print('Optimizer converged: ', results.converged.numpy())

Optimizer converged:  True


In [12]:
# print the optimal transformed value
print(tf.nn.sigmoid(results.position).numpy())

[0.2]
