# Simple network to minimize CRPS


The EMOS analog is a simple network like this:

![title](EMOS_network.png)

In this notebook we will build this simple network in theano and use the CRPS as a cost function. 

In [79]:
# First, let's import the libraries we need
import theano
import theano.tensor as T
import numpy as np

# Let's make this notebook reproducible by defining the random seed
np.random.RandomState(42)   # I don't even like the hitchhiker...

<mtrand.RandomState at 0x1126aeea0>

I followed this tutorial to figure out the theano basics: http://www.marekrei.com/blog/theano-tutorial/

We will now attempt to build a simplle class for our model following: https://github.com/marekrei/theano-tutorial/blob/master/classifier.py

So the first step is to create a class and initialize the network architecture. theano allocates a graph. This means that we first plot out the computations which will be done in the future. 

In [85]:
class EMOS_Network(object):
    def __init__(self):
        """
        This function is called once an object of this class is created.
        """
        # Before we start with the network, let's define
        # the learning rate as an input so we can vary it
        lr = T.fscalar('lr')
        
        # First let's define the input to the network
        # This is the ensemble mean (meanx), 
        # the ensemble stadnard deviation (stdx) and
        # the corresponding observation (target)
        # In theano we use tensors to describe these variables.
        # T.fvector allocates a float32 1D vector
        meanx = T.fvector('meanx')   # The name helps with debugging
        stdx = T.fvector('stdx')
        target = T.fvector('target')
        
        # Next we allocate the weights (a, b, c, d) as shared
        # variables and initialize some value for them.
        # For now we will just draw a random variable from N(0, 1)
        a = theano.shared(np.random.randn(), 'a')
        b = theano.shared(np.random.randn(), 'b')
        c = theano.shared(np.random.randn(), 'c')
        d = theano.shared(np.random.randn(), 'd')
        
        # Now that we have the input and the weights, 
        # we can set up the network.
        mu = a + meanx * b
        sigma = c + stdx * d
        
        # Now comes the cost function.
        # To stop sigma from becoming negative we first have to 
        # convert it the the variance and then take the square
        # root again. (I learned from experience...)
        # This part of the code is inspired by Kai Polsterer's code!
        var = T.sqr(sigma)
        # The following three variables are just for convenience
        loc = (target - mu) / T.sqrt(var)
        phi = 1.0 / np.sqrt(2.0 * np.pi) * T.exp(-T.square(loc) / 2.0)
        Phi = 0.5 * (1.0 + T.erf(loc / np.sqrt(2.0)))
        # First we will compute the crps for each input/target pair
        crps =  T.sqrt(var) * (loc * (2. * Phi - 1.) + 2 * phi - 1. / np.sqrt(np.pi))
        # Then we take the mean. The cost is now a scalar
        mean_crps = T.mean(crps)
        
        # Now compute the gradients of the cost function 
        # with respect to the four weights/parameters
        params = [a, b, c, d]   # Let's put them in a list for convenience
        gradients = theano.tensor.grad(mean_crps, params)
        
        # For gradient descent we now need to subtract the gradients
        # from our parameters to minimize the cost function
        # In theano we want to define a list of tuples containing
        # the old parameter and the updated parameter.
        updates = [(p, p - lr * g) for p, g in zip(params, gradients)]
        
        # So far no actual computations have been done. Now we will
        # define a Theano function, which takes input, does some 
        # calculations and returns some output. In our case, we use 
        # meanx, stdx and the target as an input plus the required 
        # learning rate and return the mean_crps
        # as an output. Then we tell the function to apply the update
        # every time it is called. This is the training
        self.train = theano.function([meanx, stdx, target, lr], 
                                     mean_crps, updates=updates)
        # Furthermore, we define a method for simply making a prediction
        # and returning the predicted values of mu and sigma
        # along with the mean_crps without updating the parameters
        self.predict = theano.function([meanx, stdx, target],
                                       [mu, sigma, mean_crps])

In [83]:
# Let's define some input arrays
nb_data = 100  # Number of data
# It is important that the input has the same type (float32) as the Theano tensors
in_meanx = np.asarray(np.random.randn(nb_data) + 3, dtype='float32')   # Random with mean 3 and std 1
in_stdx = np.asarray(2 * np.random.randn(nb_data) + 1, dtype='float32')
in_target = np.asarray(1.5 * np.random.randn(nb_data) + 2, dtype='float32')

In [106]:
# Let's create a perfect training set, where we should get perfect results
in_meanx = np.ones(nb_data, dtype='float32') * 3
in_stdx = np.ones(nb_data, dtype='float32') * 3
in_target = np.ones(nb_data, dtype='float32') * 2

Now we have set up our model and created some simple test data. Let's now initialize the network and train it!

In [107]:
# Initialize the network
model = EMOS_Network()

In [112]:
# Let's run over the data a few times and print out the crps every few steps
# Note that this is simply gradient descent, not stochastic, since
# we are giving the algorithm all the data for each update/
lr = np.asarray(0.001, dtype='float32')
for i in range(500):
    cost = model.train(in_meanx, in_stdx, in_target, lr)
    if i%10 == 0: print('Step %i; mean_crps = %.3f' % (i, cost))

Step 0; mean_crps = 0.001
Step 10; mean_crps = 0.002
Step 20; mean_crps = 0.001
Step 30; mean_crps = 0.007
Step 40; mean_crps = 0.001
Step 50; mean_crps = 0.000
Step 60; mean_crps = 0.003
Step 70; mean_crps = 0.001
Step 80; mean_crps = 0.000
Step 90; mean_crps = 0.001
Step 100; mean_crps = 0.004
Step 110; mean_crps = 0.001
Step 120; mean_crps = 0.006
Step 130; mean_crps = 0.001
Step 140; mean_crps = 0.000
Step 150; mean_crps = 0.000
Step 160; mean_crps = 0.004
Step 170; mean_crps = 0.000
Step 180; mean_crps = 0.004
Step 190; mean_crps = 0.001
Step 200; mean_crps = 0.000
Step 210; mean_crps = 0.002
Step 220; mean_crps = 0.001
Step 230; mean_crps = 0.002
Step 240; mean_crps = 0.004
Step 250; mean_crps = 0.001
Step 260; mean_crps = 0.000
Step 270; mean_crps = 0.001
Step 280; mean_crps = 0.000
Step 290; mean_crps = 0.001
Step 300; mean_crps = 0.001
Step 310; mean_crps = 0.001
Step 320; mean_crps = 0.002
Step 330; mean_crps = 0.003
Step 340; mean_crps = 0.000
Step 350; mean_crps = 0.004
Ste

In [113]:
preds = model.predict(in_meanx, in_stdx, in_target)

In [115]:
preds[0][:5]

array([ 1.99962553,  1.99962553,  1.99962553,  1.99962553,  1.99962553])

Good so far, we are able to reduce the CRPS and also get the correct predictions where possible. So now we can actually start thinking about real data!