Docs on how to use the emulator on PROSAIL

jgomezdans · Oct 31, 2018 · 2680433 · 2680433
1 parent 04743e7
commit 2680433
Show file tree

Hide file tree

Showing 6 changed files with 706 additions and 126 deletions.
diff --git a/docs/emulating_rt_model.rst b/docs/emulating_rt_model.rst
@@ -0,0 +1,141 @@
+Emulating a typical radiative transfer model
+===============================================
+
+This package was designed to emulate radiatie transfer models. The process entails the following steps:
+
+1. Decide on what input parameters are required
+2. Decide their ranges
+3. Generate a training input parameter set
+4. Run the model for each element of the input training set and store the outputs
+5. Pass  the input and output training pairs to the library, and let it fit the hyperparameters
+
+We can show how this works with an example of the PROSPECT+SAIL model. 
+
+Setting the input parameter ranges
+-----------------------------------
+
+We can set the parameter names and their ranges simply by having lists with minimum and maximum values. This assumes a uniformly-distributed parameter distribution between those two boundaries, but other distributions are possible (we never had any reason to try them though!). We additionally set up the SRFs and other variables that need to be defined here... We train the model on 250 samples and test on (say) 100. 100 validation samples is probably too few, but for the sake of not waiting too much... ;-)
+
+.. code-block:: python
+
+    from functools import partial
+
+    import numpy as np
+
+    import gp_emulator
+
+    import prosail
+
+    # Spectral band definition. Just a top hat, with start and 
+    # end wavlengths as an example
+    b_min = np.array( [ 620., 841, 459, 545, 1230, 1628, 2105] )
+    b_max = np.array( [ 670., 876, 479, 565, 1250, 1652, 2155] )
+    wv = np.arange ( 400, 2501 )
+    passband = []
+
+    # Number of training and validation samples
+    n_train = 250
+    n_validate = 100
+    # Validation number is small, increase to a more realistic value
+    # if you want
+
+    # Define the parameter names and their ranges
+    # Note that we are working here in transformed coordinates...
+
+    # Define geometry. Each emulator is for one geometry
+    sza = 30.
+    vza = 0.
+    raa = 0. # in degrees
+
+    parameters = [ 'n', 'cab', 'car', 'cbrown', 'cw', 'cm', 'lai', 'ala', 'bsoil', 'psoil']
+    min_vals = [ 0.8       ,  0.46301307,  0.95122942,  0.        ,  0.02829699,
+            0.03651617,  0.04978707,  0.44444444,  0.        ,  0.]
+    max_vals = [ 2.5       ,  0.998002  ,  1.        ,  1.        ,  0.80654144,
+            0.84366482,  0.99501248,  0.55555556,  2.   , 1     ]
+
+
+We then require a function for calling the RT model. In the case of PROSAIL, we can do that easily from Python, in other models available in e.g. Fortran, you could have a function that calls the external model
+
+
+.. code-block:: python
+
+    def inverse_transform ( x ):
+        """Inverse transform the PROSAIL parameters"""
+        x_out = x*1.
+        # Cab, posn 1
+        x_out[1] = -100.*np.log ( x[1] )
+        # Cab, posn 2
+        x_out[2] = -100.*np.log ( x[2] )
+        # Cw, posn 4
+        x_out[4] = (-1./50.)*np.log ( x[4] )
+        #Cm, posn 5
+        x_out[5] = (-1./100.)*np.log ( x[5] )
+        # LAI, posn 6
+        x_out[6] = -2.*np.log ( x[6] )
+        # ALA, posn 7
+        x_out[7] = 90.*x[7]
+        return x_out
+
+
+
+    def rt_model ( x, passband=None, do_trans=True ):
+        """A coupled land surface/atmospheric model, predicting refl from
+        land surface parameters. Thisfunction provides estimates of refl for 
+        a particular illumination geometry.
+        
+        The underlying land surface reflectance spectra is simulated using
+        PROSAIL. The input parameter ``x`` is a vector with the following components:
+            
+            * ``n``
+            * ``cab``
+            * ``car``
+            * ``cbrown``
+            * ``cw``
+            * ``cm``
+            * ``lai``
+            * ``ala``
+            * ``bsoil``
+            * ``psoil``
+
+        """
+        x, sza, vza, raa = x
+
+        # Invert parameter LAI
+        if do_trans:
+            x = inverse_transform ( x )
+        ################# surface refl with prosail #####################
+    
+        surf_refl = prosail.run_prosail(x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7], 0.01, sza, vza, raa,
+                        rsoil=x[8], psoil=x[9])
+        if passband is None:
+            return surf_refl
+        else:
+            return surf_refl[passband].mean()
+
+
+Now we loop over all the bands, and prepare the emulators. we do this by using the `create_emulator_validation` function, that does everything you'd want to do... We just stuff the emulator, training and validation sets in one list for convenience. 
+
+.. code-block:: python
+
+    retval = []
+    for iband,bmin in enumerate ( b_min ):
+        # Looping over the bands....
+        print("Doing band %d" % (iband+1))
+        passband = np.nonzero( np.logical_and ( wv >= bmin, wv <= b_max[iband] ) )
+        # Define the SRF for the current band
+        # Define the simulator for convenience
+        simulator = partial (rt_model, passband=passband)
+        # Actually create the training and validation parameter sets, train the emulators
+        # and return all that 
+        x = gp_emulator.create_emulator_validation (simulator, parameters, min_vals, max_vals, 
+                                    n_train, n_validate, do_gradient=True, 
+                                    n_tries=15, args=(30, 0, 0) )
+        retval.append (x)
+
+        
+A simple validation visualisation looks like this
+
+.. figure:: prosail_emulator_new_300.png
+   :figwidth: 90%
+
+   Comparison between the simulated output and the corresponding emulator output for the validation dataset. Correlations (R2) are in all cases better than 0.99. Slope was between 0.97 and 1., whereas the bias term was smaller than 0.002.
diff --git a/docs/index.rst b/docs/index.rst
@@ -11,6 +11,8 @@ Welcome to gp_emulator's documentation!
    :caption: Contents:
 
    introduction.rst
+   quickstart.rst
+   emulating_rt_model.rst
 
 
 Indices and tables

diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -28,123 +28,3 @@ or just clone or download the source repository and invoke `setup.py` script: ::
 
     python setup.py install
     
-Quickstart
-===========
-
-Single output model emulation
-----------------------------------
-
-Assume that we have two arrays, ``X`` and ``y``. ``y`` is of size ``N``, and it stores the ``N`` expensive model outputs that have been produced by running the model on the ``N`` input sets of ``M`` input parameters in ``X``. We will try to emulate the model by learning from these two training sets: ::
-
-    gp = gp_emulator.GaussianProcess(inputs=X, targets=y)
-    
-Now, we need to actually do the training... ::
-
-    gp.learn_hyperparameters()
-
-Once this process has been done, you're free to use the emulator to predict the model output for an arbitrary test vector ``x_test`` (size ``M``): ::
-
-    y_pred, y_sigma, y_grad = gp.predict (x_test, do_unc=True,
-                                           do_grad=True)
-    
-In this case, ``y_pred`` is the model prediction, ``y_sigma`` is the variance associated with the prediction (the uncertainty) and ``y_grad`` is an approximation to the Jacobian of the model around ``x_test``. 
-
-Let's see a more concrete example. We create a damped sine, add a bit of Gaussian noise, and then subsample a few points (10 in this case), fit the GP, and predict the function over the entire range. We also plot the uncertainty from this prediction.
-
-
-.. plot:: 
-    :include-source:
-
-    import random
-    import numpy as np
-    import matplotlib.pyplot as plt
-
-    import gp_emulator
-
-    random.seed(111)
-    n_samples = 2000
-    x = np.linspace(0, 2, n_samples)
-    y = np.exp(-0.7*x)*np.sin(2*np.pi*x/0.9)
-    y += np.random.randn(n_samples)*0.02
-    plt.plot(x, y, '-', label="Original")
-    # Select a few random samples from x and y
-    isel = random.choices(range(n_samples), k=10)
-    x_train = np.atleast_2d(x[isel]).T
-    y_train = y[isel] 
-    plt.plot(x_train[:,0], y_train, 'o', label="Samples")
-
-    gp = gp_emulator.GaussianProcess(x_train, y_train)
-    gp.learn_hyperparameters(n_tries=25)
-
-    y_pred, y_unc, _ = gp.predict(np.atleast_2d(x).T,
-                                    do_unc=True, do_deriv=False)
-    plt.plot(x, y_pred, '-', lw=2., label="Predicted")
-    plt.plot(x, np.exp(-0.7*x)*np.sin(2*np.pi*x/0.9), '-', label="True")
-    plt.fill_between(x, y_pred-1.96*y_unc,
-                        y_pred+1.96*y_unc, color="0.8")
-    plt.legend(loc="best")
-
-
-We can see that the GP is doing an excellent job in predicting the function, even in the presence of noise, and with a handful of sample points. In situations where there is extrapolation, this is indicated by an increase in the predictive uncertainty.
-
-
-
-Multiple output emulators
---------------------------
-
-In some cases, we can emulate multiple outputs from a model. For example, hyperspectral data used in EO can be emulated by employing the SVD trick and emulating the individual principal component weights. Again,  we use ``X`` and ``y``. ``y`` is now of size ``N, P``, and it stores the ``N`` expensive model outputs (size ``P``) that have been produced by running the model on the ``N`` input sets of ``M`` input parameters in ``X``. We will try to emulate the model by learning from these two training sets, but we need to select a variance level for the initial PCA (in this case, 99%) ::
-
-    gp = gp_emulator.MultivariateEmulator (X=y, y=X, thresh=0.99)
-    
-Now, we're ready to use on a new point ``x_test`` as above: ::
-
-    y_pred, y_sigma, y_grad = gp.predict (x_test, do_unc=True, 
-                                            do_grad=True)
-    
-
-
-A more concrete example: let's produce a signal that can be decomposed as a sum of scaled orthogonal basis functions...
-
-
-.. plot::
-    :include-source:
-
-    import random
-    import numpy as np
-
-    from scipy.fftpack import dct
-
-    import matplotlib.pyplot as plt
-    import gp_emulator
-
-    random.seed(111)
-
-    n_validate = 250
-    n_train = 100
-    basis_functions = dct(np.eye(128), norm="ortho")[:, 1:4]
-
-    params=["w1", "w2", "w3"]  
-    mins = [-1, -1, -1]
-    maxs = [1, 1, 1]
-
-
-    train_weights, dists = gp_emulator.create_training_set(params, mins, maxs,
-                                                            n_train=n_train)
-    validation_weights = gp_emulator.create_validation_set(dists,
-                                                        n_validate=n_validate)
-
-    training_set = (train_weights@basis_functions.T).T 
-
-    training_set += np.random.randn(*training_set.shape)*0.0005
-    validation_set = (validation_weights@basis_functions.T).T
-
-    gp = gp_emulator.MultivariateEmulator (y=train_weights, X=training_set.T,
-                                            thresh=0.973, n_tries=25)
-    y_pred = np.array([gp.predict(validation_weights[i])[0] 
-                            for i in range(n_validate)])
-
-    fig, axs = plt.subplots(nrows=1, ncols=2,sharey=True,figsize=(12, 4))
-    axs[0].plot(validation_set[:, ::25])
-    axs[1].plot(10.*(y_pred.T - validation_set))
-    axs[0].set_title("Samples from validation dataset")
-    axs[1].set_title("10*Mismatch between validation simulator and emulator")