# Introduction

In this notebook, we'll learn how EMERGENT models parameter spaces and finds extrema. First, we'll study how EMERGENT chooses experimental data points to acquire, governed by the sampling method chosen by you, the user. Let's look at the base Sampling class:

In [None]:
from emergent.samplers.sampling import Sampling
Sampling??

In order to define a sampling method, you should override the \_run() method, as we will look at shortly. For now, understand that this method simply executes a bunch of experiments and returns the points (parameter space coordinates) and costs (experimental results). After the data is acquired, the \_finish() method allows a model to be fit to the data, then goes to either the first, last, or best point sampled. The Sampling class also implements a standardized plot() method which returns a handle to a 2D plot of the parameter space (currently only 2D spaces are supported).

Now let's look at how to define a sampling method. Consider the Grid sampling class:

In [None]:
from emergent.samplers import Grid
Grid??

As we can see, defining new sampling methods is easy - just override the \_run() method, then define any parameters you don't want to hard-code; these parameters will automatically appear in the GUI to be set before starting a run. In this case, the \_run() method creates a uniform grid between 0 and 1 in _N_ dimensions, runs the experiment at each point, and returns the results.

<div class="alert alert-block alert-warning">
Don't worry about the actual ranges - the sampling takes places in a normalized subspace from 0 to 1 along all coordinates, and EMERGENT automatically scales to the ranges defined by the Hub.range dictionary, also displayed in the GUI's network tree.
</div>


Grid sampling can be useful for exploring experiments with 1 or 2 knobs, but the number of iterations required grows exponentially in the number of dimensions. For experiments with many free parameters, we can sample more intelligently with the _online_ sampling method:

In [None]:
from emergent.samplers import Online
Online??

The basic idea is that we attach a model to the sampling process - this could be based on physical intuition or just a general modeler like a Gaussian process. The sampling proceeds as follows:

1. Pre-sample some number of randomly generated points.
2. Fit the model to all acquired data.
3. Numerically minimize the _effective cost_ over the modeled surface to choose the next point.
4. Repeat step 3 for each point in a batch, then return to step 2, and repeat.

The _effective cost_ is the magic bullet that makes this work so well. Instead of sampling uniformly, we sample intelligently, using all of the knowledge gathered about the experiment to suggest the next point. But what function should we use for the effective cost? A good first guess is the amplitude of the parameter surface itself, i.e. seeking extrema where the signal-to-noise is high. However, if your model can also generate an uncertainty estimate for a point, then it works well to optimize some linear combination of the amplitude and the uncertainty. By iterating through various combinations, we can go from optimizer mode (seeking low amplitude) and explorer mode (seeking high uncertainty), refining the model very quickly.

Let's now examine how models are implemented in EMERGENT, starting with the base Model class:

In [None]:
from emergent.models.model import Model
Model??

When defining a model, you should override the fit() and predict() methods. The former should fit the model based on all acquired data so far, while the latter should generate a prediction of the experimental result (and possibly the uncertainty) at a given state dictionary. The Model class implements a number of other functions too, including generation of the next_sample() in terms of the explorer/optimizer tradeoff _b_ and the ability to plot() the modeled parameter surface in 2D.

Now let's look at a specific implementation of a model, Gaussian process regression:

In [None]:
from emergent.models import GaussianProcess
GaussianProcess??

Just as with defining a Sampling subclass, it is very simple to define a new model. In this case, we initialize a regressor in the __init__ method, then override the Model.fit() and Model.predict() methods with the specific syntax required for the GaussianProcessRegressor class imported from scikit-learn.

# Running an optimization
Now let's try actually running an optimization! First, initialize a demo network by running the Getting Started notebook:

In [1]:
%run "./Getting started.ipynb"

C:\emergent\emergent
Overwriting networks/test/things/test_thing.py
Overwriting networks/test/hubs/test_hub.py
Overwriting networks/test/network.py
Actuating to {'X': 1, 'Y': 2}
DataDict([('thing', {'X': 1, 'Y': 2})])
Actuating to {'X': 1, 'Y': 2}
New state: DataDict([('thing', {'X': 1, 'Y': 2})])
 * Serving Flask app "emergent.modules.webAPI" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)


In order to start a process, we POST a settings dictionary to the /run endpoint containing all of the info necessary to define the process. All of the following is handled much simpler by the GUI, but this tutorial demonstrates the exact steps required to manually launch an optimization through the API. Let's construct a command for a grid search optimization of the "gaussian" experiment attached to our hub:

In [2]:
hub = network.hubs['hub']
settings = {'hub': hub.name, 'state': hub.state}
settings['experiment'] = {'name': 'gaussian',
                          'params': {'sigma_x': 0.3, 
                                     'sigma_y': 0.8, 
                                     'x0': 0.3, 
                                     'y0': 0.6, 
                                     'noise':0, 
                                     'delay': 0.5} }
settings['sampler'] = {'name': 'Grid',
                       'params': {'Steps': 20, 'Sweeps': 1} }

settings['process'] = {'type': 'model', 'end at': 'Best point', 'callback': None}

Now let's submit this process via the API:

In [None]:
import requests
requests.post('http://127.0.0.1:5000/run', json=settings)

Because we specified the Grid sampler with 20 steps, the optimization was a crude brute force search on a 20x20 grid, converging to near the peak after 400 experimental cycles. We could solve this problem more efficiently with online Gaussian process optimization:

In [7]:
hub = network.hubs['hub']
settings = {'hub': hub.name, 
            'state': hub.state}
settings['experiment'] = {'name': 'gaussian',
                          'params': {'sigma_x': 0.3, 
                                     'sigma_y': 0.8, 
                                     'x0': 0.3, 
                                     'y0': 0.6, 
                                     'noise':0, 
                                     'delay': 0.5} }
settings['sampler'] = {'name': 'Online',
                       'params': {'Presampled points': 10, 
                                  'Iterations': 5,
                                  'Batch size': 5,
                                  'Tolerance': 0.01} }
settings['model'] = {'name': 'GaussianProcess',
                     'params': {'Amplitude': 1,
                                'Length scale': 1,
                                'Noise': 0} }

settings['process'] = {'type': 'model', 'end at': 'Best point', 'callback': None}

In [8]:
import requests
requests.post('http://127.0.0.1:5000/run', json=settings)

INFO:werkzeug:127.0.0.1 - - [14/Mar/2019 17:48:31] "POST /run HTTP/1.1" 200 -


Actuating to {'X': 0.29999999697376073, 'Y': 0.6000000044122996}
Actuating to {'X': 0.8956240624937795, 'Y': 0.24067041595477456}
Actuating to {'X': 0.3356792696306593, 'Y': 0.3655153794754148}
Actuating to {'X': 0.5476707666861723, 'Y': 0.20974577427488683}
Actuating to {'X': 0.07613891050639332, 'Y': 0.5255688268215828}
Actuating to {'X': 0.003710511532121763, 'Y': 0.14677280370169055}
Actuating to {'X': 0.3659341444772364, 'Y': 0.4686807282865476}
Actuating to

<Response [200]>

 {'X': 0.6133663819039652, 'Y': 0.7009896339322805}
Actuating to {'X': 0.3428477963174096, 'Y': 0.8480525525734}
Actuating to {'X': 0.32956462162863054, 'Y': 0.809251407882458}
Actuating to {'X': 0.15980863136744217, 'Y': 0.5061549435078785}
Actuating to {'X': 0.04999999697376073, 'Y': 0.8500000044122996}
Actuating to {'X': 0.04999999697376073, 'Y': 0.8500000044122996}
Actuating to {'X': 0.04999999697376073, 'Y': 0.8500000044122996}
Actuating to {'X': 0.04999999697376073, 'Y': 0.8500000044122996}
Actuating to {'X': 0.04999999697376073, 'Y': 0.8500000044122996}
Actuating to {'X': 0.3015065077408921, 'Y': 0.6189902015944494}
Actuating to {'X': 0.5499999969737608, 'Y': 0.8500000044122996}
Actuating to {'X': 0.5499999969737608, 'Y': 0.8500000044122996}
Actuating to {'X': 0.22786359857835478, 'Y': 0.7153716928146138}
Actuating to {'X': 0.2556063827379436, 'Y': 0.6847744522214495}
Actuating to {'X': 0.3338082292259617, 'Y': 0.6385469100450547}
Actuating to {'X': 0.3054322944395359, 'Y': 0.61

INFO:root:Optimization complete!


Actuating to {'X': 0.301233320439634, 'Y': 0.6064074099435781}


Now the model converged within 1% tolerance much quicker! However, we can do better than this: while Gaussian processes can model virtually any complex experimental parameter space, the training requires fitting many degrees of freedom. If we can restrict the degrees of freedom by suggesting a model that's close to the experimental landscape, we can vastly improve the training time:

In [10]:
hub = network.hubs['hub']
settings = {'hub': hub.name, 
            'state': hub.state}
settings['experiment'] = {'name': 'gaussian',
                          'params': {'sigma_x': 0.3, 
                                     'sigma_y': 0.8, 
                                     'x0': 0.3, 
                                     'y0': 0.6, 
                                     'noise':0, 
                                     'delay': 0.5} }
settings['sampler'] = {'name': 'Online',
                       'params': {'Presampled points': 20, 
                                  'Iterations': 5,
                                  'Batch size': 5,
                                  'Tolerance': 0.01} }
settings['model'] = {'name': 'Nonlinear',
                     'params': {'Leash': 0.25} }

settings['process'] = {'type': 'model', 'end at': 'Best point', 'callback': None}

In [11]:
import requests
requests.post('http://127.0.0.1:5000/run', json=settings)

INFO:werkzeug:127.0.0.1 - - [14/Mar/2019 17:48:55] "POST /run HTTP/1.1" 200 -


Actuating to {'X': 0.301233320439634, 'Y': 0.6064074099435781}
Actuating to {'X': 0.9041603247758776, 'Y': 0.4073759439553384}
Actuating to {'X': 0.36116187193046667, 'Y': 0.9571807173191608}
Actuating to {'X': 0.1779535247626478, 'Y': 0.4806483601322037}
Actuating to

<Response [200]>

 {'X': 0.1194912005164882, 'Y': 0.22683339309850337}
Actuating to {'X': 0.3507746738990313, 'Y': 0.27430274869929083}
Actuating to {'X': 0.4776350437235598, 'Y': 0.21263419072562573}
Actuating to {'X': 0.40302530204921017, 'Y': 0.8421382181951389}
Actuating to {'X': 0.6795838213208593, 'Y': 0.5418248920925846}
Actuating to {'X': 0.12862842226993076, 'Y': 0.65618612770829}
Actuating to {'X': 0.906361213301861, 'Y': 0.3359821457267923}
Actuating to {'X': 0.8223258600264983, 'Y': 0.1317107195667665}
Actuating to {'X': 0.36449963362952664, 'Y': 0.644575048400043}
Actuating to {'X': 0.15664216486104976, 'Y': 0.855396043559703}
Actuating to {'X': 0.6126022122085847, 'Y': 0.1169388122333822}
Actuating to {'X': 0.851636588877546, 'Y': 0.9837574420472629}
Actuating to {'X': 0.2133660911764962, 'Y': 0.692982868789233}
Actuating to {'X': 0.8395109327063336, 'Y': 0.9998511971998124}
Actuating to {'X': 0.9990189786050344, 'Y': 0.3767786315855468}
Actuating to {'X': 0.6681132197432502, 'Y': 0.097664

INFO:root:Optimization complete!


Actuating to {'X': 0.30000000419047324, 'Y': 0.5999999985073314}


Our targeted Gaussian model converged with fewer iterations and less computation time per training cycle than the more general Gaussian process model!

# Visualizing results
The results of the last two experiments can be accessed through the API. First, we find the process ID corresponding to the run we want to look at; these are listed in chronological order at the following endpoint:

In [None]:
import requests
samplers = requests.get('http://127.0.0.1:5000/hubs/hub/samplers').json()

Now let's plot the data acquired by the first sampler:

In [None]:
requests.get('http://127.0.0.1:5000/hubs/hub/samplers/%s/plot/data'%samplers[0])

Since we specified a model to fit in the second process, we can plot not only the raw data but also the modeled surface through the following endpoint:

In [None]:
requests.get('http://127.0.0.1:5000/hubs/hub/samplers/%s/plot/model'%samplers[1])

Notice that the parameter space is resolved with decent accuracy and high resolution in the second case, despite the fact that we didn't acquire as much experimental data! If you know the functional form of your experiment's parameter space, you should write a Model and use it to generate points with the Online sampler; if not, you can always use a black-box modeler like GaussianProcess.