# A verbose walk thru of gaussian-process-regression emulation for global optimization

In this notebook, we go thru much pain to demonstrate how GPR can be used for so-called "bayesian optimization" (BO) with a very naive in-filling strategy.  Many of the steps and operations in this notebook are done only for the purposes of hoping to explain how GPR emulation works in the context of global evolutionary optimization.

We will be using the well-known hosaki function for this analysis

In [None]:
import os
import shutil
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pyemu

First we need to prepare for the GPR experiments.  Let's get the original model directory and copy it

In [None]:
org_d = os.path.join("..","autotest","utils","hosaki_template")
assert os.path.exists(org_d)

In [None]:
t_d = "hosaki_template"
if os.path.exists(t_d):
    shutil.rmtree(t_d)
shutil.copytree(org_d,t_d)

In [None]:
pst = pyemu.Pst(os.path.join(t_d,"pest.pst"))

In [None]:
par = pst.parameter_data
par

The hosaki function only has two decision variables - its very simple.  But in general, GPR-based emulation does not scale to extreme dimensions (like those seem in ensemble-based DA), but well within the space of global (multi-objective) optimization, so its an obvious pairing with pestpp-mou...

The hosaki function only has 1 objective and no constraints and our goal is to minimize this single objective:

In [None]:
pst.observation_data

First, lets sweep over decision variable space so that we can map the "true" objective function surface. You would never do this in practice but it helps us understand the GPR emulation process:

In [None]:
pvals = []
#feel free to change this to a larger number (like 20-30) to get a higher resolution map
# we keep it small so that the pyemu testing on github is faster...
sweep_steps = 15
for p1 in np.linspace(par.parlbnd.iloc[0],par.parubnd.iloc[0],sweep_steps):
    for p2 in np.linspace(par.parlbnd.iloc[0],par.parubnd.iloc[0],sweep_steps):
        pvals.append([p1,p2])
pvals = pd.DataFrame(pvals,columns=pst.par_names)
pvals.to_csv(os.path.join(t_d,"sweep_in.csv"))
pst.pestpp_options["ies_par_en"] = "sweep_in.csv"
pst.control_data.noptmax = -1
sweep_d = os.path.join("hosaki_sweep")
pst.pestpp_options["ies_include_base"] = False

pst.write(os.path.join(t_d,"pest.pst"))
port = 5544
num_workers = 30

pyemu.os_utils.start_workers(t_d,"pestpp-ies","pest.pst",
                             num_workers=num_workers,
                             master_dir=sweep_d,worker_root='.',
                             port=port)


And lets load up the sweep results so we can plot:

In [None]:
sweep_pe = pd.read_csv(os.path.join(sweep_d,"pest.0.par.csv"),index_col=0)
sweep_oe = pd.read_csv(os.path.join(sweep_d,"pest.0.obs.csv"),index_col=0)
sweep_x = sweep_pe.loc[:,pst.par_names[0]].values.reshape(sweep_steps,sweep_steps)
sweep_y = sweep_pe.loc[:,pst.par_names[1]].values.reshape(sweep_steps,sweep_steps)
sweep_z = sweep_oe.loc[:,pst.obs_names[0]].values.reshape(sweep_steps,sweep_steps)

In [None]:
def get_obj_map(ax,_sweep_x,_sweep_y,_sweep_z,label="objective function",
                levels=[-2,-1,0,0.5],vmin=-2,vmax=0.5,cmap="magma"): 
    """a simple function to plot an objective function surface"""
    cb = ax.pcolormesh(_sweep_x,_sweep_y,_sweep_z,vmin=vmin,vmax=vmax,cmap=cmap)
    plt.colorbar(cb,ax=ax,label=label)
    ax.contour(_sweep_x,_sweep_y,_sweep_z,levels=levels,colors="w")
    
    ax.set_aspect("equal")
    return ax

In [None]:
fig,ax = plt.subplots(1,1,figsize=(6,5))
_ = get_obj_map(ax,sweep_x,sweep_y,sweep_z)
ax.set_title("truth",loc="left")
plt.show()
plt.close(fig)

So that is the true objective function surface with the global minimum at (4,2) and a local minimum at (1,2).

The GPR BO workflow starts by first evaluating an initial population with the complex and expensive to run "model", which here is just the simple hosaki function.  In practice, this will be an expensive and complex process based model.  To make this demo more interesting, we will limit the decision variable search space to not include the global minimum (and really 10 members in the initial population is pretty crazy small - just for demo!).  If we dont limit the search space and use a larger initial population, the GPR process nails it in the first go...

In [None]:
m_d = "hosaki_model_master"
if os.path.exists(m_d):
    shutil.rmtree(m_d)
pst.pestpp_options["mou_population_size"] = 10
pst.control_data.noptmax = -1
par = pst.parameter_data
par.loc[pst.par_names[0],"parubnd"] = 3
par.loc[pst.par_names[0],"parval1"] = 1.5
pst.write(os.path.join(t_d,"pest.pst"))
num_workers = 10
pyemu.os_utils.start_workers(t_d,"pestpp-mou","pest.pst",
                                 num_workers=num_workers,
                                 master_dir=m_d,worker_root='.',
                                 port=port)


Now load the initial training input-output pairs

In [None]:
training_dvpop_fname = os.path.join(m_d,"pest.0.dv_pop.csv")
training_opop_fname = os.path.join(m_d,"pest.0.obs_pop.csv")

and we can plot them to visual where we have training points on the actual true surface

In [None]:
training_dvpop = pd.read_csv(training_dvpop_fname,index_col=0)
training_opop = pd.read_csv(training_opop_fname,index_col=0)
fig,ax = plt.subplots(1,1,figsize=(6,5))
get_obj_map(ax,sweep_x,sweep_y,sweep_z)
ax.scatter(training_dvpop.loc[:,pst.par_names[0]],training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
ax.set_title("true objective function surface with training points")
plt.show()
plt.close(fig)
           

Those white triangles are the places where we have actually run the model - in practice we will never have the colormap or the contours - those are just to help us understand what is happening...and remember: that is a purposefully sparse training dataset!

This function is where the magic happens: we use the decision variable population and resulting output/observation population to setup the GPR emulation process.  Note that `pyemu.helpers.prep_for_gpr()` function setups and training a GPR emulator for each non-zero weighted observation in the control file... 

In [None]:
def prep_for_gpr(dvpops,opops,gpr_t_d,noptmax=-1):
    # this helper does the heavy lifting.  We are including the optional GPR emulated standard deviations
    # as "observations".  These quantities are the standard deviation of the emulated objective function value
    # with respect to the GPR process - this is a unique and very powerful aspect of GPR vs other
    #data-driven techniques.  We will use these obs to visualize the uncertainty in the 
    # GPR objectve function surface later...
    pyemu.helpers.prep_for_gpr(os.path.join(t_d,"pest.pst"),dvpops,opops,gpr_t_d=gpr_t_d,plot_fits=True,
                               include_emulated_std_obs=True)
    # now load the newly created gpr-based pst file:
    gpst = pyemu.Pst(os.path.join(gpr_t_d,"pest.pst"))
    #and copy the sweep input file that lets us run the full response surface
    # in practice, you dont need this either...
    shutil.copy2(os.path.join(sweep_d,"sweep_in.csv"),os.path.join(gpr_t_d,"sweep_in.csv"))
    # some bits and bobs:
    gpst.control_data.noptmax = noptmax
    gpst.pestpp_options["ies_include_base"] = False
    gpst.pestpp_options["mou_save_population_every"] = 1
    gpst.pestpp_options.pop("mou_dv_population_file",None)
    par = gpst.parameter_data
    par.loc[:,"parlbnd"] = 0.0
    par.loc[:,"parubnd"] = 5.0
    par.loc[:,"parval1"] = 2.5    
    
    gpst.write(os.path.join(gpr_t_d,"pest.pst"),version=2)
    return gpst

One important consideration here: In this simple demo, we only have one objective.  That means we have to be careful not to drive the emulator to a single best-fit solution (which we do by using only a few generations when we run mou), but we also need to sure that we start the emulator-driven mou runs with a fully dispersed initial population.  But!  In a multiobjective run, the final generation for mou is naturally disperesed because of the pareto frontier search, so in the multiobjective setting, we probably want to start the emulated mou run with the existing population to keep us from having to research all of decision variable space for the nondominated solutions.  This isnt a hard-and-fast rule, but it seems to be generally applicable

The GPR helper in pyemu accepts a list of input and output filenames to make it easier to repeatedly retrain the GPR emulators:

In [None]:
training_dvpop_fnames = [training_dvpop_fname]
training_opop_fnames = [training_opop_fname]

In [None]:
gpr_t_d = t_d + "_gpr"
gpst = prep_for_gpr(training_dvpop_fnames,training_opop_fnames,gpr_t_d)

In [None]:
gpst.observation_data

We see the original "sim" observation (which is the objective function we are trying to minimize), but we also see the optional GPR-estimated standard devivation

Let's see what was created in the new GPR-based template directory:

In [None]:
os.listdir(gpr_t_d)

two very important files in that dir:  the new forward run python script and a series of pickle files, one per objective and one per active constraint in the optimization problem.  Essentially, we need to build a GPR-based emulator for each output that is relavent to the optimization problem.  We also dont want to rebuild these emulators everytime we run the model, so we store the trained GPR emulators in pickle files and load them up as needed:

In [None]:
with open(os.path.join(gpr_t_d,"forward_run.py"),'r') as f:
    for line in f:
        print(line,end="")

So we simply loop over all relavent model outputs that have a GPR emulator and "emulate" the value of the model output given the current decision variable values.  easy as!

Now, just for learning, lets sweep over decision variable space but evaluate the GPR emulated objective function value.  You would never do this in practice, but it can be informative:

In [None]:
gpr_sweep_d = sweep_d+"_gpr"
num_workers = 30
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-ies","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_sweep_d,worker_root='.',
                             port=port)


Just a helper function to visual the results of the GPR (re-)training:

In [None]:
def plot_gpr_sweep_results(_gpr_sweep_d,_training_dvpop):
    #load the gpr sweep results to viz the emulated objective function surface
    sweep_gpr_pe = pd.read_csv(os.path.join(_gpr_sweep_d,"pest.0.par.csv"),index_col=0)
    sweep_gpr_oe = pd.read_csv(os.path.join(_gpr_sweep_d,"pest.0.obs.csv"),index_col=0)
    gpr_sweep_x = sweep_gpr_pe.loc[:,gpst.par_names[0]].values.reshape(sweep_steps,sweep_steps)
    gpr_sweep_y = sweep_gpr_pe.loc[:,gpst.par_names[1]].values.reshape(sweep_steps,sweep_steps)
    gpr_sweep_z = sweep_gpr_oe.loc[:,gpst.obs_names[0]].values.reshape(sweep_steps,sweep_steps)
    gpr_sweep_stdev_z = sweep_gpr_oe.loc[:,gpst.obs_names[1]].values.reshape(sweep_steps,sweep_steps)
    
    # plot it up:
    fig, axes = plt.subplots(2,2,figsize=(10,8))
    axes = axes.flatten()
    get_obj_map(axes[0],sweep_x,sweep_y,sweep_z)
    get_obj_map(axes[1],gpr_sweep_x,gpr_sweep_y,gpr_sweep_z)
    axes[1].scatter(_training_dvpop.loc[:,pst.par_names[0]],_training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
    diff = sweep_z-gpr_sweep_z
    amax = np.abs(diff).max()
    get_obj_map(axes[2],gpr_sweep_x,gpr_sweep_y,sweep_z-gpr_sweep_z,label="truth minus emulated",levels=None,vmin=-amax,vmax=amax,cmap="bwr")
    get_obj_map(axes[3],gpr_sweep_x,gpr_sweep_y,gpr_sweep_stdev_z,label="GPR stdev",levels=None,
                vmin=gpr_sweep_stdev_z.min(),vmax=gpr_sweep_stdev_z.max(),cmap="jet")
    axes[2].scatter(training_dvpop.loc[:,pst.par_names[0]],training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
    
    axes[0].set_title("truth",loc="left")
    axes[1].set_title("emulated with training points",loc="left")
    axes[2].set_title("difference with training points",loc="left")
    axes[3].set_title("GPR standard deviation",loc="left")
    plt.show()

In [None]:
plot_gpr_sweep_results(gpr_sweep_d,training_dvpop)

OK!  now we can see whats happening - the emulated objective function surface is strongly controlled by the location of the training data points, and, is this case, its not a good representation of the truth surface...yet...it should also be clear that the uncertainty in the GPR emulation is lowest near the training points but is highly uncertain as we move away from the training - just like geostatistics!

But now lets run pestpp-mou on the GPR emulated model.  This is usually quite fast, especially if the process model that is being emulated takes more than a few mins to run...

In [None]:
gpst.control_data.noptmax = 8
gpst.pestpp_options["mou_population_size"] = 20
gpst.write(os.path.join(gpr_t_d,"pest.pst"))
gpr_m_d = gpr_t_d.replace("template","master")
num_workers = 20
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-mou","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_m_d,worker_root='.',
                             port=port)

Now lets plot how the pestpp-mou population evolves across the emulated surface, but also plot it on the true surface just to help us understand what is happening

In [None]:
def plot_gpr_mou_results(_gpr_m_d,_gpr_sweep_d):
    sweep_gpr_pe = pd.read_csv(os.path.join(_gpr_sweep_d,"pest.0.par.csv"),index_col=0)
    sweep_gpr_oe = pd.read_csv(os.path.join(_gpr_sweep_d,"pest.0.obs.csv"),index_col=0)
    gpr_sweep_x = sweep_gpr_pe.loc[:,gpst.par_names[0]].values.reshape(sweep_steps,sweep_steps)
    gpr_sweep_y = sweep_gpr_pe.loc[:,gpst.par_names[1]].values.reshape(sweep_steps,sweep_steps)
    gpr_sweep_z = sweep_gpr_oe.loc[:,gpst.obs_names[0]].values.reshape(sweep_steps,sweep_steps)
    gpr_sweep_stdev_z = sweep_gpr_oe.loc[:,gpst.obs_names[1]].values.reshape(sweep_steps,sweep_steps)
    
    gpr_dvpops = [os.path.join(_gpr_m_d,f) for f in os.listdir(_gpr_m_d) if len(f.split('.')) == 4 and f.endswith("dv_pop.csv") and "archive" not in f]
    gpr_dvpops_itr = [int(f.split(".")[1]) for f in gpr_dvpops]
    gpr_dvpops = {itr:pd.read_csv(f,index_col=0) for itr,f in zip(gpr_dvpops_itr,gpr_dvpops)}
    for itr in range(max(gpr_dvpops_itr)):
        fig,axes = plt.subplots(1,2,figsize=(10,4))
        ax = axes[0]
        get_obj_map(ax,sweep_x,sweep_y,sweep_z)
        ax.scatter(gpr_dvpops[itr].loc[:,gpst.par_names[0]],gpr_dvpops[itr].loc[:,gpst.par_names[1]],marker='.',c='w',s=10)
        ax.set_title("truth generation {0}".format(itr),loc="left")
        ax = axes[1]
        get_obj_map(ax,gpr_sweep_x,gpr_sweep_y,gpr_sweep_z)
        ax.scatter(gpr_dvpops[itr].loc[:,gpst.par_names[0]],gpr_dvpops[itr].loc[:,gpst.par_names[1]],marker='.',c='w',s=10)
        ax.set_title("emulated generation {0}".format(itr),loc="left")
        plt.show()
        plt.close(fig)

In [None]:
plot_gpr_mou_results(gpr_m_d,gpr_sweep_d)

So just what you expected?  Essentially pestpp-mou converged to the minimum of the objective function we gave it, which is the emulated objective function...at this stage the emulated objective function is a poor represetation of the truth objective function....

Now this where some more cleverness happens:  Lets take that last emulated decision variable population and actually run it thru the complex "model" (which in this case is just the hosaki function...).  This is so that we can "in-fill" our GPR emulator with this new points in decision variable space.  In practice, a lot more cleverness needs to happen to actually decide which points, but for this lil demo, it works...

In [None]:
gpr_dvpops = [os.path.join(gpr_m_d,f) for f in os.listdir(gpr_m_d) if len(f.split('.')) == 4 and f.endswith("dv_pop.csv") and "archive" not in f]
gpr_dvpops_itr = [int(f.split(".")[1]) for f in gpr_dvpops]
gpr_dvpops = {itr:pd.read_csv(f,index_col=0) for itr,f in zip(gpr_dvpops_itr,gpr_dvpops)}
gpr_dvpops[max(gpr_dvpops_itr)].to_csv(os.path.join(t_d,"retrain_1_dvpop.csv"))
pst.pestpp_options["mou_dv_population_file"] = "retrain_1_dvpop.csv"
pst.control_data.noptmax = -1
pst.write(os.path.join(t_d,"pest.pst"))


In [None]:
m_d += "_1"
num_workers = 10
pyemu.os_utils.start_workers(t_d,"pestpp-mou","pest.pst",
                                 num_workers=num_workers,
                                 master_dir=m_d,worker_root='.',
                                 port=port)

Now load the newly evaluated training points:

In [None]:
training_dvpop_fname1 = os.path.join(m_d,"pest.0.dv_pop.csv")
training_opop_fname1 = os.path.join(m_d,"pest.0.obs_pop.csv")

And combine these new points with the points we already had (the original few places where we ran the model at the beginning):

In [None]:
training_dvpop1 = pd.read_csv(training_dvpop_fname1,index_col=0)
training_opop1 = pd.read_csv(training_opop_fname1,index_col=0)
training_dvpop = pd.concat([training_dvpop,training_dvpop1])
training_opop = pd.concat([training_opop,training_opop1])

fig,ax = plt.subplots(1,1,figsize=(6,5))
get_obj_map(ax,sweep_x,sweep_y,sweep_z)
ax.scatter(training_dvpop.loc[:,pst.par_names[0]],training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
plt.show()
plt.close(fig)

See how we have combined the original points and the new points?

Now lets re-do the GPR training process with this combined decision variable and output/observation population

In [None]:
training_dvpop_fnames.append(training_dvpop_fname1)
training_opop_fnames.append(training_opop_fname1)
gpr_t_d = t_d + "_gpr1"
gpst = prep_for_gpr(training_dvpop_fnames,training_opop_fnames,gpr_t_d)

Ok, now let's also re-sweep over the emulated surface to see how much better our approximation to the true objective function surface (remember dont do this in practice!  we are just doing it here to help build understanding):

In [None]:
gpr_sweep_d = sweep_d+"_gpr1"
num_workers = 30
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-ies","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_sweep_d,worker_root='.',
                             port=port)

And now visualize these new surfaces with the current training dataset:

In [None]:
plot_gpr_sweep_results(gpr_sweep_d,training_dvpop)

Comparing this to the original emulated surface, we can see that our emulated objective function surface has improved dramatically!

Now let's run pestpp-mou on this improved emulated model and visualize how pestpp-mou navigates this new emulated surface:

In [None]:
gpst.control_data.noptmax = 8
gpst.pestpp_options["mou_population_size"] = 20
gpst.write(os.path.join(gpr_t_d,"pest.pst"))
gpr_m_d = gpr_t_d.replace("template","master")
num_workers = 20
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-mou","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_m_d,worker_root='.',
                             port=port)
plot_gpr_mou_results(gpr_m_d,gpr_sweep_d)

Now lets do this whole process a few times:

In [None]:
i = 2
gpr_t_d = t_d + "_gpr{0}".format(i-1)
gpr_m_d = gpr_t_d.replace("template","master")
gpr_dvpops = [os.path.join(gpr_m_d,f) for f in os.listdir(gpr_m_d) if len(f.split('.')) == 4 and f.endswith("dv_pop.csv") and "archive" not in f]
gpr_dvpops_itr = [int(f.split(".")[1]) for f in gpr_dvpops]
gpr_dvpops = {itr:pd.read_csv(f,index_col=0) for itr,f in zip(gpr_dvpops_itr,gpr_dvpops)}
gpr_dvpops[max(gpr_dvpops_itr)].to_csv(os.path.join(t_d,"retrain_{0}_dvpop.csv".format(i)))
pst.pestpp_options["mou_dv_population_file"] = "retrain_{0}_dvpop.csv".format(i)
pst.control_data.noptmax = -1
pst.write(os.path.join(t_d,"pest.pst"))
m_d = "hosaki_model_{0}".format(i)
num_workers = 10
pyemu.os_utils.start_workers(t_d,"pestpp-mou","pest.pst",
                                 num_workers=num_workers,
                                 master_dir=m_d,worker_root='.',
                                 port=port)

training_dvpop_fnamei = os.path.join(m_d,"pest.0.dv_pop.csv")
training_opop_fnamei = os.path.join(m_d,"pest.0.obs_pop.csv")

training_dvpopi = pd.read_csv(training_dvpop_fnamei,index_col=0)
training_opopi = pd.read_csv(training_opop_fnamei,index_col=0)
training_dvpop = pd.concat([training_dvpop,training_dvpopi])
training_opop = pd.concat([training_opop,training_opopi])

fig,ax = plt.subplots(1,1,figsize=(6,5))
get_obj_map(ax,sweep_x,sweep_y,sweep_z)
ax.scatter(training_dvpop.loc[:,pst.par_names[0]],training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
plt.show()
plt.close(fig)

training_dvpop_fnames.append(training_dvpop_fnamei)
training_opop_fnames.append(training_opop_fnamei)

gpr_t_d = t_d + "_gpr{0}".format(i)
gpr_m_d = gpr_t_d.replace("template","master")
gpst = prep_for_gpr(training_dvpop_fnames,training_opop_fnames,gpr_t_d)
gpr_sweep_d = sweep_d+"_gpr{0}".format(i)
num_workers = 30
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-ies","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_sweep_d,worker_root='.',
                             port=port)

plot_gpr_sweep_results(gpr_sweep_d,training_dvpop)



In [None]:
gpst.control_data.noptmax = 8
gpst.pestpp_options["mou_population_size"] = 20
gpst.write(os.path.join(gpr_t_d,"pest.pst"))

num_workers = 20
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-mou","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_m_d,worker_root='.',
                             port=port)



plot_gpr_mou_results(gpr_m_d,gpr_sweep_d)

In [None]:
i = 3
gpr_t_d = t_d + "_gpr{0}".format(i-1)
gpr_m_d = gpr_t_d.replace("template","master")
gpr_dvpops = [os.path.join(gpr_m_d,f) for f in os.listdir(gpr_m_d) if len(f.split('.')) == 4 and f.endswith("dv_pop.csv") and "archive" not in f]
gpr_dvpops_itr = [int(f.split(".")[1]) for f in gpr_dvpops]
gpr_dvpops = {itr:pd.read_csv(f,index_col=0) for itr,f in zip(gpr_dvpops_itr,gpr_dvpops)}
gpr_dvpops[max(gpr_dvpops_itr)].to_csv(os.path.join(t_d,"retrain_{0}_dvpop.csv".format(i)))
pst.pestpp_options["mou_dv_population_file"] = "retrain_{0}_dvpop.csv".format(i)
pst.control_data.noptmax = -1
pst.write(os.path.join(t_d,"pest.pst"))
m_d = "hosaki_model_{0}".format(i)
num_workers = 10
pyemu.os_utils.start_workers(t_d,"pestpp-mou","pest.pst",
                                 num_workers=num_workers,
                                 master_dir=m_d,worker_root='.',
                                 port=port)

training_dvpop_fnamei = os.path.join(m_d,"pest.0.dv_pop.csv")
training_opop_fnamei = os.path.join(m_d,"pest.0.obs_pop.csv")

training_dvpopi = pd.read_csv(training_dvpop_fnamei,index_col=0)
training_opopi = pd.read_csv(training_opop_fnamei,index_col=0)
training_dvpop = pd.concat([training_dvpop,training_dvpopi])
training_opop = pd.concat([training_opop,training_opopi])

fig,ax = plt.subplots(1,1,figsize=(6,5))
get_obj_map(ax,sweep_x,sweep_y,sweep_z)
ax.scatter(training_dvpop.loc[:,pst.par_names[0]],training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
plt.show()
plt.close(fig)

training_dvpop_fnames.append(training_dvpop_fnamei)
training_opop_fnames.append(training_opop_fnamei)
gpr_t_d = t_d + "_gpr{0}".format(i)
gpr_m_d = gpr_t_d.replace("template","master")

gpst = prep_for_gpr(training_dvpop_fnames,training_opop_fnames,gpr_t_d)
gpr_sweep_d = sweep_d+"_gpr{0}".format(i)
num_workers = 30
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-ies","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_sweep_d,worker_root='.',
                             port=port)

plot_gpr_sweep_results(gpr_sweep_d,training_dvpop)



In [None]:
gpst.control_data.noptmax = 8
gpst.pestpp_options["mou_population_size"] = 20
gpst.write(os.path.join(gpr_t_d,"pest.pst"))

num_workers = 20
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-mou","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_m_d,worker_root='.',
                             port=port)



plot_gpr_mou_results(gpr_m_d,gpr_sweep_d)

And one last time...and this time with more emulator-based pestpp-mou generations to polish things off since we no longer want to keep a dispersed population (since we arent doing any more in-filling and retraining)

In [None]:
i = 4
gpr_t_d = t_d + "_gpr{0}".format(i-1)
gpr_m_d = gpr_t_d.replace("template","master")
gpr_dvpops = [os.path.join(gpr_m_d,f) for f in os.listdir(gpr_m_d) if len(f.split('.')) == 4 and f.endswith("dv_pop.csv") and "archive" not in f]
gpr_dvpops_itr = [int(f.split(".")[1]) for f in gpr_dvpops]
gpr_dvpops = {itr:pd.read_csv(f,index_col=0) for itr,f in zip(gpr_dvpops_itr,gpr_dvpops)}
gpr_dvpops[max(gpr_dvpops_itr)].to_csv(os.path.join(t_d,"retrain_{0}_dvpop.csv".format(i)))
pst.pestpp_options["mou_dv_population_file"] = "retrain_{0}_dvpop.csv".format(i)
pst.control_data.noptmax = -1
pst.write(os.path.join(t_d,"pest.pst"))
m_d = "hosaki_model_{0}".format(i)
num_workers = 10
pyemu.os_utils.start_workers(t_d,"pestpp-mou","pest.pst",
                                 num_workers=num_workers,
                                 master_dir=m_d,worker_root='.',
                                 port=port)

training_dvpop_fnamei = os.path.join(m_d,"pest.0.dv_pop.csv")
training_opop_fnamei = os.path.join(m_d,"pest.0.obs_pop.csv")

training_dvpopi = pd.read_csv(training_dvpop_fnamei,index_col=0)
training_opopi = pd.read_csv(training_opop_fnamei,index_col=0)
training_dvpop = pd.concat([training_dvpop,training_dvpopi])
training_opop = pd.concat([training_opop,training_opopi])

fig,ax = plt.subplots(1,1,figsize=(6,5))
get_obj_map(ax,sweep_x,sweep_y,sweep_z)
ax.scatter(training_dvpop.loc[:,pst.par_names[0]],training_dvpop.loc[:,pst.par_names[1]],marker='^',c='w',s=20)
plt.show()
plt.close(fig)

training_dvpop_fnames.append(training_dvpop_fnamei)
training_opop_fnames.append(training_opop_fnamei)
gpr_t_d = t_d + "_gpr{0}".format(i)
gpr_m_d = gpr_t_d.replace("template","master")
gpst = prep_for_gpr(training_dvpop_fnames,training_opop_fnames,gpr_t_d)
gpr_sweep_d = sweep_d+"_gpr{0}".format(i)
num_workers = 30
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-ies","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_sweep_d,worker_root='.',
                             port=port)

plot_gpr_sweep_results(gpr_sweep_d,training_dvpop)



In [None]:
gpst.control_data.noptmax = 15
gpst.pestpp_options["mou_population_size"] = 20
gpst.write(os.path.join(gpr_t_d,"pest.pst"))

num_workers = 20
pyemu.os_utils.start_workers(gpr_t_d,"pestpp-mou","pest.pst",
                             num_workers=num_workers,
                             master_dir=gpr_m_d,worker_root='.',
                             port=port)

plot_gpr_mou_results(gpr_m_d,gpr_sweep_d)

There you have it - after evaluating only a few populations with the complex process-based model (here that was only the hosaki function), we can effectively optimize a function with the otherwise every expensive global solvers in pestpp-mou.  In general, this approach can reduce the computational demand of mou by 10X to 100X, which is pretty amazing....