# Find the starting lattice parameters using MILK

Until now in this tutorial we have focused on a general problem.
Now we will apply what we have done to a Rietveld analysis.

The key concepts you should gain from this page:

 * How to create a surrogate model of an expensive cost function and use the surrogate model to guide the optimization.
 
**Note, you are going to need the data and ``template.par`` from the [MILK XRD tutorial](https://github.com/lanl/MILK/wiki/XRD-Tutorial).
If running from the notebook, you will need this data directory and parameter file in the same location as you run the notebook.**

## Define cost function using MILK

In our prior problems, we defined a cost function to optimize which was a simple Python function.
For using a program like MAUD which writes files to disk we need to make sure each optimizer has its own directory set up for itself.
Otherwise different optimizers would be overwriting each other in the same directory.

Therefore, we are going to create a slightly more complicated cost function using the ``AbstractFunction`` class from Mystic.
The analagous function to our prior cost functions is ``CostFunction.function`` below.
Our ``CostFunction.function`` calls ``CostFunction.initialize`` and ``CostFunction.cost_function``.

MILK will want to setup a directory with files that are read by MAUD.
This is only called once at the beginning.
Therefore, there is an ``CostFunciton.initialize`` function which is called once at the very beginning of an optimization.
This ``initialize`` simply sets up the directory (e.g. copy needed files).
We name the directory after the processor's name.

Then there is the ``CostFunction.cost_function`` which sets the lattice parameters using MILK, does a refinement step, and returns the R-factor.
This is called for every parameter set in the optimization.

In [1]:
import MILK
import multiprocess
import os
import shutil
import time
from mystic import models
from spotlight import filesystem

class CostFunction(models.AbstractFunction):

    def __init__(self, ndim=None):
        super().__init__(ndim=ndim)
        self.initialized = False

    def reset(self):
        self.initialized = False

    def function(self, p):
        if not self.initialized:
            self.initialize()
            self.initialized = True
        return self.cost_function(p)

    def initialize(self):

        # get editor and maudText
        self.editor = MILK.parameterEditor.editor()
        self.editor.parseConfig(config)
        self.maudText = MILK.maud.maudText()
        self.maudText.parseConfig(config)

        # set data file
        data_files = [["PWDR_ff_004768_sum_norm_dc_0_tif_A352_fxye_Bank_1.chi"]]

        # set run dir based on process name
        self.editor.run_dirs = f"tmp_{multiprocess.current_process().name}"
        self.maudText.run_dirs = self.editor.run_dirs

        # create run dir
        if os.path.exists(self.editor.run_dirs):
            shutil.rmtree(self.editor.run_dirs)
        filesystem.mkdir(self.editor.run_dirs)
        filesystem.cp([self.editor.ifile], dest=self.editor.run_dirs)

        # set initial phase fractions
        self.editor.set_val(key='_pd_phase_atom_', value='0.33')
        self.editor.free(key='_pd_phase_atom_', wild=[0])

    def cost_function(self, p):

        t0 = time.time()

        # set lattice parameters as values from optimization
        self.editor.set_val(key='cell_length_a', sobj="alpha", value=str(p[0]))
        self.editor.set_val(key='cell_length_c', sobj="alpha", value=str(p[1]))
        self.editor.set_val(key='cell_length_a', sobj="steel", value=str(p[2]))
        self.editor.set_val(key='cell_length_c', sobj="beta", value=str(p[3]))

        # refine
        self.maudText.refinement(itr='1', ifile=self.editor.ifile, ofile=self.editor.ofile, simple_call=True)

        # get the statistic to return to the optimizer
        self.editor.get_val(key="_refine_ls_wR_factor_all")
        stat = float(self.editor.value[0])

        print(f"Our R-factor is {stat} and it took {time.time() - t0}s to compute")

        return stat

## Configure MILK

We will also need to configure MILK for our refinement.
The following is an example of setting MILK to use the [MILK XRD Tutorial dataset](https://github.com/lanl/MILK/wiki/XRD-Tutorial).
For more documentation on the settings of these parameters, see the MILK tutorial.

In [2]:
# set dataset configuration
dataset_config = {"2theta": [3.4, 11.1],
                  "data_dir": "data/",
                  "data_ext": ".chi",
                  "data_group_size": 1,
                  "template_name": "template.par"}

# set MILK configuration
config = {"folders": {"work_dir": "",
                      "run_dirs": "ack(wild)",
                      "sub_dir": "",
                      "wild": [0],
                      "wild_range": [[]],
          },
          "compute": {"maud_path": "",
                      "n_maud": 1,
                      "java_opt": "Xmx8G",
                      "clean_old_step_data": False,
                      "cur_step": 1,
                      "log_consol": False,
          },
          "ins": {"riet_analysis_file": "template.par",
                  "riet_analysis_fileToSave": "output.par",
                  "section_title": "Ti64_test_data",
                  "analysis_iteration_number": 4,
                  "LCLS2_detector_config_file": "",
                  "LCLS2_Cspad0_original_image": "",
                  "LCLS2_Cspad0_dark_image": "",
                  "output_plot2D_filename": "plot_",
                  "output_summed_data_filename": "all_spectra",
                  "maud_output_plot_filename": "plot1d_",
                  "output_PF_filename": "PF_",
                  "output_PF": "",
                  "append_simple_result_to": "tmp_simple_results.txt",
                  "append_result_to": "tmp_results.txt",
                  "import_phase": [],
                  "ins_file_name": "MAUDText.ins",
                  "maud_remove_all_datafiles": True,
                  "verbose": 0,
          },
          "interface": {"verbose": 0,
          },
}

## Finding the starting lattice parameters of a Rietveld analysis by optimizing a surrogate model

In the prior page of the tutorial, we went over how to find the global minimum of using a surrogate model.
Recall, that we constructed an interpolated model using the ``InterpModel`` and then minimized until our surrogate and cost function reached <0.001 difference.

Applying this to Rietveld analysis, the global minimum is the best fit parameters for our data.
Below, we follow the same pattern using ``InterpModel`` to create an interpolated surrogate model.
In the ``while`` loop we continuously update the surrogate model, find the minimum of the surrogate model, then evaluate that minimum using our actual cost function (``CostFunction.function`` above) using MILK, and continue until we are below a threshold of agreement between the surrogate model and the actual cost function.

Recall, we introduced this approach because we were able to find a minimum with fewer calls to the actual cost function.
In this case, a call to MILK may take several seconds and in an optimization algorithm calling that hundreds or thousands of times, the time can be hours.

**Note, in the script below, update the ``template.par`` path to where you have created it in the MILK tutorial.**

In [3]:
from mystic import tools
from mystic.solvers import diffev2
from mystic.math.legacydata import dataset, datapoint
from spotlight.bridge.ouq_models import WrapModel
from spotlight.bridge.ouq_models import InterpModel

# set random seed so we can reproduce results
tools.random_seed(0)

# set bounds for parameters to be +/-5%
target = [2.9306538, 4.6817646, 3.6026807, 3.233392]
lower_bounds = [x * 0.95 for x in target]
upper_bounds = [x * 1.05 for x in target]

# remove prior cached results
if os.path.exists("tmp"):
    shutil.rmtree("tmp")

# copy template file to current working dir
# note copy the template.par file from whatever you have created it
filesystem.cp(["/Users/cmbiwer/src/spotlight/docs/notebooks/template.par"], dest=".")
        
# generate a sampled dataset for the model
truth = WrapModel("tmp", CostFunction(4), nx=4, ny=None, cached=False)
bounds = list(zip(lower_bounds, upper_bounds))
data = truth.sample(bounds, pts=[2, 1, 1, 1])

# create surrogate model
surrogate = InterpModel("surrogate", nx=4, ny=None, data=truth, smooth=0.0, noise=0.0,
                        method="thin_plate", extrap=False)

# go until error < 1e-3
error = float("inf")
sign = 1.0
while error > 1e-3:

    # fit surrogate data
    surrogate.fit(data=data)

    # find minimum/maximum of surrogate
    results = diffev2(lambda x: sign * surrogate(x), bounds, npop=20,
                      bounds=bounds, gtol=500, full_output=True)

    # get minimum/maximum of actual expensive model
    xnew = results[0].tolist()
    ynew = truth(xnew)

    # compute error which is actual model value - surrogate model value
    ysur = results[1]
    error = abs(ynew - ysur)

    # print statements
    print("truth", xnew, ynew)
    print("surrogate", xnew, ysur)
    print("error", ynew - ysur, error)
    print("data", len(data))

    # add latest evaulated point with actual expensive model to be used by surrogate in fitting
    pt = datapoint(xnew, value=ynew)
    data.append(pt)

# print the best parameters
print(f"The best solution is {xnew} with Rwp {ynew}")
print(f"The reference solutions is {target}")
ratios = [x / y for x, y in zip(target, xnew)]
print(f"The ratios of to the reference values are {ratios}")

File /Users/cmbiwer/src/spotlight/docs/notebooks/template.par already exists!
Our R-factor is 0.5236905 and it took 6.017719984054565s to compute
Our R-factor is 0.5224805 and it took 5.949759006500244s to compute
Optimization terminated successfully.
         Current function value: 0.355938
         Iterations: 519
         Function evaluations: 10400
Our R-factor is 0.13660918 and it took 5.832312107086182s to compute
truth [2.9308146307795577, 4.681764598938253, 3.6026807000821934, 3.233392000478879] 0.13660918
surrogate [2.9308146307795577, 4.681764598938253, 3.6026807000821934, 3.233392000478879] 0.35593774110718646
error -0.21932856110718646 0.21932856110718646
data 2
Optimization terminated successfully.
         Current function value: 0.136609
         Iterations: 516
         Function evaluations: 10340
Our R-factor is 0.13658616 and it took 6.940546989440918s to compute
truth [2.930790877829773, 4.681764598967135, 3.6026807000709002, 3.2333920002603103] 0.13658616
surrogate

## An ensemble using only the cost function

As state above, the call to MILK may take several seconds.
Below, we present an example of using an ensemble of optimizers in parallel with MILK to find the global minimum.
**Note, this will take awhile. It depends on the number of processors available on your machine.**

In [4]:
from mystic.solvers import LatticeSolver
from mystic.solvers import NelderMeadSimplexSolver
from mystic.termination import VTR
from pathos.pools import ProcessPool as Pool

# set the ranges
target = [2.9306538, 4.6817646, 3.6026807, 3.233392]
lower_bounds = [x * 0.95 for x in target]
upper_bounds = [x * 1.05 for x in target]

# set random seed so we can reproduce results
tools.random_seed(0)

# create a solver
solver = LatticeSolver(4, 8)

# set multi-processing pool
solver.SetMapper(Pool().map)

# since we have an search solver
# we specify what optimization algorithm to use within the search
# we tell the optimizer to not go more than 50 evaluations of our cost function
subsolver = NelderMeadSimplexSolver(4)
subsolver.SetEvaluationLimits(50, 50)
solver.SetNestedSolver(subsolver)

# set the range to search for all parameters
solver.SetStrictRanges(lower_bounds, upper_bounds)

# find the minimum
solver.Solve(CostFunction(4), VTR())

# print the best parameters
print(f"The best solution is {solver.bestSolution} with Rwp {solver.bestEnergy}")
print(f"The reference solutions is {target}")
ratios = [x / y for x, y in zip(target, solver.bestSolution)]
print(f"The ratios of to the reference values are {ratios}")

Our R-factor is 0.62775004 and it took 25.128027200698853s to compute
Our R-factor is 0.62282383 and it took 25.271618843078613s to compute
Our R-factor is 0.6246426 and it took 25.542072057724s to compute
Our R-factor is 0.62601537 and it took 25.940300941467285s to compute
Our R-factor is 0.62909144 and it took 25.969143867492676s to compute
Our R-factor is 0.62771976 and it took 25.905327081680298s to compute
Our R-factor is 0.6299428 and it took 26.132752895355225s to compute
Our R-factor is 0.62489647 and it took 26.110680103302002s to compute
Our R-factor is 0.6315842 and it took 28.788614988327026s to compute
Our R-factor is 0.6268012 and it took 28.909955739974976s to compute
Our R-factor is 0.62726223 and it took 28.60523295402527s to compute
Our R-factor is 0.62239444 and it took 29.202092170715332s to compute
Our R-factor is 0.6244521 and it took 29.22259497642517s to compute
Our R-factor is 0.62854445 and it took 28.962391138076782s to compute
Our R-factor is 0.62731254 and

Our R-factor is 0.5943767 and it took 29.738141298294067s to compute
Our R-factor is 0.6006289 and it took 30.29346990585327s to compute
Our R-factor is 0.6260417 and it took 28.366052865982056s to compute
Our R-factor is 0.6038816 and it took 28.739644765853882s to compute
Our R-factor is 0.592987 and it took 27.977646827697754s to compute
Our R-factor is 0.6127648 and it took 28.581695079803467s to compute
Our R-factor is 0.6306583 and it took 28.793065071105957s to compute
Our R-factor is 0.5841003 and it took 27.96793484687805s to compute
Our R-factor is 0.43987775 and it took 28.77456521987915s to compute
Our R-factor is 0.5238677 and it took 28.796834707260132s to compute
Our R-factor is 0.57103854 and it took 28.801408052444458s to compute
Our R-factor is 0.529194 and it took 29.039142847061157s to compute
Our R-factor is 0.6119143 and it took 28.593602180480957s to compute
Our R-factor is 0.4529906 and it took 28.55851101875305s to compute
Our R-factor is 0.58588296 and it took

Our R-factor is 0.48298502 and it took 29.03914189338684s to compute
Our R-factor is 0.5058743 and it took 28.561774015426636s to compute
Our R-factor is 0.4797361 and it took 28.442381143569946s to compute
Our R-factor is 0.27507955 and it took 29.829578161239624s to compute
Our R-factor is 0.44775927 and it took 30.263264179229736s to compute
Our R-factor is 0.5237698 and it took 29.983269214630127s to compute
Our R-factor is 0.3520383 and it took 29.902167797088623s to compute
Our R-factor is 0.56512624 and it took 29.882758855819702s to compute
Our R-factor is 0.5080494 and it took 29.926796197891235s to compute
Our R-factor is 0.3996992 and it took 29.278870105743408s to compute
Our R-factor is 0.5810925 and it took 30.11130428314209s to compute
Our R-factor is 0.44461071 and it took 28.778246879577637s to compute
Our R-factor is 0.5273767 and it took 29.023586988449097s to compute
Our R-factor is 0.32259303 and it took 28.993775129318237s to compute
Our R-factor is 0.42134672 and

Our R-factor is 0.15503924 and it took 6.111812114715576s to compute
The best solution is [2.93097711 4.68672942 3.60531948 3.30091662] with Rwp 0.15503924
The reference solutions is [2.9306538, 4.6817646, 3.6026807, 3.233392]
The ratios of to the reference values are [0.9998896937429909, 0.9989406648886187, 0.9992680865168723, 0.979543676635183]


This concludes the whirlwind tutorial of applying ensembles of optimizers to Rietveld analysis.