# Find the starting lattice parameters using MILK

Until now in this tutorial we have focused on a general problem.
Now we will apply what we have done to a Rietveld analysis.

The key concepts you should gain from this page:

 * How to create a surrogate model of an expensive cost function and use the surrogate model to guide the optimization.

## Define cost function using MILK

In our prior problems, we define a cost function to optimize.
Here, we will do the same using MILK to define a cost function that sets the lattice parameters and returns the R-factor.

This cost function is setup a little different than before.
Our ``CostFunction.function`` is has calls to ``CostFunction.initialize`` and a ``CostFunction.cost_function``.

MILK will want to setup a directory with files that are read by MAUD.
Therefore, there is an ``CostFunciton.initialize`` function which is called once at the very beginning of an optimization.
This ``initialize`` simply sets up the directory (e.g. copy needed files).

Then there is the ``CostFunction.cost_function`` which sets the lattice parameters using MILK, does a refinement step, and returns the R-factor.

In [26]:
import MILK
from mystic import models
import time

class CostFunction(models.AbstractFunction):

    def __init__(self, ndim=None):
        super().__init__(ndim=ndim)
        self.initialized = False

    def reset(self):
        self.initialized = False

    def function(self, p):
        if not self.initialized:
            self.initialize()
            self.initialized = True
        return self.cost_function(p)

    def initialize(self):

        # get editor and maudText
        self.editor = MILK.parameterEditor.editor()
        self.editor.parseConfig(config)
        self.maudText = MILK.maud.maudText()
        self.maudText.parseConfig(config)

        # set data file
        data_files = [["PWDR_ff_004768_sum_norm_dc_0_tif_A352_fxye_Bank_1.chi"]]

        # set run dir based on process name
        self.editor.run_dirs = f"opt_{multiprocess.current_process().name}"
        self.maudText.run_dirs = self.editor.run_dirs

        # create run dir
        if os.path.exists(self.editor.run_dirs):
            shutil.rmtree(self.editor.run_dirs)
        filesystem.mkdir(self.editor.run_dirs)
        filesystem.cp([self.editor.ifile], dest=self.editor.run_dirs)

        # set initial phase fractions
        self.editor.set_val(key='_pd_phase_atom_', value='0.33')
        self.editor.free(key='_pd_phase_atom_', wild=[0])

    def cost_function(self, p):

        t0 = time.time()

        # set lattice parameters as values from optimization
        self.editor.set_val(key='cell_length_a', sobj="alpha", value=str(p[0]))
        self.editor.set_val(key='cell_length_c', sobj="alpha", value=str(p[1]))
        self.editor.set_val(key='cell_length_a', sobj="steel", value=str(p[2]))
        self.editor.set_val(key='cell_length_c', sobj="beta", value=str(p[3]))

        # refine
        self.maudText.refinement(itr='1', ifile=self.editor.ifile, ofile=self.editor.ofile)

        # get the statistic to return to the optimizer
        self.editor.get_val(key="_refine_ls_wR_factor_all")
        stat = float(self.editor.value[0])

        print("STAT", stat, time.time() - t0)

        return stat

## Configure MILK

We will also need to configure MILK for our refinement.
The following is an example of setting MILK to use the [MILK XRD Tutorial dataset](https://github.com/lanl/MILK/wiki/XRD-Tutorial).
For more documentation on the settings of these parameters, see the MILK tutorial.

In [23]:
# set dataset configuration
dataset_config = {"2theta": [3.4, 11.1],
                  "data_dir": "data/",
                  "data_ext": ".chi",
                  "data_group_size": 1,
                  "template_name": "template.par"}

# set MILK configuration
config = {"folders": {"work_dir": "",
                      "run_dirs": "ack(wild)",
                      "sub_dir": "",
                      "wild": [0],
                      "wild_range": [[]],
          },
          "compute": {"maud_path": "",
                      "n_maud": 1,
                      "java_opt": "Xmx8G",
                      "clean_old_step_data": False,
                      "cur_step": 1,
                      "log_consol": False,
          },
          "ins": {"riet_analysis_file": "template.par",
                  "riet_analysis_fileToSave": "output.par",
                  "section_title": "Ti64_test_data",
                  "analysis_iteration_number": 4,
                  "LCLS2_detector_config_file": "",
                  "LCLS2_Cspad0_original_image": "",
                  "LCLS2_Cspad0_dark_image": "",
                  "output_plot2D_filename": "plot_",
                  "output_summed_data_filename": "all_spectra",
                  "maud_output_plot_filename": "plot1d_",
                  "output_PF_filename": "PF_",
                  "output_PF": "",
                  "append_simple_result_to": "tmp_simple_results.txt",
                  "append_result_to": "tmp_results.txt",
                  "import_phase": [],
                  "ins_file_name": "MAUDText.ins",
                  "maud_remove_all_datafiles": True,
                  "verbose": 0,
          },
          "interface": {"verbose": 0,
          },
}

## Finding the starting lattice parameters of a Rietveld analysis by optimizing a surrogate model

In the prior page of the tutorial, we went over how to find the global minimum of using a surrogate model.
Recall, that we constructed an interpolated model using the ``InterpModel`` and then minimized until our surrogate and cost function reached <0.001 difference.

Applying this to Rietveld analysis, the global minimum is the best fit parameters for our data.
Below, we follow the same pattern using ``InterpModel`` to create an interpolated surrogate model.
In the ``while`` loop we continuously update the surrogate model, find the minimum of the surrogate model, then evaluate that minimum using our actual cost function (``CostFunction.function`` above) using MILK, and continue until we are below a threshold of agreement between the surrogate model and the actual cost function.

Recall, we introduced this approach because we were able to find a minimum with fewer calls to the actual cost function.
In this case, a call to MILK may take several seconds and in an optimization algorithm calling that hundreds or thousands of times, the time can be hours.

In [30]:
import shutil
from spotlight.bridge.ouq_models import WrapModel
from spotlight.bridge.ouq_models import InterpModel
from mystic.solvers import diffev2
from mystic.monitors import VerboseLoggingMonitor
from mystic.models import AbstractFunction
from scipy import stats
from mystic.models import rosen
from mystic.solvers import BuckshotSolver
from mystic.termination import VTR
from mystic.solvers import NelderMeadSimplexSolver, PowellDirectionalSolver
from pathos.pools import ProcessPool as Pool
from mystic import tools
from mystic.math.legacydata import dataset, datapoint
import multiprocess
from spotlight import filesystem

# set bounds for parameters to be +/-5%
target = [2.9306538, 4.6817646, 3.6026807, 3.233392]
lower_bounds = [x * 0.95 for x in target]
upper_bounds = [x * 1.05 for x in target]

# remove prior cached results
if os.path.exists("tmp"):
    shutil.rmtree("tmp")

# copy template file to current working dir
filesystem.cp(["/Users/cmbiwer/src/spotlight/tmp/template.par"], dest=".")
        
# generate a sampled dataset for the model
truth = WrapModel("tmp", CostFunction(4), nx=4, ny=None, cached=False)
bounds = list(zip(lower_bounds, upper_bounds))
data = truth.sample(bounds, pts=[2, 1, 1, 1])

# create surrogate model
kwds = dict(smooth=0.0, noise=0.0, method="thin_plate", extrap=False)
surrogate = InterpModel("surrogate", nx=4, ny=None, data=truth, **kwds)

# go until error < 1e-3
error = float("inf")
sign = 1.0
while error > 1e-3:

    # fit surrogate data
    surrogate.fit(data=data)

    # find minimum/maximum of surrogate
    args = dict(bounds=bounds, gtol=500, full_output=True)
    results = diffev2(lambda x: sign * surrogate(x), bounds, npop=20, **args)

    # get minimum/maximum of actual expensive model
    xnew = results[0].tolist()
    ynew = truth(xnew)

    # compute error which is actual model value - surrogate model value
    ysur = results[1]
    error = abs(ynew - ysur)

    # print statements
    print("truth", xnew, ynew)
    print("surrogate", xnew, ysur)
    print("error", ynew - ysur, error)
    print("data", len(data))

    # add latest evaulated point with actual expensive model to be used by surrogate in fitting
    pt = datapoint(xnew, value=ynew)
    data.append(pt)

print("Done!")


Starting MAUD refinement for step 1


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.48s/it]


Archiving step data
+----------------------------------+-----------+--+
|              Title               |   Rwp(%)  |  |
+----------------------------------+-----------+--+
| opt_MainProcess/Ti64_test_data01 | 52.369053 |  |
+----------------------------------+-----------+--+
+----------------------------------+-----------+------------+----------+-----------+----------+-----------+--------------------+--------------------+----------------+--------------+------------+----------+-----------+-----------+-----------+--------------------+----------------+--------------+------------+-----------+----------+----------+-----------+--------------------+----------------+--------------+--+
|              Title               |   Rwp(%)  | Phase_Name | Vol.(%)  |  error(%) |  Wt.(%)  |  error(%) | Cell_Par(Angstrom) | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase_Name | Vol.(%)  |  error(%) |   Wt.(%)  |  error(%) | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase_Name |  


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.45s/it]



Archiving step data
+----------------------------------+----------+--+
|              Title               |  Rwp(%)  |  |
+----------------------------------+----------+--+
| opt_MainProcess/Ti64_test_data01 | 52.24805 |  |
+----------------------------------+----------+--+
+----------------------------------+----------+------------+-----------+-----------+------------+-----------+--------------------+--------------------+----------------+--------------+------------+-----------+------------+----------+-----------+--------------------+----------------+--------------+------------+----------+----------+-----------+-----------+--------------------+----------------+--------------+--+
|              Title               |  Rwp(%)  | Phase_Name |  Vol.(%)  |  error(%) |   Wt.(%)   |  error(%) | Cell_Par(Angstrom) | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase_Name |  Vol.(%)  |  error(%)  |  Wt.(%)  |  error(%) | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase_Name | 

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.43s/it]



Archiving step data
+----------------------------------+-----------+--+
|              Title               |   Rwp(%)  |  |
+----------------------------------+-----------+--+
| opt_MainProcess/Ti64_test_data01 | 13.660918 |  |
+----------------------------------+-----------+--+
+----------------------------------+-----------+------------+----------+-----------+----------+------------+--------------------+--------------------+----------------+--------------+------------+-----------+------------+-----------+------------+--------------------+----------------+--------------+------------+-----------+-----------+----------+------------+--------------------+----------------+--------------+--+
|              Title               |   Rwp(%)  | Phase_Name | Vol.(%)  |  error(%) |  Wt.(%)  |  error(%)  | Cell_Par(Angstrom) | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase_Name |  Vol.(%)  |  error(%)  |   Wt.(%)  |  error(%)  | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phas

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.53s/it]


Archiving step data
+----------------------------------+-----------+--+
|              Title               |   Rwp(%)  |  |
+----------------------------------+-----------+--+
| opt_MainProcess/Ti64_test_data02 | 13.658616 |  |
+----------------------------------+-----------+--+
+----------------------------------+-----------+------------+-----------+----------+----------+------------+--------------------+--------------------+----------------+--------------+------------+-----------+------------+-----------+------------+--------------------+----------------+--------------+------------+-----------+-----------+----------+-----------+--------------------+----------------+--------------+--+
|              Title               |   Rwp(%)  | Phase_Name |  Vol.(%)  | error(%) |  Wt.(%)  |  error(%)  | Cell_Par(Angstrom) | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase_Name |  Vol.(%)  |  error(%)  |   Wt.(%)  |  error(%)  | Cell_Par(Angstrom) | Size(Angstrom) | Microstrain  | Phase


