Summit has several machine learning strategies available for optimisation, as well as some more naive ones.
All strategies have a similar API. They are instantiated by passing in a ~summit.domain.Domain
. New reaction conditions are requested using the suggest_experiments method which, potentially, takes results from previous reactions.
Bayesian optimisation (BO) is an efficient way to optimise a wide variety of functions, inculding chemical reactions. In BO, you begin by specifying some prior beliefs about your functions. In many cases, we start with an assumption that we know very little. Then, we create a probabilistic model that incorporates this prior belief and some data (i.e, reactions at different conditions), called a posterior. In reaction optimisation, this model will predict the value of an objective (e.g., yield) at particular reaction conditions. One key factor is that these models are probabalistic, so they do not give precise predictions but instead a distribution that is sampled.
With the updated model, we use one of two classes of techniques to select our next experiments. Some BO strategies optimise an acquisition function, which is a function that takes in the model parameters and some suggested next experiement and predicts the quality of that experiment. Alternatively, a deterministic function can be sampled from the model, which is then optimised.
Illustration of how acquisition functions eanble BO strategies to reduce uncertainty and maximise objective simulataneously. Dotted line is actual objective and solid line is posterior of surrogate model. Acquisition function is high where objective to be optimal (exploration) and where there is high uncertainty (exploitation). Adapted from Shahriari et al.To learn more about BO, we suggest reading the review by Shahriari et al.
The BO strategies available in Summit are:
summit.strategies.tsemo.TSEMO
summit.strategies.sobo.SOBO
summit.strategies.MTBO
summit.strategies.ENTMOOT
Reinforcement learning (RL) is distinct because it focuses on creating a custom policy for a particular problem instead of a model of the problem. In the case of reaction optimisation, the policy directly predicts the next experiment(s) should be given a history of past experiments. Policies are trained to maximise some sort of reward, such as achieving the maximum number of yield in as few experiments possible.
For more information about RL, see the book by Sutton and Barto or David Silver's course.
summit.strategies.deep_reaction_optimizer.DRO
summit.strategies.neldermead.NelderMead
summit.strategies.random.Random
summit.strategies.random.LHS
summit.strategies.snobfit.SNOBFIT
summit.strategies.factorial_doe.FullFactorial