Skip to content

Tutorial

Kobi Felton edited this page Jul 2, 2020 · 13 revisions

In this tutorial, we will use summit for a multiobjective optimization problem.

Setup

Make sure you have summit installed:

pip install git+https://github.com/sustainable-processes/summit_private.git@0.4.0#egg=summit

Additionally, you will need numpy, pandas and matplotlib for this tutorial:

pip install --upgrade numpy pandas matplotlib

Once you have those packages installed, start a new .py file and add the following lines:

from summit.utils.dataset import DataSet
from summit.domain import ContinuousVariable, Constraint, Domain
from summit.strategies import TSEMO2

In the first block we are importing classes from summit. In the second block, we are importing classes from other packages that we will use.

Finally, download this CSV file. That link will bring a page which doesn't have a download button. An easy way to download the file is to click on on the "Raw" button and copy the url of the page that comes up. You can then run the following command in your terminal to download the file:

curl URL -o silica_experiments_data.csv

where URL is the url of the page that comes up when you click "Raw" button. This saves silica_experiments_data.csv in the current directory of your terminal. Note that the URL will expire after a couple hours.

Import Data

We'll begin by importing the existing experimental data and converting it into a DataSet.

# Read in CSV file
data_pd = pd.read_csv('silica_experiments_data.csv')

# Select relevant columns
input_columns = [ 'TEOS', 'NH3', 'H2O', 'EtOH']
output_columns = ['PSD', 'STD']
metadata_columns=['Batch']
data_pd = data_pd[input_columns + output_columns + metadata_columns]

# Transform PSD
data_pd['PSD_distance_100'] = ((data_pd['PSD']-100)**2)**0.5

# Convert to Summit DataSet
data = DataSet.from_df(data_pd, metadata_columns=metadata_columns)

The first line reads in the CSV file as a pandas dataframe. Pandas dataframes are like spreadsheets in python; each dataframe has columns and an index specifying the rows. We select the columns corresponding with our inputs (TEOS, NH3, H2O and EtOH); these are the variables we can change in the experiments. We also select the outputs which are the objectives that we want to optimize.

We do a transform of the output column PSD to be the squared difference between the value and 100. This is so we can target a value of 100.

Finally, we convert into Summit's special form of a DataFrame called a DataSet. A DataSet can include certain columns as metadata, so those columns will not be used in the optimization.

Domain

Now, we specify the optimization domain, i.e., the variables that summit should to optimize the objectives.

#Set up the optimization problem domain
domain = Domain()

#Decision variables
domain += ContinuousVariable('TEOS', 
                             description = '',
                             bounds=[1, 35])
domain += ContinuousVariable('NH3', 
                             description = '',
                             bounds=[0, 100])
domain += ContinuousVariable('H2O', 
                             description = '',
                             bounds=[0, 100])

#Objectives
domain += ContinuousVariable('PSD_distance_100', 
                             description = 'Distance of average particle size from target of 100',
                             bounds=[0, 1000],
                             is_objective=True,
                             maximize=False)
domain += ContinuousVariable('STD', 
                             description = 'Standard deviation of particle size distribution',
                             bounds=[0, 1000],
                             is_objective=True,
                             maximize=False)

#Constraints
domain += Constraint(lhs='TEOS+NH3+H2O-100',
                     constraint_type='<')
domain += Constraint(lhs='(H2O+0.75*0.91*NH3)/18-2*TEOS/208.33',
                     constraint_type='>')

Note that that EtOH is not included in the inputs because it can be deduced from the first constraint. The constraints are the left hand side of an equation with the right hand side as zero.

Strategy

We now use TSEMO2 to request new experiments. This outputs a CSV file with the conditions for the next experiments.

# Get experiment suggestions
num_experiments=4
tsemo = TSEMO2(domain, models)
experiments = tsemo.suggest_experiments(num_experiments, data)

# Wrangle data
experiments['EtOH'] = 100-(experiments['TEOS'] + experiments['NH3'] + experiments['H2O'])
experiments = experiments.round()
experiments.to_csv('next_experiments.csv')
Clone this wiki locally