# CHEM 1000 - Spring 2023
Prof. Geoffrey Hutchison, University of Pittsburgh

## Statistical Design of Experiments

These lectures notes on statistics will include substantial material not found in our text.

By the end of this session, you should be able to:
- Understand "orthogonal" and factorial design
- Understand key issues in statistical design of experiments

This is meant only as a quick introduction to Design of Experiments and why we should care, not a full tutorial.

Modern design of experiments are often full optimization problems (fun!) and can involve computational design, machine learning or Bayesian methods to minimize errors and maximize the chance of finding the best solution in the fewest experiments or measurements.

### Why Bother?

We often are introduced to science using the concept of *one variable at a time* experiments (OVAT).

* Change one variable
* Keep everything else constant to the best of your ability
* See how the results change

Let's propose that we want to optimize the yield of a particular synthetic reaction.

We should consider possible *factors* that influence yield, like:

* Concentration of Reagent 1
* Concentration of Catalyst
* Concentration of Catalyst Ligand
* Solvent
* Temperature
* Reaction Time

Phew, that's a lot of possible things to change. Let's imagine just changing catalyst loading and temperature:

<img src="../images/DoE.png" width="300" />

1. We start at our initial concentration of 0.5 mM and scan temperature (10, 20, 30, 40, 50, 60°C)

<img src="../images/Temperature.png" width="300" />

2. We now scan concentration at our initial temperature of 30°C:

<img src="../images/Concentration.png" width="300" />

Great, it looks like our best conditions are 40°C and 2 mM of catalyst! Much better yield (118.g vs. 78.6g initial)

In this example, the best conditions are at 50°C, 2.5 mM of catalyst, yielding 123.g of product!

<img src="../images/Response.png" width="300" />

**Why did we miss our best conditions?**

There are often interactions between otherwise independent factors in our experiments. Maybe temperature and concentration are correlated *together*?

Instead of varying one variable at a time, we should use Design of Experiments (DoE).

<img src="../images/grid.png" width="300" />

Note if we pick the four "corner" points, we'll get a much better idea if there's an interaction between the two variables. One of them is likely to be close to the best point.

Even better, we could add a center point.

Think about a car's fuel efficiency. We need to try a small engine in a big car and a big engine in a big car .. not just a Prius and a Hummer SUV.

**Iterative Design of Experiments**

We don't want to assume that we can answer all the questions in one huge experiment. Particularly when we start out, we may not know which factors contribute. Maybe the critical step in the reaction is really fast, so reaction time doesn't matter much.

Even in my example here, picking the four "corner points" won't exactly find the best point. But we'll learn that we should investigate more carefully the upper left region (higher temperature, higher concentration) in subsequent studies.

When we perform this *iterative* process, we will often want to balance **exploration** and **exploitation**
- **Exploration** - sampling new regions / conditions which have high uncertainty
- **Exploitation** - trying to find the best near the points we have tried already

After all, performing experiments of any kind is often time-consuming.

**What can we learn?**

- Screening studies: vary a few things to determine which factors are important (e.g., in combination with ANOVA)
    - Consider the efficiency of a rechargable battery. The redox levels of the anode and cathode matter (voltage). But you also care about the mass, volume, speed of recharging … etc.
- Modeling a process: similar - get a better understanding of a system
    - Maybe a process has interactions or nonlinear effects?
- Optimization: finding the best yield, best coffee, etc.

Mathematically, we usually treat this as an example of **multiple regression**:

$$
Y=\beta_{0}+\beta_{1} X_{1}+\beta_{2} X_{2}+\beta_{12} X_{1} X_{2}+\text { experimental error }
$$

In other words, we'll add an *interaction* term $\beta_{12} X_{1} X_{2}$ that can capture any correlations between the two variables.

We can also consider quadratic or nonlinear terms:

$$
Y=\beta_{0}+\beta_{1} X_{1}+\beta_{2} X_{2}+\beta_{11} X_{1}^{2}+\beta_{22} X_{2}^{2}+\text { experimental error }
$$

In general, we might have a *lot* of factors and interactions.

Second-order interactions are pretty common. Maybe a catalyst doesn't work as well at higher temperature (e.g., it decomposes). Or light roast coffee requires longer brew times?

In general, third-order interactions and higher are much less common. This is good because we essentially get replication "for free."

## Full Factorial Design

The basic idea is this: for each factor that might matter, we should pick initial points that span the widest range. (We can come back later for further optimizations in smaller ranges if needed.)

If we want to capture nonlinear / quadratic behavior, it's often useful to add a few center points.

<img src="../images/factorial.png" width="250" />

Many times, this full factorial design is also mentioned as $2^k$ design, because at 2 levels for each factor and $k$ factors... it's an exponential number of experiments.

It's useful though.

Consider the *statistical power* of our design.

Can we detect an improvement of 0.5?

<img src="../images/power-small.png" width="250" />

Probably not. What if the improvement effect was larger .. maybe a difference of 2.0? (This is similar to our t-test and ANOVA questions.. how much separation do we need to detect a difference)

<img src="../images/power-large.png" width="250" />

If we want to detect smaller differences, maybe we need better equipment (or expert taste testers) with less variation:

<img src="../images/power-expert.png" width="250" />

Or if we don't want to do that, we can repeat the measurements because the signal to noise of an average (i.e., the standard error) goes down as you repeat:

<img src="../images/power-repeat.png" width="250" />

### Replications in Full Factorial Design

Let's say we want to test two levels of two different variables, like in our catalyst concentration and temperature example.

|Concentration (mM)|Temperature (°C)|
| --- | --- |
| 0.5|30|
| 0.5|60|
| 3.0|30|
| 3.0|60|

Our multiple-variable regression would look something like this:

$$\text{yield} = \beta_0 + \beta_1 \text{concentration} + \beta_2 \text{Temperature}$$

We do 4 experiments, and the regression has 3 degrees of freedom (one for each of the $\beta$ coefficients.

If we did a three-factor study, we'd do 8 experiments:

|Concentration (mM)|Temperature (°C)|Time (hours)|
| --- | --- | -- |
| 0.5|30|8|
| 0.5|30|24|
| 0.5|60|8|
| 0.5|60|24|
| 3.0|30|8|
| 3.0|30|8|
| 3.0|60|24|
| 3.0|60|24|

Again, if we only consider the first-order effects:

$$\text{yield} = \beta_0 + \beta_1 \text{concentration} + \beta_2 \text{Temperature} + \beta_3 \text{Time}$$

We effectively get increased repetition because we're doing a bigger study. We can more effectively deduce the effect of each factor (and minimizing the effects of random noise.)

## Better Office Coffee

This example is borrowed from [`dexpy`](https://hpanderson.github.io/dexpy-pymntos/#/5) a Python module for design of experiments.

Incdentally, Prof. Chris Hendon, a computational chemist at U. Oregon has worked hard to make better coffee and has won barista awards:
- [Dr. Coffee](https://around.uoregon.edu/drcoffee)
- [Brewing a Great Cup](https://theconversation.com/brewing-a-great-cup-of-coffee-depends-on-chemistry-and-physics-84473)
- [Systematically Improving Espresso: Insights from Mathematical Modeling and Experiment](https://www.sciencedirect.com/science/article/pii/S2590238519304102)
- [Using Chemistry To Get The Perfect Cup Of Coffee](https://www.sciencefriday.com/segments/coffee-chemistry/)

**Why?**

Current office coffee is 👎 "disgusting and unacceptable" 

* What coffee beans to use? (Light vs. Dark roast)
* How much coffee to use?
* How to grind the coffee? (Burr vs. Blade grind, Grind size)
* How long to brew?

So that's *five* factors, or $2^5$ design, potentially with some center points.

* Amount of Coffee (2.5 to 4.0 oz.) - continuous
* Grind size (8-10mm) - continuous
* Brew time (3.5 to 4.5 minutes) - continuous
* Grind Type (burr vs blade)
* Coffee beans (light vs dark)

That's a lot of pots of coffee to taste. Even if we have 3 cups per day(!) if it's for the office, we're limited to weekdays (i.e., everyone gets to taste).

So maybe 6 weeks of taste tests?

We can instead use fractional factorial design .. we'll miss out on third order effects (light roast + long brew + a lot of coffee) but it seems okay to ignore that for now.

<img src="../images/fractional-factorial.png" width="150" />

Basically, we'll use **half** the points, so $2^{k-1}$ .. and maybe add a few center points (e.g., 3.25 oz, 9 mm, 4.0 min) to make sure we capture any nonlinearity.

For full factorial design, it's fairly easy to generate the table of things to try .. every combination.

For fractional factorial design, it's often best to either consult pre-built tables or use software that will generate the combinations.

Then, ideally randomize the list (e.g., maybe your first pot of coffee in the morning tastes better because you're tired and craving caffeine?)

### Example Results

```
                   Results: Ordinary least squares
=====================================================================
Model:                OLS               Adj. R-squared:      0.746   
Dependent Variable:   taste_rating      AIC:                 79.5691 
Date:                 2016-11-10 19:52  BIC:                 90.1715 
No. Observations:     24                Log-Likelihood:      -30.785 
Df Model:             8                 F-statistic:         9.438   
Df Residuals:         15                Prob (F-statistic):  0.000123
R-squared:            0.834             Scale:               1.2184  
---------------------------------------------------------------------
                       Coef.  Std.Err.    t    P>|t|   [0.025  0.975]
---------------------------------------------------------------------
Intercept              5.0318   0.2253 22.3328 0.0000  4.5516  5.5121
amount                 0.9731   0.2759  3.5266 0.0031  0.3850  1.5613
grind_size             0.0022   0.2759  0.0078 0.9939 -0.5860  0.5903
brew_time              1.2061   0.2759  4.3709 0.0005  0.6180  1.7943
grind_type            -0.0974   0.2253 -0.4324 0.6716 -0.5777  0.3828
beans                  0.5774   0.2253  2.5628 0.0216  0.0972  1.0577
amount:beans          -1.4820   0.2759 -5.3707 0.0001 -2.0702 -0.8939
grind_size:brew_time   0.3961   0.2759  1.4354 0.1717 -0.1921  0.9843
grind_size:grind_type -0.6927   0.2759 -2.5103 0.0240 -1.2809 -0.1046
---------------------------------------------------------------------
Omnibus:               4.208          Durbin-Watson:            2.190
Prob(Omnibus):         0.122          Jarque-Bera (JB):         1.550
Skew:                  -0.116         Prob(JB):                 0.461
Kurtosis:              1.777          Condition No.:            1    
=====================================================================
```

Notice that grind size basically has no effect in this study. Nor does grind type. (I am skeptical.. I prefer a burr grinder because it produces more even grounds.)

A lot of other effects seem important:
- Increased amount generally improves quality (e.g., too weak)
- Brew time was important (e.g., people rushed to get their coffee = too weak)

What do interactions look like?

<img src="../images/bean-interaction.png" width="450" />

Evidently, it seems like a lot of dark roast comes out too bitter, but a small amount of light roast generates weak taste?

Maybe a follow-up experiment can use medium roast...

**The key point is that by systematic design of experiments, we could improve our coffee**

## How is Design of Experiments Used?

It's used *everywhere* now.

* Credit card companies study what makes people most likely to take offers
  * Points?
  * Lower rate?
  * Promotional rate?
  * etc.
* Airlines study pricing (e.g., paying for checked bags)
* Websites do "A / B testing" to see which words or which photos work better
* Test kitchens try different tweaks to recipes to make better coffee, etc.
* etc.

Put simply, performing one-variable at a time studies is slower and likely to miss interactions or correlations between factors.

## Replication

There are multiple sources of variation / error. I'm going to stick to (nano)materials, since that's my area of expertise, but similar effect show up in simulations, synthesis, analytical chemistry, etc.

* Variation across multiple regions of a sample
* Variation between multiple samples on a single day
* Variation between multiple days
* Variation between multiple students
* etc.

Ideally we'd want to duplicate every experiment.

But instead, we can track sample-to-sample variation (e.g., on a few points) and make sure the effects we see are larger than the variation.

As mentioned above, we get some level of replication through a factorial design because we're studying .. 5 factors but using 16 or 32 experiments.

## Other Things to Consider

When creating a design, it's also worth considering other "hidden" factors that might matter?

- Does the order you do the experiment matter? (e.g., maybe an instrument "warms up" or an AFM tip wears down over time)
- Does the time of day matter? (e.g., maybe you do better work just after lunch or just after drinking coffee)
- Does the day of the week matter?
- Does the humidity matter? (e.g., some experiments hard in the winter when it's dry)
- Does the temperature matter? (e.g., solvent vapor pressure higher or lower)
- etc.

For this reason, it's often useful to track *everything* and use **randomization** to minimize effects of order. You can also track the order of experiments and add it as a factor to your ANOVA or regression analysis.

-------
This notebook is from Prof. Geoffrey Hutchison, University of Pittsburgh
https://github.com/ghutchis/chem1000

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a>