<a href="https://colab.research.google.com/github/peterbmob/DHMVADoE/blob/main/Excercises/Doe_red_ex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Screening designs

In this notebook we will create and analyse some two-level screening designs applied to an analytical method, with the aim to evaluate the influence of several instrumental factors on the analytical result. The statistical tools, however, are applicable to any production process; a measurement procedure may be seen as process with measurement values as the final product and with very much the same quality aspects.


## A ruggedness (robustness) test

We will start with an example of *highly reduced screening designs*, suitable for testing the ruggedness (or robustness) of experimental procedures (in this case an analytical method). The purpose of the test is to investigate a rather large number of possibly influential factors in order to find out how they may affect the result. Sometimes the goal is merely to ensure highly reproducible results, in other cases it is used for trouble-shooting or as a preliminary step for optimization. (An example of troubleshooting in an industrial environment is given in Box, Hunter & Hunter, Section 13.3)
Our example concerns the production of a hand cream (Calmuril) by a Swedish pharmaceutical company. The active ingredient in the cream is lactic acid, and an analytical method was developed to monitor the amount of lactic acid in the cream. As a part of the method validation a robustness test was performed to elucidate the effect of some selected experimental factors. Based on the results from the real study we will now perform a simulated robustness test for the analytical method. The main purpose is to perform a limited number of runs to evaluate the main effects of the selected factors. Another issue to consider is that the analysis of a reference sample (with known concentration of lactic acid) has indicated that the method tends to give somewhat too low values.

The analytical procedure (here somewhat simplified) will be presented in some detail in order to give a feeling for the nature of the experimental factors.
The analytical procedure comprises the following steps:

1. A portion of a homogenized sample of the cream is weighed and diluted into an aqueous test solution.

2. A small volume of the test solution is injected as a plug in the flow of a mobile phase through an analytical separation column.

3. As an option there may be a pre-column prior to the separation column in order to filter out organic impurities from the mobile phase.

4. The chemical compounds in the injected plug of the sample will leave the column at different points of times (retention times) and are recorded as separated peaks by a UV absorbance detector with adjustable wavelength.

5. The area of the lactic acid peak is converted into the amount of lactic acid (g/100 g) by a calibration curve obtained from a set of standard solutions with lactic acid (or the equivalent amount of its salt sodium lactate).

In the study the concentration of lactic acid was determined for the same sample under varying experimental conditions, although within the limits of normal operation for the chromatographic system. Seven factors were selected for the investigation; four quantitative factors (which can be varied continuously) and three qualitative factors (with distinct alternatives). The factors and the two levels selected for the experiment are given in the table below.

|Factor | Low level (-) | High level (+) | Type of factor |
|---| ---| ----| ---|  
|A: Acid concentration in mobile phase |  0.001 M |  0.005 M |  quantitative |
|B: Flow rate |0.6 ml/min | 0.8 ml/min | quantitative |
|C: Injection volume | 25 $\mu$l |50 $\mu$l | quantitative |
|D: Detection wave length | 206 nm | 214 nm | quantitative |
|E: Pre-column |Not connected | Connected |qualitative |
|F: Calibration substance | Lactic acid | Sodium lactate | qualitative |
|G: Dilution of standard/sample | Water | Mobile phase | qualitative   |



With seven factors it is possible to evaluate all main effects with only eight runs (a saturated design). Even if the analytical procedure is rather time-consuming (almost one hour per run) it would be possible to make the whole experiment in one working day. This seems tempting, especially since we
have to clean the column and prepare a fresh mobile phase solution each day. But without prior knowledge of the standard deviation for the experimental errors, it is difficult to draw valid conclusions with such a minimal experimental design. You could use a normal distribution graph to assess the
influence of mere chance, but this works only if you have more "random effects" than real effects. It is *always wise to add replicates* and the easy way is to replicate the whole design. But since we have time for only eight experiments per day, we have to “bite the bullet” and afford two working days. The straight-forward way is to repeat the eight runs of the experiment on another day (randomized within the days), but then we should think of blocking the day factor (we have, for example, a new preparation of mobile phase)! Thus, let us make a saturated 27-4 experiment with replicates in another block.

In [1]:
import pandas as pd
import numpy as np
from numpy.random import rand

In [13]:
inputs_labels = {'A' : 'Acid',
                 'B' : 'Flow',
                 'C' : 'Unj. vol.',
                 'D' : 'Det. w.l.',
                 'E' : 'Pre-col',
                 'F' : 'Std subst.',
                 'G' : 'Dilution'}

dat = [('A',1,5),
       ('B',0.6,0.8),
       ('C',25,50),
       ('D',206,214),
       ('E','without','with'),
       ('F','Lactic Acid','Na Lactate'),
       ('G','Water', 'Mobile Phase')]

inputs_df = pd.DataFrame(dat,columns=['index','low','high'])
inputs_df = inputs_df.set_index(['index'])
inputs_df['label'] = inputs_df.index.map( lambda z : inputs_labels[z] )

inputs_df

Unnamed: 0_level_0,low,high,label
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1,5,Acid
B,0.6,0.8,Flow
C,25,50,Unj. vol.
D,206,214,Det. w.l.
E,without,with,Pre-col
F,Lactic Acid,Na Lactate,Std subst.
G,Water,Mobile Phase,Dilution


In [16]:
inputs_df['average'] = inputs_df['A':'D'].apply( lambda z : ( z['high'] + z['low'])/2 , axis=1)
inputs_df['span'] = inputs_df['A':'D'].apply( lambda z : ( z['high'] - z['low'])/2 , axis=1)

inputs_df['encoded_low'] = inputs_df['A':'D'].apply( lambda z : ( z['low']  - z['average'] )/( z['span'] ), axis=1)
inputs_df['encoded_high'] = inputs_df['A':'D'].apply( lambda z : ( z['high'] - z['average'] )/( z['span'] ), axis=1)

inputs_df = inputs_df.drop(['average','span'],axis=1)

inputs_df['encoded_low']=inputs_df['encoded_low'].fillna(-1)
inputs_df['encoded_high']=inputs_df['encoded_high'].fillna(+1)

inputs_df

Unnamed: 0_level_0,low,high,label,encoded_low,encoded_high
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A,1,5,Acid,-1.0,1.0
B,0.6,0.8,Flow,-1.0,1.0
C,25,50,Unj. vol.,-1.0,1.0
D,206,214,Det. w.l.,-1.0,1.0
E,without,with,Pre-col,-1.0,1.0
F,Lactic Acid,Na Lactate,Std subst.,-1.0,1.0
G,Water,Mobile Phase,Dilution,-1.0,1.0


In [18]:
import itertools
# we have four repetitions
encoded_inputs= list(itertools.product([-1,1],[-1,1],[-1,1], [-1,1],[-1,1],[-1,1], [-1,1]))


[(-1, -1, -1, -1, -1, -1, -1),
 (-1, -1, -1, -1, -1, -1, 1),
 (-1, -1, -1, -1, -1, 1, -1),
 (-1, -1, -1, -1, -1, 1, 1),
 (-1, -1, -1, -1, 1, -1, -1),
 (-1, -1, -1, -1, 1, -1, 1),
 (-1, -1, -1, -1, 1, 1, -1),
 (-1, -1, -1, -1, 1, 1, 1),
 (-1, -1, -1, 1, -1, -1, -1),
 (-1, -1, -1, 1, -1, -1, 1),
 (-1, -1, -1, 1, -1, 1, -1),
 (-1, -1, -1, 1, -1, 1, 1),
 (-1, -1, -1, 1, 1, -1, -1),
 (-1, -1, -1, 1, 1, -1, 1),
 (-1, -1, -1, 1, 1, 1, -1),
 (-1, -1, -1, 1, 1, 1, 1),
 (-1, -1, 1, -1, -1, -1, -1),
 (-1, -1, 1, -1, -1, -1, 1),
 (-1, -1, 1, -1, -1, 1, -1),
 (-1, -1, 1, -1, -1, 1, 1),
 (-1, -1, 1, -1, 1, -1, -1),
 (-1, -1, 1, -1, 1, -1, 1),
 (-1, -1, 1, -1, 1, 1, -1),
 (-1, -1, 1, -1, 1, 1, 1),
 (-1, -1, 1, 1, -1, -1, -1),
 (-1, -1, 1, 1, -1, -1, 1),
 (-1, -1, 1, 1, -1, 1, -1),
 (-1, -1, 1, 1, -1, 1, 1),
 (-1, -1, 1, 1, 1, -1, -1),
 (-1, -1, 1, 1, 1, -1, 1),
 (-1, -1, 1, 1, 1, 1, -1),
 (-1, -1, 1, 1, 1, 1, 1),
 (-1, 1, -1, -1, -1, -1, -1),
 (-1, 1, -1, -1, -1, -1, 1),
 (-1, 1, -1, -1, -1, 1, -1),
