## Case Studies
This notebook contains case studies illustrating the SyntheticData Class in synthetic.py</br></br>
JDL / Data Delve LLC, August 2023

In [22]:
import pandas as pd
import numpy as np
from synthetic import SyntheticData

### Case Study 1 - Readme.md Example
A designed experiment will consist of making five batches. Each will be sampled in n=3 within-batch locations referred to as "Begin", "Middle" and "End" based on sampling during batch pumpout. Each sample will be measured for viscosity n=2 times. 

To simulate the experiment prior to running it, the grand mean viscosity is assumed to be 400 centipoise (cps). Random batch to batch variation (due to unknown causes) is expected to be about 2% of the mean based on prior, industrial experience with similar products. Similarly, within-batch variability is expected to be about  3% of the mean. The viscosity lab method is relatively noisy for this product type --5% of the mean.

The five batches will have incremental formula variations of 2.3 to 2.7% of ingredient x -- modeled in the lab to cause a linear increase in viscosity from -20% of grand mean (320 cps) to +20% of grand mean (480 cps)

Additionally, there is known to be a 10% drift in viscosity from Begin to End of batch pumpout

In [23]:
#Specify measurement grand mean (viscosity in centipoise)
xbarbar = 400
sig_digits = 1
meas = 'viscosity_cps'

#Define degrees of freedom, their number of levels and their variability as fraction of mean
nms_dof = ['batch', 'within batch', 'lab']
n_levels_dof = dict(zip(nms_dof, [5, 3, 2]))
var_fracs_dof = dict(zip(nms_dof, [0.02, 0.03, 0.05]))

In [24]:
#User-specified lvl_val names and effects calculated from formulation model (as frac of mean)
lvl_vals = {'batch':['Batch A', 'Batch B', 'Batch C', 'Batch D', 'Batch E'],
                 'within batch':['Begin', 'Middle', 'End']}

#lvl_effects = {'batch':[-0.20, -0.10, 0.00, 0.10, 0.20]}
lvl_effects = {'batch':[-0.20, -0.10, 0.00, 0.10, 0.20], 'within batch':[0.05, 0.0, -0.05]}

nms_dof, n_levels_dof, var_fracs_dof, lvl_vals, lvl_effects

(['batch', 'within batch', 'lab'],
 {'batch': 5, 'within batch': 3, 'lab': 4},
 {'batch': 0.02, 'within batch': 0.03, 'lab': 0.05},
 {'batch': ['Batch A', 'Batch B', 'Batch C', 'Batch D', 'Batch E'],
  'within batch': ['Begin', 'Middle', 'End']},
 {'batch': [-0.2, -0.1, 0.0, 0.1, 0.2], 'within batch': [0.05, 0.0, -0.05]})

In [25]:
expt = SyntheticData(10000, 
                     xbarbar=xbarbar, 
                     names=nms_dof, 
                     n_levels=n_levels_dof,
                     var_fracs=var_fracs_dof,
                     lvl_val_names=lvl_vals,
                     lvl_val_effects=lvl_effects,
                     digits=sig_digits,
                     meas_nm=meas)
expt.create_experiment_procedure()

In [26]:
keep = ['batch', 'within batch', 'level_lab', meas]
expt.df_expt[keep]

Unnamed: 0,batch,within batch,level_lab,viscosity_cps
0,Batch A,Begin,1,327.9
1,Batch A,Begin,2,311.6
2,Batch A,Begin,3,322.3
3,Batch A,Begin,4,334.0
4,Batch A,Middle,1,303.3
5,Batch A,Middle,2,306.7
6,Batch A,Middle,3,304.0
7,Batch A,Middle,4,326.6
8,Batch A,End,1,261.3
9,Batch A,End,2,280.4


In [27]:
#Output to Excel -- add a simulation run label to track results from multiple sims
df = expt.df_expt[keep].copy()
df['sim_run'] = '5-3-2 Sampling'
df.to_excel('case_study1.xlsx', index=False)