Tablet formulation development is expensive because each wet-lab experiment consumes time and material.
With interacting excipients and hard constraints (mass balance + bounds), naive screening methods like OFAT or brute-force grids waste runs and miss high-performing regions.
This project answers one core question: How do we find good formulations faster, with fewer experiments, under realistic formulation constraints?
Full video (MP4): bo_explainer.mp4
Q45 is the percentage of drug dissolved at 45 minutes (0-100%).
In general form:
[ Q45 = 100 \times \frac{\text{drug dissolved at 45 min}}{\text{labeled drug amount}} ]
In this repo, each experiment returns a Q45 value (from the synthetic simulator), and optimization aims to maximize that value under formulation constraints.
This repository implements a model-based DoE and Bayesian optimization pipeline for pharmaceutical formulation design.
Gaussian Process surrogate models guide sample-efficient exploration of constrained excipient space.
- Single-objective BO: optimize
Q45using GP +EI/LogEIstyle acquisition (notebooks + Streamlit app) - Multi-objective BO: optimize trade-offs among
Q45,hardness, andfriabilityusingqNEHVI(Notebook 4) - Interactive Streamlit dashboard for design-space exploration, experiment logging, and convergence/strategy comparison
We use a model-based DoE loop with Bayesian Optimization (BO):
- Fit a probabilistic surrogate (Gaussian Process) on observed formulation outcomes.
- Quantify uncertainty over unexplored regions.
- Select the next experiment using an acquisition function (Expected Improvement).
- Update the model with the new result and repeat.
In short: we replace blind search with uncertainty-aware sequential experimentation.
- Fixed API:
30% w/w - Excipients sum to
70% w/wwith bounds:HPMC:0-20%MCC:20-60%CCS:1-8%MgSt:0.25-2%PVP K30: derived by mass balance, constrained to0-10%
- Primary objective: maximize
Q45(dissolution at 45 min)
notebooks/01_doe_baseline.ipynb- Shows factorial feasibility collapse under constraints, then constrained D-opt proxy + CCD-style baseline, then quadratic RSM.
notebooks/02_bayesian_optimization.ipynb- Runs BO end-to-end (init design + sequential suggestions), with convergence, posterior, and acquisition plots.
notebooks/03_comparison.ipynb- Compares BO against random/grid/RSM-guided search under equal evaluation budget.
notebooks/04_multiobj_bo.ipynb- Uses qNEHVI to jointly optimize dissolution (
Q45), hardness, and friability via Pareto trade-offs.
app/streamlit_app.py- Explore design space, run one-step BO suggestions, log experiments, view convergence, and compare strategies.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRun notebooks:
jupyter labRun app:
streamlit run app/streamlit_app.py- Maps directly to model-based DoE under constraints.
- Uses GP uncertainty for design space characterization.
- Prioritizes runs via acquisition for systematic experiment planning.
- Supports multi-CQA trade-offs via Pareto fronts.
- Uses a physics-informed synthetic simulator (not proprietary lab data).
- Adds Gaussian observation noise to mimic experimental variability.
- Demonstrates method on a 5-variable space; real programs may include more CQAs/process factors/categorical variables.


