Bayesian Optimization for pharmaceutical tablet formulation

Problem statement

Tablet formulation development is expensive because each wet-lab experiment consumes time and material.
With interacting excipients and hard constraints (mass balance + bounds), naive screening methods like OFAT or brute-force grids waste runs and miss high-performing regions.

This project answers one core question: How do we find good formulations faster, with fewer experiments, under realistic formulation constraints?

Explainer (60-second animated overview)

Full video (MP4): bo_explainer.mp4

What is Q45?

Q45 is the percentage of drug dissolved at 45 minutes (0-100%).
In general form:

[ Q45 = 100 \times \frac{\text{drug dissolved at 45 min}}{\text{labeled drug amount}} ]

In this repo, each experiment returns a Q45 value (from the synthetic simulator), and optimization aims to maximize that value under formulation constraints.

Project description

This repository implements a model-based DoE and Bayesian optimization pipeline for pharmaceutical formulation design.
Gaussian Process surrogate models guide sample-efficient exploration of constrained excipient space.

Single-objective BO: optimize Q45 using GP + EI/LogEI style acquisition (notebooks + Streamlit app)
Multi-objective BO: optimize trade-offs among Q45, hardness, and friability using qNEHVI (Notebook 4)
Interactive Streamlit dashboard for design-space exploration, experiment logging, and convergence/strategy comparison

Visual overview

Optimization workflow

Headline benchmark result

Our solution

We use a model-based DoE loop with Bayesian Optimization (BO):

Fit a probabilistic surrogate (Gaussian Process) on observed formulation outcomes.
Quantify uncertainty over unexplored regions.
Select the next experiment using an acquisition function (Expected Improvement).
Update the model with the new result and repeat.

In short: we replace blind search with uncertainty-aware sequential experimentation.

Formulation context

Fixed API: 30% w/w
Excipients sum to 70% w/w with bounds:
- HPMC: 0-20%
- MCC: 20-60%
- CCS: 1-8%
- MgSt: 0.25-2%
- PVP K30: derived by mass balance, constrained to 0-10%
Primary objective: maximize Q45 (dissolution at 45 min)

How we solve it in this repo

1) Baseline DoE (why naive design is insufficient)

notebooks/01_doe_baseline.ipynb
Shows factorial feasibility collapse under constraints, then constrained D-opt proxy + CCD-style baseline, then quadratic RSM.

2) Core BO loop

notebooks/02_bayesian_optimization.ipynb
Runs BO end-to-end (init design + sequential suggestions), with convergence, posterior, and acquisition plots.

3) Benchmark vs alternatives

notebooks/03_comparison.ipynb
Compares BO against random/grid/RSM-guided search under equal evaluation budget.

4) Multi-objective BO extension

notebooks/04_multiobj_bo.ipynb
Uses qNEHVI to jointly optimize dissolution (Q45), hardness, and friability via Pareto trade-offs.

5) Interactive app for scientists

app/streamlit_app.py
Explore design space, run one-step BO suggestions, log experiments, view convergence, and compare strategies.

Quick start

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run notebooks:

jupyter lab

Run app:

streamlit run app/streamlit_app.py

Why this matters (CMC/QbD relevance)

Maps directly to model-based DoE under constraints.
Uses GP uncertainty for design space characterization.
Prioritizes runs via acquisition for systematic experiment planning.
Supports multi-CQA trade-offs via Pareto fronts.

Assumptions and limitations

Uses a physics-informed synthetic simulator (not proprietary lab data).
Adds Gaussian observation noise to mimic experimental variability.
Demonstrates method on a 5-variable space; real programs may include more CQAs/process factors/categorical variables.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
docs/assets		docs/assets
notebooks		notebooks
results/runs		results/runs
.gitignore		.gitignore
README.md		README.md
animate_explainer.py		animate_explainer.py
bayesian_optimization_notes.html		bayesian_optimization_notes.html
bo_animation.html		bo_animation.html
bo_explainer.mp4		bo_explainer.mp4
bo_teaser.gif		bo_teaser.gif
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Optimization for pharmaceutical tablet formulation

Problem statement

Explainer (60-second animated overview)

What is Q45?

Project description

Visual overview

Optimization workflow

Headline benchmark result

Our solution

Formulation context

How we solve it in this repo

1) Baseline DoE (why naive design is insufficient)

2) Core BO loop

3) Benchmark vs alternatives

4) Multi-objective BO extension

5) Interactive app for scientists

Quick start

Why this matters (CMC/QbD relevance)

Assumptions and limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bayesian Optimization for pharmaceutical tablet formulation

Problem statement

Explainer (60-second animated overview)

What is Q45?

Project description

Visual overview

Optimization workflow

Headline benchmark result

Our solution

Formulation context

How we solve it in this repo

1) Baseline DoE (why naive design is insufficient)

2) Core BO loop

3) Benchmark vs alternatives

4) Multi-objective BO extension

5) Interactive app for scientists

Quick start

Why this matters (CMC/QbD relevance)

Assumptions and limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages