Skip to content
Efficient, adaptive, and fair sequential testing for large-scale digital experiments with the crowd.
Python HTML
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


A familiar, full stack (yet lightweight) framework for digital experimentation, human orchestration, and interactive machine learning applications.

Machina combines an intuitive experimentation framework (think A/B tests) with flexible human-in-the-loop orchestration to support and accelerate interactive ML research with a focus on consistency, reproducibility, and experimentation. We make it so easy to evaluate your new algorithm with real humans... you'd be foolish not to!

Machina automates the boring stuff and stays out of your way when it comes to the fun/interesting stuff. The framework combines an intuitive Flask-like web framework, built-in deployment and devops functionality, and Mechanical Turk management/orchestration. Specifically it provides utilities for adaptive experimentation, metrics and measures, templates and UI, data storage, participant management and authorization, experiment serialization and replication, application deployment, MTurk orchestration, a monitoring/admin dashboard, and statistical analysis helpers.

Use machina if you want:

  • Amazon Mechanical Turk automation and human orchestration for intelligent human-in-the-loop systems
  • Adaptive experimentation powered by optimal Bayesian experimental design to efficiently evaluate your ML applications
  • Modular Python components for building interactive ML interfaces (which render as HTML and serialize as JSON)

machina can help if you are a:

  • Researcher studying interpretable ML algorithms and want to benchmark against existing methods.
  • Social scientist or HCI researcher evaluating XAI systems with user experiments.
  • Developer building interactive machine learning interfaces and systems.
  • Data scientist needing to explain, debug, and understand ML models.


  • Effort: Zero configuration deployment to the cloud.. and crowd!
  • Cost: Zero cost experiment hosting... so more of your budget goes to your participants
  • Science: Rerun existing experiments with one command to replicate, remix, or extend!

To get started:

If you want to learn about our research on/with machina, all the gory details are included in the following paper: [probably an arXiv link]...


More than the development gains from using the library, we feel that these abstractions we present in the framework are a beneficial way to think about human subjects experiments involving machine learning (or any complex stochastic system).

  • Interdisciplinary collaboration for explainable AI research through model/method, data, and experiment sharing and remixing.
  • Reproducible research through serialized experiments with standardized taks, datasets, interfaces, and measures.
  • Portability enabling anyone anywhere to rerun your experiment on any platform with any participants.
  • Consistent encode best practices of running ML human subjects experiments in the framework itself so you don't have to worry you checked all the boxes. (and helpful for onboarding new experimenters!)
  • Automate the boring stuff to lower the floor and raise the ceiling of explainable AI research... allowing you to focus on what's fun and impactful.
  • Increase participation of who is building and studying ML systems to create more accessable and equitable applications.
  • Modular design so you can take only what you need and leave the rest.
    • Building an interactive ML application? Use the web framework and UI components (and get serilaization for free).
    • Running user experiment with an existing application? Use the experimentation utilities to design the study and assign users to variants (and get statistical bookkeeping an analysis for free).
  • Always open, always free


  • Reference implementations of common XAI explainers and state of the art methods to compare against.
  • Common interpretable benchmark datasets and synthetic data generators across modalities (tabular, text, image, etc.) in a consistent data loading format.
  • Standardized measures to record how humans interact with your experiment and built-in logging.
  • Consistent class interfaces to enable seamless multitasking over datasets, models, and explainers making comprehensive evaluation a breeze.
  • MTurk automation, experiment data logging, and hosting of results.
  • Experiment serialization and versioning to snapshot experiment parameters, interfaces, and model state to facilitate reproduction or remixing.

One explainy boi

See the full annotated and runnable example code here.


⚠️ Compatible with Python 3.x


pip install machina


git clone
cd machina && python install

What's in here?!

The main functionality the library provides is a standardized set of datasets, tasks, models, and measures for ML user experiments, all accessible through a common API to enable rapid and scalable evaluation of various interpretable ML techniques. By standardizing an environment in which to conduct interpretability and FAT research, we hope to accelerate the pace and replicability of ML research that dependends on human interaction and feedback.

At a high level, machina can be thought of as a compiler for interpretable machine learning human subjects experiments, but instead of generating code this compiler translates your experiment specification-- datasets, models, and explainers --into the necessary variants to evaluate. The framework outputs HTML interfaces, instruments them with measures to run your experiment with human subjects (and optionally deploys these experiments to crowdworker platforms like Amazon's Mechanical Turk).

Standing on the shoulders of giants...

See Also


📧 @jonathandinu



  author = {Jonathan Dinu},
  title = {Machina: A declarative framework for efficient lifelong experiments 
  and budgeted designs for adaptive experimentation on crowd marketplaces},
  url = {},
  version = {0.0.1},
  month = {Oct},
  year = {2019}
You can’t perform that action at this time.