# FairChem v2 with Docker for Local Development and Cloud Scaling

---

This notebook introduces **FairChem v2**, a lightweight and production-friendly cheminformatics framework designed for scalable property prediction and molecule representation learning. This walkthrough covers:

1. **Introduction to FairChem v2**: Historical context and motivations.
2. **Setting Up the Development Environment**: Docker-based instructions.
3. **Building and Running the Docker Container**: Easily containerize FairChem v2.
4. **Getting Started with Predictions**: Quick CLI and API usage.
5. **Extending to Cloud**: How to scale FairChem v2 using cloud infrastructure.

## Introduction to FairChem v2 and the Evolution of Molecular Machine Learning

---

The quest to computationally predict molecular properties began with quantum chemistry methods like **Hartree–Fock** and **Density Functional Theory (DFT)**—powerful but computationally expensive techniques that formed the backbone of early molecular modeling.

In the 2000s, the field shifted toward **machine learning models** like kernel ridge regression and random forests trained on hand-crafted descriptors (e.g., MACCS, ECFP), enabling fast screening workflows.

A paradigm shift occurred in the 2010s with the rise of **graph neural networks (GNNs)**, treating molecules as graphs with atoms as nodes and bonds as edges. Pioneering models such as **Message Passing Neural Networks (MPNNs)**, **WeaveNet**, and **GraphConv** learned molecular representations end-to-end.

By 2020, architectures like [SchNet](#refs‑sch-net), [DimeNet](#refs‑dimenet), and [PhysNet](#refs-psynet) incorporated geometric and quantum structure, achieving higher generalizability across different molecular tasks. Self-supervised pretraining (e.g., contrastive learning, masking) became common by 2023, fostering **foundation models for chemistry**.

**FairChem v2** builds on this rich legacy:

- Packs pretrained GNNs and Transformer-style architectures into one cohesive `fairchem-core` package (no need for PyG, torch-scatter, etc.) [[10]](#refs‑install)
- Offers both **2D and 3D input support**, enabling versatile use cases
- Provides CLI and Python API for flexible workflows
- Designed for scalable deployment—locally via Docker and on the cloud

Now maintained under Meta FAIR Chemistry (formerly the Open Catalyst Project) [[4]](#refs‑meta), FairChem v2 is production-ready, with models like UMA and OMAT integrated into ASE and Hugging Face pipelines.

Whether in drug discovery, catalysis, materials science, or reaction optimization, FairChem v2 provides a comprehensive, efficient platform for modern molecular machine learning.


## Notebook Roadmap

---

### Sections
- [Building and Running the Docker Container](#building-and-running-the-docker-container)
- [Using FairChem v2](#using-fairchem-v2)
- [Running a Property Prediction Example](#property-prediction-example)
- [Deploying to the Cloud](#deploying-to-the-cloud)

### Prerequisites

To follow this tutorial, ensure the following are installed on your system:

- [Docker](https://docs.docker.com/get-docker/)
- A GPU-compatible system (recommended)
- NVIDIA Container Toolkit if using GPU
- [VSCode](https://code.visualstudio.com/) (optional for container access)


## Building and Running the Docker Container

---

1. **Clone the FairChem v2 Repository**

```bash
git clone https://github.com/gabenavarro/MLContainerLab.git
cd MLContainerLab
```

2. **Build the Docker Image**: Use the provided Dockerfile to build the Docker image.

Before you build your docker image, make sure you have access to [UMA model repository](#refs-uma-models) created a [huggingface-cli access token](#ref-huggingface-cli-token). Save the token string into to gitingore file `./assets/secrets/huggingface.token`. This will make sure the dockerfile places huggingface access token in appropriate path. Alternatively, run `huggingface-cli login` in running container to add token later.

```bash
# You can choose any tag you want for the image
# Feel free to play around with the base image, just make sure the host has the same or higher CUDA version
docker build -f ./assets/build/Dockerfile.fairchem2.cu126cp310 -t fairchem2:126-310 .
```
3. **Run the Docker Container**: Run the Docker container with the necessary configurations. In the first example, we will run the container locally with GPU support. This is the recommended way to run a container while in development mode. For scaling up, we will use the second example which runs the container in the cloud.

```bash
# Run the container with GPU support
docker run -dt \
   --gpus all \
   --shm-size=64g \
   -v "$(pwd):/workspace" \
   --name fairchem2 \
   --env NVIDIA_VISIBLE_DEVICES=all \
   --env GOOGLE_APPLICATION_CREDENTIALS=/workspace/assets/secrets/gcp-key.json \
   fairchem2:126-310
```
> Note: The `-v "$(pwd):/workspace"` option mounts the current directory to `/workspace` in the container, allowing you to access your local files from within the container. The `--env` options set environment variables for GPU visibility and Google Cloud credentials.<br>
> Note: The `--gpus all` option allows the container to use all available GPUs. <br>

4. **Access the Container with IDE**: In this example, we will use Visual Studio Code to access the container. You can use any IDE of your choice.

```bash
# In a scriptable manner
CONTAINER_NAME=fairchem2
FOLDER=/workspace
HEX_CONFIG=$(printf {\"containerName\":\"/$CONTAINER_NAME\"} | od -A n -t x1 | tr -d '[\n\t ]')
code --folder-uri "vscode-remote://attached-container+$HEX_CONFIG$FOLDER"
```

> Note: The `code` command is used to open Visual Studio Code. Make sure you have the Remote - Containers extension installed in VS Code to access the container directly. <br>
> Note: Make sure you have installed Remote - Containers extension in VS Code.<br>


[#ref-huggingface-cli-token]: https://huggingface.co/settings/tokens "Access tokens authenticate your identity to the Hugging Face Hub and allow applications to perform actions based on token permissions."
[#refs-psynet]: https://github.com/MMunibas/PhysNet "PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments and Partial Charges"
[#refs-uma-models]: https://huggingface.co/facebook/UMA "UMA model agreement"
[#refs‑sch-net]: https://arxiv.org/abs/1712.06113 "SchNet: a deep learning architecture for molecules and materials"
[#refs‑dimenet]: https://arxiv.org/abs/2003.03123 "DimeNet: Directional Message Passing Neural Network"
[#refs‑install]: https://fair-chem.github.io/core/install.html "FairChem‑core installation notes"
[#refs‑meta]: https://github.com/FAIRChem "FAIR Chemistry @ Meta (was Open Catalyst Project)"

In [1]:
from ase import units
from ase.io import Trajectory
from ase.md.langevin import Langevin
from ase.build import molecule
from fairchem.core import pretrained_mlip, FAIRChemCalculator
from fairchem.core.datasets import atomic_data

predictor = pretrained_mlip.get_predict_unit("uma-s-1", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="omol")

In [36]:
from __future__ import annotations

from fairchem.core import FAIRChemCalculator, pretrained_mlip
from ase.build import bulk, molecule
from fairchem.core.datasets.atomic_data import AtomicData, atomicdata_list_to_batch
from fairchem.core.datasets.embeddings import khot_embeddings
from ase.atoms import Atoms

predictor = pretrained_mlip.get_predict_unit("uma-s-1", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="omol")

In [24]:

atomic_data_list = [
    AtomicData.from_ase(molecule("H2O"), task_name="omol"),
    # AtomicData.from_ase(molecule("C7NH5"), task_name="omol").to("cuda")
]
batch = atomicdata_list_to_batch(atomic_data_list)
predictor.predict(batch)

{'energy': tensor([-2079.4659], device='cuda:0', dtype=torch.float64,
        grad_fn=<CatBackward0>),
 'forces': tensor([[ 3.2495e-05,  4.0628e-05, -9.3457e-01],
         [-1.8890e-03, -2.5103e-01,  4.6724e-01],
         [ 1.8565e-03,  2.5099e-01,  4.6732e-01]], device='cuda:0'),
 'stress': tensor([[ 0.0000e+00,  1.4294e-03, -9.6884e-06,  1.4294e-03,  3.8317e-01,
           1.8400e-05, -9.6884e-06,  1.8400e-05,  5.5729e-01]], device='cuda:0',
        grad_fn=<CatBackward0>)}

In [32]:
predictor.model(batch)

{'oc20_energy': {'energy': tensor([0.], device='cuda:0')},
 'oc20_forces': {'forces': tensor([[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]], device='cuda:0')},
 'oc20_stress': {'stress': tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')},
 'odac_energy': {'energy': tensor([0.], device='cuda:0')},
 'odac_forces': {'forces': tensor([[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]], device='cuda:0')},
 'odac_stress': {'stress': tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')},
 'omat_energy': {'energy': tensor([0.], device='cuda:0')},
 'omat_forces': {'forces': tensor([[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]], device='cuda:0')},
 'omat_stress': {'stress': tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')},
 'omc_energy': {'energy': tensor([0.], device='cuda:0')},
 'omc_forces': {'forces': tensor([[0., 0., 0.],
          [0., 0., 0.],
          [0., 0., 0.]], device='cuda:0')},
 'omc_stress': {'st

In [35]:
predictor.inference_mode.__reduce__()

(<function copyreg._reconstructor(cls, base, state)>,
 (fairchem.core.units.mlip_unit.api.inference.InferenceSettings, object, None),
 {'tf32': True,
  'activation_checkpointing': True,
  'merge_mole': True,
  'compile': False,
  'wigner_cuda': True,
  'external_graph_gen': False,
  'internal_graph_gen_version': 2})

In [19]:
predictor = pretrained_mlip.get_predict_unit("uma-s-1", device="cuda")
calc = FAIRChemCalculator(predictor, task_name="omol")

atoms = molecule("H2O")
atoms.calc = calc

atoms.todict()

{'numbers': array([8, 1, 1]),
 'positions': array([[ 0.      ,  0.      ,  0.119262],
        [ 0.      ,  0.763239, -0.477047],
        [ 0.      , -0.763239, -0.477047]]),
 'cell': array([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]),
 'pbc': array([False, False, False])}