Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always read files in double precision #294

Merged
merged 10 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ architecture.

What is metatrain?
##################
``metatrain`` is a command line interface (cli) to `train` and `evaluate` atomistic
``metatrain`` is a command line interface (cli) to ``train`` and ``evaluate`` atomistic
models of various architectures. It features a common ``yaml`` option inputs to
configure training and evaluation. Trained models are exported as standalone files that
can be used directly in various molecular dynamics (MD) engines (e.g. ``LAMMPS``,
Expand All @@ -30,7 +30,7 @@ that can be connected to an MD engine. Any custom architecture compatible with
TorchScript_ can be integrated in ``metatrain``, gaining automatic access to a training
and evaluation interface, as well as compatibility with various MD engines.

Note: ``metatrain`` does not provide mathematical functionalities `per se` but relies on
Note: ``metatrain`` does not provide mathematical functionalities *per se* but relies on
external models that implement the various architectures.

.. _TorchScript: https://pytorch.org/docs/stable/jit.html
Expand Down
10 changes: 5 additions & 5 deletions docs/src/architectures/pet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,8 +152,8 @@ training dataset. All default values are given by atomic versions for better
transferability across various datasets.

To increase the step size of the learning rate scheduler by, for example, 2 times, take
the default value for ``SCHEDULER_STEP_SIZE_ATOMIC`` from the default_hypers and specify
a value that's twice as large.
the default value for ``SCHEDULER_STEP_SIZE_ATOMIC`` from the default hypers and
specify a value that's twice as large.

It is worth noting that the stopping criterion of PET is either exceeding the maximum
number of epochs (specified by ``EPOCH_NUM`` or ``EPOCH_NUM_ATOMIC``) or exceeding the
Expand All @@ -168,11 +168,11 @@ probability of achieving the best accuracy on a typical moderate-sized dataset.
result, some default hyperparameters might be excessive, meaning they could be adjusted
to significantly increase the model's speed with minimal impact on accuracy. For
practical use, especially when conducting massive calculations where model speed is
crucial, it may be beneficial to set ``N_TRANS_LAYERS`` to `2` instead of the default
value of `3`. The ``N_TRANS_LAYERS`` hyperparameter controls the number of transformer
crucial, it may be beneficial to set ``N_TRANS_LAYERS`` to ``2`` instead of the default
value of ``3``. The ``N_TRANS_LAYERS`` hyperparameter controls the number of transformer
layers in each message-passing block (see more details in the `PET paper
<https://arxiv.org/abs/2305.19302>`_). This adjustment would result in a model that is
about `1.5` times more lightweight and faster, with an expected minimal deterioration in
about *1.5 times* more lightweight and faster, with an expected minimal deterioration in
accuracy.

Architecture Hyperparameters
Expand Down
6 changes: 3 additions & 3 deletions docs/src/dev-docs/architecture-life-cycle.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@ Life Cycle of an Architecture

.. TODO: Maybe add a flowchart later

Architectures in `metatrain` undergo different stages based on their
Architectures in ``metatrain`` undergo different stages based on their
development/functionality level and maintenance status. We distinguish three distinct
stages: **experimental**, **stable**, and **deprecated**. Typically, an architecture
starts as experimental, advances to stable, and eventually becomes deprecated before
removal if maintenance is no longer feasible.

.. note::
The development and maintenance of an architecture must be fully undertaken by the
architecture's authors or maintainers. The core developers of `metatrain`
architecture's authors or maintainers. The core developers of ``metatrain``
provide infrastructure and implementation support but are not responsible for the
architecture's internal functionality or any issues that may arise therein.

Expand Down Expand Up @@ -47,7 +47,7 @@ satisfied:
2. Comprehensive architecture documentation including a schema for verifying the
architecture's hyperparameters.
3. If an architecture has external dependencies, all must be publicly available on PyPI.
4. Adherence to the standard output infrastructure of `metatrain`, including
4. Adherence to the standard output infrastructure of ``metatrain``, including
logging and model save locations.

Deprecated Architectures
Expand Down
2 changes: 1 addition & 1 deletion docs/src/dev-docs/cli/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ the ``eval`` and the ``export`` functions of ``metatrain``.
export

We provide a custom formatter class for the formatting the help message of the
`argparse` package.
``argparse`` package.

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 1 addition & 1 deletion docs/src/dev-docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Developer documentation
=======================

This is a collection of documentation for developers of the `metatrain` package.
This is a collection of documentation for developers of the ``metatrain`` package.
It includes documentation on how to add a new model, as well as the API of the utils
module.

Expand Down
33 changes: 18 additions & 15 deletions docs/src/dev-docs/new-architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ Adding a new architecture
=========================

This page describes the required classes and files necessary for adding a new
architecture to `metatrain` as experimental or stable architecture as described on the
architecture to ``metatrain`` as experimental or stable architecture as described on the
:ref:`architecture-life-cycle` page. For **examples** refer to the already existing
architectures inside the source tree.

To work with `metatrain` any architecture has to follow the same public API to be called
correctly within the :py:func:`metatrain.cli.train` function to process the user's
options. In brief, the core of the ``train`` function looks similar to these lines
To work with ``metatrain`` any architecture has to follow the same public API to be
called correctly within the :py:func:`metatrain.cli.train` function to process the
user's options. In brief, the core of the ``train`` function looks similar to these
lines

.. code-block:: python

Expand All @@ -30,6 +31,7 @@ options. In brief, the core of the ``train`` function looks similar to these lin

trainer.train(
model=model,
dtype=dtype,
devices=[],
train_datasets=[],
val_datasets=[],
Expand Down Expand Up @@ -75,8 +77,8 @@ requirements to be stable. The usual structure of architecture looks as

.. note::
A new architecture doesn't have to be registered somewhere in the file tree of
`metatrain`. Once a new architecture folder with the required files is created
`metatrain` will include the architecture automatically.
``metatrain``. Once a new architecture folder with the required files is created
``metatrain`` will include the architecture automatically.

Model class (``model.py``)
--------------------------
Expand Down Expand Up @@ -118,8 +120,8 @@ Note that the ``ModelInterface`` does not necessarily inherit from
:py:class:`torch.nn.Module` since training can be performed in any way.
``__supported_devices__`` and ``__supported_dtypes__`` can be defined to set the
capabilities of the model. These two lists should be sorted in order of preference since
`metatrain` will use these to determine, based on the user request and
machines' availability, the optimal `dtype` and `device` for training.
``metatrain`` will use these to determine, based on the user request and
machines' availability, the optimal ``dtype`` and ``device`` for training.

The ``export()`` method is required to transform a trained model into a standalone file
to be used in combination with molecular dynamic engines to run simulations. We provide
Expand All @@ -141,6 +143,7 @@ methods for ``train()``, ``save_checkpoint()`` and ``load_checkpoint()``.
def train(
self,
model: ModelInterface,
dtype: torch.dtype,
devices: List[torch.device],
train_datasets: List[Union[Dataset, torch.utils.data.Subset]],
val_datasets: List[Union[Dataset, torch.utils.data.Subset]],
Expand All @@ -155,7 +158,7 @@ methods for ``train()``, ``save_checkpoint()`` and ``load_checkpoint()``.
) -> "TrainerInterface":
pass

The format of checkpoints is not defined by `metatrain` and can be any format that
The format of checkpoints is not defined by ``metatrain`` and can be any format that
can be loaded by the trainer (to restart training) and by the model (to export the
checkpoint).

Expand All @@ -164,15 +167,15 @@ Init file (``__init__.py``)
The names of the ``ModelInterface`` and the ``TrainerInterface`` are free to choose but
should be linked to constants in the ``__init__.py`` of each architecture. On top of
these two constants the ``__init__.py`` must contain constants for the original
`__authors__` and current `__maintainers__` of the architecture.
``__authors__`` and current ``__maintainers__`` of the architecture.

.. code-block:: python

from .model import CustomSOTAModel
from .trainer import Trainer
from .model import ModelInterface
from .trainer import TrainerInterface

__model__ = CustomSOTAModel
__trainer__ = Trainer
__model__ = ModelInterface
__trainer__ = TrainerInterface

__authors__ = [
("Jane Roe <jane.roe@myuniversity.org>", "@janeroe"),
Expand Down Expand Up @@ -207,7 +210,7 @@ required to improve usability. The default hypers must follow the structure
training:
...

`metatrain` will parse this file and overwrite these default hypers with the
``metatrain`` will parse this file and overwrite these default hypers with the
user-provided parameters and pass the merged ``model`` section as a Python dictionary to
the ``ModelInterface`` and the ``training`` section to the ``TrainerInterface``.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/dev-docs/utils/data/readers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Target type specific readers
----------------------------

:func:`metatrain.utils.data.read_targets` uses sub-functions to parse supported
target properties like the `energy` or `forces`. Currently we support reading the
target properties like the ``energy`` or ``forces``. Currently we support reading the
following target properties via

.. autofunction:: metatrain.utils.data.read_energy
Expand Down
12 changes: 6 additions & 6 deletions docs/src/getting-started/custom_dataset_conf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Customize a Dataset Configuration
=================================
Overview
--------
The main task in setting up a training procedure with `metatrain` is to provide
The main task in setting up a training procedure with ``metatrain`` is to provide
files for training, validation, and testing datasets. Our system allows flexibility in
parsing data for training. Mandatory sections in the `options.yaml` file include:
parsing data for training. Mandatory sections in the ``options.yaml`` file include:

- ``training_set``
- ``test_set``
Expand Down Expand Up @@ -78,16 +78,16 @@ A single string in this section automatically expands, using the string as the

.. note::

`metatrain` does not convert units during training or evaluation. Units are
``metatrain`` does not convert units during training or evaluation. Units are
only required if model should be used to run MD simulations.

Targets Section
^^^^^^^^^^^^^^^
Allows defining multiple target sections, each with a unique name.

- Commonly, a section named ``energy`` should be defined, which is essential for running
molecular dynamics simulations. For the ``energy`` section gradients like `forces` and
`stress` are enabled by default.
molecular dynamics simulations. For the ``energy`` section gradients like ``forces``
and ``stress`` are enabled by default.
- Other target sections can also be defined, as long as they are prefixed by ``mtt::``.
For example, ``mtt::free_energy``. In general, all targets that are not standard
outputs of ``metatensor.torch.atomistic`` (see
Expand Down Expand Up @@ -137,7 +137,7 @@ without them.
Multiple Datasets
-----------------
For some applications, it is required to provide more than one dataset for model
training. `metatrain` supports stacking several datasets together using the
training. ``metatrain`` supports stacking several datasets together using the
``YAML`` list syntax, which consists of lines beginning at the same indentation level
starting with a ``"- "`` (a dash and a space)

Expand Down
18 changes: 9 additions & 9 deletions docs/src/getting-started/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,20 @@ registered via the abbreviation ``mtt`` to your command line. The general help o

mtt --help

We now demonstrate how to `train` and `evaluate` a model from the command line. For this
example we use the :ref:`architecture-soap-bpnn` architecture and a subset of the `QM9
dataset <https://paperswithcode.com/dataset/qm9>`_. You can obtain the reduced dataset
from our :download:`website <../../static/qm9/qm9_reduced_100.xyz>`.
We now demonstrate how to ``train`` and ``evaluate`` a model from the command line. For
this example we use the :ref:`architecture-soap-bpnn` architecture and a subset of the
`QM9 dataset <https://paperswithcode.com/dataset/qm9>`_. You can obtain the reduced
dataset from our :download:`website <../../static/qm9/qm9_reduced_100.xyz>`.

Training
########

To train models, `metatrain` uses a dynamic override strategy for your training
To train models, ``metatrain`` uses a dynamic override strategy for your training
options. We allow a dynamical composition and override of the default architecture with
either your custom ``options.yaml`` and even command line override grammar. For
reference and reproducibility purposes `metatrain` always writes the fully
reference and reproducibility purposes ``metatrain`` always writes the fully
expanded, including the overwritten option to ``options_restart.yaml``. The restart
options file is written into a subfolder named with the current `date` and `time` inside
options file is written into a subfolder named with the current *date* and *time* inside
the ``output`` directory of your current training run.

The sub-command to start a model training is
Expand All @@ -45,7 +45,7 @@ training using the default hyperparameters of an SOAP BPNN model
:language: yaml

For each training run a new output directory in the format
``output/YYYY-MM-DD/HH-MM-SS`` based on the current `date` and `time` is created. We use
``output/YYYY-MM-DD/HH-MM-SS`` based on the current *date* and *time* is created. We use
this output directory to store checkpoints, the ``train.log`` log file as well the
restart ``options_restart.yaml`` file. To start the training create an ``options.yaml``
file in the current directory and type
Expand All @@ -64,7 +64,7 @@ The sub-command to evaluate an already trained model is

mtt eval

Besides the trained `model`, you will also have to provide a file containing the
Besides the trained ``model``, you will also have to provide a file containing the
system and possible target values for evaluation. The system of this ``eval.yaml``
is exactly the same as for a dataset in the ``options.yaml`` file.

Expand Down
6 changes: 3 additions & 3 deletions examples/programmatic/llpr/llpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
from metatrain.utils.neighbor_lists import get_system_with_neighbor_lists # noqa: E402


qm9_systems = read_systems("qm9_reduced_100.xyz", dtype=torch.float64)
qm9_systems = read_systems("qm9_reduced_100.xyz")

target_config = {
"energy": {
Expand All @@ -65,7 +65,7 @@
"virial": False,
},
}
targets, _ = read_targets(target_config, dtype=torch.float64)
targets, _ = read_targets(target_config)

requested_neighbor_lists = model.requested_neighbor_lists()
qm9_systems = [
Expand All @@ -77,7 +77,7 @@
# We also load a single ethanol molecule on which we will compute properties.
# This system is loaded without targets, as we are only interested in the LPR
# values.
ethanol_system = read_systems("ethanol_reduced_100.xyz", dtype=torch.float64)[0]
ethanol_system = read_systems("ethanol_reduced_100.xyz")[0]
ethanol_system = get_system_with_neighbor_lists(
ethanol_system, requested_neighbor_lists
)
Expand Down
7 changes: 7 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ source = [
".tox/*/lib/python*/site-packages/metatrain"
]

[tool.black]
exclude = 'docs/src/examples'

[tool.isort]
skip = "__init__.py"
profile = "black"
Expand All @@ -106,4 +109,8 @@ lines_after_imports = 2
known_first_party = "metatrain"

[tool.mypy]
exclude = [
"docs/src/examples"
]
follow_imports = 'skip'
ignore_missing_imports = true
18 changes: 8 additions & 10 deletions src/metatrain/cli/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,8 +167,10 @@ def _eval_targets(
system = sample["system"]
get_system_with_neighbor_lists(system, model.requested_neighbor_lists())

# Infer the device from the model
device = next(itertools.chain(model.parameters(), model.buffers())).device
# Infer the device and dtype from the model
model_tensor = next(itertools.chain(model.parameters(), model.buffers()))
dtype = model_tensor.dtype
device = model_tensor.device

# Create a dataloader
dataloader = torch.utils.data.DataLoader(
Expand All @@ -188,9 +190,10 @@ def _eval_targets(
# Evaluate the model
for batch in dataloader:
systems, batch_targets = batch
systems = [system.to(device=device) for system in systems]
systems = [system.to(dtype=dtype, device=device) for system in systems]
batch_targets = {
key: value.to(device=device) for key, value in batch_targets.items()
key: value.to(dtype=dtype, device=device)
for key, value in batch_targets.items()
}
batch_predictions = evaluate_model(model, systems, options, is_training=False)
batch_predictions = average_by_num_atoms(
Expand Down Expand Up @@ -238,10 +241,6 @@ def eval_model(
"""
logger.info("Setting up evaluation set.")

# TODO: once https://github.com/lab-cosmo/metatensor/pull/551 is merged and released
# use capabilities instead of this workaround
dtype = next(model.parameters()).dtype

if isinstance(output, str):
output = Path(output)

Expand All @@ -258,13 +257,12 @@ def eval_model(
eval_systems = read_systems(
filename=options["systems"]["read_from"],
reader=options["systems"]["reader"],
dtype=dtype,
)

if hasattr(options, "targets"):
# in this case, we only evaluate the targets specified in the options
# and we calculate RMSEs
eval_targets, eval_info_dict = read_targets(options["targets"], dtype=dtype)
eval_targets, eval_info_dict = read_targets(options["targets"])
else:
# in this case, we have no targets: we evaluate everything
# (but we don't/can't calculate RMSEs)
Expand Down
Loading
Loading