Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomistic models based on metatensor-torch #405

Merged
merged 17 commits into from
Dec 14, 2023

Conversation

Luthaf
Copy link
Contributor

@Luthaf Luthaf commented Oct 20, 2023

This PR contains all the code required to define, export, load and validate arbitrary atomistic models based on metatensor-torch. The user has to provide a TorchScript-compatible torch.nn.Module, with the following signature:

def forward(self, system: System, run_options: ModelRunOptions) -> Dict[str, TensorBlock]:
    ...

The System contains positions, cell and the neighbors lists requested by any submodules, which are requested by defining a requested_neighbors_lists function (this allow e.g. rascaline to request a NL without the end module knowing about it):

def requested_neighbors_lists(self) -> List[NeighborsListOptions]:
    ...

The ModelRunOptions is what the engine wants; in particular it contains a list of ModelOutput (the model forward function can return multiple outputs, and the MD engine should request the ones it wants) and the set of selected_atoms on which to run the calculation.

When exporting a model, the user should use ModelCapabilities to declare what the model is able to do, as well as the units it uses as input & output. MetatensorAtomisticModule then does unit conversion between what the engine provides and the model wants for input; and what the engine wants and the model provides on output.


Still TBD:

  • Python side documentation
  • Commented example on how to use this API
  • Decide on the naming for the new metadata classes (ModelCapabilities, ModelOutput, ModelRunOptions)
  • Define a standard for metatensor metadata of some outputs: what should be the samples/components/properties names and values for the energy output, for the dipole output, …
  • Can a model provide both per-atom and per-structure of the same quantity? How would this look like?
  • Document the standard above
  • Record the torch extension and torch version used when saving the model
  • Add a way to profile model execution time: will be done in a later PR
  • Provide a function to connect neighbors lists distances with positions & cell in the computational graph

📚 Documentation preview 📚 https://metatensor--405.org.readthedocs.build/en/405/

@Luthaf Luthaf force-pushed the atomistic-models branch 2 times, most recently from 4ae3959 to 07fea5e Compare October 20, 2023 16:07
Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to wrap my head around this. The syntax and the logic, but could be very good.

I have some first specific comments in the units.

_requested_neighbors_lists: List[NeighborsListOptions]
_known_quantities: Dict[str, Quantity]

def __init__(self, module: torch.nn.Module, capabilities: ModelCapabilities):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capabilities is something like the target? i.e. "forces", "dipole moments", "partial charges"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capabilities.outputs contains the different targets the model is able to produce. capabilities contains other information about the model (unit used as input, species it can handle).

Maybe we should rename this to ModelDefinition or something, and also store in there the model authors, papers to cite, date of training, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also stumbled on this term, I thought about ModelConfig, but the term does not capture really that it includes the model outputs.

Maybe we should rename this to ModelDefinition or something, and also store in there the model authors, papers to cite, date of training, etc.

I like ModelDefinition, then this would include a metainformation variable of type a Dict[str, str] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to give more structure to the data, and have multiple fields of type str instead of a dict. We could have a other field if people want to store more data in there, but I would start without this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could consider metadata that is included in models from PyTorch Hub or hugging face
https://pytorch.org/docs/stable/hub.html#torch.hub.load
https://huggingface.co/docs/transformers/v4.34.1/en/model_doc/auto#transformers.AutoModel.from_pretrained

I identify these parameters

  • provider hub platform (e.g. "Hugging Face")
  • repo_url (e.g. "https://camembert-model.fr")
  • model_name (e.g. "camenbert")
  • model_checkpoint (e.g. "camembert-large")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be also useful to consider, the metadata that is contained in ONNX models https://onnx.ai/onnx/intro/concepts.html#metadata I identified these as possibly useful

  • producer_version: The version of the generating tool.
    like version of the model or commit ID
  • model_version: The version of the model itself, encoded in an integer.
    I think this as basically what model_checkpoint above is expressing, just that they express it as a string instead of a number. If you look at models from the computer vision community https://pytorch.org/vision/stable/models.html it also makes more sense. For example the first model the model_name would be AlexNet and the version would be AlexNet_Weights.IMAGENET1K_V1. I think that would be good enough to track models. Sometimes they use the _VX at the end to describe that the training procedure changed a bit, but sometimes they change the name at the beginning to do that. It is a mess.
  • model_license: The well-known name or URL of the license under which the model is made available.
    Not sure if this makes sense, as the license should be stored in the repo where the model comes from I would say
  • doc_string: Human-readable documentation for this model

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sold on the hugging face style metadata, it feels more related to a full repository of models instead of a single one. The ONNX metatadata makes a lot more sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently collect some minimal metadata, which should be expanded with stuff like this. If you agree @agoscinski I would leave the metadata definition to a later PR?

@frostedoyster
Copy link
Contributor

frostedoyster commented Nov 22, 2023

Do I understand correctly that one "System" is one structure?
How much work would it be to change the signature to the following?

def forward(self, systems: List[System], run_options: ModelRunOptions) -> Dict[str, TensorBlock]:

This would allow a single model to be trained and exported, since the forward needs to take in multiple structures during training. At the moment, I can only see this working if, after having trained the model, you manually convert it to a different class that has the system: System signature as the forward function, unless I'm missing something.
For now, we could have the interface fail if len(systems) != 1, and in the future this would facilitate PIMD and other techniques where the MD engine can ask for multiple evaluations at the same time.
EDIT: Otherwise, would the current interface work for

def forward(self, systems: Union[System, List[System]], run_options: ModelRunOptions) -> Dict[str, TensorBlock]:

@Luthaf
Copy link
Contributor Author

Luthaf commented Nov 22, 2023

We had a couple of discussion on taking a single system or multiple, but I can't remember the argument for going with a single one. I agree having more than one system is quite useful here, I'll give it a go and check if everything works the same.

@Luthaf
Copy link
Contributor Author

Luthaf commented Dec 11, 2023

Regarding example: I'd prefer to leave them to a separate PR. I already have a branch, but it's going to take a lot more work, so I would rather merge this without waiting for the examples to be done.

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy.

There is not much metatdata to store for these, so turning them to pure
Tensor makes the code easier to use. It will also allow to replace
rascaline.torch.System with this class.
- take multiple systems instead of a single one
- return dict of TensorMap instead of TensorBlock (for equivariant targets)
- separate outputs and selected_atoms into two arguments
This allow us to use rewritten asserts in the tests, and
get nicer error messages on test failure
@Luthaf
Copy link
Contributor Author

Luthaf commented Dec 13, 2023

I found another small bug, I'll wait for CI to pass & then this should be good to go!

Also make sure to call it for neighbors list inside the ASE calculator
@Luthaf
Copy link
Contributor Author

Luthaf commented Dec 13, 2023

CI is hitting a bug in CMake: https://discourse.cmake.org/t/3-28-segmentation-fault-on-macos-11-runner/9588. I'll give them a day to fix it before trying force usage of a different cmake version.

Version 3.28.0 has a miscompilation issue and segfaults in some cases
@Luthaf Luthaf merged commit e788e67 into lab-cosmo:master Dec 14, 2023
27 checks passed
@Luthaf Luthaf deleted the atomistic-models branch December 14, 2023 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants