-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomistic models based on metatensor-torch #405
Conversation
4ae3959
to
07fea5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to wrap my head around this. The syntax and the logic, but could be very good.
I have some first specific comments in the units.
_requested_neighbors_lists: List[NeighborsListOptions] | ||
_known_quantities: Dict[str, Quantity] | ||
|
||
def __init__(self, module: torch.nn.Module, capabilities: ModelCapabilities): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capabilities
is something like the target? i.e. "forces", "dipole moments", "partial charges"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capabilities.outputs
contains the different targets the model is able to produce. capabilities
contains other information about the model (unit used as input, species it can handle).
Maybe we should rename this to ModelDefinition
or something, and also store in there the model authors, papers to cite, date of training, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also stumbled on this term, I thought about ModelConfig
, but the term does not capture really that it includes the model outputs.
Maybe we should rename this to ModelDefinition or something, and also store in there the model authors, papers to cite, date of training, etc.
I like ModelDefinition
, then this would include a metainformation variable of type a Dict[str, str]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to give more structure to the data, and have multiple fields of type str
instead of a dict. We could have a other
field if people want to store more data in there, but I would start without this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could consider metadata that is included in models from PyTorch Hub or hugging face
https://pytorch.org/docs/stable/hub.html#torch.hub.load
https://huggingface.co/docs/transformers/v4.34.1/en/model_doc/auto#transformers.AutoModel.from_pretrained
I identify these parameters
provider
hub platform (e.g. "Hugging Face")repo_url
(e.g. "https://camembert-model.fr")model_name
(e.g. "camenbert")model_checkpoint
(e.g. "camembert-large")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be also useful to consider, the metadata that is contained in ONNX models https://onnx.ai/onnx/intro/concepts.html#metadata I identified these as possibly useful
- producer_version: The version of the generating tool.
like version of the model or commit ID - model_version: The version of the model itself, encoded in an integer.
I think this as basically whatmodel_checkpoint
above is expressing, just that they express it as a string instead of a number. If you look at models from the computer vision community https://pytorch.org/vision/stable/models.html it also makes more sense. For example the first model themodel_name
would beAlexNet
and the version would beAlexNet_Weights.IMAGENET1K_V1
. I think that would be good enough to track models. Sometimes they use the_VX
at the end to describe that the training procedure changed a bit, but sometimes they change the name at the beginning to do that. It is a mess. - model_license: The well-known name or URL of the license under which the model is made available.
Not sure if this makes sense, as the license should be stored in the repo where the model comes from I would say - doc_string: Human-readable documentation for this model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold on the hugging face style metadata, it feels more related to a full repository of models instead of a single one. The ONNX metatadata makes a lot more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently collect some minimal metadata, which should be expanded with stuff like this. If you agree @agoscinski I would leave the metadata definition to a later PR?
c5db0b1
to
dd04da5
Compare
3424c94
to
4900992
Compare
fbff672
to
7d095c9
Compare
f292a97
to
0d2c141
Compare
2629d0b
to
44758e4
Compare
Do I understand correctly that one "System" is one structure? def forward(self, systems: List[System], run_options: ModelRunOptions) -> Dict[str, TensorBlock]: This would allow a single model to be trained and exported, since the forward needs to take in multiple structures during training. At the moment, I can only see this working if, after having trained the model, you manually convert it to a different class that has the def forward(self, systems: Union[System, List[System]], run_options: ModelRunOptions) -> Dict[str, TensorBlock]: |
We had a couple of discussion on taking a single system or multiple, but I can't remember the argument for going with a single one. I agree having more than one system is quite useful here, I'll give it a go and check if everything works the same. |
1f2cb79
to
cdbbd7c
Compare
0e3348e
to
939275e
Compare
cb06e4e
to
9acf895
Compare
Regarding example: I'd prefer to leave them to a separate PR. I already have a branch, but it's going to take a lot more work, so I would rather merge this without waiting for the examples to be done. |
dcbd458
to
e6b8823
Compare
e6b8823
to
88ae5aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy.
There is not much metatdata to store for these, so turning them to pure Tensor makes the code easier to use. It will also allow to replace rascaline.torch.System with this class.
- take multiple systems instead of a single one - return dict of TensorMap instead of TensorBlock (for equivariant targets) - separate outputs and selected_atoms into two arguments
This allow us to use rewritten asserts in the tests, and get nicer error messages on test failure
88ae5aa
to
a8be7d6
Compare
I found another small bug, I'll wait for CI to pass & then this should be good to go! |
Also make sure to call it for neighbors list inside the ASE calculator
a8be7d6
to
1826005
Compare
CI is hitting a bug in CMake: https://discourse.cmake.org/t/3-28-segmentation-fault-on-macos-11-runner/9588. I'll give them a day to fix it before trying force usage of a different cmake version. |
Version 3.28.0 has a miscompilation issue and segfaults in some cases
b186c08
to
f0dd3fa
Compare
This PR contains all the code required to define, export, load and validate arbitrary atomistic models based on metatensor-torch. The user has to provide a TorchScript-compatible
torch.nn.Module
, with the following signature:The
System
contains positions, cell and the neighbors lists requested by any submodules, which are requested by defining arequested_neighbors_lists
function (this allow e.g. rascaline to request a NL without the end module knowing about it):The
ModelRunOptions
is what the engine wants; in particular it contains a list ofModelOutput
(the modelforward
function can return multiple outputs, and the MD engine should request the ones it wants) and the set ofselected_atoms
on which to run the calculation.When exporting a model, the user should use
ModelCapabilities
to declare what the model is able to do, as well as the units it uses as input & output.MetatensorAtomisticModule
then does unit conversion between what the engine provides and the model wants for input; and what the engine wants and the model provides on output.Still TBD:
ModelCapabilities
,ModelOutput
,ModelRunOptions
)energy
output, for thedipole
output, …Add a way to profile model execution time: will be done in a later PR📚 Documentation preview 📚 https://metatensor--405.org.readthedocs.build/en/405/