-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify for unknown architecture fields #241
Conversation
59dec14
to
bb3846c
Compare
bb3846c
to
1a582d3
Compare
Thanks a lot for the work.
With this config, we would like to see an error (the key is supposed to be |
Interesting let me check this again. |
1a582d3
to
71eec8b
Compare
I fixed checking the options you posted @frostedoyster and added them as a test. |
We seem to have some issues: |
I guess that the issue is that we are trying to use the default hypers as a schema for all possible hypers. If we want to do proper validation, we should have an actual schema. JSON schema seems to be the main industry standard: https://json-schema.org/. Trying to do validation without a schema, we will always have either things we can't validate, or things we are too strict validating. However, this would introduce more work for architecture contributors, so it might be worth to discuss it more in depth later. One thing we could have to reduce this work would be some tool to auto-generate the schema from an example YAML. I'm sure something like this already exists. |
Yes, I thought we can avoid using a JSON schema but it seems like we can't. I will explore if there is an easy way that is not too much work for developers. |
71eec8b
to
d1f8629
Compare
"stress": { | ||
"$ref": "#/$defs/gradient_section" | ||
}, | ||
"virial": { | ||
"$ref": "#/$defs/gradient_section" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we already that stress
and virial
are exclusive in the schema?
Also, the gradients are hardcoded. I think this is fine for now...
d1f8629
to
1ad0ed0
Compare
with open(PACKAGE_ROOT / "share/schema-base.json", "r") as f: | ||
schema_base = json.load(f) | ||
|
||
jsonschema.validate(instance=OmegaConf.to_container(options), schema=schema_base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can wrap these two lines in a function check_base_options
, but might be an overkill...
2812c20
to
7aaa70a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code and checks looks pretty good, I think we need more documentation about this in the contributors documentation.
Is there a tool that can create a (non-ideal) schema from an example YAML file? We could recommend it as a starting point.
"model": { | ||
"type": "object", | ||
"properties": { | ||
"soap": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be interesting to check if we can automatically import rascaline's JSON schema file here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be very cool. But then we have to generate this schemas dynamically based on the location of rascaline or what do have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rascaline currently does not distribute schema (they are only used for documentation generation). I was thinking we could copy the file from rascaline next to this one, and update it whenever we update the pin on rascaline version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds like a good idea. If there is a way to use a distributed schema we can try it once we changed the schema in rascaline.
Yes sure I can add additional information in the section about adding a new architecture.
I checked a couple tools in the beginning, but all of them are not great. In the end, I used CHATGPT for generating, which was much better compared the other tools. I think it is okay if we can recommend this. |
We can suggest some tools with their limitations, and say that we also had good success using ChatGPT/LLM for this! |
64103ba
to
41497f4
Compare
41497f4
to
1e027dd
Compare
I updated the page for documentation including tools for generating the schemas. @Luthaf and regarding linking the rascaline schemas: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me, but I'd like to see an approval from the different architecture maintainers on the new json-schema files.
If you can also open issues for the things left for future PR, that would be great!
@frostedoyster @abmazitov @DavideTisi @spozdn Can you look at the new json-schema file in your architecture and let us know if (a) you understand the file format and what it is doing; and (b) you agree with the types/constrains of everything? |
It looks good to me and the schemas are quite intuitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool, just a bit pedantic but i do not have a better idea
"properties": { | ||
"name": { | ||
"type": "string", | ||
"enum": ["experimental.gap"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why enum
and not string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if you want to explicitly match a string to a specific value with jsonschema you have to use an enum
Partly closes #168 since for now I only added a verification for the architecture hypers using a function
check_architecture_options
. For the checking the dataset it is a bit more complicated because the yaml layout is very flexible. I think it is doable but also to keep the PR a bit smaller I will do this in another run.While touching the code I also moved the
check_options_list
inside the dataset expansion because these two functions always will be called together. Also, there was no test if the writtenoption_restart.yaml
is actually a valid input to start another training run.Contributor (creator of pull-request) checklist
馃摎 Documentation preview 馃摎: https://metatrain--241.org.readthedocs.build/en/241/