Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

49 add support for h5md files #71

Merged
merged 52 commits into from Jan 17, 2024
Merged

Conversation

JFRudzinski
Copy link
Contributor

@ladinesa I am finally trying to get back to the development of this H5MD-schema-based parser of hdf5 files.

I was wondering if you could take a (brief) initial look at this code (relevant corresponding nomad-lab dev HERE) and also the initial docs that I have been working on H5MD docs just to get an initial sense of what I am trying to do.

From there I think it would be nice to meet with you to discuss further before moving forward. I spoke with Markus about this a bit at the retreat, and we discussed the possibility of this parser being folded into a more general hdf5 parser. In either case, I realize that there are many things that I have done here that are not ideal, since I was more concerned with using this as a platform to define the schema itself. Additionally, with upcoming changes, I would like to discuss the most efficient way to move forward in a timely fashion.

@JFRudzinski JFRudzinski linked an issue Oct 23, 2023 that may be closed by this pull request
@ladinesa
Copy link
Collaborator

I will have a closer look at this but just some initial thoughts:

  1. Maybe I would split the parser itself with the h5 file parser. This way, we can reuse the h5 file parser for another parser.
  2. Have you done some benchmarkings? I am not sure about the performance of accessing datasets one at a time but I think traversing the h5 group one time will be faster. In the latter case, it will be harder to separate the parser from the file parser but it is doable.
  3. This looks like an md parser so it would be nice the also use the md parser class or at least the parser will be made ready to easily transition to using it.

@JFRudzinski
Copy link
Contributor Author

I will have a closer look at this but just some initial thoughts:

  1. Maybe I would split the parser itself with the h5 file parser. This way, we can reuse the h5 file parser for another parser.
  2. Have you done some benchmarkings? I am not sure about the performance of accessing datasets one at a time but I think traversing the h5 group one time will be faster. In the latter case, it will be harder to separate the parser from the file parser but it is doable.
  3. This looks like an md parser so it would be nice the also use the md parser class or at least the parser will be made ready to easily transition to using it.
  1. Yes, I agree. Is there anything existing already? I implemented a couple things that are schema-specific within the parsing of the file, so we would need to figure out how to make this work generally.
  2. I have not at all. I knew that I would have to probably rewrite a lot, but it was easier conceptually for me to start this way. I probably need a bit of guidance here. I did sort of try to access the group only once by defining class properties that place the information within each group into a dict, but maybe I am not understanding correctly how this works.
  3. Yes, I will definitely replace what I can by the super md parser (I started this before that was written actually).

@ladinesa
Copy link
Collaborator

I am satisfied now with the current state. you can merge this at will. thanks!

@JFRudzinski JFRudzinski merged commit 39e5685 into develop Jan 17, 2024
1 of 2 checks passed
@JFRudzinski JFRudzinski deleted the 49-add-support-for-h5md-files branch January 18, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for H5MD files
2 participants