Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a reader library instead of file suffix #153

Open
PicoCentauri opened this issue Mar 15, 2024 · 3 comments
Open

Set a reader library instead of file suffix #153

PicoCentauri opened this issue Mar 15, 2024 · 3 comments
Assignees
Labels
Infrastructure: Data Related to data handling like readers and datasets Priority: Medium Important issues to address after high priority.

Comments

@PicoCentauri
Copy link
Contributor

It might make sense to invert the control flow here. Instead of guessing the reader from the file extension, do something like

structures:
    read_from: file.xyz
    reader: ase

or

structures:
    read_from: ase://file.xyz

this way we don't try to use ASE on CP2K xyz files, which are not quite compatible

Originally posted by @Luthaf in #84 (comment)

I would go for the first option and leave the reader as a optional field. If the user does not specify a reader library we will infer it from the file suffix based on a list of known extensions (as we are currently doing it).

@PicoCentauri PicoCentauri self-assigned this Mar 15, 2024
@Luthaf
Copy link
Contributor

Luthaf commented Mar 15, 2024

The main advantages I see for the second option is that which reader is being used is made explicit, so any error would be less surprising to an user. But it is a bit more clunky than just trying to guess the best option for a file from it's extension.

I'd be fine with either solutions here!

@DavideTisi
Copy link
Contributor

DavideTisi commented Mar 15, 2024

I like the first option more, it seems more direct

@PicoCentauri
Copy link
Contributor Author

We can try to formulate a good error message like

"`dataset.foobar` could not be parsed succesfully by the reader library `ase`. "
"Try to change `reader` field in your `options.yaml`. "
"Possible reader libraries are ase, deepmp, ...."

or similar.

@PicoCentauri PicoCentauri added Priority: Medium Important issues to address after high priority. Infrastructure: Data Related to data handling like readers and datasets and removed infrastructure labels Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure: Data Related to data handling like readers and datasets Priority: Medium Important issues to address after high priority.
Projects
None yet
Development

No branches or pull requests

3 participants