Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable mapping table and the catalog file #233

Open
MaceKuailv opened this issue Jun 12, 2022 · 4 comments
Open

Variable mapping table and the catalog file #233

MaceKuailv opened this issue Jun 12, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@MaceKuailv
Copy link
Collaborator

I found opening and working on new datasets that are not in the standard MITgcm output format with oceanspy somewhat difficult.

I am suggesting several functionalities that might come in handy:

  • Oceanspy should be able to guess the meaning of variables based on name and long_name.
  • Od object contains a dictionary called alases that shows what variables got mapped as what standard variable. If would be nice to print out this table when calling od. Ideally, oceanspy should be able to generate this object on its own based on the guesses. And of course, users can change this table.
  • The catalog file should contain the aliasing information. I also think it would be nice to have an alternative to the yaml format. Say, a csv file that can be edited by Excel.
  • The documentation for set_aliases and manipulate_coords is a bit hard to find. I also think they should be called in the open_oceandataset function.
@ThomasHaine
Copy link
Collaborator

Thanks for the suggestions! It seems that the 2nd item is easiest to implement. Please go ahead and provide a working example. The first and third items look more involved. How do you envisage accomplishing them? Are they needed now or for future releases with support for more models? For the final item: How does this change current functionality?

@MaceKuailv
Copy link
Collaborator Author

The hardest one to implement is probably number 3. There really are a lot of configurations to fill in when creating a new dataset, and the easiest way to do it is debatable.

No.1 can be implemented easily with some word2vec packages (it's also not hard to implement without them).

As for No.4, I think an alternative to calling the set_alias and manipulate_coords is to print out a reminder or raise a warning: "This dataset is missing XG (the longitude of U-velocity). Some methods may require these variable(s) to work. Called set_alias or manipulate_coords to fill it in. "

I would suggest those changes be made before the next release. Right now, the datasets files that have a catalog file are all renamed in the MITgcm fashion (I don't know how it was done, but I assume by hand). Although those changes are going to require somewhere between 100-500 lines of code (actually not that much), I think it is going to help broadcast the package to a much broader audience. I think it is pretty cost-effective.

@ThomasHaine ThomasHaine added the enhancement New feature or request label Jul 6, 2022
@asiddi24
Copy link
Collaborator

Inspired by this issue and #274, we should be able to add function to oceanspy that list all available datasets on Sciserver at some point.

@ThomasHaine
Copy link
Collaborator

ThomasHaine commented Jan 17, 2023

See also @malmans2 2nd comment here: #224 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants