Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support user-specified internal names of grid attributes #182

Closed
spencerahill opened this issue May 8, 2017 · 11 comments
Closed

Support user-specified internal names of grid attributes #182

spencerahill opened this issue May 8, 2017 · 11 comments

Comments

@spencerahill
Copy link
Owner

Users can currently specify the names of the data variables in their input data via the name and alt_names attributes of Var. But the same can't be done for grid attributes such as lat, land_mask, etc., because these are taken directly from the GRID_ATTRS dict in internal_names.py.

This effectively means that if their data's grid attributes don't match any of the names in GRID_ATTRS, they will be unable to use aospy unless they modify the aospy source code directly. This just came up for me in trying to use some NCAR CAM output, which has 'nbnd' as its 'bounds' name and 'LANDFRAC' as its 'land_mask' name.

One initial idea (though more though/discussion definitely warranted): create Var objects within aospy for each grid attribute, using the alt_names attribute to store what's currently in GRID_ATTRS. And then the user can define these Vars separately if they want, and aospy will look for them first in their object library before looking at the internal "master" list.

Or, perhaps less boilerplate, do the same but just look for a 'GRID_ATTRS' attribute in their object library. I think that might be better; it's definitely easier.

I'm a bit all over the place right now... @spencerkclark am I right about the overall premise of this issue, or have I forgotten a way to do this?

@spencerahill spencerahill added this to the v0.2 milestone May 8, 2017
@spencerkclark
Copy link
Collaborator

+1 for raising this issue; incidentally this would also help the WRF users (e.g. 'lat' and 'lon' are named 'XLAT' and 'XLONG' respectively).

Or, perhaps less boilerplate, do the same but just look for a 'GRID_ATTRS' attribute in their object library. I think that might be better; it's definitely easier.

I would lean towards this solution as well; it seems like it would be relatively clean to implement, and would be relatively simple from a user's perspective.

@spencerahill
Copy link
Owner Author

this would also help the WRF users (e.g. 'lat' and 'lon' are named 'XLAT' and 'XLONG' respectively).

Good to know -- let me know all other of WRF's grid attributes that are non-standard, and I'll add them in the same PR

I would lean towards this solution as wel

Great, we'll go this route.

@spencerkclark
Copy link
Collaborator

let me know all other of WRF's grid attributes that are non-standard, and I'll add them in the same PR

Do we need to hard-code the settable attributes or could we just have a user specify their own dictionary with only the attributes they want to add alternative names to, and then just iterate over the keys and values of that dictionary and update the internal GRID_ATTRS dictionary where specified?

@spencerahill
Copy link
Owner Author

Do we need to hard-code the settable attributes

I'm inclined to continue populating the builtin GRID_ATTRS with variants for each grid attribute as we come across them. This makes it more likely that a user will be able to use aospy without having to do additional configuring. Otherwise, why have GRID_ATTRS store alternate names at all?

Of course, this is predicated on attribute names being unique, i.e. that no name corresponds to different grid attributes in different models.

@spencerahill
Copy link
Owner Author

(aside)

This does touch on something that I'm not totally satisfied with. Internal naming conventions are generally consistent across all of a model's simulations (or all models in a MIP or all/nearly all models from the same institution). But we have no way, even with the Model class, to signify this.

E.g. it feels like we should be taking advantage of this, so that e.g. when the model being used is WRF, we look for grid attributes with the WRF names first, and potentially warn/raise if we can't find them with those names.

(Now that I'm typing this, it sounds familiar, as in maybe we discussed it before...)

@spencerkclark
Copy link
Collaborator

True, this does seem like something well suited for a Model object to at least store; the tricky thing (at least right now) is that this information needs to be accessible at the DataLoader level. I suppose one could pass the model-specific grid attribute names themselves (in the form of a dict) as an argument to data_loader.load_variable, and go from there.

In writing this down, it actually doesn't seem too disruptive. Do you think that would work?

@spencerahill
Copy link
Owner Author

the tricky thing (at least right now) is that this information needs to be accessible at the DataLoader level

Yes exactly.

I suppose one could pass the model-specific grid attribute names themselves (in the form of a dict) as an argument to data_loader.load_variable, and go from there.

Yes, this seems like the logical way to do it. But from the perspective of users interfacing with the package primarily through submit_mult_calcs, how to make this work?

One idea:

  1. Implement Model.grid_attrs as a dict with keys as the internal names and values as the model-specific name'
  2. When submit_mult_calcs gets called, then for each parameter combination, grab the Model.grid_attrs for that calculation's model, and pass it to the DataLoader.
  3. (largely what you said) In data_loader.py, modify _load_data_from_disk, grid_attrs_to_aospy_names, and DataLoader.load_variable to accept an optional grid_attrs argument.

What do you think?

@spencerkclark
Copy link
Collaborator

spencerkclark commented May 9, 2017

That's pretty much exactly what I had in mind.

  1. When submit_mult_calcs gets called, then for each parameter combination, grab the Model.grid_attrs for that calculation's model, and pass it

I think this is made pretty straightforward by the fact that a Calc object already has a Model attribute, so whenever you call load_variable in calc.py if you just pass self.model.grid_attrs as an argument things should behave nicely. This could be done by adding a grid_attrs argument to the data_loader_attrs dict within calc.py.

@spencerahill
Copy link
Owner Author

Awesome, let's plan on doing this then. This will be a nice feature, preventing users from getting "stuck" where they can't use their data with aospy without aospy source code modifications.

Making sure this doesn't get lost: I do still want to continue adding name variants to internal_names.GRID_ATTRS, unless there's a compelling reason not to.

@spencerkclark
Copy link
Collaborator

Making sure this doesn't get lost: I do still want to continue adding name variants to internal_names.GRID_ATTRS, unless there's a compelling reason not to.

On this front for WRF (aospy internal name -> WRF name):

  • 'time' -> 'XTIME'
  • 'lon' -> 'XLONG'
  • 'lat' -> 'XLAT'
  • 'land_mask' -> 'XLAND'
  • 'zsurf' -> 'HGT'

@spencerahill
Copy link
Owner Author

Perfect, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants