-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for custom user preprocessing step to data_loader.load_variable? #177
Comments
Good catch. Are there any other places where we're implicitly assuming CF-compliant data?
I agree.
This definitely seems like the most straightforward solution. However, the fact that this stems from model-level settings (i.e. the output format of WRF simulations), I'm wondering if there's a way to do this more systematically. More specifically, is there a way to implement this at the Model level? I don't think so right now, so maybe that's beyond the scope of this particular issue. And it doesn't preclude the usefulness of your recommended solution, since that's more general.
What motivates making it |
Just occurred to me: this is exactly what different DataLoaders are for. I.e. we should add a Edit: In fact, if a python wrapper exists to the wrfout_to_cf.ncl utility you linked to, we could use that directly, rather than rolling our own. (I still think your |
I think we only are strict about times, since xarray requires a CF-compliant units attribute to decode times (and we rely on times being decoded within aospy).
That does sound like a nice solution! That way you wouldn't have to remember to pass the preprocessing function to the DataLoader each time (you'd just have to make sure you used the correct one).
I agree; it would make implementing different DataLoaders that require their own preprocessing steps for cleaning datasets before passing them to aospy more straightforward.
Right, for this particular use case (I was thinking about just a |
Thinking out loud, could we make it a method that's unique to each DataLoader? E.g. for DictDataLoader it maps to Not sure if that's a good idea. Definitely need to work through the implementation a bit before proceeding. |
We toyed with adding this in #90, but eventually settled on just adding a custom time-offset capability for simplicity.
I know someone trying out aospy that would like to use it for looking at output from the WRF model, which unfortunately does not always comply with CF conventions; this is particularly problematic for the time variable:
It would be nice, rather than have to modify every output file to have a CF-compliant time units variable, if a user could provide a function to apply to the dataset before aospy touches it. I think it would make most sense for this to be a specification one could make as an optional argument in a DataLoader constructor.
For instance:
The
preprocess
argument could take a dictionary as input, with anintvl_in
mapping to a user-defined function, which would be called beforegrid_attrs_to_aospy_names
whenever files were loaded from a particular file set. This could be used to clean up datasets that are close, but not quite, compatible with aospy's assumptions.@spencerahill what are your thoughts here? Does this sound like a decent option to address this issue?
The text was updated successfully, but these errors were encountered: