Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Friendlier code and API #72

Merged
merged 11 commits into from
Jan 22, 2016
Merged

Friendlier code and API #72

merged 11 commits into from
Jan 22, 2016

Conversation

ocefpaf
Copy link
Member

@ocefpaf ocefpaf commented Jan 17, 2016

The goal of this PR is to resurrect some of the topics we discussed in APIRUS (ping @rsignell-usgs ). @ayan-usgs I'd appreciate if you could review and comment on it. @kwilcox you are the only one I know that is using pysgrid on production, so your feedback is very welcomed!

Summary:

  • removed the abc SGridND and the SGrid3D to reduce the code complexity for newcomers. The SGrid2D was renamned to SGrid;
  • replaced custom exceptions with a "compliance-check" that returns ValueErrors when properties (values) are not found (akin to pyugrid);
  • find_grid_topology_var() is no longer a method of NetCDFDataset. It became some sort of stand-alone compliance checker function (and should probably be renamed);
  • there where many direct and indirect calls to find_grid_topology_var(). Also, some of those calls could take the topology variable as input even though it sill automatically looked for it later one. That was simplified in this version;
  • from_ncfile and from_nc_dataset were replaced by load_sgrid that can take nc datasets, file names, or URLs. (I plan to do the same in pyugrid... I guess we could rename it to load_grid.);
  • PEP8d the code to make it friendlier to receive new contributions.

The long term goal is to get rid of the NetCDFDataset class and fold all of its functionality into a SGRID compliance checker. This compliance checker will take call_backs to parse non-compliant datasets.

Hopefully, with a simpler code base, we can start tracking some of bottlenecks that are plaguing sci-wms.

:rtype: string

"""
grid_topology = nc.get_variables_by_attributes(cf_role='grid_topology')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: need to make this work in older versions of netcdf4-python.

@rsignell-usgs
Copy link
Contributor

@ocefpaf , I'm excited you are working on this. @kwilcox , can you please take a look?

@ocefpaf
Copy link
Member Author

ocefpaf commented Jan 18, 2016

The latest commit (ocefpaf@6823f94) splits the centers and nodes into center_lon, center_lat and node_lon, node_lat. It also gets rid of pair_arrays (well... Almost 😬).

By avoiding calling pair_arrays on both nodes and centers we delay the data download and, in some situations, pysgrid can perform a little bit faster.

Before this PR the notebook example took ~10 s to run, now it takes ~5 s.

Note that the gain there was just by not downloading (the rarely used) nodes. We still need to be smart about the centering slicing and subsequent slices to improve this even more.

@kwilcox
Copy link
Member

kwilcox commented Jan 21, 2016

Great progress.

sci-wms needs to know too much information about what an sgrid is:

Anything that closes the implementation gap between pysgrid and potential users. I'd love it if sci-wms never had:

  • Rotate angles
  • Recenter any variables on to another grid
  • Figure out padding when recentering
  • Subset in space (I should be able to provide a bbox and get back another Sgrid object read to go)
  • Subset in time (I don't want to have to write the bisect code... pysgrid could do that for me).

Basically, all data access should assume the user wants it back on the centers and pad/slice/rotate as needed. If a user wants the raw data they can just use netCDF4.

@ocefpaf
Copy link
Member Author

ocefpaf commented Jan 21, 2016

@kwilcox I agree to all that! And some of those should make into pyugrid too, but right now we need to take baby steps.

First: are you OK with the changes proposed in this PR? #72 (comment)

If so, the next step should be:

  • Subset in space (provide a bbox and get back another Sgrid object read to go)

I have a few ideas (#73 and #74) brewing. They all boils down to perform lazy trimmings and lazy center/node averages. That would get us close to a sgrid subset while reducing the amount of data downloaded.

PS: I am opening another issue for those points you raised.

@kwilcox
Copy link
Member

kwilcox commented Jan 21, 2016

👍 +1 to the changes here

@ocefpaf
Copy link
Member Author

ocefpaf commented Jan 21, 2016

@ayan-usgs can you spare some time to review and merge this? (If not I am planning to merge this next week.)

@ayan-usgs
Copy link
Contributor

@ocefpaf,

This all looks solid. Thanks for coding up these improvements!

ayan-usgs added a commit that referenced this pull request Jan 22, 2016
@ayan-usgs ayan-usgs merged commit f10076b into sgrid:master Jan 22, 2016
@ocefpaf ocefpaf deleted the remove_custom_exceptions branch January 22, 2016 14:15
@ocefpaf ocefpaf mentioned this pull request Jan 22, 2016
@ayan-usgs ayan-usgs mentioned this pull request Feb 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants