Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing data in groups #102

Closed
lhmarsden opened this issue Nov 29, 2022 · 7 comments · Fixed by #126
Closed

Accessing data in groups #102

lhmarsden opened this issue Nov 29, 2022 · 7 comments · Fixed by #126

Comments

@lhmarsden
Copy link

I am trying to access data from variables within a NetCDF file that contains hierarchical groups using R. For example, see question I posted on stackoverflow: https://stackoverflow.com/questions/74612898/get-variable-data-out-of-a-group-in-a-netcdf-file-using-rnetcdf-or-ncdf4

I can't find anything about how to do this in the RNetCDF documentation - though this seems to be out of date online.
Latest version: https://www.rdocumentation.org/packages/RNetCDF/versions/2.6-1
Latest documented version: https://www.rdocumentation.org/packages/RNetCDF/versions/1.9-1

Could you help me to do this? Thanks!

@mjwoods
Copy link
Owner

mjwoods commented Nov 29, 2022

Hi @lhmarsden , I’m not sure why Rdocumentation is so old, and I will need to look into it. Thanks for letting me know. You can still read the help files in your local R session though.

To read hierarchical groups, you first use ‘grp.inq.nc’ to get a list of groups in the dataset. Then select a group by name or from the grps item in the result list. Each group is an object that can be used in other RNetCDF functions in place of the dataset object. So you can work through the hierarchy using repeated calls to grp.inq.nc, until the grps list is empty.

I hope that makes sense. Sorry my explanation is brief, but I’m using my phone. If you need more help, maybe you could send me a sample dataset and I could write a code snippet once I am back at my computer.

@lhmarsden
Copy link
Author

Hi @mjwoods , thanks for your help. I can see how to get a list of groups but not how to 'select a group by name'. I would be very grateful if you could show me how to access the variable 'SI_12km_NH_18H_ASC' within the file below.

You should be able to take the file from here:
https://drive.google.com/file/d/1msfsugcfwQCm-GOq-IK_CTrcDApeEpdm/view?usp=sharing

In Python xarray I access the group like this:
ds = xr.open_dataset('AMSR_U2_L3_SeaIce12km_B04_20190101.he5', group='HDFEOS/GRIDS/NpPolarGrid12km/Data Fields')

Thanks!

@mjwoods
Copy link
Owner

mjwoods commented Dec 1, 2022

Hi @lhmarsden , you can get information about the group like this:

fid <- open.nc("AMSR_U2_L3_SeaIce12km_B04_20190101.he5")
# Get info about a group whose name you know:
ginfo <- grp.inq.nc(fid, "HDFEOS/GRIDS/NpPolarGrid12km/Data Fields")
print(ginfo)
# Get the NetCDF object pointing to that group:
gid <- ginfo$self
# Then use the NetCDF object in other RNetCDF functions, for example:
var.inq.nc(gid, 0) # You can use numeric identifiers for variables (or dimensions, attributes, nested groups, etc.) ...
var.inq.nc(gid, "SI_12km_NH_18H_ASC") # or names if you know them.

Hopefully that gets you a bit closer to your goal. I'm happy to help if needed, because it also helps me to understand how the package could be improved to make it easier to use.

@lhmarsden
Copy link
Author

lhmarsden commented Dec 2, 2022

This is very useful @mjwoods thanks! Yes I can figure out the rest from here. I must admit I mostly use Python, but find RNetCDF to be more readable than ncdf4 when I am teaching in R.

A bit off the topic of this GitHub issue, but one thing I think would be useful is if the user could access a variable based on the 'standard_name' variable attribute rather than the variable name. The 'standard_name' attribute is standardised, whereas the variable name is not. It is therefore easier to extract data from files created by different users that contain the same type of data using the standard_name.

I did this in a recent script: https://github.com/lhmarsden/NetCDF-CF_workshops/blob/main/R_workshop_materials/access_multiple_netcdf_files_standard_names.R
And presented this in a YouTube video too: https://www.youtube.com/watch?v=jWRszWCVWLc

But it would be easier if this was coded into RNetCDF, something like:
fid <- open.nc(filepath)
data <- var.get.nc(fid, standard_name = "mass_concentration_of_chlorophyll_a_in_sea_water")

I haven't seen this option in Python xarray, NetCDF4 or ncdf4 in R either, but I could be wrong...

@mjwoods
Copy link
Owner

mjwoods commented Dec 3, 2022

Thanks @lhmarsden , I'm glad I could help.

I like your suggestion to allow selection of variables based on standard_name. I think that would be good as an optional feature of RNetCDF. I'll add it to my TODO list for the package.

@mjwoods
Copy link
Owner

mjwoods commented Oct 2, 2023

Hi @lhmarsden , I have added examples of working with hierarchical groups to the help file for grp.inq.nc. I think this will help users with datasets similar to what you described here. I aim to release the changes on CRAN soon.

I am still considering your suggestion about the standard_name attribute, which I have moved to a separate issue (#103).

@lhmarsden
Copy link
Author

Great to hear, thanks @mjwoods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants