Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VNP46A2-h11v07 recipe #217

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jzavala-gonzalez
Copy link

Hi everyone. Thanks for developing pangeo-forge-recipes! It's been really helpful for simplifying the download of this dataset. And especially thanks to @briannapagan and @yuvipanda since this recipe is heavily based off their implementation for the GPM IMERG (#190 ) dataset and usage of CMR's GranuleQuery.

Here are some notes and questions to keep in mind with this recipe:

  • It doesn't encompass the entire dataset, just one grid square of it (h11v07) that covers Puerto Rico and parts of the Caribbean. Any advice on how to expand to other horizontal (h) and vertical (v) squares without breaking the existing lat/lon dimensions is appreciated. At the end is a screenshot of this grid piece and here is a link to the shapefile defining the grid.
  • Although it is hosted by one of NASA's DAACs, it seems it can be accessed without credentials. Should I ask around the Earthdata forums about this before pushing to a bakery? Maybe it's a bug on the LAADS DAAC.
  • Each input file corresponds to a single day but the input files don't have a date dimension. In the recipe, the date dimension is added when opening the input. However, I think the metadata needs something similar. Any idea how to "concat" the Dataset attributes using the date dimension?

Thank you for your time!!

image

@andersy005
Copy link
Member

/run vnp46a2-h11v07

@pangeo-forge
Copy link
Contributor

pangeo-forge bot commented Nov 4, 2022

The test failed, but I'm sure we can find out why!

Pangeo Forge maintainers are working diligently to provide public logs for contributors.
That feature is not quite ready yet, however, so please reach out on this thread to a
maintainer, and they'll help you diagnose the problem.

'''
For each date, return the URL from the collected dictionary.
'''
return vnp_date_dict[date]['href']
Copy link
Member

@andersy005 andersy005 Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzavala-gonzalez, the latest test run failed due to a NameError. This isn't necessary a Python problem but one of the quirks of apache-beam: accessing variables defined in the global scope from a function scope seems to be a problem for apache-beam. one solution is to find a way to move vnp_date_dict from the global scope into make_full_path function scope by

  1. defining vnp_date_dict inside make_full_path or
  2. passing vpn_date_dict as an argument to make_full_path
    File "/usr/local/lib/python3.9/dist-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 155, in cache_input
      fname = config.file_pattern[input_key]
    File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/patterns.py", line 219, in __getitem__
      fname = self.format_function(**format_function_kwargs)
    File "/tmp/tmpxgkkg7a3/recipes/vnp46a2-h11v07/recipe.py", line 77, in make_full_path
  NameError: name 'vnp_date_dict' is not defined [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you decide to go with option 2), i'm curious as to whether the following would work

import functools 

.... 

print('Earliest date:', min(vnp_dates).strftime('%Y-%m-%d'))
print('Latest date:  ', max(vnp_dates).strftime('%Y-%m-%d'))


def make_full_path(date: datetime.date, vnp_date_dict=None) -> str:
   return vnp_date_dict[date]['href']

... 

make_full_path = functoos.partial(make_full_path, vnp_date_dict=vnp_date_dict)

....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 2 looks to be working for the pattern function as you wrote. I suspected the same would happen for the process_input function, add_date_dimension, but during the init of the XarrayZarrRecipe it's attempting unsuccesfully to JSON serialize that partial function (TypeError: object of type partial not serializable). In that second function it's much easier to adjust to not call global so I applied option 1 for that instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants