- 
                Notifications
    
You must be signed in to change notification settings  - Fork 14
 
Add noaa-cdr datasets #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7e29016    to
    234b3fe      
    Compare
  
    Collections: - noaa-cdr-ocean-heat-content - noaa-cdr-ocean-heat-content-netcdf - noaa-cdr-sea-ice-concentration - noaa-cdr-sea-surface-temperature-optimum-interpolation - noaa-cdr-sea-surface-temperature-whoi - noaa-cdr-sea-surface-temperature-whoi-netcdf
        
          
                datasets/noaa-cdr/collections/ocean-heat-content-netcdf/description.md
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                datasets/noaa-cdr/collections/ocean-heat-content-netcdf/template.json
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                datasets/noaa-cdr/collections/sea-ice-concentration/template.json
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                datasets/noaa-cdr/collections/sea-surface-temperature-optimum-interpolation/template.json
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | 
           @TomAugspurger media type and   | 
    
* Added Dockerfile * Updated image * Added app insights string * Added chunking to sea-ice
| 
           I've had a handful of troubles with the full run. 
 pctasks.dataset.items.task.CreateItemsError: Failed to create item from blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970805.nc
[INFO]:2023-04-26 21:23:11,698: (039.00%) [5.37s]  - blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970804.nc (117 of 300)
/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/plugins.py:159: RuntimeWarning: 'h5netcdf' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
[INFO]:2023-04-26 21:23:25,027:  === PCTasks: Task Failed! ===
[ERROR]:2023-04-26 21:23:25,028: Failed to create item from blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970805.nc
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 174, in create_items
    result = self._create_item(asset_uri, storage_factory)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/noaa_cdr.py", line 161, in create_item
    item = stactools.noaa_cdr.stac.add_cogs(item, temporary_directory)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/stac.py", line 94, in add_cogs
    assets = cog.cogify(
             ^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/cog.py", line 25, in cogify
    with xarray.open_dataset(file, mask_and_scale=False) as ds:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/api.py", line 509, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/plugins.py", line 197, in guess_engine
    raise ValueError(error_msg)
ValueError: found the following matches with the input file in xarray's IO backends: ['h5netcdf']. But their dependencies may not be installed, see:
https://docs.xarray.dev/en/stable/user-guide/io.html 
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/run.py", line 138, in run_task
    result = task.parse_and_run(task_data, task_context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/task.py", line 53, in parse_and_run
    output = self.run(args, context)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 203, in run
    results = self.create_items(input, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 176, in create_items
    raise CreateItemsError(
pctasks.dataset.items.task.CreateItemsError: Failed to create item from blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970805.nc
[INFO]:2023-04-26 21:23:25,086: Task run complete.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 174, in create_items
    result = self._create_item(asset_uri, storage_factory)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/noaa_cdr.py", line 161, in create_item
    item = stactools.noaa_cdr.stac.add_cogs(item, temporary_directory)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/stac.py", line 94, in add_cogs
    assets = cog.cogify(
             ^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/cog.py", line 25, in cogify
    with xarray.open_dataset(file, mask_and_scale=False) as ds:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/api.py", line 509, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/plugins.py", line 197, in guess_engine
    raise ValueError(error_msg)
ValueError: found the following matches with the input file in xarray's IO backends: ['h5netcdf']. But their dependencies may not be installed, see:
https://docs.xarray.dev/en/stable/user-guide/io.html 
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/opt/conda/bin/pctasks", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/cli/cli.py", line 140, in cli
    pctasks_cmd(prog_name="pctasks")
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/cli.py", line 50, in run_cmd
    _cli.run_cmd(
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/_cli.py", line 32, in run_cmd
    output = run_task(msg)
             ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/run.py", line 138, in run_task
    result = task.parse_and_run(task_data, task_context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/task.py", line 53, in parse_and_run
    output = self.run(args, context)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 203, in run
    results = self.create_items(input, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 176, in create_itemsFWIW, the compute is in West Europe, but this dataset resides in East US, which might contribute to the networking / download issues.  | 
    
This adds a last-ditch timeout to the user-defined create_items. It *should* reliably interrupt functions that run longer than the user-specified `timeout` (unset by defualt, i.e. no timeout). It works by registering a singal handler for `SIGALRM` (https://www.man7.org/linux/man-pages/man7/signal.7.html, https://www.man7.org/linux/man-pages/man2/alarm.2.html) and setting an alarm for `timeout` seconds. If more than that passes, the kernel takes care of interrupting our thread, and we'll raise a `TimeoutError`. To handle the "common" case of something in `ssl` is stuck, we'll also retry `TimeoutErrors` multiple times.
| 
           e5b60cc has (yet another) attempt at handling the timeouts. It's running now on a pair of runs that have gotten stuck 2/2 times now. I'm a bit... unsure about how this this strategy but I think it'll be OK. The tl/dr is that if you specify a  That said, we should always prefer to solve these problems within the  I'll split it out into its own PR, but wanted to "test" it out on noaa-cdr since it's been giving problems..  | 
    
| 
           Depending on your point of view, we got unlucky with the latest round and didn't get any hangs (or the logs I added didn't work properly).  | 
    
| 
           I did manage to reproduce the issue locally by running this in a loop. This is my understanding: We were specifying  That's in seconds, so we would have eventually interrupted it (in 22.2 hours). I'm not sure who is setting that, but it isn't us. The correct way to set these client-side, socket timeouts is with   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
Collections:
noaa-cdr-sea-ice-concentration(not included because of version mismatch between Azure blob storage assets and NOAA's assets)Type of change
How Has This Been Tested?
Item(s) from all six collections have been ingested into the test instance.
Checklist:
Please delete options that are not relevant.
Screenshots
Ocean heat content
Sea surface temperature optimum interpolation
Sea surface temperature WHOI