Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to create complex tiled AWIPS NetCDF files (formerly SCMI writer) #1402

Merged
merged 38 commits into from Dec 16, 2020

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Oct 15, 2020

This is going to become a large collaborative pull request to add a large set of features to the current scmi writer. Additionally, this work should really require a lot of code refactoring because the current implementation was cruft on cruft on cruft over years of "oh that shouldn't be that hard to add". That said, let's see if I can document this as we go.

The Main Task

The CSPP Geo team at the Space Science and Engineering Center (SSEC) has been tasked with reproducing some tiled AWIPS NetCDF files originally produced by another group with non-Satpy and non-CSPP Geo software. So although we could produce the same result to the client in AWIPS (images on the screen) with the current implementation of the scmi writer, we aren't allowed to modify the configuration files on the AWIPS server that would give us this flexibility.

Therefore, the scmi writer needs to be updated to match the files provided by this other group. There are some complications to make this possible in the Satpy writer. These are outlined below.

Renaming the writer

The name scmi came from the original use case for these files (sectorized cloud moisture imagery), but doesn't make sense any more. Additionally, some people associate SCMI with the ABI L1b file produced for the NWS, even though there are other files (sensors and datasets) using this same format to get their data into AWIPS.

I propose renaming this writer to one of:

  • awips_tiled
  • tiled_awips
  • awips_netcdf

Thoughts?

Expected Output Overview

Here is an excerpt from the config on the AWIPS server and what it is looking for (we hope to get an actual sample NetCDF file soon). Further on I'll explain why we have some complications with what this expects.

<goesrProductDescriptions>

  <default>
    <matches>
      <match pattern="[OI][RT]_GLM-L[23]-GLM(C|F|M1|M2)-M\d_G\d\d_s\d{14}_e\d{14}_c\d{14}.nc">
        <attribute name="dataset_name" />
      </match>
    </matches>
  </default>

  <description>
    <dataTime>
        <formattedRefTime dateFormat="yyyy-MM-dd'T'HH:mm:ss.S'Z'">
            <attribute name="time_coverage_end"/>
        </formattedRefTime>
    </dataTime>

    <source>
        <attribute name="production_site" />
    </source>

    <creatingEntity>
        <pattern inputPattern="G(1[678])" outputFormat="GOES-%S">
            <attribute name="platform_ID" />
        </pattern>
    </creatingEntity>
  </description>

  
  <!-- Flash extent density 1min -->
  <description>
    <data>
      <variable name="Flash_extent_density" />
      <discreteMask name="DQF" value="0" keep="true">
      <fillByte>0</fillByte>
      </discreteMask>
    </data>
    <physicalElement>
      <value value="GLM_Flash_Extent_Density" />
    </physicalElement>
  </description>

  ... more variables ...

Global Attributes

These files expect either new global attributes for data we have access to or they expect the same global attributes we already produce but with different names. The simplest solution is to provide keyword arguments to specify the name for some of these. Luckily there are only a few that are needed that AWIPS understands. In these files those are:

  • dataset_name: the name of the file
  • time_coverage_end: the end time of the dataset. I already added a flag to support using the end time instead of the start time (start time being the default). Just need another keyword argument to change the name of this attribute (default start_time).
  • production_site: I think this is just the acronym of the site that produced it. For example, I've used SSEC in the past for this. This is currently known as "source" in the writer and is only used when generating the filename right now.
  • platform_ID: Should be taken from the platform_name of the metadata. One issue is that Satpy uses names like GOES-16 but this wants them as G-16. We'll need an easy way to make that work. Perhaps a hacky dictionary to map things that is hard-coded for now?

Multiple Variables

The biggest challenge will be that we currently generate a single set of tile files for each product, but these files expect multiple variables in each NetCDF file. There are a couple ways I see of solving this (that may also help with the above):

  1. Create a new keyword argument to choose whether you want to do multi-variable files or not. This would then choose a different NetCDFWrapper (and other helper classes) that exist in the scmi.py module to accomplish this. We already deal with all datasets/variables at once in the save_datasets method, so this is just a different way of handling that.
  2. Instead of doing this under one single awips_netcdf writer, we could use the same python class but allow the YAML files (satpy/etc/writers/scmi.yaml currently) to configure the template for the file and whether or not they should be multi-variable. This configuration could include all of the attribute layouts and maybe even things like a platform_name renaming map.
  3. Allow for more templating at a higher level so this isn't so complicated with a ton of keyword arguments.

Ancillary Variables

The expected files expect a DQF variable to also exist. This is a little strange for the current design, but shouldn't really be a problem. The current glm_l2 reader doesn't read this DQF variable but that's fine. It could show up as its own variable or as an ancillary_variable which should show up in my_data_arr.attrs['ancillary_variables'] when the other datasets are loaded in the Scene.

TODO

  • Refactor writer to allow for more higher-level control of what the file looks like. Even if the user passes in a YAML or XML file to control it.
  • Global Attributes
    • dataset_name
    • time_coverage_end
    • production_site
    • platform_ID
  • Multiple Variable
  • DQF in reader
  • DQF in writer (if needed, may just act as any other dataset)
  • Optional: Rewrite reader to use available_datasets so all variables from the input NetCDF files can be loaded
  • Tests added
  • Passes flake8 satpy
  • Fully documented
  • Add your name to AUTHORS.md if not there already (Nick and Ray will probably need to do this)

@djhoese djhoese added enhancement code enhancements, features, improvements component:writers refactor labels Oct 15, 2020
@codecov
Copy link

codecov bot commented Oct 15, 2020

Codecov Report

Merging #1402 (f80153e) into master (12917ab) will increase coverage by 0.56%.
The diff coverage is 86.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1402      +/-   ##
==========================================
+ Coverage   90.56%   91.12%   +0.56%     
==========================================
  Files         228      241      +13     
  Lines       33406    35438    +2032     
==========================================
+ Hits        30254    32294    +2040     
+ Misses       3152     3144       -8     
Flag Coverage Δ
behaviourtests 4.45% <0.00%> (?)
unittests 91.59% <86.55%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/writers/awips_tiled.py 82.22% <82.22%> (ø)
satpy/tests/writer_tests/test_awips_tiled.py 98.48% <98.48%> (ø)
satpy/readers/glm_l2.py 95.65% <100.00%> (+2.03%) ⬆️
satpy/tests/reader_tests/test_glm_l2.py 100.00% <100.00%> (ø)
satpy/utils.py 24.09% <0.00%> (-46.39%) ⬇️
satpy/readers/mimic_TPW2_nc.py 80.95% <0.00%> (-9.41%) ⬇️
satpy/tests/test_dependency_tree.py 98.03% <0.00%> (-1.97%) ⬇️
satpy/tests/reader_tests/test_grib.py 97.41% <0.00%> (-1.53%) ⬇️
satpy/readers/nwcsaf_msg2013_hdf5.py 94.87% <0.00%> (-1.29%) ⬇️
satpy/readers/seviri_l1b_hrit.py 90.93% <0.00%> (-1.25%) ⬇️
... and 79 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12917ab...f80153e. Read the comment docs.

@coveralls
Copy link

coveralls commented Oct 15, 2020

Coverage Status

Coverage increased (+0.4%) to 90.95% when pulling ce05dd1 on feature-scmi-multivar into 12917ab on master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 90.565% when pulling 63b2ae7 on feature-scmi-multivar into 12917ab on master.

@ghost
Copy link

ghost commented Oct 22, 2020

Congratulations 🎉. DeepCode analyzed your code in 7.531 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@djhoese
Copy link
Member Author

djhoese commented Oct 26, 2020

So with this last commit I think I'm as close I can be without having example files of the tiled files we plan on reproducing. The main question mark is what the actual filename should be. There are a couple important optimizations that should be done and I need to see what needs to be fixed to get category products to work but I'll leave that for now so I can get back to other projects.

self.assertEqual(len(all_files), 20)

# these tiles should be the right-most edge of the first image
first_left_edge_files = [x for x in first_files if 'O01' in x or 'O03' in x or 'U01' in x or 'U01' in x]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F841 local variable 'first_left_edge_files' is assigned to but never used

ds = xr.open_dataset(fn, mask_and_scale=False)
check_required_common_attributes(ds)
assert ds.attrs['time_coverage_end'] == end_time.strftime('%Y-%m-%dT%H:%M:%S.%fZ')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W391 blank line at end of file

@djhoese
Copy link
Member Author

djhoese commented Dec 2, 2020

I hit a very annoying road block that I've spent multiple days on with no progress so I'm putting a bandaid on it and leaving it. The functionality for the SCMI write to update existing tiles on disk with new data seems to hang if you use multiple threads (multiple dask workers) and I've only noticed it in the last couple days. I've tried and tried to make non-satpy scripts to reproduce it or to narrow down exactly what is happening, but haven't come up with anything. It all seems to be with xarray's caching of file objects and multiple threads access them. I suppose, if I had to guess, the file caching might not be thread safe.

@djhoese djhoese marked this pull request as ready for review December 14, 2020 16:03
@djhoese djhoese changed the title [WIP] Add ability to create complex tiled AWIPS NetCDF files (formerly SCMI writer) Add ability to create complex tiled AWIPS NetCDF files (formerly SCMI writer) Dec 14, 2020
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this seems like a lot of refactoring and testing could be done here, but maybe that belongs in a future PR as long as the new functionality is correctly tested.

Just a couple of comments.

satpy/etc/writers/awips_tiled.yaml Show resolved Hide resolved
satpy/readers/glm_l2.py Outdated Show resolved Hide resolved
satpy/readers/glm_l2.py Outdated Show resolved Hide resolved
satpy/writers/awips_tiled.py Show resolved Hide resolved
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit 8427877 into master Dec 16, 2020
@djhoese djhoese deleted the feature-scmi-multivar branch December 16, 2020 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:writers enhancement code enhancements, features, improvements refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants