Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize composite YAML files to be more flexible #789

Open
djhoese opened this issue May 22, 2019 · 4 comments
Open

Reorganize composite YAML files to be more flexible #789

djhoese opened this issue May 22, 2019 · 4 comments
Assignees
Labels
component:compositors enhancement code enhancements, features, improvements

Comments

@djhoese
Copy link
Member

djhoese commented May 22, 2019

Feature Request

Is your feature request related to a problem? Please describe.
Right now all composites for a particular sensor have to be placed in one single file (<sensor>.yaml). This is overall not very flexible or extensible. This is especially important when thinking of extending Satpy as discussed in #784.

Describe the solution you'd like
Currently the composites configuration directory is structured as:

etc/
  composites/
    visir.yaml  # Composite recipes that work for most or multiple VIS/IR instruments
    viirs.yaml  # composites specific to the VIIRS instrument
    abi.yaml  # composites specific to the ABI instrument

@mraspaud and I talked about the possibility of doing something like:

etc/
  composites/
    shared.yaml  # explained below
    visir/
      a.yaml  # VIS/IR composites
      viirs/
        a.yaml  # some VIIRS composites
        b.yaml  # more VIIRS composites
      abi/
        a.yaml  # some ABI composites
        b.yaml  # more ABI composites

This means that the sensor-specific directories can hold any number of yaml composite files. It also means that the sensor: visir/viirs logic currently used in composite configs is no longer needed since the hierarchy is in the directory structure.

The shared.yaml is something that still needs discussion. If you want composites that combine multiple sensors of different types they would go in the root composites directory.

Complications:

  • Composite conflicts between sensor-specific configs. Do we load the YAML files alphabetically? Is that fair? Right now things are updated recursively so someone could end up with some odd looking compositor configurations (true_color defined in a.yaml gets its dependencies overridden by b.yaml even though a.yaml used keyword arguments specific to its requirements).

Describe any changes to existing user workflow
Custom composites should be defined in this new directory structure. There could/should be a deprecation cycle though to support the existing sensor-based single file loading.

Additional context
Another part of this that is important to mention is the possibility of adding "sensor" and potentially a "platform_name" parameter to composite dependencies to say "this dependency must come from this platform/sensor. You could then make composites of G16 ABI and G17 ABI. Right now that isn't possible. I also just realized that this wouldn't even be possible because a single reader would be used

We also need to remember that #161 is still on the wish list.

@djhoese djhoese added enhancement code enhancements, features, improvements component:compositors labels May 22, 2019
@djhoese djhoese self-assigned this May 22, 2019
@jsolbrig
Copy link

jsolbrig commented May 29, 2019

Summary of discussion from Slack

I started a discussion on Slack on this issue because I'm concerned that putting each sensor into a specific category will become limiting as new sensors come online. For example, calling sensors "visir" neatly includes ABI, AHI, AMI, VIIRS, MODIS, etc. Some sensors, however, overlap with the "visir" channels while also carrying other channels; for example, in addition to Vis and IR bands PACE will have UV bands including the O2 bands. Also, different "visir" sensors have different capabilities. For example, the GOES-Imager (GOES <= 15) only has five channels while GOES-ABI (GOES >= 16) has 16 channels and very different capabilities.

Given all of that, it seems limiting to classify sensors into groups like "visir", "microwave", etc. This raises two questions.

  1. Should we have generic algorithms that can be used for sensors with the appropriate channels?
  2. If so, how do we accomplish having generic algorithms without putting sensors into specific classes?

Ideas

  1. Give each sensor a set of capabilities (e.g. "UV", "O2", "vis", "ir", "mw", etc.) and create generic algorithms for the various capabilities and combinations of capabilities.
  2. Load all YAML files and find the ones with appropriate wavelength requirements for the current sensor.
  3. Make use of PyYAML's !include directive and explicitly include generic algorithms in YAML for sensors than can produce them without needing to copy and paste.

Thoughts on the ideas

Generally, I think Option # 1 is problematic for the same reason that using the "visir" designation is problematic. Sensor capabilities aren't easily summarized in a tag (see the GOES-Imager example above).

Option # 2 seems promising, but may be limited by load times for YAML that contains Python function references. Maybe it is possible to load YAML while delaying loading the actual Python functions until specific YAML has been settled on? If so, we could cue completely off of which wavelengths each algorithm requires and which wavelengths the sensor offers.

Assuming that the YAML can't be loaded in a reasonable amount of time, Option # 3 may be a good option as well, but may require custom handling of the !include directive in order to allow SatPy and extension modules to handle one another's YAML seamlessly.

@mraspaud
Copy link
Member

In pyyaml, the baseloader class can read yaml without interpreting any python objects.

@jsolbrig
Copy link

@mraspaud Thanks! I was just looking into this and I think I was just about there, but that just saved me some time.

@jsolbrig
Copy link

jsolbrig commented May 30, 2019

I have made some progress on this issue. Changes so far can be seen in my fork.

So far I am working with option # 2. I have implemented this option for composites, but still need to figure out how to do the same for enhancements, readers, and writers. Readers and writers look straightforwards, but enhancements appear to be a bit more difficult. I wonder if a refactor of some of the enhancement code might be worth while.

To Do List

  • Add a deprecation warning when multi-level sensor_name values are encountered in composite files.
  • Ensure that all changes are backwards compatible.
  • Update loading composites.
    • See next section for a description of this progress.
  • Update loading enhancements.
    • This seems likely to be the most complicated of the needed updates. Maybe something needs to be done to refactor loading enhancements. I'm not sure that I understand the dependency tree, though, and need to study this part of the code more in-depth.
  • Update loading readers.
    • I have not looked into this at all yet and am unsure of its difficulty level.
  • Update loading of writers.
    • I have not looked into this at all yet and am unsure of its difficulty level.

Update loading composites

Composite YAML file structure

The structure of the sensor_name tag has been modified, though I believe the change is backwards compatible.

Previously, the sensor_name tag could contain multi-level sensor names where the levels were separated by "/". The last element of these sensor names was the actual name of the sensor and the other elements were names of generic composite files to read.

Now, the sensor_name tag should be limited to just the name of the sensor with no additional classifications. Files that contain generic composites should not contain the sensor_name tag. When loading composites from generic files (no sensor_name tag in the file) composites will be considered (discarding those whose prerequisites don't match the data we have). When loading composites from sensor-specific files, only those whose sensor_name tag matches the value of the sensor_name argument will be considered.

Composite search and loading

To accommodate these changes, I updated the composite loader to load all composite YAML files at once, then sort out which to use for the specified sensor. This is surprisingly fast and takes much less than a second. I will time this later to get some real numbers. If this method of loading all YAML at once proves to be too slow at larger scales I believe that there are two options for speeding it up:

  1. Perform initial loading using yaml.BaseLoader rather than yaml.UnsafeLoader, sort out which will actually be used, then re-load those with yaml.UnsafeLoader.
  2. Read only the first several lines of each YAML file and look for a sensor_name tag. If it doesn't exist, load that file. If it does exist, check its value against the input sensor_name argument and load only those that match.

Additionally, maybe something can be done to cache this information during a given process so that when many Scene objects are created or when a MultiScene object is created, the YAML aren't re-loaded many times.

Loading composites from plugin packages

In reference to #784, I have updated composite loading to use pkg_resources.iter_entry_points to obtain a list of composite YAML files from plugin packages that correctly specify an entry point in their setup.py in the group "satpy.plugins". This isn't hard-coded and I have plans to automatically handle it in the cookiecutters mentioned in #784. The entry point is defined like this:

# setup.py
setup(
    ...,
    entry_points={'satpy.plugins': 'plugin_name=plugin_name:config_files'},
    ...)

This loads plugin_name.config_files which is a dictionary defined in plugin/__init__.py which looks like this:

config_files = {'composites': glob(
                    os.path.join(__dirname, 'etc', 'composites', '*.yaml')),
                'enhancements': glob(
                    os.path.join(__dirname, 'etc', 'enhancements', '*.yaml')),
                'readers': glob(
                    os.path.join(__dirname, 'etc', 'readers', '*.yaml')),
                'writers': glob(
                    os.path.join(__dirname, 'etc', 'writers', '*.yaml'))}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:compositors enhancement code enhancements, features, improvements
Projects
None yet
Development

No branches or pull requests

3 participants