Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration handling for Gammapy #401

Closed
cdeil opened this issue Dec 13, 2015 · 10 comments
Closed

Configuration handling for Gammapy #401

cdeil opened this issue Dec 13, 2015 · 10 comments

Comments

@cdeil
Copy link
Contributor

cdeil commented Dec 13, 2015

In Gammapy we've started to use YAML configuration files and @joleroi has added a few utility functions in gammapy.utils.scripts, namely read_yaml, write_yaml and recursive_update_dict.

I'd like to propose we go for a more full-featured solution with milestone 0.5 that provides the option to specify and validate schemas and to a large degree use an existing package instead of rolling our own.

It's not clear yet what other features we need / want, e.g. do we want to have the option to override config options via command line parameters? Do we want to support other file formats than YAML for configs, e.g. JSON or FITS or INI? Do we want dot access, i.e. config.spam.ham instead of config['spam']['ham']?

This is something where it might be possible to work on one solution with the ctapipe devs or to learn from what they're doing in ctapipe.configuration, so @kosack, if you have any thoughts / suggestions here, please comment.

I've looked around a bit at existing Python config packages that support YAML:

For schema validation, those might be options:

I want to work on this, but probably won't have time for a while.
So @joleroi or anyone else, if you'd like to start using one of the existing packages in Gammapy so that we get experience, please go ahead.
If I were to do it, I'd probably try python-anyconfig first, because it seems to make it supports everything we need (I think):

@kosack
Copy link

kosack commented Dec 15, 2015

ctapipe.configuration is basically just a subclass of argparse that allows the ablity to read/write to FITS headers, so at this point it is nothing fancy. We have a requirement that we must be able to keep the "provenance data", one part of which is list of all commands and options that are used to create an output file, so this basic functionality helps there. That data can grow very large if a file as at the end of a processing chain, but it's very useful to be able to read it back from headers, and later do things like diffing it to see the differences in options. The idea is that given an output file, one can track back what was done to produce it.

What we use as a configuraiton format is more flexible - yaml is fine for a user-facing script, but we will probably (optionally) use a database table as the input.

@kosack
Copy link

kosack commented Dec 15, 2015

Also, I agree that validation is one of the most important features. We use argparse for that reason, since it has a way to give default values and check types. However, I've also used configobj with the validate package to do similar things. It allows a schema file to be written that is used to validate the inputs, but it's not perfect. The nice thing is that you can add custom validators, e.g. to check that an email address is correct, etc.

Ideally I wish there was a system that works a bit like astropy.table where the configuration is just a fancy dict with validation and there are multiple backends for reading/writing. That would probably be useful outside of gammapy/ctapipe/etc.

@cdeil
Copy link
Contributor Author

cdeil commented Dec 15, 2015

@kosack - Thanks for your comments!

I've also used configobj and validate in the past, e.g. in enrico. It's nice, but not as flexible / popular as YAML or even JSON.

Ideally I wish there was a system that works a bit like astropy.table where the configuration is just a fancy dict with validation and there are multiple backends for reading/writing. That would probably be useful outside of gammapy/ctapipe/etc.

This is exactly what schema or jsonschema do, they work with specs as Python dicts (and callback functions for more complex validation), no? And python-anyconfig has the multiple backend thing worked out, no?

Of course it's possible to implement this with any format / based on any of these packages, but if we manage to pick the same solution here, I think that could be nice, mostly in terms of getting a critical mass of devs that know how it works in detail and can help write schemas / validation tools (it's hard to find people with the skills / interest to work on this).

@cdeil cdeil modified the milestones: 0.6, 0.5 Jul 7, 2016
@cdeil
Copy link
Contributor Author

cdeil commented Nov 9, 2016

Just a note: @bkhelifi and I have a strong preference to add a config-file driven analysis interface at least for the most common types of simple analyses.

Something similar to what @joleroi had implemented for spectral analysis a while ago:
http://docs.gammapy.org/en/v0.4/tutorials/gammapy-spectrum/index.html#tutorials-gammapy-spectrum
(this has been removed in the meantime, because the spectral analysis classes kept changing)

It looks like there is something available in ctapipe:
https://cta-observatory.github.io/ctapipe/tools/index.html
@kosack - Is that usable / will you keep using this for the near future? Or are there plans to re-implement something different in the coming months?

@kosack
Copy link

kosack commented Nov 9, 2016

we're just testing it now, so it's not totally stable.

Currently we are directly using the ipython config system, which is based on traitlets.config. The documentation is not very good, so we may re-implement it ourselves in a simpler way. However, the idea is not too bad. One simply defines an Application (which we call Tool in ctapipe, but it's just a subclass), and what parameters are needed as traitlets. If there are modules used in the application that also have parameters, they must derive from Configurable and also specify options that get automatically collected. The write of the Application can decide which parameters are "bubbled up" to high-level command-line options, but all parameters can be set in a config file (JSON or python for now) or as command-line parameters.

@woodmd
Copy link
Member

woodmd commented Nov 9, 2016

I've written some classes for configuration management in fermipy. It's nothing as sophisticated as traitlets.config but I think the design and intent is similar. To make a class configurable you define its parameters in a static dictionary called defaults and inherit from Configurable which does the parsing and validation. The configuration state itself is just passed around as a python dictionary which can be easily serialized to YAML or JSON. To configure a class you just pass the config dictionary to the constructor.

Since this functionality isn't really fermi specific I'd be interested in folding what's currently in fermipy into some common configuration modules that could live in gammapy (or elsewhere). Note that I don't have any particular attachment to the implementation in fermipy so I'd have no problem switching to something closer to traitlets.config as long as it provides the same functionality (configuration state as python dict, etc.).

@kosack
Copy link

kosack commented Nov 10, 2016

traitlets.config does give you a dictionary of the configuration state (as as Config object, which is just a subclass of dict), so it should work.

@cdeil cdeil modified the milestones: 0.6, 0.7 Jan 27, 2017
@cdeil cdeil modified the milestones: 0.7, 0.8 Sep 14, 2017
@cdeil cdeil modified the milestones: 0.8, 0.9 May 3, 2018
@cdeil cdeil removed their assignment Oct 29, 2018
@cdeil cdeil modified the milestones: 0.9, 0.10 Oct 29, 2018
@cdeil cdeil modified the milestones: 0.10, 0.12 Jan 17, 2019
@adonath adonath removed this from the 0.12 milestone May 23, 2019
@adonath adonath added this to the 0.14 milestone May 23, 2019
@cdeil
Copy link
Contributor Author

cdeil commented Sep 3, 2019

@Bultako started implementing config-driven high-level interface for Gammapy (see #2219 and #2323).

@Bultako - Please look over the discussion above, with links and info to other config systems and Fermipy.

I'd suggest we keep this issue open for a few more weeks, and use it to discuss our Config implementation.

@cdeil cdeil modified the milestones: 0.14, 0.15 Sep 27, 2019
@Bultako Bultako added this to To do in gammapy.analysis via automation Nov 15, 2019
@Bultako Bultako moved this from To do to In progress in gammapy.analysis Nov 19, 2019
@cdeil
Copy link
Contributor Author

cdeil commented Nov 21, 2019

@Bultako and I explored solutions to improve Gammapy here: https://github.com/cdeil/gammapy-config/

Discussed with everyone at the Gammapy coding sprint in Granada just now. Decisions:

  • Change the config file mostly as proposed in example-config.yaml. Notes:
    • We chose to have simple keys instead of custom string formats, for skycoord, mapaxis, regions
    • Offer geom.wcs.frame independent from geom.wcs.skydir
    • Offer energy: {min: 1 TeV, max: 10 TeV} instead of energy: {min: 1, max: 10, unit: TeV}
    • Add section datasets.geom.selection and offer max offset and safe energy selection there
    • Put background after geometry
    • decision: do not put obs selection by source
  • @adonath wants larger changes to configure the data reduction to the config file. We will first do the changes proposed today, and then re-discuss that change (if things go well already next week).
  • Change from JSON Schema to pydantic for the AnalysisConfig class
  • Clearer responsibility: AnalysisConfig does validation and config -> object mapping (create SkyCoord, Time, Region, Geom objects), Analysis relies on validated data. We try to make Analysis also an object store, so that advanced users can to analysis.dataset.geom = Geom(...) from Python. This is our escape hatch, that allows us to not have to provide options for everything in the config file.

@cdeil
Copy link
Contributor Author

cdeil commented Nov 29, 2019

@Bultako implemented a new Gammapy config handling in #2621 using pydantic.
For sure there's many things to improve (see e.g. #2621 (comment)) but overall I think this technical solution is now OK (maybe even awesome) for v1.0.

So I'm closing this old issue now. Further improvements / suggestions as new PRs and issues with specific suggestions, please.

@cdeil cdeil closed this as completed Nov 29, 2019
gammapy.analysis automation moved this from In progress to Done Nov 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants