Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-9530: Initial metric and specification layout design #1

Merged
merged 5 commits into from Feb 28, 2017

Conversation

jonathansick
Copy link
Member

@jonathansick jonathansick commented Feb 21, 2017

Design features:

  • Packages define metrics in /metrics/, each in a single yaml file named after the package.
  • Specifications for metrics are in /specs/<package_name>/. Specifications can be spread across multiple files.

This PR includes validate_drp metrics, LPM-17 specifications for PA1, PF1 and PA2, and provenance-specification specifications for the CFHT gri-band validation dataset as examples.

Sample metric:

PA1:
  description: >
    The maximum rms of the unresolved source magnitude distribution around the
    mean value.
  unit: mmag
  reference:
    doc: LPM-17
    url: http://ls.st/lpm-17
    page: 21

Specification schema

A complete specification looks like:

---
metric: 'PA1'
name: 'design_gri'
threshold:
  value: 5.0
  unit: ''
  operator: '<='
provenance_query:
  filter: ['g', 'r', 'I']
...

The fully qualified name of this specification is validate_drp.PA1.design_gri (assuming the specification is defined in /specs/validate_drp/).

The provenance_query field is a hash/dictionary of query terms for provenance that this specification can be applied to. The query language is currently undefined, so the example is a pseudocode query where the filter must be one of g, r or i.

Specification partials

Specifications might repeat information. For example, a provenance_query for a certain test dataset. We DRY this with partials. A partial has an id field, and can't be a specification on it's own.
For example:

---
# specification partial
id: 'base'
metric: 'PA1'
threshold:
  unit: ''
  operator: '<='
provenance_query:
  filter: ['g', 'r', 'I']

---
# design specification instance that mixes in the base partial
# validate_drp.PA1.design
name: 'design'
base: '#base'
threshold:
  value: 5.0

---
# stretch specification instance that mixes in the base partial
# validate_drp.PA1.stretch
name: 'stretch'
base: '#base'
threshold:
  value: 3.0
...

A partial can be reference from the base field by prefixing the id with #. Partials can also be referenced from across files (but within the same package's specs directory) by providing a filename:

base: "cfht_gri#base"

Specification inheritance

Specifications can also inherit from specifications; generally to add partials. Specifications are referenced through their fully qualified name (or the FQN without the package name). Here's an example from this PR:

---
# Specification partial
id: 'PA1-base'
metric: 'PA1'
threshold:
  unit: 'mmag'
  operator: "<="

---
# validate_drp.PA1.minimum_gri
name: "minimum_gri"
base: "#PA1-base"
threshold:
  value: 8.0

---
id: 'cfht-base'
provenance_query:
  # Prototype for query provenance. This needs to be figured out.
  dataset_repo_url: 'https://github.com/lsst/validation_data_cfht.git'
  filters: ['g', 'r', 'i']
  visits: [849375, 850587]
  ccd: [12, 13, 14, 21, 22, 23]

---
# validate_drp.PA1.cfht_minimum_gri
name: 'cfht_minimum_gri'
base: ['PA1.minimum_gri', '#cfht-base']
...

The fully-hydrogated validate_drp.PA1.minimum_gri specification is:

name: 'minimum_gri'
metric: 'PA1'
threshold:
  value: 8.0
  unit: 'mmag'
  operator: "<="

The fully-hydrated validate_drp.PA1.cfht_minimum_gri specification is:

name: 'cfht_minimum_gri'
metric: 'PA1'
provenance_query:
  # Prototype for query provenance. This needs to be figured out.
  dataset_repo_url: 'https://github.com/lsst/validation_data_cfht.git'
  filters: ['g', 'r', 'i']
  visits: [849375, 850587]
  ccd: [12, 13, 14, 21, 22, 23]
threshold:
  value: 8.0
  unit: 'mmag'
  operator: "<="

@jonathansick jonathansick force-pushed the tickets/DM-9530 branch 2 times, most recently from 1835b04 to 0441ce2 Compare February 22, 2017 19:29
Copy link
Contributor

@parejkoj parejkoj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments.

@@ -2,4 +2,28 @@
lsst.validate.metrics
#####################

**Centralized metric and specification definitions for the ``lsst.validate`` framework.**
**Centralized metric and specification definitions for the lsst.validate framework.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a paragraph in here about what this package really is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some new README content. Let me know if those are to your liking. I don't want to document too much to early since we're still prototyping ideas in sqr-017.lsst.io.

doc/README.md Outdated
@@ -0,0 +1,8 @@
Sphinx documentation support is still be developed. For now, install dependencies listed in `requirements.txt`.

To make the docs for `validate_base` alone:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_metrics here...

@@ -0,0 +1,7 @@
Place static files for `validate_base` docs, such as images and sample data, in this directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just do a package grep for validate_base, since there's more here.

doxygen_xml_dirname=None))

intersphinx_mapping['astropy'] = ('http://docs.astropy.org/en/stable', None)
intersphinx_mapping['validate_base'] = ('https://validate-base.lsst.io', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another validate_base

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's legit; it lets us reference the validate_base user guide from a (future) validate_metrics user guide.


PA1:
description: >
The maximum rms of the unresolved source magnitude distribution around the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do yaml in the "docs" way of only making newlines on new sentences, instead of enforcing a hard column limit. These are closer to docs than code, I think.

page: 23

AD3:
reference:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enforce (at least in our examples here?) an ordering of the items as "description", "unit", "reference", just for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could validate, in code, ordering if we use ruamel.yaml but that's not a stack dependency (yet). Maybe call this order a best practice for now and use it in all our examples.

operator: '<='

---
# validate_drp.PA2_minimum_gri.srd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comments here just helpers for a human reader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what do you think? I thought it'd be a nice practice to give the fully qualified name of a specification, especially when it's a highly inherited specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's useful.

specs/validate_drp
##################

Specifications for ``validate_drp`` metrics are defined in this directory as YAML documents.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have another sentence or two about what validate_drp metrics are for.

---
id: 'base'
provenance_query:
# Prototype for query provenance. This needs to be figured out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO: at the start of this comment.

@@ -0,0 +1,56 @@
# Specifications for validate_drp metrics against validation_data_cfht
# gri-band datasets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say something about what the fact that these have provenance means.

- Minimal Python package (lsst.validate.metrics) to permit version
  discovery via sconsUtils version.py. Otherwise, lsst.validate.metrics
  does not contain Python code. It's meant as a standalone data package
  that contains YAML definition files that are nominally consumed by
  validate_base.

- Uses prototype Sphinx documentation framework (same as in
  validate_base).

- Adopting RFC-45-inspired license management (COPYRIGHT file with
  references to that file from source files).

- Includes GitHub issue/PR/contributing files in .github
These metric definitions are ported from github.com/lsst/validate_drp's
metrics/, and distilled into the form being prototyped for
validate_metrics.

- Each eups package now has a YAML file, named after the package, in
  metrics/.
- metrics/validate_drp.yaml, for example, specifically only defines
  metrics; it doesn't include specifications. This will allow metrics
  to be fairly static, while any LSST staff can separately add
  additional specifications.
- We're adopting astropy.units definitions for percentages (%) and
  unitless unscaled numbers (---). No more empty string units.
- We're adopting the docs convention of one-sentence-per-line for metric
  descriptions.
This shows how specifications for PA1, from the SRD, can be written.

In LPM-17-PA1.yaml, specifications for PA1 are written in their most
basic and general form. A `#PA1-base` *partial specification* contains
common data for all PA1 specifications, namely the units, comparison
operator, and metric name.

(Our suggested best practice is to have one YAML file per metric for
these types of base specification definitions.)

Then in LPM-17-PA1.yaml, specifications inherit from `#PA1-base`, adding
a unique name and the specification value. The name **suggests**
applicable bandpasses, but this provenance query is not established
here. This design allows validate_drp to test measurements against the
SRD specifications regardless of the provenance of the dataset.

Then in cfht_gri.yaml we add specifications that specifically apply to
measurements made against the validation_data_cfht data repository in
specific bandpasses. These *additional* specifications permits selective
monitoring of specification tests.
These metrics are interesting because, in the SRD, the metric measurement
is written such that a specification of the other metric is a
configuration. In the early validate_base framework these are called
"dependent metrics."

Here we've decided to unroll these dependencies by creating a metric for
each metric measurement function configuration. This means there are
dimensions of stretch/design/minimum, and also gri/uzy.

Then the specifications are simply named 'srd' since they are the SRD
specification of that metric.

In cfht_gri.yaml, I've added specifications that inherit from the srd
specifications and mix in the provenance query for CFHT gri validation
data.
We don't have real documentation yet (a sphinx user guide), but the main
README and READMEs in the metrics/ and specs/ directories should help
orient developers for now.
@jonathansick
Copy link
Member Author

I think I've addressed your comments @parejkoj, at least as an MVP.

@jonathansick jonathansick merged commit 254e627 into master Feb 28, 2017
@ktlim ktlim deleted the tickets/DM-9530 branch August 25, 2018 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants