Skip to content

Commit

Permalink
Minor docstring fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
timj authored and TallJimbo committed Mar 23, 2020
1 parent b64a406 commit 612da1e
Show file tree
Hide file tree
Showing 7 changed files with 65 additions and 53 deletions.
36 changes: 20 additions & 16 deletions doc/lsst.daf.butler/concreteStorageClasses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,23 @@ The loaded columns are the product of the values for all levels.
Levels not included in the dict are included in their entirety.

For example, the ``deepCoadd_obj`` dataset is typically defined as a hierarchical table with levels ``dataset``, ``filter``, and ``column``, which take values such as ``("meas", "HSC-R", "base_SdssShape_xx")``.
Retrieving this dataset via::

butler.get(
"deepCoadd_obj", ...,
parameters={
"columns": {"dataset": "meas",
"filter": ["HSC-R", "HSC-I"],
"column": ["base_SdssShape_xx", "base_SdssShape_yy"]}
}
)

is equivalent to (but potentially much more efficient than)::

full = butler.get("deepCoadd_obj", ...)
full.loc[:, ["meas", ["HSC-R", "HSC-I"],
["base_SdssShape_xx", "base_SdssShape_yy"]]]
Retrieving this dataset via:

.. code-block:: python
butler.get(
"deepCoadd_obj", ...,
parameters={
"columns": {"dataset": "meas",
"filter": ["HSC-R", "HSC-I"],
"column": ["base_SdssShape_xx", "base_SdssShape_yy"]}
}
)
is equivalent to (but potentially much more efficient than):

.. code-block:: python
full = butler.get("deepCoadd_obj", ...)
full.loc[:, ["meas", ["HSC-R", "HSC-I"],
["base_SdssShape_xx", "base_SdssShape_yy"]]]
2 changes: 1 addition & 1 deletion doc/lsst.daf.butler/configuring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Overriding Root Paths
---------------------

In addition to the configuration options described above, there are some values that have a special meaning.
For `~lsst.daf.butler.RegistryConfig` and `~lsst.daf.butler.DatastoreConfig` the ``root`` key, which can be used to specify paths, can include values using the special tag ``<butlerRoot>``.
For `~lsst.daf.butler.registry.RegistryConfig` and `~lsst.daf.butler.DatastoreConfig` the ``root`` key, which can be used to specify paths, can include values using the special tag ``<butlerRoot>``.
At run time, this tag will be replaced by a value derived from the location of the main butler configuration file, or else from the value of the ``root`` key found at the top of the butler configuration.

Currently, if you create a butler configuration file that loads another butler configuration file, via ``includeConfigs``, then any ``<butlerRoot>`` tags will be replaced with the location of the new file, not the original.
Expand Down
8 changes: 4 additions & 4 deletions doc/lsst.daf.butler/dimensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ In the `Registry` database, most dimensions are associated with a table that con
Examples of dimensions include instruments, detectors, visits, and tracts.

Instances of the `Dimension` class represent one of these concepts, not values of the type of one of those concepts (e.g. "detector", not a particular detector).
In fact, a dimension "value" can mean different things in different contexts: it could mean the value of the primary key or other unique identifier for particular entity (the integer ID or string name for a particular detector), or it could represent a complete record in the table for that dimension.
In fact, a dimension "value" can mean different things in different contexts: it could mean the value of the primary key or other unique identifier for a particular entity (the integer ID or string name for a particular detector), or it could represent a complete record in the table for that dimension.

The dimensions schema also has some tables that do not map directly to `Dimension` instances.
Some of these provide extra metadata fields for combinations of dimensions, and are represented by the `DimensionElement` class in Python (this is also the base class of the `Dimension` class, and provides much of its functionality).
Expand All @@ -36,7 +36,7 @@ It also categorizes those dimensions into `~DimensionGraph.required` and `~Dimen
`DimensionGraph` also guarantees a deterministic and topological sort order for its elements.

Because `Dimension` instances have a `~Dimension.name` attribute, we typically
use `NamedValueSet` and `NamedKeyDict` as containers when immutability is needed or the guarantees of `DimensionGraph`.
use `~lsst.daf.butler.core.utils.NamedValueSet` and `~lsst.daf.butler.core.utils.NamedKeyDict` as containers when immutability is needed or the guarantees of `DimensionGraph`.
This allows the string names of dimensions to be used as well in most places where `Dimension` instances are expected.

The complete set of all compatible dimensions is held by a special subclass of `DimensionGraph`, `DimensionUniverse`.
Expand All @@ -57,7 +57,7 @@ Most `Butler` and `Registry` APIs that accept data IDs as input accept both dict

The data IDs returned by the `Butler` or `Registry` (and most of those used internally) are usually instances of the `DataCoordinate` class or its subclass, `ExpandedDataCoordinate`.
`DataCoordinate` itself is complete but minimal.
It contains only the keys that correspond to its `DimensionGraph`'s `~DimensionGraph.required` subset - that is, the minimal set of keys needed to fully identify all other dimensions in the graph.
It contains only the keys that correspond to its `DimensionGraph`'s `~DimensionGraph.required` subset --- that is, the minimal set of keys needed to fully identify all other dimensions in the graph.
Informal dictionary data IDs can be transformed into `DataCoordinate` instances by calling `DataCoordinate.standardize` (which is what most `Butler` and `Registry` APIs that accept data IDs do under the hood).

`ExpandedDataCoordinate` is its maximal counterpart.
Expand All @@ -70,7 +70,7 @@ Spatial and Temporal Dimensions
-------------------------------

Dimensions can be *spatial* or *temporal* (or both, or neither), meaning that each record is associated with a region on the sky or a timespan (respectively).
The overlaps between regions and timespans define many-to-many relationships between dimensions that -- along with the one-to-many ID-based dependencies -- generally provide a way to fully relate any set of dimensions.
The overlaps between regions and timespans define many-to-many relationships between dimensions that --- along with the one-to-many ID-based dependencies --- generally provide a way to fully relate any set of dimensions.
This produces a natural, concise query system; dimension relationships can be used to construct the full ``JOIN`` clause of a SQL ``SELECT`` with no input from the user, allowing them to specify just the ``WHERE`` clause (see `Registry.queryDimensions` and `Registry.queryDatasets`).
It is also possible to associate a region or timespan with a combination of dimensions (such as the region for a visit and a detector), by defining a `DimensionElement` for that combination.

Expand Down
12 changes: 6 additions & 6 deletions doc/lsst.daf.butler/organizing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Most of the time, however, users identify a dataset using a combination of three
- a data ID;
- a collection.

Most collections are constrained to contain only on dataset with a particular dataset type and data ID, so this combination is usually enough to resolve a dataset (see :ref:`daf_butler_collections` for exceptions).
Most collections are constrained to contain only one dataset with a particular dataset type and data ID, so this combination is usually enough to resolve a dataset (see :ref:`daf_butler_collections` for exceptions).

A dataset's type and data ID are intrinsic to it - while there may be many datasets with a particular dataset type and/or data ID, the dataset type and data ID associated with a dataset are set and fixed when it is created.
A dataset's type and data ID are intrinsic to it --- while there may be many datasets with a particular dataset type and/or data ID, the dataset type and data ID associated with a dataset are set and fixed when it is created.
A `DatasetRef` always has both a dataset type attribute and a data ID, though the latter may be empty.
Dataset types are discussed below in :ref:`daf_butler_dataset_types`, while data IDs are one aspect of the larger :ref:`Dimensions <lsst.daf.butler-dimensions_overview>` system and are discussed in :ref:`lsst.daf.butler-dimensions_data_ids`.

Expand All @@ -31,14 +31,14 @@ Collections are discussed further below in :ref:`daf_butler_collections`.
Dataset types
-------------

The names "dataset" and "dataset type" (which `lsst.daf.butler` inherits from its `lsst.daf.persistence` predecessor) are intended to evoke the relationship between an instance and its class in object-oriented programming, but this is a metaphor, *not* a relationship that maps to any particular Python objects: we don't have any Python class that fully represents the *dataset* concept (`DatasetRef` is the closest), and the `DatasetType` class is a regular class, not a metaclass.
The names "dataset" and "dataset type" (which ``daf_butler`` inherits from its ``daf_persistence`` predecessor) are intended to evoke the relationship between an instance and its class in object-oriented programming, but this is a metaphor, *not* a relationship that maps to any particular Python objects: we don't have any Python class that fully represents the *dataset* concept (`DatasetRef` is the closest), and the `DatasetType` class is a regular class, not a metaclass.
So a *dataset type* is represented in Python as a `DatasetType` *instance*.

A dataset type defines both the dimensions used in a dataset's data ID (so all data IDs for a particular dataset type have the same keys, at least when put in standard form) and the storage class that corresponds to its in-memory Python type and maps to the file format (or generalization thereof) used by a `Datastore` to store it.
These are associated with an arbitrary string name.

Beyond that definition, what a dataset type *means* isn't really specififed by the butler itself, but we expect higher-level code that *uses* butler to make that clear, and one anticipates case is worth calling out here: a dataset type roughly corresponds to the role its datasets play in a processing pipeline.
In other words, a particular pipeline will typically accept particular dataset types as inputs and produce particular dataset types as outputs (and may produce and consumed other dataset types as intermediates).
Beyond that definition, what a dataset type *means* isn't really specified by the butler itself, but we expect higher-level code that *uses* butler to make that clear, and one anticipated case is worth calling out here: a dataset type roughly corresponds to the role its datasets play in a processing pipeline.
In other words, a particular pipeline will typically accept particular dataset types as inputs and produce particular dataset types as outputs (and may produce and consume other dataset types as intermediates).
And while the exact dataset types used may be configurable, changing a dataset type will generally involve substituting one dataset type for a very similar one (most of the time with the same dimensions and storage class).

.. _daf_butler_collections:
Expand Down Expand Up @@ -73,7 +73,7 @@ Tagged Collections
`CollectionType.TAGGED` collections are the most flexible type of collection; datasets can be `associated <Registry.associate>` with or `disassociated <Registry.disassociate>` from a ``TAGGED`` collection at any time, as long as the usual contraint on a collection having only one dataset with a particular dataset type and data ID is maintained.
Membership in a ``TAGGED`` collection is implemented in the `Registry` database as a single row in a many-to-many join table (a "tag") and is completely decoupled from the actual storage of the dataset.

Tags are thus both extremely lightweight relative to copies or re-ingests of files or other `Datastore` content, and *slightly** more expensive to store and possibly query than than the ``RUN`` or ``CHAINED`` collection representations (which have no per-dataset costs).
Tags are thus both extremely lightweight relative to copies or re-ingests of files or other `Datastore` content, and *slightly* more expensive to store and possibly query than the ``RUN`` or ``CHAINED`` collection representations (which have no per-dataset costs).
The latter is rarely important, but higher-level code should avoid automatically creating ``TAGGED`` collections that may not ever be used.

Chained Collection
Expand Down
40 changes: 24 additions & 16 deletions doc/lsst.daf.butler/queries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Arguments that specify one or more dataset types can generally take any of the f
- `str` values (corresponding to `DatasetType.name`);
- `re.Pattern` values (matched to `DatasetType.name` strings, via `~re.Pattern.fullmatch`);
- iterables of any of the above;
- the special value `...`, which matches all dataset types.
- the special value "``...``", which matches all dataset types.

Some of these are not allowed in certain contexts (as documented there).

Expand All @@ -36,11 +36,11 @@ Arguments that specify one or more collections are similar to those for dataset
- `re.Pattern` values (matched to the collection name, via `~re.Pattern.fullmatch`);
- a `tuple` of (`str`, *dataset-type-restriction*) - see below;
- iterables of any of the above;
- the special value `...`, which matches all collections;
- the special value "``...``", which matches all collections;
- a mapping from `str` to *dataset-type-restriction*.

A *dataset-type-restriction* is a :ref:`DatasetType expression <daf_butler_dataset_type_expressions>` that limits a search for datasets in the associated collection to just the specified dataset types.
Unlike most other DatasetType expressions, it may not contain regular expressions (but it may be `...`, which is the implied value when no
Unlike most other DatasetType expressions, it may not contain regular expressions (but it may be "``...``", which is the implied value when no
restriction is given, as it means "no restriction").
In contexts where restrictions are meaningless (e.g. `~Registry.queryCollections` when the ``datasetType`` argument is `None`) they are allowed but ignored.

Expand All @@ -52,9 +52,9 @@ Ordered collection searches

An *ordered* collection expression is required in contexts where we want to search collections only until a dataset with a particular dataset type and data ID is found.
These include all direct `Butler` operations, the definitions of `~CollectionType.CHAINED` collections, `Registry.findDataset`, and the ``deduplicate=True`` mode of `Registry.queryDatasets`.
In these contexts, regular expressions and `...` are not allowed for collection names, because they make it impossible to unambiguously define the order in which to search.
In these contexts, regular expressions and "``...``" are not allowed for collection names, because they make it impossible to unambiguously define the order in which to search.
Dataset type restrictions are allowed in these contexts, and those
may be (and usually are) `...`.
may be (and usually are) "``...``".

Ordered collection searches are processed by the `~registry.wildcards.CollectionSearch` class.

Expand Down Expand Up @@ -94,7 +94,7 @@ Language operator precedence rules are the same as for the other languages
like C++ or Python. When in doubt use grouping operators (parentheses) for
sub-expressions.

General note - the parser itself does not evaluate any expressions even if
General note --- the parser itself does not evaluate any expressions even if
they consist of literals only, all evaluation happens in the SQL engine when
registry runs the resulting SQL query.

Expand Down Expand Up @@ -162,15 +162,15 @@ expressions which should evaluate to a numeric value.
Binary arithmetic operators
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Language supports five arithmetic operators - ``+`` (add), ``-`` (subtract),
Language supports five arithmetic operators: ``+`` (add), ``-`` (subtract),
``*`` (multiply), ``/`` (divide), and ``%`` (modulo). Usual precedence rules
apply to these operators. Operands for them can be anything that evaluates to
a numeric value.

Comparison operators
^^^^^^^^^^^^^^^^^^^^

Language supports set of regular comparison operators - ``=``, ``!=``, ``<``,
Language supports set of regular comparison operators: ``=``, ``!=``, ``<``,
``<=``, ``>``, ``>=``. This can be used on operands that evaluate to a numeric
values, for (in)equality operators operands can also be boolean expressions.

Expand All @@ -182,7 +182,9 @@ IN operator
^^^^^^^^^^^

The ``IN`` operator (and ``NOT IN``) are an expanded version of a regular SQL
IN operator. Its general syntax looks like::
IN operator. Its general syntax looks like:

.. code-block:: sql
<expression> IN ( <literal1>[, <literal2>, ... ])
<expression> NOT IN ( <literal1>[, <literal2>, ... ])
Expand All @@ -194,15 +196,19 @@ literals as defined above. It can also be a mixture of integer literals and
range literals (language allows mixing of string literals and ranges but it
may not make sense when translated to SQL).

For an example of range usage, these two expressions are equivalent::
For an example of range usage, these two expressions are equivalent:

.. code-block:: sql
visit IN (100, 110, 130..145:5)
visit in (100, 110, 130, 135, 140, 145)
visit IN (100, 110, 130..145:5)
visit in (100, 110, 130, 135, 140, 145)
as are these::
as are these:

visit NOT IN (100, 110, 130..145:5)
visit Not In (100, 110, 130, 135, 140, 145)
.. code-block:: sql
visit NOT IN (100, 110, 130..145:5)
visit Not In (100, 110, 130, 135, 140, 145)
Boolean operators
^^^^^^^^^^^^^^^^^
Expand All @@ -223,7 +229,9 @@ sub-expressions in the full expression.
Examples
^^^^^^^^

Few examples of valid expressions using some of the constructs::
Few examples of valid expressions using some of the constructs:

.. code-block:: sql
visit > 100 AND visit < 200
Expand Down
6 changes: 3 additions & 3 deletions python/lsst/daf/butler/_butler.py
Original file line number Diff line number Diff line change
Expand Up @@ -928,7 +928,7 @@ def prune(self, refs: Iterable[DatasetRef], *,
Datasets to prune. These must be "resolved" references (not just
a `DatasetType` and data ID).
disassociate : bool`, optional
Disassociate pruned datasets from ``self.collection`` (or the
Disassociate pruned datasets from ``self.collections`` (or the
collection given as the ``collection`` argument). Dataset that are
not in this collection are ignored, unless ``purge`` is `True`.
unstore : `bool`, optional
Expand Down Expand Up @@ -960,7 +960,7 @@ def prune(self, refs: Iterable[DatasetRef], *,
composite datasets. This will only prune components that are
actually attached to the given `DatasetRef` objects, which may
not reflect what is in the database (especially if they were
obtained from `Registry.queryDatasets`, which by does not include
obtained from `Registry.queryDatasets`, which does not include
components in its results).
Raises
Expand Down Expand Up @@ -1053,7 +1053,7 @@ def prune(self, refs: Iterable[DatasetRef], *,
# If we're disassociating but not purging, we can do that
# before we try to delete, and it will roll back if deletion
# fails. That will at least do the right thing if deletion
# fails because the files couldn't actually be delete (e.g.
# fails because the files couldn't actually be deleted (e.g.
# due to lack of permissions).
for tag in tags:
# recursive=False here because refs is already recursive
Expand Down

0 comments on commit 612da1e

Please sign in to comment.