Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14191: Merge butler configuration files with defaults #39

Merged
merged 26 commits into from
May 17, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e929ce8
Butler config can now merge with defaults
timj May 7, 2018
611cf94
Fix typo in docstring
timj May 7, 2018
f365908
Provide config defaults
timj May 7, 2018
120a0c9
Rewrite doImport to use importlib rather than __import__
timj May 8, 2018
fee316f
Move ButlerConfig to its own file
timj May 9, 2018
39dc7ef
Registry configuration needs schema and registry config entries
timj May 9, 2018
5b170a0
Remove StorageClassFactory.fromConfig
timj May 9, 2018
d9b94b4
Add config subclass that can extract a sub component
timj May 9, 2018
c88286f
Use ConfigSubset
timj May 9, 2018
48ac511
Simplify registry constructor
timj May 9, 2018
aa1472a
Rename registry defaults file
timj May 14, 2018
21dad01
Use self-describing config classes
timj May 14, 2018
2612595
Update tests to use ButlerConfig and butler config directory
timj May 14, 2018
2fc0e3d
Prefer to throw ImportError rather than AttributeError
timj May 16, 2018
3bcf273
Rewrite config merging to use subset configs
timj May 16, 2018
fdc02e9
Use defaulting to read schema
timj May 16, 2018
76b6020
Move some test configuration information to defaults
timj May 16, 2018
86d9613
Allow default butler config
timj May 16, 2018
76e603e
Add parens around pdb example
timj May 16, 2018
6f65bbc
Add lots of tests for configuration merging
timj May 17, 2018
f022d5b
Move merging discussion to Config class
timj May 17, 2018
4e4620d
Disable C++ check
timj May 17, 2018
e0fc9cf
If YAML file is empty create empty dict
timj May 17, 2018
0507933
Overhaul merging code to pass tests
timj May 17, 2018
29cb101
Rename registry_schema to schema config file
timj May 17, 2018
bd82bd6
Fix dangling sentence
timj May 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
110 changes: 66 additions & 44 deletions python/lsst/daf/butler/core/butlerConfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@
import os

from .config import Config
from .datastore import DatastoreConfig
from .schema import SchemaConfig
from .registry import RegistryConfig
from .storageClass import StorageClassConfig
from .utils import doImport

__all__ = ("ButlerConfig",)
Expand All @@ -36,9 +40,17 @@ class ButlerConfig(Config):

The configuration is read and merged with default configurations for
the particular classes. The defaults are read from
``$DAF_BUTLER_DIR/config`` using names specified by the appropriate classes
for registry and datastore, and including the standard schema and storage
class definitions.
``$DAF_BUTLER_DIR/config`` and ``$DAF_BULTER_CONFIG_PATH``. The defaults
are constructed by reading first the global defaults, and then adding
in overrides from each entry in the colon-separated
``$DAF_BUTLER_CONFIG_PATH`` in reverse order such that the entries ahead
in the list take priority. The registry and datastore configurations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this opposite behavior to shell $PATH? If so why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not meant to be the opposite order. In $PATH things early in the path take precedence over things later in the path, so I thought I had to reverse the order when reading in the defaults so that earlier ones in the path override defaults appearing late in the path.

are read using the names specified by the appropriate classes defined
in the supplied butler configuration.

The externally supplied butler configuration must include definitions
for ``registry.cls`` and ``datastore.cls`` to enable the correct defaults
to be located.

Parameters
----------
Expand All @@ -51,54 +63,64 @@ def __init__(self, other):
# Create an empty config for us to populate
super().__init__()

# Find the butler configs
self.defaultsDir = None
if "DAF_BUTLER_DIR" in os.environ:
self.defaultsDir = os.path.join(os.environ["DAF_BUTLER_DIR"], "config")

# Storage classes
storageClasses = Config(os.path.join(self.defaultsDir, "storageClasses.yaml"))
self.update(storageClasses)

# Default schema
schema = Config(os.path.join(self.defaultsDir, "registry", "default_schema.yaml"))
self.update(schema)

# Read the supplied config so that we can work out which other
# defaults to use.
butlerConfig = Config(other)

# Check that fundamental keys are present
self._validate(butlerConfig)
# All the configs that can be associated with default files
configComponents = (SchemaConfig, StorageClassConfig, DatastoreConfig, RegistryConfig)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make this a class member?


# This is a list of all the config files to search
defaultsFiles = [(c, c.defaultConfigFile) for c in configComponents]

# Look for class specific defaults
for section in ("datastore", "registry"):
k = f"{section}.cls"
print("Checking {}: {}".format(k, butlerConfig[k]))
# Components that derive their configurations from implementation
# classes rather than configuration classes
specializedConfigs = (DatastoreConfig, RegistryConfig)

# Look for class specific default files to be added to the file list
# These class definitions must come from the suppliedbutler config and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supplied butler.

# are not defaulted.
for componentConfig in specializedConfigs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply loop over all components and have those subclasses determine if they are special?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. by having a component member.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying I should look through all the top level keys in the supplied butler config looking for .cls members, then ask the imported classes for their config files and associated config classes? (the latter they don't know but they could).

k = "{}.cls".format(componentConfig.component)
if k not in butlerConfig:
raise ValueError("Required configuration {} not present in {}".format(k, other))
cls = doImport(butlerConfig[k])
defaultsPath = cls.defaults
if defaultsPath is not None:
if not os.path.isabs(defaultsPath):
defaultsPath = os.path.join(self.defaultsDir, defaultsPath)
c = Config(defaultsPath)
# Merge into baseline
self.update(c)
defaultsFile = cls.defaultConfigFile
if defaultsFile is not None:
defaultsFiles.append((componentConfig, defaultsFile))

# We can pick up defaults from multiple search paths
# We fill defaults by using the butler config path and then
# the config path environment variable in reverse order.
self.defaultsPaths = []

# Find the butler configs
if "DAF_BUTLER_DIR" in os.environ:
self.defaultsPaths.append(os.path.join(os.environ["DAF_BUTLER_DIR"], "config"))

if "DAF_BUTLER_CONFIG_PATH" in os.environ:
externalPaths = list(reversed(os.environ["DAF_BUTLER_CONFIG_PATH"].split(os.pathsep)))
self.defaultsPaths.append(externalPaths)

# Search each directory for each of the default files
for pathDir in self.defaultsPaths:
for configClass, configFile in defaultsFiles:
# Assume external paths have same config files as global config
# directory. Absolute paths are possible for external
# code.
# Should this be a log message? Are we using lsst.log?
# print("Checking path {} for {} ({})".format(pathDir, configClass, configFile))
if not os.path.isabs(configFile):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this apply absolute paths over and over? Maybe that's not a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It does. Hadn't thought of that. I don't think it's a problem as such, but it will slow things down.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply invert the loop order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. since the configs read from the paths are subsets of the config I can loop over configs and then paths.

configFile = os.path.join(pathDir, configFile)
if os.path.exists(configFile):
# this checks a specific part of the tree
# We may need to turn off validation since we do not
# require that each defaults file found is fully
# consistent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we'll need to defer validation until the full config has been loaded and merged.

config = configClass(configFile)
# Attach it using the global namespace
self.update({configClass.component: config})

# Now that we have all the defaults we can merge the externally
# provided config into the defaults.
self.update(butlerConfig)

def _validate(self, config=None):
"""Check a butler config contains mandatory keys.

Parameters
----------
config : `Config`, optional
By default checks itself, but if ``config`` is given, this
config will be checked instead.
"""
if config is None:
config = self
for k in ['datastore.cls', 'registry.cls']:
if k not in config:
raise ValueError(f"Missing ButlerConfig parameter: {k}")
4 changes: 4 additions & 0 deletions python/lsst/daf/butler/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,10 @@ class ConfigSubset(Config):
"""Keys that are required to be specified in the configuration.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these relative to the top level or the component? (should be documented)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative to the Config root of this config (which does not include the component name).


defaultConfigFile = None
"""Name of the file containing defaults for this config class.
"""

def __init__(self, other=None):
super().__init__(other)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if other is a ConfigSubset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am I missing?

>>> from lsst.daf.butler import ConfigSubset
>>> x = ConfigSubset({"A": 5, "b": 6})
>>> x
{'A': 5, 'b': 6}
>>> y = ConfigSubset(x)
>>> y
{'A': 5, 'b': 6}
>>> x["c"] = 3
>>> x
{'A': 5, 'b': 6, 'c': 3}
>>> y
{'A': 5, 'b': 6}

if self.component is not None and self.component in self.data:
Expand Down
3 changes: 2 additions & 1 deletion python/lsst/daf/butler/core/datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
class DatastoreConfig(ConfigSubset):
component = "datastore"
requiredKeys = ("cls",)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this check now duplicated in Butler(Config)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ButlerConfig doesn't inherit from ConfigSubset so no. I could move the validation code in ConfigSubset.__init__() to Config.__init__() and always have it as an option if you set requiredKeys.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

defaultConfigFile = "datastore.yaml"


class Datastore(metaclass=ABCMeta):
Expand All @@ -55,7 +56,7 @@ class Datastore(metaclass=ABCMeta):
Load configuration
"""

defaults = None
defaultConfigFile = None
"""Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or
absolute path. Can be None if no defaults specified.
"""
Expand Down
54 changes: 37 additions & 17 deletions python/lsst/daf/butler/core/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,55 +22,75 @@
from abc import ABCMeta

from .utils import doImport
from .butlerConfig import ButlerConfig
from .config import Config
from .config import Config, ConfigSubset
from .schema import SchemaConfig

__all__ = ("RegistryConfig", "Registry")


class RegistryConfig(ButlerConfig):
pass
class RegistryConfig(ConfigSubset):
component = "registry"
requiredKeys = ("cls",)
defaultConfigFile = "registry.yaml"


class Registry(metaclass=ABCMeta):
"""Registry interface.

Parameters
----------
config : `RegistryConfig`
registryConfig : `RegistryConfig`
Registry configuration.
schemaConfig : `SchemaConfig`, optional
Schema configuration.
"""

defaults = None
defaultConfigFile = None
"""Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or
absolute path. Can be None if no defaults specified.
"""

@staticmethod
def fromConfig(config):
def fromConfig(registryConfig, schemaConfig=None):
"""Create `Registry` subclass instance from `config`.

Uses ``registry.cls`` from `config` to determine which subclass to instantiate.

Parameters
----------
config : `RegistryConfig`, `Config` or `str`
registryConfig : `ButlerConfig`, `RegistryConfig`, `Config` or `str`
Registry configuration
schemaConfig : `SchemaConfig`, `Config` or `str`, optional.
Schema configuration. Can be read from supplied registryConfig
if the relevant component is defined and ``schemaConfig`` is
`None`.

Returns
-------
registry : `Registry` (subclass)
A new `Registry` subclass instance.
"""
if not isinstance(config, RegistryConfig):
if isinstance(config, str) or isinstance(config, Config):
config = RegistryConfig(config)
if schemaConfig is None:
# Try to instantiate a schema configuration from the supplied
# registry configuration.
schemaConfig = SchemaConfig(registryConfig)
elif not isinstance(schemaConfig, SchemaConfig):
if isinstance(schemaConfig, str) or isinstance(schemaConfig, Config):
schemaConfig = SchemaConfig(schemaConfig)
else:
raise ValueError("Incompatible Registry configuration: {}".format(config))
cls = doImport(config['registry.cls'])
return cls(config=config)
raise ValueError("Incompatible Schema configuration: {}".format(schemaConfig))

def __init__(self, config):
assert isinstance(config, RegistryConfig)
self.config = config
if not isinstance(registryConfig, RegistryConfig):
if isinstance(registryConfig, str) or isinstance(registryConfig, Config):
registryConfig = RegistryConfig(registryConfig)
else:
raise ValueError("Incompatible Registry configuration: {}".format(registryConfig))

cls = doImport(registryConfig['cls'])
return cls(registryConfig, schemaConfig)

def __init__(self, registryConfig, schemaConfig=None):
assert isinstance(registryConfig, RegistryConfig)
self.config = registryConfig

# TODO Add back all interfaces (copied from SqlRegistry) once that is stabalized
1 change: 1 addition & 0 deletions python/lsst/daf/butler/core/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
class SchemaConfig(ConfigSubset):
component = "schema"
requiredKeys = ("version", "dataUnits")
defaultConfigFile = "registry_schema.yaml"


class Schema:
Expand Down
1 change: 1 addition & 0 deletions python/lsst/daf/butler/core/storageClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

class StorageClassConfig(ConfigSubset):
component = "storageClasses"
defaultConfigFile = "storageClasses.yaml"


class StorageClass:
Expand Down
2 changes: 1 addition & 1 deletion python/lsst/daf/butler/datastores/posixDatastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class PosixDatastore(Datastore):
configuration.
"""

defaults = "datastores/posixDatastore.yaml"
defaultConfigFile = "datastores/posixDatastore.yaml"
"""Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or
absolute path. Can be None if no defaults specified.
"""
Expand Down
14 changes: 8 additions & 6 deletions python/lsst/daf/butler/registries/sqlRegistry.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,20 +50,22 @@ class SqlRegistry(Registry):
----------
config : `SqlRegistryConfig` or `str`
Load configuration
schemaConfig : `SchemaConfig` or `str`
Definition of the schema to use.
"""

defaults = None
defaultConfigFile = None
"""Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or
absolute path. Can be None if no defaults specified.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not actually have a default for SqlRegistry configuration?


def __init__(self, config):
super().__init__(config)
def __init__(self, registryConfig, schemaConfig):
super().__init__(registryConfig)

self.config = SqlRegistryConfig(config)
self.config = SqlRegistryConfig(registryConfig)
self.storageClasses = StorageClassFactory()
self._schema = Schema(self.config['schema'])
self._engine = create_engine(self.config['registry.db'])
self._schema = Schema(schemaConfig)
self._engine = create_engine(self.config['db'])
self._schema.metadata.create_all(self._engine)
self._datasetTypes = {}

Expand Down
4 changes: 2 additions & 2 deletions python/lsst/daf/butler/registries/sqliteRegistry.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,5 +47,5 @@ class SqliteRegistry(SqlRegistry):
Load configuration
"""

def __init__(self, config):
super().__init__(config)
def __init__(self, registryConfig, schemaConfig):
super().__init__(registryConfig, schemaConfig)