Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-23063: Allow checksum calculation to be disabled #222

Merged
merged 2 commits into from
Jan 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions doc/lsst.daf.butler/configuring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ There are additional search paths that can be included when a config object is c
To construct a Butler configuration object (`~lsst.daf.butler.ButlerConfig`) from a file the following happens:

* The supplied config is read in.
* If any leaf nodes in the configuration end in ``configIncludes`` the values (either a scalar or list) will be treated as the names of other config files.
* If any leaf nodes in the configuration end in ``includeConfigs`` the values (either a scalar or list) will be treated as the names of other config files.
These files will be located either as an absolute path or relative to the current working directory, or the directory in which the original configuration file was found.
The contents of these files will then be inserted into the configuration at the same hierarchy as the ``configIncludes`` directive, with priority given to the values defined explicitly in the parent configuration (for lists of include files later files overwrite content from earlier ones).
The contents of these files will then be inserted into the configuration at the same hierarchy as the ``includeConfigs`` directive, with priority given to the values defined explicitly in the parent configuration (for lists of include files later files overwrite content from earlier ones).
* Each sub configuration class is constructed by supplying the relevant subset of the global config to the component Config constructor.
* A search path is constructed by concatenating the supplied search path, the environment variable path (``$DAF_BUTLER_CONFIG_PATH``), and the daf_butler config directory (``$DAF_BUTLER_DIR/config``).
* Defaults are first read from the config class default file name (e.g., ``registry.yaml`` for `~lsst.daf.butler.Registry`, and ``datastore.yaml`` for `~lsst.daf.butler.Datastore`) and merged in priority order given in the search path.
Expand All @@ -34,7 +34,7 @@ The name of the specialist configuration file to search for can be found by look

We also have a YAML parser extension ``!include`` that can be used to pull in other YAML files before the butler specific config parsing happens.
This is very useful to allow reuse of YAML snippets but be aware that the path specified is relative to the file that contains the directive.
In many cases ``configIncludes`` is a more robust approach to file inclusion as it handles overrides in a more predictable manner.
In many cases ``includeConfigs`` is a more robust approach to file inclusion as it handles overrides in a more predictable manner.

There is a command available to allow you to see how all these overrides and includes behave.

Expand All @@ -51,5 +51,5 @@ In addition to the configuration options described above, there are some values
For `~lsst.daf.butler.RegistryConfig` and `~lsst.daf.butler.DatastoreConfig` the ``root`` key, which can be used to specify paths, can include values using the special tag ``<butlerRoot>``.
At run time, this tag will be replaced by a value derived from the location of the main butler configuration file, or else from the value of the ``root`` key found at the top of the butler configuration.

Currently, if you create a butler configuration file that loads another butler configuration file, via ``configIncludes``, then any ``<butlerRoot>`` tags will be replaced with the location of the new file, not the original.
Currently, if you create a butler configuration file that loads another butler configuration file, via ``includeConfigs``, then any ``<butlerRoot>`` tags will be replaced with the location of the new file, not the original.
It is therefore recommended that an explicit ``root`` be defined at the top level when defining butler overrides via a new top level butler configuration.
3 changes: 3 additions & 0 deletions python/lsst/daf/butler/datastores/fileLikeDatastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,9 @@ def __init__(self, config, registry, butlerRoot=None):
self._tableName = self.config["records", "table"]
registry.registerOpaqueTable(self._tableName, self.makeTableSpec())

# Determine whether checksums should be used
self.useChecksum = self.config.get("checksum", True)

def __str__(self):
return self.root

Expand Down
5 changes: 4 additions & 1 deletion python/lsst/daf/butler/datastores/posixDatastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,10 @@ def _extractIngestInfo(self, path: str, ref: DatasetRef, *, formatter: Type[Form
raise NotImplementedError("Transfer type '{}' not supported.".format(transfer))
path = newPath
fullPath = newFullPath
checksum = self.computeChecksum(fullPath)
if self.useChecksum:
checksum = self.computeChecksum(fullPath)
else:
checksum = None
stat = os.stat(fullPath)
size = stat.st_size
return StoredFileInfo(formatter=formatter, path=path, storageClass=ref.datasetType.storageClass,
Expand Down
3 changes: 3 additions & 0 deletions tests/config/basic/posixDatastoreNoChecksums.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
includeConfigs: posixDatastore.yaml
datastore:
checksum: false
30 changes: 30 additions & 0 deletions tests/test_datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,36 @@ def setUp(self):
super().setUp()


class PosixDatastoreNoChecksumsTestCase(PosixDatastoreTestCase):
"""Posix datastore tests but with checksums disabled."""
configFile = os.path.join(TESTDIR, "config/basic/posixDatastoreNoChecksums.yaml")

def testChecksum(self):
"""Ensure that checksums have not been calculated."""

datastore = self.makeDatastore()
storageClass = self.storageClassFactory.getStorageClass("StructuredData")
dimensions = self.universe.extract(("visit", "physical_filter"))
metrics = makeExampleMetrics()

dataId = {"instrument": "dummy", "visit": 0, "physical_filter": "V"}
ref = self.makeDatasetRef("metric", dimensions, storageClass, dataId,
conform=False)

# Configuration should have disabled checksum calculation
datastore.put(metrics, ref)
info = datastore.getStoredItemInfo(ref)
self.assertIsNone(info.checksum)

# Remove put back but with checksums enabled explicitly
datastore.remove(ref)
datastore.useChecksum = True
datastore.put(metrics, ref)

info = datastore.getStoredItemInfo(ref)
self.assertIsNotNone(info.checksum)


class CleanupPosixDatastoreTestCase(DatastoreTestsBase, unittest.TestCase):
configFile = os.path.join(TESTDIR, "config/basic/butler.yaml")

Expand Down