DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40

TallJimbo · 2018-05-14T20:37:13Z

No description provided.

pschella · 2018-05-15T19:16:29Z

python/lsst/daf/butler/butler.py

+                collection = run    # get collection from run found in config
+        if collection is None:
+            raise ValueError("No run or collection provided.")
+        if run is not None and collection != run:


Maybe runCollection is a better name for run when it actually refers to a collection name associated with a Run?

I'll give it a try, but I'll want to see how it works before committing; I'd like to leave the argument (as opposed to the local variable) as just run because it actually can be a Run.

pschella · 2018-05-15T19:19:07Z

python/lsst/daf/butler/formatters/fitsExposureFormatter.py

@@ -50,7 +50,7 @@ def _readFile(self, path, pytype):
        if not os.path.exists(path):
            return None

-        return pytype.readFits(path)
+        return pytype(path)


Why is this better? Just curious.

I personally think it's worse, but it matches what the afw APIs are and hence actually works.

pschella · 2018-05-15T19:20:44Z

python/lsst/daf/butler/formatters/pexConfigFormatter.py

+            instance.load(path)
+            return instance
+        except AssertionError as err:
+            actualPyTypeStr = str(err).split()[-1]


I suppose the possibility of an AssertionError with an unexpectedly differently formatted string being raised is rather low, but still potentially a very confusing error message.

Yeah, I imagine I can at least do a startswith test on the error message.

pschella · 2018-05-15T19:24:07Z

python/lsst/daf/butler/gen2convert/writer.py

@@ -326,3 +326,12 @@ def insertDatasets(self, registry, datastore):
                    log.debug("Adding Dataset %s as %s in %s", dataset.filePath, gen3id, repo.run)
                    ref = registry.addDataset(datasetType, gen3id, run)
                    datastore.ingest(path=os.path.relpath(dataset.fullPath, start=datastore.root), ref=ref)
+                    refs.append(ref)
+            # Add Datasets to collections associated with any child repos to similate Gen2 parent lookups.


pschella · 2018-05-15T19:25:02Z

python/lsst/daf/butler/gen2convert/writer.py

+                    refs.append(ref)
+            # Add Datasets to collections associated with any child repos to similate Gen2 parent lookups.
+            # TODO: only associated parent Datasets with DataUnits associated with DataUnits used by child
+            #       repo Datasets.


First "associated" should just be "associate". Will fix.

The problem is this: someone runs a SuperTask that processes one visit, using an input collection that includes all LSST raw data ever taken. Should their output collection also contain all LSST raw data ever taken (the Gen2 behavior, strictly speaking, and what the converter currently does), or just the raw data of the single visit that was processed (what I think we should do in Gen3, and what this TODO is about).

Runs and Collections can now be given as arguments, and you can provide just a Collection to make a read-only Butler.

It's frustrating seeing these errors and not knowing what DatasetTypes they're coming from.

This class is useful for constructing standalone Registry or Datastore objects without a Butler.

timj · 2018-05-18T14:35:28Z

python/lsst/daf/butler/butler.py

+        construct the repository should also be used to construct any Butlers
+        to it to avoid configuration inconsistencies.
+        """
+        if isinstance(config, ButlerConfig):


Maybe also include ConfigSubset in this test to be completely sure.

timj · 2018-05-18T14:38:55Z

python/lsst/daf/butler/butler.py

@@ -49,17 +55,95 @@ class Butler:
    ----------
    config : `Config`
        Configuration.
+    collection : `str` or `None`


I think the convention is to say , optional at the end for keyword args.

If you initialize a SchemaConfig with a Config that is missing a "schema" key then it will assume "datastore" is for it. Now only send the overrides if the component key exists.

pschella · 2018-05-18T15:13:22Z

python/lsst/daf/butler/butler.py

+        Note that when ``expand=False`` (the default), the configuration
+        search path (see `ConfigSubset.defaultSearchPaths`) that was used to
+        construct the repository should also be used to construct any Butlers
+        to it to avoid configuration inconsistencies.


Why does this not return a Butler (in which case it should also be a classmethod)?

I just don't think it's that useful for it to return a Butler; usually I imagine this being run by a standalone command-line tool that doesn't do anything else. Returning the config object doesn't cost anything, though, so I'll do that.

pschella · 2018-05-18T15:15:17Z

python/lsst/daf/butler/butler.py

+    @staticmethod
+    def makeRepo(root, config=None, expand=False):
+        """Create an empty data repository by adding a butler.yaml config
+        to a repository root directory.


Does it also make the root if it doesn't exist? If not then initRepo(sitory) might be a better name? Like in git init.

It does not currently create the directory, but I think it probably should. Will fix.

pschella · 2018-05-18T15:16:07Z

python/lsst/daf/butler/butler.py

+            Filesystem path to the root of the new repository.
+        config : `Config` or `None`
+            Configuration to write to the repository, after setting any
+            root-dependent Registry or Datastore config options.


Config? Or ButlerConfig?

If it is None, what does it do? Write nothing, or write just the default config?

No, ButlerConfig is not permitted (and ConfigSubset shouldn't be either, as per Tim's comment below). I'm documenting this in a new Raises section of the docs.

If config is None, It writes the defaults; now documented.

pschella · 2018-05-18T15:17:28Z

python/lsst/daf/butler/butler.py

+        config : `Config` or `None`
+            Configuration to write to the repository, after setting any
+            root-dependent Registry or Datastore config options.
+        expand : `bool`


expandDefaults might be better?

I still like expand better; I think brevity is more valuable for keyword argument names that regular local variables in terms of overall readability. @timj, want to break the tie? :-)

I think I'd prefer withDefaults to expandDefaults. You are asking for a fully qualified standalone configuration to be written, rather than the bare minimum based on what you gave in config and the root-specific entries. You aren't really expanding as such, you are filling it in to make it explicit. complete or standalone ?

Ooh, I like standalone.

pschella · 2018-05-18T15:19:39Z

python/lsst/daf/butler/butler.py

+        Parameters
+        ----------
+        root : `str`
+            Filesystem path to the root of the new repository.


Should it exist? What happens if it doesn't, or if it does but isn't empty (or even already is an existing repo)?

pschella · 2018-05-18T15:21:20Z

python/lsst/daf/butler/butler.py

+        registryClass = doImport(full["registry.cls"])
+        registryClass.setConfigRoot(root, config, full)
+        if expand:
+            config.merge(full)


Isn't this more costly (and potentially error prone) than conditionally dumping config or full?

full doesn't have entries like datastore.root and registry.db at this stage.

timj · 2018-05-18T15:59:31Z

python/lsst/daf/butler/butler.py

    """

-    def __init__(self, config=None):
+    @staticmethod
+    def makeRepo(root, config=None, expand=False):


I am wondering whether we should add searchPaths here to match those I added to the ConfigSubset constructor (and which I didn't add to ButlerConfig but should). At the very least it helps testing.

I'm actually more worried about flexibility in search paths making it easy to use the wrong search path, and I'm content to leave testing of the search path functionality to lower-level tests for now.

timj · 2018-05-18T16:01:15Z

python/lsst/daf/butler/registries/sqliteRegistry.py

+            should be copied from `full` to `Config`.
+        """
+        config["registry.db"] = "sqlite:///{}/gen3.sqlite3".format(root)
+        for key in ("registry.cls",):


I do wonder whether we could save some code duplication by having these keys as class attributes and then a routine that reads the keys and copies them (and maybe attempts to run format(root) on them.

Perhaps a bit, but I'm content to wait until we see what other concrete Datastore/Registry implementations actually need to do here before worrying about them duplicating code.

pschella reviewed May 15, 2018

View reviewed changes

TallJimbo added 8 commits May 17, 2018 14:23

Add collection overrides for gen2 transmission curves.

3da1626

More flexibility in Butler construction.

66b12b7

Runs and Collections can now be given as arguments, and you can provide just a Collection to make a read-only Butler.

Typo in DecoratedImage StorageClass definition.

2eef113

Reading Exposures from FITS uses the ctor, not readFits.

d9174e9

Add getUri and datasetExists to Butler.

98a790e

Workaround for Config reading from base class.

8ef7363

Associate converted Gen2 datasets into child Collections.

8eb8d68

More descriptive error message when validating Data IDs.

5d1e523

It's frustrating seeing these errors and not knowing what DatasetTypes they're coming from.

TallJimbo force-pushed the tickets/DM-14378 branch from 1001d1c to 5d1e523 Compare May 17, 2018 18:27

TallJimbo added 3 commits May 17, 2018 15:49

Add more formatters to default configuration.

63ec626

Add static method to create configs for new repos.

f321c2b

Export ButlerConfig to package-level imports.

2f5e4d5

This class is useful for constructing standalone Registry or Datastore objects without a Butler.

timj approved these changes May 18, 2018

View reviewed changes

timj and others added 2 commits May 18, 2018 10:55

Stop ButlerConfig initializing subconfigs when nothing in parent

ed9775e

If you initialize a SchemaConfig with a Config that is missing a "schema" key then it will assume "datastore" is for it. Now only send the overrides if the component key exists.

Unit test for makeRepo.

5455e02

TallJimbo force-pushed the tickets/DM-14378 branch from 20a3f2a to 5455e02 Compare May 18, 2018 14:55

pschella reviewed May 18, 2018

View reviewed changes

timj reviewed May 18, 2018

View reviewed changes

Improvements to makeRepo.

f939ea8

TallJimbo force-pushed the tickets/DM-14378 branch from 81b5910 to f939ea8 Compare May 18, 2018 16:26

TallJimbo merged commit 39b8556 into master May 18, 2018

ktlim deleted the tickets/DM-14378 branch August 25, 2018 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40

DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40

TallJimbo commented May 14, 2018

pschella May 15, 2018

TallJimbo May 15, 2018 •

edited

pschella May 15, 2018

TallJimbo May 15, 2018 •

edited

pschella May 15, 2018

TallJimbo May 15, 2018

pschella May 15, 2018

pschella May 15, 2018

TallJimbo May 15, 2018

timj May 18, 2018

timj May 18, 2018

pschella May 18, 2018

TallJimbo May 18, 2018

pschella May 18, 2018

TallJimbo May 18, 2018

pschella May 18, 2018

pschella May 18, 2018

TallJimbo May 18, 2018

pschella May 18, 2018

TallJimbo May 18, 2018

timj May 18, 2018

TallJimbo May 18, 2018

pschella May 18, 2018

pschella May 18, 2018

TallJimbo May 18, 2018

timj May 18, 2018

TallJimbo May 18, 2018

timj May 18, 2018

TallJimbo May 18, 2018

DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40

DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40

Conversation

TallJimbo commented May 14, 2018

Choose a reason for hiding this comment

TallJimbo May 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo May 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TallJimbo May 15, 2018 •

edited

TallJimbo May 15, 2018 •

edited