Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40

Merged
merged 14 commits into from May 18, 2018

Conversation

TallJimbo
Copy link
Member

No description provided.

collection = run # get collection from run found in config
if collection is None:
raise ValueError("No run or collection provided.")
if run is not None and collection != run:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe runCollection is a better name for run when it actually refers to a collection name associated with a Run?

Copy link
Member Author

@TallJimbo TallJimbo May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try, but I'll want to see how it works before committing; I'd like to leave the argument (as opposed to the local variable) as just run because it actually can be a Run.

@@ -50,7 +50,7 @@ def _readFile(self, path, pytype):
if not os.path.exists(path):
return None

return pytype.readFits(path)
return pytype(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this better? Just curious.

Copy link
Member Author

@TallJimbo TallJimbo May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it's worse, but it matches what the afw APIs are and hence actually works.

instance.load(path)
return instance
except AssertionError as err:
actualPyTypeStr = str(err).split()[-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the possibility of an AssertionError with an unexpectedly differently formatted string being raised is rather low, but still potentially a very confusing error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I imagine I can at least do a startswith test on the error message.

@@ -326,3 +326,12 @@ def insertDatasets(self, registry, datastore):
log.debug("Adding Dataset %s as %s in %s", dataset.filePath, gen3id, repo.run)
ref = registry.addDataset(datasetType, gen3id, run)
datastore.ingest(path=os.path.relpath(dataset.fullPath, start=datastore.root), ref=ref)
refs.append(ref)
# Add Datasets to collections associated with any child repos to similate Gen2 parent lookups.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simulate

refs.append(ref)
# Add Datasets to collections associated with any child repos to similate Gen2 parent lookups.
# TODO: only associated parent Datasets with DataUnits associated with DataUnits used by child
# repo Datasets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First "associated" should just be "associate". Will fix.

The problem is this: someone runs a SuperTask that processes one visit, using an input collection that includes all LSST raw data ever taken. Should their output collection also contain all LSST raw data ever taken (the Gen2 behavior, strictly speaking, and what the converter currently does), or just the raw data of the single visit that was processed (what I think we should do in Gen3, and what this TODO is about).

construct the repository should also be used to construct any Butlers
to it to avoid configuration inconsistencies.
"""
if isinstance(config, ButlerConfig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also include ConfigSubset in this test to be completely sure.

@@ -49,17 +55,95 @@ class Butler:
----------
config : `Config`
Configuration.
collection : `str` or `None`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention is to say , optional at the end for keyword args.

timj and others added 2 commits May 18, 2018 10:55
If you initialize a SchemaConfig with a Config that is missing
a "schema" key then it will assume "datastore" is for it.
Now only send the overrides if the component key exists.
Note that when ``expand=False`` (the default), the configuration
search path (see `ConfigSubset.defaultSearchPaths`) that was used to
construct the repository should also be used to construct any Butlers
to it to avoid configuration inconsistencies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not return a Butler (in which case it should also be a classmethod)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't think it's that useful for it to return a Butler; usually I imagine this being run by a standalone command-line tool that doesn't do anything else. Returning the config object doesn't cost anything, though, so I'll do that.

@staticmethod
def makeRepo(root, config=None, expand=False):
"""Create an empty data repository by adding a butler.yaml config
to a repository root directory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also make the root if it doesn't exist? If not then initRepo(sitory) might be a better name? Like in git init.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not currently create the directory, but I think it probably should. Will fix.

Filesystem path to the root of the new repository.
config : `Config` or `None`
Configuration to write to the repository, after setting any
root-dependent Registry or Datastore config options.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config? Or ButlerConfig?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is None, what does it do? Write nothing, or write just the default config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, ButlerConfig is not permitted (and ConfigSubset shouldn't be either, as per Tim's comment below). I'm documenting this in a new Raises section of the docs.

If config is None, It writes the defaults; now documented.

config : `Config` or `None`
Configuration to write to the repository, after setting any
root-dependent Registry or Datastore config options.
expand : `bool`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expandDefaults might be better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like expand better; I think brevity is more valuable for keyword argument names that regular local variables in terms of overall readability. @timj, want to break the tie? :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer withDefaults to expandDefaults. You are asking for a fully qualified standalone configuration to be written, rather than the bare minimum based on what you gave in config and the root-specific entries. You aren't really expanding as such, you are filling it in to make it explicit. complete or standalone ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I like standalone.

Parameters
----------
root : `str`
Filesystem path to the root of the new repository.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it exist? What happens if it doesn't, or if it does but isn't empty (or even already is an existing repo)?

registryClass = doImport(full["registry.cls"])
registryClass.setConfigRoot(root, config, full)
if expand:
config.merge(full)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this more costly (and potentially error prone) than conditionally dumping config or full?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full doesn't have entries like datastore.root and registry.db at this stage.

"""

def __init__(self, config=None):
@staticmethod
def makeRepo(root, config=None, expand=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should add searchPaths here to match those I added to the ConfigSubset constructor (and which I didn't add to ButlerConfig but should). At the very least it helps testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually more worried about flexibility in search paths making it easy to use the wrong search path, and I'm content to leave testing of the search path functionality to lower-level tests for now.

should be copied from `full` to `Config`.
"""
config["registry.db"] = "sqlite:///{}/gen3.sqlite3".format(root)
for key in ("registry.cls",):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder whether we could save some code duplication by having these keys as class attributes and then a routine that reads the keys and copies them (and maybe attempts to run format(root) on them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a bit, but I'm content to wait until we see what other concrete Datastore/Registry implementations actually need to do here before worrying about them duplicating code.

@TallJimbo TallJimbo merged commit 39b8556 into master May 18, 2018
@ktlim ktlim deleted the tickets/DM-14378 branch August 25, 2018 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants