New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14378: fixes/improvements for issues discovered in ci_hsc conversion testing #40
Conversation
python/lsst/daf/butler/butler.py
Outdated
collection = run # get collection from run found in config | ||
if collection is None: | ||
raise ValueError("No run or collection provided.") | ||
if run is not None and collection != run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe runCollection
is a better name for run
when it actually refers to a collection name associated with a Run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try, but I'll want to see how it works before committing; I'd like to leave the argument (as opposed to the local variable) as just run
because it actually can be a Run
.
@@ -50,7 +50,7 @@ def _readFile(self, path, pytype): | |||
if not os.path.exists(path): | |||
return None | |||
|
|||
return pytype.readFits(path) | |||
return pytype(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this better? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think it's worse, but it matches what the afw APIs are and hence actually works.
instance.load(path) | ||
return instance | ||
except AssertionError as err: | ||
actualPyTypeStr = str(err).split()[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the possibility of an AssertionError
with an unexpectedly differently formatted string being raised is rather low, but still potentially a very confusing error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I imagine I can at least do a startswith
test on the error message.
@@ -326,3 +326,12 @@ def insertDatasets(self, registry, datastore): | |||
log.debug("Adding Dataset %s as %s in %s", dataset.filePath, gen3id, repo.run) | |||
ref = registry.addDataset(datasetType, gen3id, run) | |||
datastore.ingest(path=os.path.relpath(dataset.fullPath, start=datastore.root), ref=ref) | |||
refs.append(ref) | |||
# Add Datasets to collections associated with any child repos to similate Gen2 parent lookups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simulate
refs.append(ref) | ||
# Add Datasets to collections associated with any child repos to similate Gen2 parent lookups. | ||
# TODO: only associated parent Datasets with DataUnits associated with DataUnits used by child | ||
# repo Datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First "associated" should just be "associate". Will fix.
The problem is this: someone runs a SuperTask that processes one visit, using an input collection that includes all LSST raw data ever taken. Should their output collection also contain all LSST raw data ever taken (the Gen2 behavior, strictly speaking, and what the converter currently does), or just the raw data of the single visit that was processed (what I think we should do in Gen3, and what this TODO is about).
Runs and Collections can now be given as arguments, and you can provide just a Collection to make a read-only Butler.
It's frustrating seeing these errors and not knowing what DatasetTypes they're coming from.
1001d1c
to
5d1e523
Compare
This class is useful for constructing standalone Registry or Datastore objects without a Butler.
python/lsst/daf/butler/butler.py
Outdated
construct the repository should also be used to construct any Butlers | ||
to it to avoid configuration inconsistencies. | ||
""" | ||
if isinstance(config, ButlerConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also include ConfigSubset
in this test to be completely sure.
python/lsst/daf/butler/butler.py
Outdated
@@ -49,17 +55,95 @@ class Butler: | |||
---------- | |||
config : `Config` | |||
Configuration. | |||
collection : `str` or `None` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention is to say , optional
at the end for keyword args.
If you initialize a SchemaConfig with a Config that is missing a "schema" key then it will assume "datastore" is for it. Now only send the overrides if the component key exists.
20a3f2a
to
5455e02
Compare
python/lsst/daf/butler/butler.py
Outdated
Note that when ``expand=False`` (the default), the configuration | ||
search path (see `ConfigSubset.defaultSearchPaths`) that was used to | ||
construct the repository should also be used to construct any Butlers | ||
to it to avoid configuration inconsistencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this not return a Butler
(in which case it should also be a classmethod
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't think it's that useful for it to return a Butler
; usually I imagine this being run by a standalone command-line tool that doesn't do anything else. Returning the config
object doesn't cost anything, though, so I'll do that.
python/lsst/daf/butler/butler.py
Outdated
@staticmethod | ||
def makeRepo(root, config=None, expand=False): | ||
"""Create an empty data repository by adding a butler.yaml config | ||
to a repository root directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also make the root
if it doesn't exist? If not then initRepo(sitory)
might be a better name? Like in git init
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not currently create the directory, but I think it probably should. Will fix.
python/lsst/daf/butler/butler.py
Outdated
Filesystem path to the root of the new repository. | ||
config : `Config` or `None` | ||
Configuration to write to the repository, after setting any | ||
root-dependent Registry or Datastore config options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config
? Or ButlerConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is None
, what does it do? Write nothing, or write just the default config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, ButlerConfig
is not permitted (and ConfigSubset
shouldn't be either, as per Tim's comment below). I'm documenting this in a new Raises
section of the docs.
If config is None
, It writes the defaults; now documented.
python/lsst/daf/butler/butler.py
Outdated
config : `Config` or `None` | ||
Configuration to write to the repository, after setting any | ||
root-dependent Registry or Datastore config options. | ||
expand : `bool` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expandDefaults
might be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like expand
better; I think brevity is more valuable for keyword argument names that regular local variables in terms of overall readability. @timj, want to break the tie? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer withDefaults
to expandDefaults
. You are asking for a fully qualified standalone configuration to be written, rather than the bare minimum based on what you gave in config
and the root-specific entries. You aren't really expanding as such, you are filling it in to make it explicit. complete
or standalone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, I like standalone
.
python/lsst/daf/butler/butler.py
Outdated
Parameters | ||
---------- | ||
root : `str` | ||
Filesystem path to the root of the new repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it exist? What happens if it doesn't, or if it does but isn't empty (or even already is an existing repo)?
python/lsst/daf/butler/butler.py
Outdated
registryClass = doImport(full["registry.cls"]) | ||
registryClass.setConfigRoot(root, config, full) | ||
if expand: | ||
config.merge(full) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this more costly (and potentially error prone) than conditionally dumping config
or full
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full
doesn't have entries like datastore.root
and registry.db
at this stage.
python/lsst/daf/butler/butler.py
Outdated
""" | ||
|
||
def __init__(self, config=None): | ||
@staticmethod | ||
def makeRepo(root, config=None, expand=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether we should add searchPaths
here to match those I added to the ConfigSubset
constructor (and which I didn't add to ButlerConfig
but should). At the very least it helps testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually more worried about flexibility in search paths making it easy to use the wrong search path, and I'm content to leave testing of the search path functionality to lower-level tests for now.
should be copied from `full` to `Config`. | ||
""" | ||
config["registry.db"] = "sqlite:///{}/gen3.sqlite3".format(root) | ||
for key in ("registry.cls",): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder whether we could save some code duplication by having these keys as class attributes and then a routine that reads the keys and copies them (and maybe attempts to run format(root)
on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a bit, but I'm content to wait until we see what other concrete Datastore/Registry implementations actually need to do here before worrying about them duplicating code.
81b5910
to
f939ea8
Compare
No description provided.