New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-14765: Allow validateDrp to run w/o instrument, dataset_repo_url #82
Conversation
Same functionality remains to pass 'instrument' and 'dataset_repo_url' as used in the SQuaSH framework. But if these aren't specified, the Job is still created, the metrics still run and reasonable default values are stored.
Was accidentally passing a non-existing "butler" variable instead of the "repo" filehname string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have a couple of comments.
""" | ||
mapper_class = Butler.getMapperClass(repo) | ||
instrument = mapper_class.split('.')[2] | ||
return instrument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, we have (not uniformly unfortunately) been using all upper case for instrument names. Maybe you should return instrument.upper()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep with the simpler beahvior.
I'm not trying to replicate the SQuaSH behavior.
For use with SQuaSH, the caller can specify the instrument name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand that 'instrument' was required now because the 'verify_metrics' are now only specified for defined instruments. And thus 'instrument' must be something that matches to something specified in a metrics YAML file. That's why it matters that it's upper-case.
python/lsst/validate/drp/validate.py
Outdated
if instrument is None: | ||
instrument = extract_instrument_from_repo(butler) | ||
if dataset_repo_url is None: | ||
dataset_repo_url = repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the full path to the repo is very helpful. It seems more useful to just truncate on the last part of the path, or to try to discover the actual repository name by asking git
or by inspecting .git/config
in the repo. I think this would be easy with gitpython
, but not sure if we want that dependence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shoudn't assume that the data are being read from a repository that is a git repository.
- People should be able to run this on data that aren't tracked in git.
- In fact, packages installed by lsstsw don't actually have the .git included.
If the caller wants it to be based on the git information they can do the lookup and pass to the script.
In the absence of having specified something, the full path is more informative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this.
python/lsst/validate/drp/validate.py
Outdated
if instrument is None: | ||
instrument = extract_instrument_from_repo(butler) | ||
if dataset_repo_url is None: | ||
dataset_repo_url = repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough.
Guess reasonable 'instrument', 'dataset_repo_url' if not passed.
Same functionality remains to pass 'instrument' and 'dataset_repo_url'
as used in the SQuaSH framework.
But if these aren't specified, the Job is still created, the metrics still run
and reasonable default values are stored.