New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-11390: Plug prototype pipeline script into verify_ap framework #6
Conversation
-Ensure each of the main functions returns just one metadata PropertySet -Rename "ref_cats" to "refcats" throughout -Return a dict rather than a bunch of lists from parsePipelineArgs -Consistently refer to "dataset_root" instead of ambiguous "dataset"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern is the number of ignored exceptions; please think about whether pushing forward really is appropriate in all cases. Otherwise a bunch of style comments.
python/lsst/ap/pipe/ap_pipe.py
Outdated
|
||
# IN PROGRESS: figure out which of these ALL-CAPS VARIABLES are already known by | ||
# ap_verify and which need to be explicitly provided in a function here in ap_pipe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what "IN PROGRESS" means; I assume it's not a scheme to loophole your way past https://developer.lsst.io/coding/python_style_guide.html#to-do-comments-should-include-a-jira-issue-key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please replace "ALL-CAPS VARIABLES" with "constants".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, this comment should have been removed once this situation was solved.
python/lsst/ap/pipe/ap_pipe.py
Outdated
Parameters | ||
---------- | ||
dataset_root: `str` | ||
The top-level directory containing all pieces of an ap_verify-style dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste error.
python/lsst/ap/pipe/ap_pipe.py
Outdated
types = ('*.fits', '*.fz') | ||
datafiles = [] | ||
allcalibdatafiles = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use readable_variable_names
instead of unreadablevariablenames
, here and elsewhere.
python/lsst/ap/pipe/ap_pipe.py
Outdated
|
||
def get_defectfiles(defect_location, DEFECT_TARBALL=DEFECT_TARBALL): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function parameters should be lowercase.
python/lsst/ap/pipe/ap_pipe.py
Outdated
# Retrieve defect filenames from tarball | ||
defectloc = os.path.join(args.dataset, DEFECT_DIR) | ||
defect_tarfile_path = glob(os.path.join(defectloc, DEFECT_TARBALL))[0] | ||
defect_tarfile_path = glob(os.path.join(defect_location, DEFECT_TARBALL))[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of glob(...)[0]
? Do you expect wildcards in the raw path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an idiosyncrasy of tarballs - using glob lists all the files in the tarball, and the zeroth entry is the name of the tarball itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant why can't you just have os.path.join(defect_location, DEFECT_TARBALL)
, without the call to glob
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey what do you know, that works! 🥇
python/lsst/ap/pipe/ap_pipe.py
Outdated
config = ProcessCcdConfig() | ||
config.load(OBS_DECAM_DIR + '/config/processCcd.py') | ||
config.load(OBS_DECAM_DIR + '/config/processCcdCpIsr.py') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer os.path.join
.
python/lsst/ap/pipe/ap_pipe.py
Outdated
@@ -527,27 +604,73 @@ def doDiffIm(processed_repo, sciencevisit, ccdnum, templatevisit, diffim_repo): | |||
config.doDecorrelation = True | |||
|
|||
TODO: use coadds as templates by default, not another visit (DM-11422). | |||
Some of the comments in this function are placeholders for DM-11422 work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should probably not be in the documentation, since it's a statement about the source code rather than the behavior.
python/lsst/ap/pipe/ap_pipe.py
Outdated
and catalogs of detected sources (diaSrc, diffexp, and metadata files) | ||
''' | ||
if os.path.exists(os.path.join(diffim_repo, 'deepDiff', 'v' + sciencevisit)): | ||
print('DiffIm has already been run for visit {0}, skipping...'.format(sciencevisit)) | ||
lsst.log.configure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant initialization
if 'visit' not in dataId_dict.keys(): | ||
raise RuntimeError('The dataId string is missing \'visit\'') | ||
else: # save the visit number from the dataId | ||
visit = dataId_dict['visit'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate code with doProcessCcd
, consider factoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken, but the duplication is minimal and dataId
handling is less than ideal all around. Leaving it as-is for now.
config.detection.thresholdValue = 5.0 | ||
config.doDecorrelation = True | ||
args = [processed_repo, '--id', 'visit=' + sciencevisit, 'ccdnum=' + ccdnum, | ||
'--templateId', 'visit=' + templatevisit, '--output', diffim_repo] | ||
dataId = dataId.split(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you splitting on three characters above but only on spaces here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, dataId
handling has to be different here because difference imaging uses parseAndRun
while processing uses run
. In the future, both Tasks may use run
.
- The ref_cats directory in the ingested image repo must be 'ref_cats', not 'refcats' as in the ap_verify_hits2015 dataset and /datasets - Changes to the obs_decam processCcd config include a hard-wired pan-starrs refcat for photometry and astrometry. Therefore: - Use run instead of parseAndRun, which requires a Butler but allows various configs to be set or overwritten sequentially - Pass dataId string as a dict when appropriate
d90d0d5
to
3881aef
Compare
No description provided.