Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-11390: Plug prototype pipeline script into verify_ap framework #6

Merged
merged 11 commits into from Sep 22, 2017

Conversation

mrawls
Copy link
Collaborator

@mrawls mrawls commented Sep 21, 2017

No description provided.

-Ensure each of the main functions returns just one metadata PropertySet
-Rename "ref_cats" to "refcats" throughout
-Return a dict rather than a bunch of lists from parsePipelineArgs
-Consistently refer to "dataset_root" instead of ambiguous "dataset"
Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern is the number of ignored exceptions; please think about whether pushing forward really is appropriate in all cases. Otherwise a bunch of style comments.


# IN PROGRESS: figure out which of these ALL-CAPS VARIABLES are already known by
# ap_verify and which need to be explicitly provided in a function here in ap_pipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "IN PROGRESS" means; I assume it's not a scheme to loophole your way past https://developer.lsst.io/coding/python_style_guide.html#to-do-comments-should-include-a-jira-issue-key?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please replace "ALL-CAPS VARIABLES" with "constants".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, this comment should have been removed once this situation was solved.

Parameters
----------
dataset_root: `str`
The top-level directory containing all pieces of an ap_verify-style dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error.

types = ('*.fits', '*.fz')
datafiles = []
allcalibdatafiles = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use readable_variable_names instead of unreadablevariablenames, here and elsewhere.


def get_defectfiles(defect_location, DEFECT_TARBALL=DEFECT_TARBALL):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function parameters should be lowercase.

# Retrieve defect filenames from tarball
defectloc = os.path.join(args.dataset, DEFECT_DIR)
defect_tarfile_path = glob(os.path.join(defectloc, DEFECT_TARBALL))[0]
defect_tarfile_path = glob(os.path.join(defect_location, DEFECT_TARBALL))[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of glob(...)[0]? Do you expect wildcards in the raw path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an idiosyncrasy of tarballs - using glob lists all the files in the tarball, and the zeroth entry is the name of the tarball itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why can't you just have os.path.join(defect_location, DEFECT_TARBALL), without the call to glob?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey what do you know, that works! 🥇

config = ProcessCcdConfig()
config.load(OBS_DECAM_DIR + '/config/processCcd.py')
config.load(OBS_DECAM_DIR + '/config/processCcdCpIsr.py')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer os.path.join.

@@ -527,27 +604,73 @@ def doDiffIm(processed_repo, sciencevisit, ccdnum, templatevisit, diffim_repo):
config.doDecorrelation = True

TODO: use coadds as templates by default, not another visit (DM-11422).
Some of the comments in this function are placeholders for DM-11422 work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should probably not be in the documentation, since it's a statement about the source code rather than the behavior.

and catalogs of detected sources (diaSrc, diffexp, and metadata files)
'''
if os.path.exists(os.path.join(diffim_repo, 'deepDiff', 'v' + sciencevisit)):
print('DiffIm has already been run for visit {0}, skipping...'.format(sciencevisit))
lsst.log.configure()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant initialization

if 'visit' not in dataId_dict.keys():
raise RuntimeError('The dataId string is missing \'visit\'')
else: # save the visit number from the dataId
visit = dataId_dict['visit']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code with doProcessCcd, consider factoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken, but the duplication is minimal and dataId handling is less than ideal all around. Leaving it as-is for now.

config.detection.thresholdValue = 5.0
config.doDecorrelation = True
args = [processed_repo, '--id', 'visit=' + sciencevisit, 'ccdnum=' + ccdnum,
'--templateId', 'visit=' + templatevisit, '--output', diffim_repo]
dataId = dataId.split(' ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you splitting on three characters above but only on spaces here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, dataId handling has to be different here because difference imaging uses parseAndRun while processing uses run. In the future, both Tasks may use run.

- The ref_cats directory in the ingested image repo must be 'ref_cats',
  not 'refcats' as in the ap_verify_hits2015 dataset and /datasets
- Changes to the obs_decam processCcd config include a hard-wired
  pan-starrs refcat for photometry and astrometry. Therefore:
  - Use run instead of parseAndRun, which requires a Butler but
    allows various configs to be set or overwritten sequentially
  - Pass dataId string as a dict when appropriate
@mrawls mrawls merged commit 3881aef into master Sep 22, 2017
@kfindeisen kfindeisen deleted the tickets/DM-11390 branch April 13, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants