Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-11021: Generalize prototype pipeline script #4

Merged
merged 7 commits into from Aug 11, 2017
Merged

Conversation

mrawls
Copy link
Collaborator

@mrawls mrawls commented Jul 28, 2017

No description provided.

Defect ingestion does not work presently because the defects live
in a tar.gz file. Once this is fixed, the pipeline should work.
Additional outstanding issues are noted with TODO in the code.
Gaia for astrometry, pan-starrs for photometry
These changes effectively complete DM-10975
decam_process.py Outdated
datafiles.extend(glob(os.path.join(args.dataset, 'raw', files)))
allcalibdatafiles.extend(glob(os.path.join(args.dataset, 'calib', files)))

# Ignore wtmaps and illumcors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider breaking up main() into several functions, e.g., one for argument parsing, one for finding the files, one for the "run everything" bit... Stuff could be divided past that; for example, the code used to create datafiles and allcalibdatafiles could be split into a single function that takes a directory and an extension (or sequence thereof).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholeheartedly agree.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! Mucking with main() is a bit lower on my priority list since ap_verify doesn't care about it, but you're both correct that this function is rather lengthy with multiple functionalities. I'll see what I can do.

Copy link

@djreiss djreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like a very positive improvement.

decam_process.py Outdated
@@ -15,324 +15,378 @@
import os
import argparse
import textwrap
import tarfile
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decam_process.py Outdated
'''
Process raw decam images with MasterCals from ingestion --> difference imaging

TODO: Rename script and GitHub repo to ap_pipe (not decam_hits or decam_process).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level docstring should come above the imports.

decam_process.py Outdated
$ python hits_ingest.py ingestCalibs -f path/to/calibrations/somefiles*.fits.fz
$ python hits_ingest.py processCcd
$ python hits_ingest.py diffIm
$ python decam_process.py -d dataset_root -o output_location -i "visit=12345, ccd=5"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to make sure that this line stays sync-ed with the one on L49 - or else get rid of it here...?

decam_process.py Outdated
help="Location on disk of dataset_root, which contains subdirectories of raw data, calibs, etc.")
parser.add_argument('-o', '--output',
help="Location on disk where output repos will live.")
# TODO: implement this argument
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a ticket for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is part of DM-11422 because using coadds as templates will require --id-like metadata to be handled differently than when a visit is a template. I'll note this explicitly here.

Copy link
Member

@kfindeisen kfindeisen Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decam_process.py Outdated
ingest - Ingest raw DECam images into a repository with a
corresponding registry.
USAGE:
$ python hits_ingest.py ingest -f path/to/rawimages/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did all of this extra functionality go away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially yes; a call to run the pipeline script via main() will run all of the steps rather than just one at a time. This doesn't matter for ap_verify (or any other external code using the pipeline) because it can call whichever function it wants whenever it wants. Is that OK?

decam_process.py Outdated
tarfile.open(os.path.join('../../', defecttarball), 'r').extractall('defects')
defectfiles = glob(os.path.join('defects', os.path.basename(defectfiles[0]), '*.fits'))
defectargs.extend(defectfiles)
defectargumentParser = IngestCalibsArgumentParser(name='ingestCalibs')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defectArgumentParser, etc...

decam_process.py Outdated
print('If this is your first time ingesting these calibration files,')
print(' make sure you only use image or wtmap MasterCals, not both.')
print('~~~ !!! ~~~')
flatBiasIngest(repo, calibrepo, calibdatafiles)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not for this ticket but I think all of this code would benefit from going into a class, instead of having passing around all the different variables from one function to the next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about the benefits? You made a similar argument for ap_verify and I'm still not sure it was the right thing to do; in particular, explicit function arguments make it much easier to see (and control) each task's data dependencies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to discuss with both of you further, but "not for this ticket" indeed!

refObjLoader.filterMap = {"g": "g",
"r": "r",
"VR": "g"}
** to run from bash, 'processccd_config.py' must exist and contain
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the equivalent of using the processccd_config.py file for this script? Is it hardcoded somewhere, or not used?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see it below. Might be good to enumerate the default values that are set in the config (e.g. using gaia for astrometric reference, etc.)

decam_process.py Outdated
'calibrate.doPhotoCal=True']
ProcessCcdTask.parseAndRun(args=args, config=config)
if os.path.isdir(os.path.join(processedrepo, '0'+visit)):
print('ProcessCcd has already been run for visit {0}, skipping...'.format(visit))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe add a return statement here and move everything else out of the else block.

decam_process.py Outdated

Parameters
----------
processedrepo: `str`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first read this as proccess drepo. I like camel-case or underscore (I don't care which but I think the style guide has us moving toward using underscore, e.g. processed_repo.

@mrawls
Copy link
Collaborator Author

mrawls commented Aug 10, 2017

Thanks for the thorough review! I think the only outstanding items are (1) adding tests and (2) possibly changing how defect file/tarball info is passed into doIngestCalibs. I would like to close this ticket so I can rename/reorganize the repo to finally be ap_pipe before making many more changes to the code. I propose handling these outstanding items as part of DM-11390 if that is OK with you.

Copy link

@djreiss djreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks great. Just 2 minor comments.

decam_process.py Outdated
@@ -207,158 +246,307 @@ def doIngest(repo, datafiles):
config = IngestConfig()
config.parse.retarget(DecamParseTask)
# create an instance of the decam ingest task
ingestTask = ingest.DecamIngestTask(config=config)
IngestTask = ingest.DecamIngestTask(config=config)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you changed this to capitalized? Capitalized names are for classes, whereas this is a class instance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because the story of my life is being confused about what is a class and what is an instance... I'll change it back :)

decam_process.py Outdated
argumentParser = IngestCalibsArgumentParser(name='ingestCalibs')
config = IngestCalibsConfig()
config.parse.retarget(ingestCalibs.DecamCalibsParseTask)
ingestTask = IngestCalibsTask(config=config, name='ingestCalibs')
CalibIngestTask = IngestCalibsTask(config=config, name='ingestCalibs')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

os.chdir(calib_repo)
try:
os.mkdir('defects')
except OSError:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change over to try/except/else!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, it's always OSError. 😲

You'll probably brand this as "not this ticket", but I don't think it's safe to assume a pre-existing directory is the only reason os.mkdir raises -- it could also be lack of permissions, problems with the filesystem itself, etc...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point @kfindeisen, I will make the statement that prints with that error mention the fact that the exception is likely (but not necessarily) happening because the defect directory already exists.

The defects are now copied and extracted from the input dataset into
the calibrepo, much in the same way ref_cats are extracted into repo.
This is done during ingestion instead of during processing.
@mrawls mrawls merged commit bc8f42c into master Aug 11, 2017
@kfindeisen kfindeisen deleted the tickets/DM-11021 branch April 13, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants