New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-29221: Add ApPipe.yaml pipelines to appropriate repos #74
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style and config questions. The one thing that worries me is the division of DECam processing into two different pipelines; if it's impossible to do those in a single pipeline, that's going to be very hard to reconcile with ap_verify
's goal of being instrument-agnostic.
@@ -0,0 +1,5 @@ | |||
description: ProcessCcd - A set of tasks to run when processing raw images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that this is for forward-compatibility with the Great Vision, but the pipe_tasks
pipeline is still being worked on. Might it be better to use the pipe_tasks
file for now, and switch to a local file later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slow enough to respond that RFC-775 is out now, so I do think this is the right place for it. Two additional reasons: (1) I can't imagine a version of generic gen3 ProcessCcd
that isn't just these three steps written down, so it's not like there's complicated stuff to remember to copy over, and (2) I don't want to import things called "DRP" for use by AP if we can help it.
pipelines/ApPipe.yaml
Outdated
diaPipe: | ||
class: lsst.ap.association.DiaPipelineTask | ||
config: | ||
# Remember to run make_apdb.py with the same isolation_level and db_url as here | ||
apdb.isolation_level: 'READ_UNCOMMITTED' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I did this in ap_verify.yaml
, but I think that was a mistake, since it caused us to use uncommitted reads even when safer options were available (e.g., PostgreSQL). I now think the entire APDB configuration should be left up to the user, with no defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting - I added it here because ap_association
fails if you don't specify READ_UNCOMMITTED
(I think that's true for both DB types), and I want the pipeline to work. I routinely set this config for both sqlite and postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's surprising -- I was definitely under the impression that this setting was only required for SQLite. It's certainly only enforced for SQLite...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think you're right it's only required for sqlite. I guess I can make the isolation_level
a comment (like the connection_timeout
) instead of an actual config, but I worry about decisions like this leading to more people messaging me when they try to run the AP Pipeline with default settings and it doesn't "just work." I wish we had a "diaPipeUsesWhateverConfigsMakeApdbUsed=True" button I could just turn on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we should revisit making the APDB location part of the task config. Configuring something that is always workspace- or workflow-dependent is not what either CmdLineTask
or PipelineTask
were designed for. Though I don't know what else we can do, especially in Gen 3 where the task inputs are strictly controlled...
characterizeImage: | ||
class: lsst.pipe.tasks.characterizeImage.CharacterizeImageTask | ||
config: | ||
refObjLoader.ref_dataset_name: 'panstarrs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these refcat overrides needed for DECam, specifically? It seems to me like it should depend on what repo you're processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both DECam ISR pipelines override outputExposure
to "postISRCCD"
. Does there need to be a similar input override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't anything DECam in /repo/main
, so I can't say for sure yet. This is what I had to do to get it to work with my gen3 repo in /project/mrawls/hits2015-3
. How does ap_verify
handle gen3 refcats with the HiTS datasets?
Good catch re. the outputExposure
setting, that appears to be an extraneously specified default, so I'll remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ap_verify
, the overrides are applied as part of dataset-specific configs. They're not included in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I'm frustrated there is no good place to set sensible defaults to prepare for a reality with no ap_verify-style datasets.
I took a look at ap_verify_ci_hits2015
and I do see the appropriate settings in config/calibrate.py
to panstarrs
and gaia
. It appears to use the obs_decam default for characterizeImage, which is ps1_pv3_3pi_20170110
, but only for Gen2 🤯 I don't know how the characterizeImage ref_dataset_name is set in Gen3.
Bottom line, I would like users to be able to run this pipeline out of the box with whatever the refcat names in /repo/main
for panstarrs and gaia wind up being, which will likely be panstarrs
and gaia
, for photometry and astrometry respectively. Plus the mappings to phot_g_mean
are basically always correct. Suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The characterizeImage
thing sounds like a hidden bug. Maybe it's not being used? I'm pretty sure that refcat isn't available in our Jenkins environment.
As for what to do, the ps1_pv3_3pi_20170110
defaults were motivated by those datasets being set up on lsst-dev
, but I'm guessing we can't change those while Gen 2 is still in use. I think the RFC-775-ish solution is to create a pipeline specialized for /repo/main
?
Though now I'm wondering why you didn't have to do this for HSC...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are good defaults for HSC refcats assuming everything is panstarrs somewhere (in obs_subaru) that obs_decam lacks. When the initial HSC gen3 repo was set up, this was worked out rather early on via the refcats
collection so images could be processed at all. Everything I've done with DECam has been much more ad hoc, because as in many situations, certain refcat defaults were hardcoded into obs_subaru and never thought about for obs_decam.
I guess for now, I will make a ApPipe_hits2015-3
with these configs that imports all the rest of everything to go with my repo by the same name. And later we can add one for /repo/main
which will be fun to come up with a name for!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I can find is https://github.com/lsst/obs_subaru/blob/master/config/calibrate.py, which sets ps1_pv3_3pi_20170110
explicitly. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps they just stick with that name instead of calling it a more succinct "panstarrs." We may have to follow suit for DECam in the shared repo, time will tell.
eedada9
to
cd56819
Compare
I think I addressed all your comments @kfindeisen, can you please take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
description: Run IsrTask for DECam with only intra-chip crosstalk. Inter-chip crosstalk needs pre-prepared crosstalkSources. | ||
instrument: lsst.obs.decam.DarkEnergyCamera | ||
tasks: | ||
isrOscan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isrOscan
and not isr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originated with Chris W, but I like it because it distinguishes this "do overscan correction to prepare the crosstalk sources, which technically means running a mini IsrTask" ISR from "real" ISR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Isn't this the old workflow, where you just have "real" ISR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I got them mixed up, even with the extremely explicit naming. Yeah, I think we can just call it isr.
Specify bias and flat data types in DECam ISR
cd56819
to
9e9f1e9
Compare
I moved comments about diaPipe configs to the "main" |
Also add new default doScaleDiffimVariance setting, default diaPipe configs, and some clarifying comments.
9e9f1e9
to
065755e
Compare
This PR adds a top-level
ApPipe.yaml
toap_pipe
. It imports from a new standaloneProcessCcd.yaml
in the same location. In addition, there are two new subdirectories for DECam and HSC. These contain camera-specific pipelines and configurations which import from the mainApPipe.yaml
.