New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221
Conversation
a989e0f
to
2b40930
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment.
pipelines/calexpQualityCore.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's a need for this pipeline? Hopefully if we're dispatching during step1 then we won't need to run dispatch separately, and it's a little confusing how this relates to visitQualityCore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to disagree: I feel that there is a need. This is in case someone wishes to create the metrics using a standalone analysis tool (i.e., the usual way we use analysis tools). For example, something gets flagged by the Step1 running, so then wants to reproduce the "weirdness" without running the calibrate task.
I did think about putting this in visitQualityCore, but that seems to aggregate/plot data for a whole visit (i.e., across all detectors), whereas this is for a single detector.
Having said all that, this doesn't affect its use for OR3, so I can drop it if needed and we can come back to it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too am a little bit concerned that this pipeline hanging out there will confuse people as to why it's not included and run. At minimum I think that means it needs a lot of documentation added to it. I don't really think it is needed and can be deleted. It is no real extra work to do pipetask run -b ... -p $ANALYSIS_TOOLS_DIR/pipelines/calexpQualityCore.yaml ...
vs pipetask run -b ... -t last.analysis.tools.tasks.CalexpSummaryAnalysisTask ...
if someone wants to run a one off to regenerate something for investigation. And if they are doing investigation, it might make sense for them to use it through the terminal/notebook interface anyway using the tool directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to be adding some pipelines soon that won't be run as standard. We need some not run in processing but ready to go if something goes wrong pipelines. For example to be triggered if metrics cross a threshold. We can improve the docs and maybe make a readme in the pipelines directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave it for now? I'd like to get this merged by tomorrow, and this isn't critical for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to get this merged fast, I would say delete this from this ticket, but don't delete it completely and talk with Sophie about longer term plans after this ticket. The problem is once it exists you have to worry about legacy users of this as an interface if you want to move or drop it later. Pipetask dose have the -t
interface so it is not like someone can't run it stand alone, and I don't like the idea of a pipeline existing and named so specifically as this just to help someone not type out the class on the command line. I do support what Sophie is talking about as they are intended for specific types of followups (i.e. photometry) not so specific as pipelines for one task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core in the name implies it should be run every time. Let's leave it out and then have a chat about future pipelines. I could use a rubber duck if nothing else.
pipelines/calexpQualityCore.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too am a little bit concerned that this pipeline hanging out there will confuse people as to why it's not included and run. At minimum I think that means it needs a lot of documentation added to it. I don't really think it is needed and can be deleted. It is no real extra work to do pipetask run -b ... -p $ANALYSIS_TOOLS_DIR/pipelines/calexpQualityCore.yaml ...
vs pipetask run -b ... -t last.analysis.tools.tasks.CalexpSummaryAnalysisTask ...
if someone wants to run a one off to regenerate something for investigation. And if they are doing investigation, it might make sense for them to use it through the terminal/notebook interface anyway using the tool directly.
AnalysisBaseConnections, | ||
dimensions=("visit", "band", "detector"), | ||
defaultTemplates={"inputName": "calexp", "outputName": "calexpSummary"}, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is what you want. In the calibrate task I think you can just write out the ExposureSummaryStats
in a connection, and then just read that in here, that would be way faster and cheaper than loading the whole exposure. There is already a Storage Class type for it in the butler: https://github.com/lsst/daf_butler/blob/main/python/lsst/daf/butler/configs/storageClasses.yaml#L33C3-L33C23 . This will need a change on the pipe_base side of the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUT if we don't want to write the summary stats out for some good reason, this here looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine for time reasons if you want to skip this for this ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see where in calibrate.py ExposureSummaryStats
are being written, so while I agree it would be much cheaper to just load the stats, it doesn't seem to be possible at thiss stage. Maybe I've missed something, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is there a reason to load it like this at all for this? It make sense if you wanted to stream them out as you go, but if you are all the way in analysis tools running a pipeline for follow up, why would you not just get it from the summary table that has the whole visit in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that one may want to just do as you suggest. However, I'm trying to look at it from the perspective of someone who isn't very familiar with analysis_tools - if calexpSimmaryAnalysis is retargeted in the yaml file, then it's immediately clear to someone that they can use that analysis tool task to replicate the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try changing your connection to calexp.summaryStats
talking to Eli I think this is not possible with component disassembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calexp.summaryStats works, so I can change it to that (if that's what you mean).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to calexp.summaryStats
.
|
||
inputs = butlerQC.get(inputRefs) | ||
|
||
summary = inputs["data"].getInfo().getSummaryStats().__dict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you change the input connection you will only need the __dict__
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
f57b13f
to
b1c56ae
Compare
Add the ability for an analysis tool to specify if the input data should be passed through each stage in the tool.
changed docstring for stats
Extracts stats from a calexp's summaryStats metadata, utilizing propagateData = True to extract directly from keyedData.
b1c56ae
to
7458332
Compare
This ticket involves changes to two repositories: this repository (analysis_tools) and pipe_tasks. Here, I'm adding a new analysis tool task and atool to extract summary statistics from a calexp's metadata and write them to sasquatch.