DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221

jrmullaney · 2024-03-17T21:39:11Z

This ticket involves changes to two repositories: this repository (analysis_tools) and pipe_tasks. Here, I'm adding a new analysis tool task and atool to extract summary statistics from a calexp's metadata and write them to sasquatch.

ctslater

One small comment.

ctslater · 2024-03-19T15:44:43Z

pipelines/calexpQualityCore.yaml

I'm not sure there's a need for this pipeline? Hopefully if we're dispatching during step1 then we won't need to run dispatch separately, and it's a little confusing how this relates to visitQualityCore.

I'm inclined to disagree: I feel that there is a need. This is in case someone wishes to create the metrics using a standalone analysis tool (i.e., the usual way we use analysis tools). For example, something gets flagged by the Step1 running, so then wants to reproduce the "weirdness" without running the calibrate task.

I did think about putting this in visitQualityCore, but that seems to aggregate/plot data for a whole visit (i.e., across all detectors), whereas this is for a single detector.

Having said all that, this doesn't affect its use for OR3, so I can drop it if needed and we can come back to it later.

I too am a little bit concerned that this pipeline hanging out there will confuse people as to why it's not included and run. At minimum I think that means it needs a lot of documentation added to it. I don't really think it is needed and can be deleted. It is no real extra work to do pipetask run -b ... -p $ANALYSIS_TOOLS_DIR/pipelines/calexpQualityCore.yaml ... vs pipetask run -b ... -t last.analysis.tools.tasks.CalexpSummaryAnalysisTask ... if someone wants to run a one off to regenerate something for investigation. And if they are doing investigation, it might make sense for them to use it through the terminal/notebook interface anyway using the tool directly.

I'm going to be adding some pipelines soon that won't be run as standard. We need some not run in processing but ready to go if something goes wrong pipelines. For example to be triggered if metrics cross a threshold. We can improve the docs and maybe make a readme in the pipelines directory?

Can we leave it for now? I'd like to get this merged by tomorrow, and this isn't critical for that.

if you want to get this merged fast, I would say delete this from this ticket, but don't delete it completely and talk with Sophie about longer term plans after this ticket. The problem is once it exists you have to worry about legacy users of this as an interface if you want to move or drop it later. Pipetask dose have the -t interface so it is not like someone can't run it stand alone, and I don't like the idea of a pipeline existing and named so specifically as this just to help someone not type out the class on the command line. I do support what Sophie is talking about as they are intended for specific types of followups (i.e. photometry) not so specific as pipelines for one task.

The core in the name implies it should be run every time. Let's leave it out and then have a chat about future pipelines. I could use a rubber duck if nothing else.

natelust · 2024-03-19T20:19:00Z

pipelines/calexpQualityCore.yaml

I too am a little bit concerned that this pipeline hanging out there will confuse people as to why it's not included and run. At minimum I think that means it needs a lot of documentation added to it. I don't really think it is needed and can be deleted. It is no real extra work to do pipetask run -b ... -p $ANALYSIS_TOOLS_DIR/pipelines/calexpQualityCore.yaml ... vs pipetask run -b ... -t last.analysis.tools.tasks.CalexpSummaryAnalysisTask ... if someone wants to run a one off to regenerate something for investigation. And if they are doing investigation, it might make sense for them to use it through the terminal/notebook interface anyway using the tool directly.

python/lsst/analysis/tools/actions/scalar/scalarActions.py

python/lsst/analysis/tools/atools/calexpMetrics.py

natelust · 2024-03-19T20:29:26Z

python/lsst/analysis/tools/tasks/calexpSummaryAnalysis.py

+    AnalysisBaseConnections,
+    dimensions=("visit", "band", "detector"),
+    defaultTemplates={"inputName": "calexp", "outputName": "calexpSummary"},
+):


I am not sure this is what you want. In the calibrate task I think you can just write out the ExposureSummaryStats in a connection, and then just read that in here, that would be way faster and cheaper than loading the whole exposure. There is already a Storage Class type for it in the butler: https://github.com/lsst/daf_butler/blob/main/python/lsst/daf/butler/configs/storageClasses.yaml#L33C3-L33C23 . This will need a change on the pipe_base side of the ticket.

BUT if we don't want to write the summary stats out for some good reason, this here looks fine.

I am fine for time reasons if you want to skip this for this ticket

I can't see where in calibrate.py ExposureSummaryStats are being written, so while I agree it would be much cheaper to just load the stats, it doesn't seem to be possible at thiss stage. Maybe I've missed something, though.

Actually, is there a reason to load it like this at all for this? It make sense if you wanted to stream them out as you go, but if you are all the way in analysis tools running a pipeline for follow up, why would you not just get it from the summary table that has the whole visit in it?

I agree that one may want to just do as you suggest. However, I'm trying to look at it from the perspective of someone who isn't very familiar with analysis_tools - if calexpSimmaryAnalysis is retargeted in the yaml file, then it's immediately clear to someone that they can use that analysis tool task to replicate the metrics.

Can you try changing your connection to calexp.summaryStats talking to Eli I think this is not possible with component disassembly.

calexp.summaryStats works, so I can change it to that (if that's what you mean).

I've changed it to calexp.summaryStats.

natelust · 2024-03-19T20:29:47Z

python/lsst/analysis/tools/tasks/calexpSummaryAnalysis.py

+
+        inputs = butlerQC.get(inputRefs)
+
+        summary = inputs["data"].getInfo().getSummaryStats().__dict__


if you change the input connection you will only need the __dict__ here

Add the ability for an analysis tool to specify if the input data should be passed through each stage in the tool.

changed docstring for stats

Extracts stats from a calexp's summaryStats metadata, utilizing propagateData = True to extract directly from keyedData.

jrmullaney requested review from natelust and sr525 March 17, 2024 21:39

jrmullaney force-pushed the tickets/DM-43231 branch 3 times, most recently from a989e0f to 2b40930 Compare March 17, 2024 21:53

ctslater reviewed Mar 19, 2024

View reviewed changes

natelust approved these changes Mar 19, 2024

View reviewed changes

jrmullaney force-pushed the tickets/DM-43231 branch 8 times, most recently from f57b13f to b1c56ae Compare March 21, 2024 05:39

natelust and others added 3 commits March 21, 2024 08:53

Add the ability to pass data through AnalysisTool

a09ea17

Add the ability for an analysis tool to specify if the input data should be passed through each stage in the tool.

calexpSummaryAnalysis.py reads a calexp's summary stats

5e8035c

changed docstring for stats

Added calexpMetrics analysis tool

7458332

Extracts stats from a calexp's summaryStats metadata, utilizing propagateData = True to extract directly from keyedData.

jrmullaney force-pushed the tickets/DM-43231 branch from b1c56ae to 7458332 Compare March 21, 2024 15:54

jrmullaney merged commit 65672ba into main Mar 21, 2024
8 checks passed

jrmullaney deleted the tickets/DM-43231 branch March 21, 2024 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221

DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221

jrmullaney commented Mar 17, 2024

ctslater left a comment

ctslater Mar 19, 2024

jrmullaney Mar 19, 2024 •

edited

natelust Mar 19, 2024

sr525 Mar 19, 2024

jrmullaney Mar 19, 2024

natelust Mar 19, 2024

sr525 Mar 20, 2024

natelust Mar 19, 2024

natelust Mar 19, 2024

natelust Mar 19, 2024

natelust Mar 19, 2024

jrmullaney Mar 19, 2024

natelust Mar 19, 2024

jrmullaney Mar 19, 2024 •

edited

natelust Mar 20, 2024

jrmullaney Mar 20, 2024

jrmullaney Mar 20, 2024

natelust Mar 19, 2024

jrmullaney Mar 19, 2024


		inputs = butlerQC.get(inputRefs)

		summary = inputs["data"].getInfo().getSummaryStats().__dict__

DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221

DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221

Conversation

jrmullaney commented Mar 17, 2024

ctslater left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrmullaney Mar 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrmullaney Mar 19, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrmullaney Mar 19, 2024 •

edited

jrmullaney Mar 19, 2024 •

edited