Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-43231: Create Analysis Tool to send visitSummary info to sasquatch #221

Merged
merged 3 commits into from Mar 21, 2024

Conversation

jrmullaney
Copy link
Contributor

This ticket involves changes to two repositories: this repository (analysis_tools) and pipe_tasks. Here, I'm adding a new analysis tool task and atool to extract summary statistics from a calexp's metadata and write them to sasquatch.

@jrmullaney jrmullaney force-pushed the tickets/DM-43231 branch 3 times, most recently from a989e0f to 2b40930 Compare March 17, 2024 21:53
Copy link
Member

@ctslater ctslater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a need for this pipeline? Hopefully if we're dispatching during step1 then we won't need to run dispatch separately, and it's a little confusing how this relates to visitQualityCore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to disagree: I feel that there is a need. This is in case someone wishes to create the metrics using a standalone analysis tool (i.e., the usual way we use analysis tools). For example, something gets flagged by the Step1 running, so then wants to reproduce the "weirdness" without running the calibrate task.

I did think about putting this in visitQualityCore, but that seems to aggregate/plot data for a whole visit (i.e., across all detectors), whereas this is for a single detector.

Having said all that, this doesn't affect its use for OR3, so I can drop it if needed and we can come back to it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too am a little bit concerned that this pipeline hanging out there will confuse people as to why it's not included and run. At minimum I think that means it needs a lot of documentation added to it. I don't really think it is needed and can be deleted. It is no real extra work to do pipetask run -b ... -p $ANALYSIS_TOOLS_DIR/pipelines/calexpQualityCore.yaml ... vs pipetask run -b ... -t last.analysis.tools.tasks.CalexpSummaryAnalysisTask ... if someone wants to run a one off to regenerate something for investigation. And if they are doing investigation, it might make sense for them to use it through the terminal/notebook interface anyway using the tool directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be adding some pipelines soon that won't be run as standard. We need some not run in processing but ready to go if something goes wrong pipelines. For example to be triggered if metrics cross a threshold. We can improve the docs and maybe make a readme in the pipelines directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave it for now? I'd like to get this merged by tomorrow, and this isn't critical for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to get this merged fast, I would say delete this from this ticket, but don't delete it completely and talk with Sophie about longer term plans after this ticket. The problem is once it exists you have to worry about legacy users of this as an interface if you want to move or drop it later. Pipetask dose have the -t interface so it is not like someone can't run it stand alone, and I don't like the idea of a pipeline existing and named so specifically as this just to help someone not type out the class on the command line. I do support what Sophie is talking about as they are intended for specific types of followups (i.e. photometry) not so specific as pipelines for one task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core in the name implies it should be run every time. Let's leave it out and then have a chat about future pipelines. I could use a rubber duck if nothing else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too am a little bit concerned that this pipeline hanging out there will confuse people as to why it's not included and run. At minimum I think that means it needs a lot of documentation added to it. I don't really think it is needed and can be deleted. It is no real extra work to do pipetask run -b ... -p $ANALYSIS_TOOLS_DIR/pipelines/calexpQualityCore.yaml ... vs pipetask run -b ... -t last.analysis.tools.tasks.CalexpSummaryAnalysisTask ... if someone wants to run a one off to regenerate something for investigation. And if they are doing investigation, it might make sense for them to use it through the terminal/notebook interface anyway using the tool directly.

python/lsst/analysis/tools/actions/scalar/scalarActions.py Outdated Show resolved Hide resolved
python/lsst/analysis/tools/atools/calexpMetrics.py Outdated Show resolved Hide resolved
AnalysisBaseConnections,
dimensions=("visit", "band", "detector"),
defaultTemplates={"inputName": "calexp", "outputName": "calexpSummary"},
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is what you want. In the calibrate task I think you can just write out the ExposureSummaryStats in a connection, and then just read that in here, that would be way faster and cheaper than loading the whole exposure. There is already a Storage Class type for it in the butler: https://github.com/lsst/daf_butler/blob/main/python/lsst/daf/butler/configs/storageClasses.yaml#L33C3-L33C23 . This will need a change on the pipe_base side of the ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUT if we don't want to write the summary stats out for some good reason, this here looks fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine for time reasons if you want to skip this for this ticket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see where in calibrate.py ExposureSummaryStats are being written, so while I agree it would be much cheaper to just load the stats, it doesn't seem to be possible at thiss stage. Maybe I've missed something, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is there a reason to load it like this at all for this? It make sense if you wanted to stream them out as you go, but if you are all the way in analysis tools running a pipeline for follow up, why would you not just get it from the summary table that has the whole visit in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that one may want to just do as you suggest. However, I'm trying to look at it from the perspective of someone who isn't very familiar with analysis_tools - if calexpSimmaryAnalysis is retargeted in the yaml file, then it's immediately clear to someone that they can use that analysis tool task to replicate the metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try changing your connection to calexp.summaryStats talking to Eli I think this is not possible with component disassembly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calexp.summaryStats works, so I can change it to that (if that's what you mean).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to calexp.summaryStats.


inputs = butlerQC.get(inputRefs)

summary = inputs["data"].getInfo().getSummaryStats().__dict__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you change the input connection you will only need the __dict__ here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@jrmullaney jrmullaney force-pushed the tickets/DM-43231 branch 8 times, most recently from f57b13f to b1c56ae Compare March 21, 2024 05:39
natelust and others added 3 commits March 21, 2024 08:53
Add the ability for an analysis tool to specify if the input data
should be passed through each stage in the tool.
Extracts stats from a calexp's summaryStats metadata, utilizing
propagateData = True to extract directly from keyedData.
@jrmullaney jrmullaney merged commit 65672ba into main Mar 21, 2024
8 checks passed
@jrmullaney jrmullaney deleted the tickets/DM-43231 branch March 21, 2024 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants