New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tickets/dm-33992 Support generating Pipeline to include in DRP pipe #20
Conversation
5c625ee
to
25a691b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main comment here is that I don't see any separation of visit-level vs. coadd-level vs. survery-level plots. Based on the conversation yesterday, I expected to see three executions of the builder in the SConscript, each reading in and producing a different pipeline, with each of those defining a different labeled subset for all of their pipelines. And then pipelines in drp_pipe would include those three machine-generated pipelines directly.
I think you could achieve the same result in terms of subset definitions using the pattern-matching functionality you've added here, while running the builder only once, given @sr525 's statement about plot names always having suffixes like "_visit" or "_coadd", and maybe I misunderstood the plan from the meeting about how to achieve having those subsets. But I think we need to make them somehow.
I'm going to point the Slack channel at this comment to make sure I've understood the plan for splitting up plots correctly.
pipelines/generic_plots.in
Outdated
- "$ANALYSIS_DRP_DIR/pipelines/visitQAEllipPlots.yaml" | ||
- "$ANALYSIS_DRP_DIR/pipelines/visitQAPlots.yaml" | ||
|
||
description: "A pipeline including all analysis drp plots" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline at end of file.
pipelines/SConscript
Outdated
|
||
targets.setdefault( | ||
"pipelines", | ||
[]).extend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty unusual formatting; I'd put a newline between ]
and )
with no indent, but that's mostly just a guess at what black would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what black outputted for me, I will run it again once all changes are done.
self.parser = argparse.ArgumentParser() | ||
self.parser.add_argument("pipeline", help="input pipeline to process") | ||
|
||
def getGroups(self) -> Mapping[str, tuple[Optional[str], Iterable[str]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is about defining subsets, maybe just call it "subsets" rather than "groups"?
def run(self, args: Sequence[str]) -> None: | ||
"""Main entry point for actually building a pipeline. This is | ||
responsible for parsing the supplied args (likely coming from the | ||
command line, and then calling the build method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
command line, and then calling the build method. | |
command line), and then calling the build method. |
And whenever talking about args from the command-line, I think it's important to clarify whether it expects sys.argv
or sys.argv[1:]
.
|
||
Subsets are added to the output pipeline that contain all the plots | ||
(except plots contained in the exclude set), and any other subsets | ||
defined within the builders ``getGroups`` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defined within the builders ``getGroups`` method. | |
defined within the builder's ``getGroups`` method. |
ValueError | ||
Raised if a subset is already declared with a name returned from | ||
the ``getGroups`` method. | ||
Raise if a subset is alreday declared with a name that conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise if a subset is alreday declared with a name that conflicts | |
Raised if a subset is already declared with a name that conflicts |
pipelines/generic_plots.in
Outdated
- "$ANALYSIS_DRP_DIR/pipelines/makeQaTractTables.yaml" | ||
- "$ANALYSIS_DRP_DIR/pipelines/stellarLocusPlots.yaml" | ||
- "$ANALYSIS_DRP_DIR/pipelines/visitQAEllipPlots.yaml" | ||
- "$ANALYSIS_DRP_DIR/pipelines/visitQAPlots.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a concrete proposal for how to split this up:
all_coadd_plots.yaml.in
: importscoaddQAPlots.yaml
andstellarLocusPlots.yaml
all_visit_plots.yaml.in
: importsvisitQAPlots.yaml
andvisitQAEllipPlots.yaml
all_survey_plots.yaml.in
: empty for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me, though are @erykoff's plots in a separate pipeline as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mine can't be in a generic pipeline because they can't run generically... how that all fits in together I definitely don't understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaving his pipeline as a completely independent thing, that anything needs to directly refer to (especially as it seems to only be able to be included in one place right now if I understand correctly) we can modify that on future tickets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also going to be matchVisitCatalogs.yaml
->all_visit_plots.yaml.in
and matchCoaddCatalogs.yaml
->all_coadd_plots.yaml.in
as soon as Jenkins finishes for my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's better for a pipeline to self-declare where it can be used, when possible, via this kind of structure. Having a flat directory with tons of plot pipelines, in which some are secretly but knowingly instrument-specific, seems like a trap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless they declare an instrument, because clashes in instruments are not allowed. It seems like that is how a pipeline is supposed to declare itself as belonging to an instrument rather than a position on a filesystem, or at the very least it should be both. As I said I am not invested enough to push bash on either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the instrument to the pipeline seems like it's worth doing as well, but without some mechanism for conditionally importing pipelines that looks at the instrument, I don't think it solves the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow why conditional anything is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I guess I jumped a few steps without explaining myself. I am thinking about how to try to solve the problem of "I am a downstream pipeline author; how do I know which upstream plot-pipelines to include in my pipeline?" Having the instrument inside the pipeline but no relevant organizational structure for the pipelines themselves only seems to help with that problem if there is a way they can say, "include all of the pipelines but automatically filter out the ones with the wrong instrument".
5acf636
to
ea65d68
Compare
71ee2a7
to
dae5fda
Compare
This patch introduces a python module which builds a top level generic plotting pipeline containing plots which are generally appropriate for inclusion in most drp processing pipelines. Scons by default will build the pipeline, and the resulting pipeline will be ignored by git.
Make sure visit level plots, and coadd level plots have unique parameter names so they do not trample each other when combined into one pipeline.
d8c2bd1
to
88eb1d9
Compare
This patch works around some failures in scatterPlot.py until a the underlying issues can be resolved.
78d987f
to
a1e9cbf
Compare
toPlotList = [(catPlot[xCol].values, catPlot[yCol].values, "purple", None, | ||
sourceTypeMapper["all"])] | ||
else: | ||
toPlotList = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On your follow-up ticket, please also request code comments in and around this block, as I have no idea what it's doing, but I think that's not a job for you.
fig = self.scatterPlotWithTwoHists(plotDf, plotInfo, sumStats) | ||
try: | ||
fig = self.scatterPlotWithTwoHists(plotDf, plotInfo, sumStats) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, just make sure we have a follow-up ticket to make this go away.
Draft PR of the complete pipeline building code