Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tickets/dm-33992 Support generating Pipeline to include in DRP pipe #20

Merged
merged 8 commits into from Apr 5, 2022

Conversation

natelust
Copy link
Contributor

Draft PR of the complete pipeline building code

@natelust natelust force-pushed the tickets/DM-33992 branch 4 times, most recently from 5c625ee to 25a691b Compare March 16, 2022 14:19
@natelust natelust changed the title Draft tickets/dm-33992 tickets/dm-33992 Support generating Pipeline to include in DRP pipe Mar 16, 2022
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main comment here is that I don't see any separation of visit-level vs. coadd-level vs. survery-level plots. Based on the conversation yesterday, I expected to see three executions of the builder in the SConscript, each reading in and producing a different pipeline, with each of those defining a different labeled subset for all of their pipelines. And then pipelines in drp_pipe would include those three machine-generated pipelines directly.

I think you could achieve the same result in terms of subset definitions using the pattern-matching functionality you've added here, while running the builder only once, given @sr525 's statement about plot names always having suffixes like "_visit" or "_coadd", and maybe I misunderstood the plan from the meeting about how to achieve having those subsets. But I think we need to make them somehow.

I'm going to point the Slack channel at this comment to make sure I've understood the plan for splitting up plots correctly.

- "$ANALYSIS_DRP_DIR/pipelines/visitQAEllipPlots.yaml"
- "$ANALYSIS_DRP_DIR/pipelines/visitQAPlots.yaml"

description: "A pipeline including all analysis drp plots"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add newline at end of file.


targets.setdefault(
"pipelines",
[]).extend(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty unusual formatting; I'd put a newline between ] and ) with no indent, but that's mostly just a guess at what black would do.

Copy link
Contributor Author

@natelust natelust Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what black outputted for me, I will run it again once all changes are done.

self.parser = argparse.ArgumentParser()
self.parser.add_argument("pipeline", help="input pipeline to process")

def getGroups(self) -> Mapping[str, tuple[Optional[str], Iterable[str]]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is about defining subsets, maybe just call it "subsets" rather than "groups"?

def run(self, args: Sequence[str]) -> None:
"""Main entry point for actually building a pipeline. This is
responsible for parsing the supplied args (likely coming from the
command line, and then calling the build method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
command line, and then calling the build method.
command line), and then calling the build method.

And whenever talking about args from the command-line, I think it's important to clarify whether it expects sys.argv or sys.argv[1:].


Subsets are added to the output pipeline that contain all the plots
(except plots contained in the exclude set), and any other subsets
defined within the builders ``getGroups`` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defined within the builders ``getGroups`` method.
defined within the builder's ``getGroups`` method.

ValueError
Raised if a subset is already declared with a name returned from
the ``getGroups`` method.
Raise if a subset is alreday declared with a name that conflicts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Raise if a subset is alreday declared with a name that conflicts
Raised if a subset is already declared with a name that conflicts

- "$ANALYSIS_DRP_DIR/pipelines/makeQaTractTables.yaml"
- "$ANALYSIS_DRP_DIR/pipelines/stellarLocusPlots.yaml"
- "$ANALYSIS_DRP_DIR/pipelines/visitQAEllipPlots.yaml"
- "$ANALYSIS_DRP_DIR/pipelines/visitQAPlots.yaml"
Copy link
Member

@TallJimbo TallJimbo Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a concrete proposal for how to split this up:

  • all_coadd_plots.yaml.in: imports coaddQAPlots.yaml and stellarLocusPlots.yaml
  • all_visit_plots.yaml.in: imports visitQAPlots.yaml and visitQAEllipPlots.yaml
  • all_survey_plots.yaml.in: empty for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me, though are @erykoff's plots in a separate pipeline as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mine can't be in a generic pipeline because they can't run generically... how that all fits in together I definitely don't understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am leaving his pipeline as a completely independent thing, that anything needs to directly refer to (especially as it seems to only be able to be included in one place right now if I understand correctly) we can modify that on future tickets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also going to be matchVisitCatalogs.yaml->all_visit_plots.yaml.in and matchCoaddCatalogs.yaml->all_coadd_plots.yaml.in as soon as Jenkins finishes for my PR.

Copy link
Member

@TallJimbo TallJimbo Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's better for a pipeline to self-declare where it can be used, when possible, via this kind of structure. Having a flat directory with tons of plot pipelines, in which some are secretly but knowingly instrument-specific, seems like a trap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless they declare an instrument, because clashes in instruments are not allowed. It seems like that is how a pipeline is supposed to declare itself as belonging to an instrument rather than a position on a filesystem, or at the very least it should be both. As I said I am not invested enough to push bash on either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the instrument to the pipeline seems like it's worth doing as well, but without some mechanism for conditionally importing pipelines that looks at the instrument, I don't think it solves the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why conditional anything is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I guess I jumped a few steps without explaining myself. I am thinking about how to try to solve the problem of "I am a downstream pipeline author; how do I know which upstream plot-pipelines to include in my pipeline?" Having the instrument inside the pipeline but no relevant organizational structure for the pipelines themselves only seems to help with that problem if there is a way they can say, "include all of the pipelines but automatically filter out the ones with the wrong instrument".

@natelust natelust force-pushed the tickets/DM-33992 branch 3 times, most recently from 5acf636 to ea65d68 Compare March 22, 2022 19:06
@natelust natelust force-pushed the tickets/DM-33992 branch 2 times, most recently from 71ee2a7 to dae5fda Compare March 27, 2022 18:49
This patch introduces a python module which builds a top level
generic plotting pipeline containing plots which are generally
appropriate for inclusion in most drp processing pipelines.

Scons by default will build the pipeline, and the resulting pipeline
will be ignored by git.
Make sure visit level plots, and coadd level plots have unique
parameter names so they do not trample each other when combined
into one pipeline.
@natelust natelust force-pushed the tickets/DM-33992 branch 2 times, most recently from d8c2bd1 to 88eb1d9 Compare March 30, 2022 14:08
toPlotList = [(catPlot[xCol].values, catPlot[yCol].values, "purple", None,
sourceTypeMapper["all"])]
else:
toPlotList = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On your follow-up ticket, please also request code comments in and around this block, as I have no idea what it's doing, but I think that's not a job for you.

fig = self.scatterPlotWithTwoHists(plotDf, plotInfo, sumStats)
try:
fig = self.scatterPlotWithTwoHists(plotDf, plotInfo, sumStats)
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, just make sure we have a follow-up ticket to make this go away.

@natelust natelust merged commit ac18fba into main Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants