Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline skip already run steps #1262

Merged
merged 23 commits into from
Jul 31, 2023

Conversation

bhilbert4
Copy link
Collaborator

This PR is motivated by updates to the dark monitor. In input data for the dark monitor are dark.fits files. However, the pipeline calls in our run_pipeline.py skip at the moment all assume that the input file is an uncal.fits file, and that all calwebb_detector1 pipeline steps should be run.

This PR updates the pipeline calls such that for dark.fits input files, only the jump and ramp-fitting steps are run. This will save time and disk space, since the pipeline call for the dark data at the moment saves the output from each step of the pipeline.

@bhilbert4
Copy link
Collaborator Author

This PR also switches the pipeline calls to use the call() method, rather than the run() method of the pipeline. This will allow the values of unspecified parameters to be retrieved from parameter reference files, rather than falling back to the defaults that are hardcoded.

@bhilbert4 bhilbert4 linked an issue Jun 2, 2023 that may be closed by this pull request
@bhilbert4 bhilbert4 changed the title [WIP]: Pipeline skip already run steps Pipeline skip already run steps Jul 26, 2023
@bhilbert4
Copy link
Collaborator Author

Testing on dev appears successful. The dark monitor successfully grabbed *_dark.fits files, ran the appropriate pipeline steps, and saved the rate.fits files. This PR also adds logic to allow the rateints files to be saved. Earlier testing showed that to be working. That aspect of the code is turned off in this PR though. A future PR that re-works how the dark monitor groups input files will also turn that on.

@mfixstsci this is ready for review.
@york-stsci since you are the most familiar with the celery functions, it might be helpful if you gave the changes a quick look over as well.

Copy link
Collaborator

@mfixstsci mfixstsci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bhilbert4 everything is looking great! Some comments here, I am waiting on approval because I would like to hear from @york-stsci first.

jwql/shared_tasks/run_pipeline.py Show resolved Hide resolved
jwql/shared_tasks/run_pipeline.py Outdated Show resolved Hide resolved
jwql/shared_tasks/run_pipeline.py Show resolved Hide resolved
@york-stsci
Copy link
Collaborator

I think the above has my thoughts on this. In general it looks good, and the shared tasks log shows that it's running through celery in the way that we would expect.

@york-stsci
Copy link
Collaborator

I feel as if, given the set of changes we've been making, it's probably worth taking this PR and putting it on test and doing a run there, just to be trying again on a clean system and make sure that we're not missing anything.

@bhilbert4
Copy link
Collaborator Author

I agree. I've re-implemented my function for the dictionary -> string conversion and am going to run another test on the dev server. Assuming that is successful, then I think testing this on the test server makes sense.

@bhilbert4
Copy link
Collaborator Author

@mfixstsci @york-stsci Ok, everything seems to be working well on the dev server. I've re-added the line with the jump step threshold. It would be nice to remove that in a separate PR in the not too distant future. We could always update the bad pixel monitor at the same time to have it pass the same parameter to its pipeline call, so nothing would change there. I just don't like the idea of forcing all monitors that use the current pipeline functions to not be able to adjust the parameter value.

@york-stsci
Copy link
Collaborator

@mfixstsci @york-stsci Ok, everything seems to be working well on the dev server. I've re-added the line with the jump step threshold. It would be nice to remove that in a separate PR in the not too distant future. We could always update the bad pixel monitor at the same time to have it pass the same parameter to its pipeline call, so nothing would change there. I just don't like the idea of forcing all monitors that use the current pipeline functions to not be able to adjust the parameter value.

I don't think we're currently forcing anything, because as far as I can tell the only monitor that calls the save_jump pipeline is the bad pixel monitor, and the parameters that are set are specifically the ones that the bad pixel monitor wants to set. That said, I agree that we should go down to a single pipeline function, especially since you now have a way to set custom parameters on a per-step basis when doing that.

I'm creating an issue now to do this, and once we have the capability to set per-monitor pipeline parameters (which also wasn't really something we had before we started using celery, because the directly-calling-the-pipeline version also didn't have a free way to set pipeline parameters), we might want to actually go through the common monitors (since those are the ones that call the pipeline in the first place), and ask whether we could get better results if we did start to use custom parameters.

@york-stsci
Copy link
Collaborator

As just a last question, when you say that rateints support is present but not turned on, do you mean that "if you ask for a rateints file, you'll get one, but the dark monitor doesn't currently ask for them?" or "you can't currently get one, but we can uncomment some lines, and then you'll be able to"?

@bhilbert4
Copy link
Collaborator Author

@york-stsci, the former. I made some updates that allow rateints to be specified as an output that is saved and moved into the output directory. Those updates worked in my dark monitor testing last week where I temporarily changed the dark monitor to ask for them. I have since changed the dark monitor back to ask for rate files, as it has been doing. In #1259 I have made more substantial changes to the dark monitor to allow it to actually open and work with rateints files. My plan is to update that branch with the changes in this branch (once this PR is merged), and to switch the dark monitor to ask for rateints files in there. Then everything will be set for an end to end test using rateints files.

@york-stsci
Copy link
Collaborator

In that case, we probably want to add to the general notes that, once this PR goes in, other monitors can also ask for rateints files, in case that would be useful.

Copy link
Collaborator

@york-stsci york-stsci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As (extensively) annotated in the conversation, looks good to me.

@mfixstsci mfixstsci merged commit 65fb43e into spacetelescope:develop Jul 31, 2023
6 checks passed
@bhilbert4 bhilbert4 deleted the pipeline-skip-run-steps branch July 31, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to call/omit specific steps of the pipeline
3 participants