New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis results are not added to source data set #2294
Comments
So this seems to be a side effect of utilizing the asterisking feature that Galaxy provides to select workflow outputs. When a Workflow that specifies If one takes the same Workflow and deselects/reselects the asterisked outputs and then saves the entire workflow Refinery will then detect that there are I initially choose to utilize this feature of Galaxy because I thought that the old way of annotating outputs wouldn't scale very well. i.e. A user may be much more willing to click 10 things than annotate 10 things. Moving forward I see a couple of paths:
Looking for input @hackdna DS w/ Derived results from FastQC, RNA-SEQ SE/PE, and ChIP-Seq (all hg19) |
Thanks. First, just a bit of context: workflow annotation is meant to be done by site admins, not by end users. Also, it is something done infrequently, so there is no reason to worry about scaling. All proposed solutions sound OK. There should definitely be a full suite of checks for workflows imported into Refinery regardless of the annotation format chosen (it should be impossible to import a workflow that doesn't declare outputs). Also, there should probably be some error handling at the end of analysis when files are downloaded but not associated with the data set (they are inaccessible yet occupy storage space). It sounds like there is a workaround (re-saving workflows) that can be applied to the existing CloudMan clusters. However, it is a manual process, so a proper long term solution is needed. It is worth checking if this behavior exists in the latest version of Galaxy. If it doesn't, it would still probably take months before we can use it since that version of Galaxy would need to make it into CloudMan then we would need to test everything with Refinery, make changes if necessary, create a new shared cluster, etc. If this behavior does exist then you'd need to submit a patch for Galaxy and that would take even longer. Also, it is unclear if using the Galaxy mechanism for hiding intermediate workflow outputs is even sufficient for use with Refinery (if no outputs are hidden then all are returned and it is impossible to tell whether that was by design). So, all this basically means that we will most likely need to revert back to using step level annotations at least for medium term. Finally: what would be the process to clean up analysis output files on beta.stemcellcommons.org that were already downloaded but not associated with a data set? |
Okay thanks for the input: Also, the workaround is very specific, I've found just saving again not to be enough. I have a gist here illustrating the odd behavior, but, in short, one would need to: upload a Regarding: Simply deleting these recent |
Sounds good. Yes, I assumed that all outputs are downloaded if none are selected because of the behavior described in #2293. So, if Galaxy can properly recognize and report the outputs marked by asterisk and if workflow outputs are checked during import then I guess it won't be a problem. Thanks for the gist. Have you already updated the workflows on the current prod cluster and re-imported them into beta or should I? |
👍 I've only updated the Human-based Workflows so far. |
OK, thanks, that would be great. |
Just to clarify: we should revert back to using step level annotations for workflow outputs at least for medium term. |
So I can reproduce the odd asterisking behavior in newer Galaxy I'm going to close this in favor of #2381 where we are reverting back to using workflow step annotations for desired output files |
Steps to reproduce
Observed behavior
Analysis results are not added to the source data set
Expected behavior
Analysis results are added to the source data set
Notes
TO-DO:
workflow_outputs
Scottx611x/ensure workflow outputs exist #2315The text was updated successfully, but these errors were encountered: