-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default processing #3116
Add default processing #3116
Conversation
|
||
def add_default_workflow(self, user): | ||
"""The modification timestamp of the prep information | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Parameters | ||
---------- | ||
user : The user that requested to add the default workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to note the expected type of user
?
# 3. loop over the missing merging schemes and create the commands | ||
# missing to get to those processed samples and add them to a new | ||
# workflow | ||
# 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 4. |
""" | ||
# helper functions to avoid duplication of code | ||
|
||
def _get_node_info(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pass node
but not wk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add but when this is called, wk is "global" within this method; do you think is better to pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be nice to either pass both or pass neither as node
is also non-local from what I can tell from the function call
if not missing_artifacts[wk]: | ||
del missing_artifacts[wk] | ||
if not missing_artifacts: | ||
raise ValueError('This preparation is complete') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to correspond to the list of reasons for a ValueError
in the docstring, or I may be misunderstanding how this relates to the described reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
def _get_predecessors(node): | ||
# recursive method to get predecessors of a given node | ||
for pnode in wk.graph.predecessors(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question here regarding node
and wk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, same, they exists in add_default_workflow
and the functions are defined and called within that scope so they are "global" ... in fact, we might not even need to pass node. Anyway, I'm OK changing to whatever you think is best, just let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing both may be nice to be explicit, tracing non-local variables can get confusing quickly
pred = [data] | ||
else: | ||
pred.append(data) | ||
return pred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that having the return
in scope of the for
would imply a node can only have a single predecessor. Is that correct for both interpretation and intent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!! Note that this is a recursive function (line 767 calls itself) as we need the "grandparent" information to create the "parent" information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, but just so I understand, is it correct that a node only ever has a single parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for double checking this; better to be safe.
Yes, I spent sometime looking at this and this is the expected behavior; basically, by being recursive it will go down to the origin of the workflow (always a single artifact) and return (if pred is None:
) the single only parent, then it will start coming out of the recursion and appending in order the children jobs (else:
). In case it helps, we tested this in qiita-rc and everything seem fine.
self.assertEqual(pt.artifact, None) | ||
artifact = qdb.artifact.Artifact.create( | ||
self.filepaths, "FASTQ", prep_template=pt) | ||
self.assertEqual(pt.artifact, artifact) | ||
|
||
# here we can test that we can properly create a workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a limited test given the complexity of the function, are there edge cases beyond the ValueError
check below that should be examined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll add more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, I checked everything that came to mind and I think those "simple" tests actually cover all scenarios by having to create a full pipeline and then checking that full pipeline and not having anything else to create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wasade, thank you for detailed review, I have some questions; then, I'll issue fixes to your comments.
""" | ||
# helper functions to avoid duplication of code | ||
|
||
def _get_node_info(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add but when this is called, wk is "global" within this method; do you think is better to pass?
|
||
def _get_predecessors(node): | ||
# recursive method to get predecessors of a given node | ||
for pnode in wk.graph.predecessors(node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, same, they exists in add_default_workflow
and the functions are defined and called within that scope so they are "global" ... in fact, we might not even need to pass node. Anyway, I'm OK changing to whatever you think is best, just let me know.
pred = [data] | ||
else: | ||
pred.append(data) | ||
return pred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!! Note that this is a recursive function (line 767 calls itself) as we need the "grandparent" information to create the "parent" information.
if not missing_artifacts[wk]: | ||
del missing_artifacts[wk] | ||
if not missing_artifacts: | ||
raise ValueError('This preparation is complete') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I'll add it.
self.assertEqual(pt.artifact, None) | ||
artifact = qdb.artifact.Artifact.create( | ||
self.filepaths, "FASTQ", prep_template=pt) | ||
self.assertEqual(pt.artifact, artifact) | ||
|
||
# here we can test that we can properly create a workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll add more tests.
Wasn't sure if this should go here or have its own separate issue, let me know and happy create a new issue. As per the recent email thread on this topic: the Standard Workflow option is great, but without a brief description of what it actually is/does it might be confusing for users. Adding a brief description of this would be thus very useful. Happy to contribute to this. Depending on the options the description I can make the text brief or more detailed. |
Thank you @mestaki; I think 1 and 3 would be great. If you send me an email with that I can open a PR adding those texts so you can review there ... or you can open a PR with those changes and I can review; whatever works best. BTW no need to open a new issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wasade.
self.assertEqual(pt.artifact, None) | ||
artifact = qdb.artifact.Artifact.create( | ||
self.filepaths, "FASTQ", prep_template=pt) | ||
self.assertEqual(pt.artifact, artifact) | ||
|
||
# here we can test that we can properly create a workflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, I checked everything that came to mind and I think those "simple" tests actually cover all scenarios by having to create a full pipeline and then checking that full pipeline and not having anything else to create.
pred = [data] | ||
else: | ||
pred.append(data) | ||
return pred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for double checking this; better to be safe.
Yes, I spent sometime looking at this and this is the expected behavior; basically, by being recursive it will go down to the origin of the workflow (always a single artifact) and return (if pred is None:
) the single only parent, then it will start coming out of the recursion and appending in order the children jobs (else:
). In case it helps, we tested this in qiita-rc and everything seem fine.
Good to merge? |
Yes, thank you. The tutorial/text updates will be done in a separate PR (together with the release notes). |
First merge #3110; then I'll update this PR.
This PR adds a single button to generate/populate based on our default pipelines, for example: