Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default processing #3116

Merged
merged 8 commits into from
Jun 30, 2021
Merged

Conversation

antgonza
Copy link
Member

First merge #3110; then I'll update this PR.

This PR adds a single button to generate/populate based on our default pipelines, for example:
add-default-workflow

@antgonza antgonza requested a review from wasade June 24, 2021 00:58
@coveralls
Copy link

coveralls commented Jun 24, 2021

Coverage Status

Coverage decreased (-0.03%) to 91.089% when pulling 4a97f92 on antgonza:add-default-processing into e30107c on qiita-spots:dev.


def add_default_workflow(self, user):
"""The modification timestamp of the prep information

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


Parameters
----------
user : The user that requested to add the default workflows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to note the expected type of user?

# 3. loop over the missing merging schemes and create the commands
# missing to get to those processed samples and add them to a new
# workflow
# 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 4.

"""
# helper functions to avoid duplication of code

def _get_node_info(node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pass node but not wk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add but when this is called, wk is "global" within this method; do you think is better to pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be nice to either pass both or pass neither as node is also non-local from what I can tell from the function call

if not missing_artifacts[wk]:
del missing_artifacts[wk]
if not missing_artifacts:
raise ValueError('This preparation is complete')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to correspond to the list of reasons for a ValueError in the docstring, or I may be misunderstanding how this relates to the described reasons

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I'll add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


def _get_predecessors(node):
# recursive method to get predecessors of a given node
for pnode in wk.graph.predecessors(node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here regarding node and wk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, same, they exists in add_default_workflow and the functions are defined and called within that scope so they are "global" ... in fact, we might not even need to pass node. Anyway, I'm OK changing to whatever you think is best, just let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing both may be nice to be explicit, tracing non-local variables can get confusing quickly

pred = [data]
else:
pred.append(data)
return pred
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having the return in scope of the for would imply a node can only have a single predecessor. Is that correct for both interpretation and intent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!! Note that this is a recursive function (line 767 calls itself) as we need the "grandparent" information to create the "parent" information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, but just so I understand, is it correct that a node only ever has a single parent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for double checking this; better to be safe.

Yes, I spent sometime looking at this and this is the expected behavior; basically, by being recursive it will go down to the origin of the workflow (always a single artifact) and return (if pred is None:) the single only parent, then it will start coming out of the recursion and appending in order the children jobs (else:). In case it helps, we tested this in qiita-rc and everything seem fine.

self.assertEqual(pt.artifact, None)
artifact = qdb.artifact.Artifact.create(
self.filepaths, "FASTQ", prep_template=pt)
self.assertEqual(pt.artifact, artifact)

# here we can test that we can properly create a workflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a limited test given the complexity of the function, are there edge cases beyond the ValueError check below that should be examined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll add more tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I checked everything that came to mind and I think those "simple" tests actually cover all scenarios by having to create a full pipeline and then checking that full pipeline and not having anything else to create.

Copy link
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wasade, thank you for detailed review, I have some questions; then, I'll issue fixes to your comments.

"""
# helper functions to avoid duplication of code

def _get_node_info(node):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could add but when this is called, wk is "global" within this method; do you think is better to pass?


def _get_predecessors(node):
# recursive method to get predecessors of a given node
for pnode in wk.graph.predecessors(node):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, same, they exists in add_default_workflow and the functions are defined and called within that scope so they are "global" ... in fact, we might not even need to pass node. Anyway, I'm OK changing to whatever you think is best, just let me know.

pred = [data]
else:
pred.append(data)
return pred
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!! Note that this is a recursive function (line 767 calls itself) as we need the "grandparent" information to create the "parent" information.

if not missing_artifacts[wk]:
del missing_artifacts[wk]
if not missing_artifacts:
raise ValueError('This preparation is complete')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I'll add it.

self.assertEqual(pt.artifact, None)
artifact = qdb.artifact.Artifact.create(
self.filepaths, "FASTQ", prep_template=pt)
self.assertEqual(pt.artifact, artifact)

# here we can test that we can properly create a workflow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll add more tests.

@mestaki
Copy link
Collaborator

mestaki commented Jun 30, 2021

Wasn't sure if this should go here or have its own separate issue, let me know and happy create a new issue.

As per the recent email thread on this topic: the Standard Workflow option is great, but without a brief description of what it actually is/does it might be confusing for users. Adding a brief description of this would be thus very useful. Happy to contribute to this.
Option 1: Add a short description on top of the main network processing page,
Option 2: A brief "hover-over" text box appears over the "Standard Workflow" button,
Option 3: A detailed description in the main help files and just provide a link to that in the network processing page?
Option 4: something else?

Depending on the options the description I can make the text brief or more detailed.

@antgonza
Copy link
Member Author

Thank you @mestaki; I think 1 and 3 would be great. If you send me an email with that I can open a PR adding those texts so you can review there ... or you can open a PR with those changes and I can review; whatever works best. BTW no need to open a new issue.

Copy link
Member Author

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wasade.

self.assertEqual(pt.artifact, None)
artifact = qdb.artifact.Artifact.create(
self.filepaths, "FASTQ", prep_template=pt)
self.assertEqual(pt.artifact, artifact)

# here we can test that we can properly create a workflow
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I checked everything that came to mind and I think those "simple" tests actually cover all scenarios by having to create a full pipeline and then checking that full pipeline and not having anything else to create.

pred = [data]
else:
pred.append(data)
return pred
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for double checking this; better to be safe.

Yes, I spent sometime looking at this and this is the expected behavior; basically, by being recursive it will go down to the origin of the workflow (always a single artifact) and return (if pred is None:) the single only parent, then it will start coming out of the recursion and appending in order the children jobs (else:). In case it helps, we tested this in qiita-rc and everything seem fine.

@wasade
Copy link
Contributor

wasade commented Jun 30, 2021

Good to merge?

@antgonza
Copy link
Member Author

Yes, thank you. The tutorial/text updates will be done in a separate PR (together with the release notes).

@wasade wasade merged commit ef7f873 into qiita-spots:dev Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants