Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a worker to run longer analyses (polyphemus) #12

Closed
graft opened this issue Dec 6, 2017 · 0 comments
Closed

Add a worker to run longer analyses (polyphemus) #12

graft opened this issue Dec 6, 2017 · 0 comments
Assignees

Comments

@graft
Copy link
Contributor

graft commented Dec 6, 2017

Polyphemus is a worker that finds data uploaded to magma and manages its analysis via an external pipeline.

Summary

At the moment data from external pipelines is fed into Magma by third parties. A typical workflow might go like this:

  • Analyst receives data (e.g. raw sequence from RNAseq) and performs analysis on it (e.g. alignment + quantification)
  • Analyst composes a document, validates it against a Magma template and sends it with appropriate credentials to Magma for insertion
  • Magma approves and inserts

However, as more and more data flows into the system and the number of tasks increase this sort of manual intervention will become problematic. We can imagine, instead, replacing the analyst's role with a software worker (polyphemus). This time the workflow might go like this:

  • Data is uploaded into Magma (e.g. raw sequence from RNAseq)
  • Polyphemus searches for unanalyzed data to consume (e.g. a sample with raw sequence but no quantification results)
  • Polyphemus determines the correct analysis to run on the unanalyzed data and composes a configuration script for a remote pipeline.
  • The job is dispatched to the remote pipeline, which requests the appropriate raw data from Magma and analyzes it.
  • The remote pipeline pushes records (or errors) back to Magma, which alerts Polyphemus that the job is complete.
  • If there is an error, Polyphemus makes note of it and asks for intervention.

Polyphemus could also track what version of a specific pipeline was used, and whether it has been invalidated by a newer version. For example, if we discover our RNAseq alignment was incorrect, we can compose a new, better pipeline and update the analysis requirements (e.g., "RNAseq raw data must be analyzed with at least version 10.1 of pipeline 'rnaseq'"). Polyphemus can then find all data that has not yet been analyzed, or data that needs to be re-analyzed using a new method, and queue new jobs for analysis.

In general this means we just need to feed information into the system, and it will slowly produce the best available analysis of the data as it exists currently.

@graft graft self-assigned this Dec 6, 2017
@graft graft closed this as completed Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant