Add a worker to run longer analyses (polyphemus) #12

graft · 2017-12-06T20:16:55Z

Polyphemus is a worker that finds data uploaded to magma and manages its analysis via an external pipeline.

Summary

At the moment data from external pipelines is fed into Magma by third parties. A typical workflow might go like this:

Analyst receives data (e.g. raw sequence from RNAseq) and performs analysis on it (e.g. alignment + quantification)
Analyst composes a document, validates it against a Magma template and sends it with appropriate credentials to Magma for insertion
Magma approves and inserts

However, as more and more data flows into the system and the number of tasks increase this sort of manual intervention will become problematic. We can imagine, instead, replacing the analyst's role with a software worker (polyphemus). This time the workflow might go like this:

Data is uploaded into Magma (e.g. raw sequence from RNAseq)
Polyphemus searches for unanalyzed data to consume (e.g. a sample with raw sequence but no quantification results)
Polyphemus determines the correct analysis to run on the unanalyzed data and composes a configuration script for a remote pipeline.
The job is dispatched to the remote pipeline, which requests the appropriate raw data from Magma and analyzes it.
The remote pipeline pushes records (or errors) back to Magma, which alerts Polyphemus that the job is complete.
If there is an error, Polyphemus makes note of it and asks for intervention.

Polyphemus could also track what version of a specific pipeline was used, and whether it has been invalidated by a newer version. For example, if we discover our RNAseq alignment was incorrect, we can compose a new, better pipeline and update the analysis requirements (e.g., "RNAseq raw data must be analyzed with at least version 10.1 of pipeline 'rnaseq'"). Polyphemus can then find all data that has not yet been analyzed, or data that needs to be re-analyzed using a new method, and queue new jobs for analysis.

In general this means we just need to feed information into the system, and it will slowly produce the best available analysis of the data as it exists currently.

graft self-assigned this Dec 6, 2017

graft closed this as completed Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a worker to run longer analyses (polyphemus) #12

Add a worker to run longer analyses (polyphemus) #12

graft commented Dec 6, 2017

Add a worker to run longer analyses (polyphemus) #12

Add a worker to run longer analyses (polyphemus) #12

Comments

graft commented Dec 6, 2017

Summary