Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing Layer for Run Data Managers #189

Open
jmchilton opened this issue Dec 2, 2022 · 1 comment
Open

Preprocessing Layer for Run Data Managers #189

jmchilton opened this issue Dec 2, 2022 · 1 comment

Comments

@jmchilton
Copy link
Member

Simon start work on a higher-level genome processing outside of Ephemeris with...

This is a great idea and we should formalize it and make it more robust and broadly useful by moving this functionality into Ephemeris and right into the run-data-managers endpoint.

MVP:

  • Establish Pydantic models (or maybe pykwalify but probably not?) for the a low-level run data managers layer - that is the current inputs to run-data-managers.
  • Write Pydantic models for syntactic sugar that covers:
    • If genomes key is available, read them and convert to invocations of the data_manager_fetch_genome_dbkeys_all_fasta tool as covered by make_fetch.py - assume latest version of data_manager_fetch_genome_dbkeys_all_fasta.
    • Prepend those invocations to the list of managers to run.
    • Write those all back to the lower level YAML description and validate.
  • In run-data-managers run the preprocessor before executing these.

Follow Ups Enhancements:

  • After Allow shed-tools to consume run_data_managers.yaml #188 is implemented, run the preprocessor before looking for tool ids.
  • Pick an important data manager that doesn't start from genomes/dbkeys (Kraken I suppose - or is gemini another thing?) and generalize the initial sources like this.
  • Pick an important data manager that indexes genomes (further along in the "workflow") and define some syntactic sugar to make the invocation of this cleaner from XML (TODO come up with example or drop this bullet point if it doesn't make sense)
@natefoo
Copy link
Member

natefoo commented Dec 13, 2022

Maybe worth mentioning this here: galaxyproject/galaxy#15188

The issue is in Galaxy but I'm not sure how involved a Galaxy-side fix would be, but Ephemeris could work around it fairly easy by querying data tables first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants