Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule task to build genome index if one is not available #49

Merged
merged 1 commit into from
Apr 16, 2021

Conversation

benjiec
Copy link
Contributor

@benjiec benjiec commented Apr 16, 2021

This allows a GET request to the RO server to trigger a celery task to build genome index, if one does not exist. Currently if there's no genome index, the GET call fails, and the index is not built, so subsequent calls also fail.

@benjiec benjiec requested a review from yaoyuyang April 16, 2021 13:17
@shared_task
def build_genome_fragment_indices(genome_id):
genome = Genome.objects.get(pk=genome_id)
genome.indexed_genome()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. So this is different from the build blastdb for the genome, right? also where is the index stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices are per-fragment, and maps sequence bps to a bp numeric number. It's stored in the database.

@@ -330,6 +330,10 @@ def on_get(self, request, genome_id):
args = q_parser.parse_args(request)
field = args['field']

if not genome.has_location_index:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes the genome index not built when it was first created? random failures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices are not required. In fact, because the indices are not shared between genomes, if we build one per fragment all the time, it would remove some of the storage advantage of Edge for engineered genomes. Hence, we don't actually create one when a new genome is created, only when someone go hit the genome on the website, or hit an API that requires it. The issue is that with RO server, if you hit there, you will never end up building the indices. So the solution here is to have building the indices be done via celery, on RO APIs. If a RO server receives an API that's meant for updating, then that's a caller error.

@benjiec benjiec merged commit 5308315 into master Apr 16, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants