Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: 馃悰 optimize the query to get the list of valid datasets #333

Merged
merged 2 commits into from
Jun 1, 2022

Conversation

severo
Copy link
Collaborator

@severo severo commented Jun 1, 2022

optimization: avoid getting first all the dataset names from the splits,
and then put them into a set: it's a lot quicker to use distinct to only
get the distinct name from mongo.

fixes #326

On a dump of the production database, it now takes 75ms

0.075 cache.py:456(get_valid_or_stalled_dataset_names)

instead of 2 seconds! 馃ゲ

2.082 cache.py:456(get_valid_or_stalled_dataset_names)

For the record: I use cProfile to profile the calls

optimization: avoid getting first all the dataset names from the splits,
and then put them into a set: it's a lot quicker to use distinct to only
get the distinct name from mongo.

fixes #326
@severo severo marked this pull request as ready for review June 1, 2022 14:15
@severo severo merged commit 2e3b180 into main Jun 1, 2022
@severo severo deleted the optimize-valid-endpoint branch June 1, 2022 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce the response time of /valid
1 participant