fix: 🐛 optimize the query to get the list of valid datasets #333

severo · 2022-06-01T14:15:51Z

optimization: avoid getting first all the dataset names from the splits,
and then put them into a set: it's a lot quicker to use distinct to only
get the distinct name from mongo.

fixes #326

On a dump of the production database, it now takes 75ms

0.075 cache.py:456(get_valid_or_stalled_dataset_names)

instead of 2 seconds! 🥲

2.082 cache.py:456(get_valid_or_stalled_dataset_names)

For the record: I use cProfile to profile the calls

optimization: avoid getting first all the dataset names from the splits, and then put them into a set: it's a lot quicker to use distinct to only get the distinct name from mongo. fixes #326

fix: 🐛 optimize the query to get the list of valid datasets

79144dd

optimization: avoid getting first all the dataset names from the splits, and then put them into a set: it's a lot quicker to use distinct to only get the distinct name from mongo. fixes #326

severo marked this pull request as ready for review June 1, 2022 14:15

feat: 🎸 update libcache in api

1df2327

severo merged commit 2e3b180 into main Jun 1, 2022

severo deleted the optimize-valid-endpoint branch June 1, 2022 14:23

severo mentioned this pull request Jun 1, 2022

feat: 🎸 update dependencies to update libcache and libqueue #336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: 🐛 optimize the query to get the list of valid datasets #333

fix: 🐛 optimize the query to get the list of valid datasets #333

severo commented Jun 1, 2022 •

edited

fix: 🐛 optimize the query to get the list of valid datasets #333

fix: 🐛 optimize the query to get the list of valid datasets #333

Conversation

severo commented Jun 1, 2022 • edited

severo commented Jun 1, 2022 •

edited