Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize queries #25

Closed
2 tasks done
trannel opened this issue Mar 18, 2022 · 1 comment
Closed
2 tasks done

Optimize queries #25

trannel opened this issue Mar 18, 2022 · 1 comment
Labels
duplicate This issue or pull request already exists enhancement Pull Request: A new feature

Comments

@trannel
Copy link
Contributor

trannel commented Mar 18, 2022

Is your feature request related to a problem? Please describe.
Once #24 is solved some queries might not perform well anymore. The /info, /quartiles, and topk endpoints take very long to respond for authors (around 1 minute). The issue comes from MongoDB has to:

  1. $unwind 5 million papers, into 10+ million items
  2. $group those into 2.7 million group
  3. $sort those 2.7 million authors (without index)

Describe the solution you'd like
Optimize all queries that do not perform well and fix any workarounds.

  • Fix the paged endpoint: It has issues with the $lookup/$sort/$project in the pipeline, so we changed the order of the pipeline as a workaround. This returns incorrect results when we sort by venue or authors. Originally the $project stage was before the $sort stage. done, but there is a new issue
  • Change the schema, so all information is duplicated into each author. This will make sure all filters can be applied to each author and without $unwind/$group or $lookup.

Describe alternatives you've considered
Should the queries without filters still take too long we could add some default values for filters.

@trannel trannel added the enhancement Pull Request: A new feature label Mar 21, 2022
@trannel trannel self-assigned this Mar 21, 2022
@trannel trannel added this to Data/Optimization in cs-insights Mar 21, 2022
Repository owner deleted a comment from github-actions bot Apr 26, 2022
@trannel trannel moved this from Data/Optimization to Core features in cs-insights May 19, 2022
@trannel trannel removed their assignment Jul 5, 2022
@trannel trannel moved this from Core features to Out-of-scope in cs-insights Jul 5, 2022
@trannel trannel moved this from Small Out-of-scope to Large Out-of-scope in cs-insights Jul 11, 2022
@jpwahle jpwahle added the duplicate This issue or pull request already exists label Aug 21, 2022
@jpwahle
Copy link
Owner

jpwahle commented Aug 21, 2022

It will be fixed in database Schema remodeling #90

@jpwahle jpwahle closed this as completed Aug 21, 2022
cs-insights automation moved this from Far Future to Done Aug 21, 2022
@jpwahle jpwahle removed this from Done in cs-insights Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement Pull Request: A new feature
Projects
None yet
Development

No branches or pull requests

2 participants