Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improv.] improve schema for current mongodb storage , especially for category indexing #36

Closed
rajatkb opened this issue Mar 8, 2020 · 3 comments
Labels
bug Something isn't working gssoc20 GSSOC label for gscco20 tag hard GSSOC label for beginner tag

Comments

@rajatkb
Copy link
Owner

rajatkb commented Mar 8, 2020

The category field in the current schema is storing an array of values.

"categories" : [ "semantic web", "wireless", "web services", "internet" ],

This cannot be indexed as these are not unique across each entry in the database. We have some queries that specifically require to get entries corresponding to one category. Like

query: get all conference of category "wireless"

For such query, the complexity will rise to O(n) for the database. We need a solution to this problem.

@rajatkb rajatkb added bug Something isn't working hard GSSOC label for beginner tag gssoc20 GSSOC label for gscco20 tag labels Mar 8, 2020
@rajatkb rajatkb added this to To do in Scrapper-Service via automation Mar 8, 2020
@rajatkb
Copy link
Owner Author

rajatkb commented Mar 14, 2020

Update: The array fields can be indexed through multi indexing.

Multikey indexing
Bounds of multi key indexing
Bounds ||

The documents can be indexed on the array fields for now. especially is the search criteria is only one entry from the array. Which allows for faster results (fist entry is used for index based filtering) . Currently we can resort to some sort of in memory caching to reduce response times and DB calls.

@Rukmini-Meda
Copy link

@rajatkb I would like to work on this issue. Please assign this to me as part of GSSoC'20.

@rajatkb
Copy link
Owner Author

rajatkb commented Apr 29, 2020

The issue is resolved for now. But if you see any improvements can be done in the queries and indexed fields. You can put a PR.

@rajatkb rajatkb closed this as completed May 9, 2020
Scrapper-Service automation moved this from To do to Done May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gssoc20 GSSOC label for gscco20 tag hard GSSOC label for beginner tag
Projects
Notifier-Service
  
Awaiting triage
Development

No branches or pull requests

2 participants