Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend database schemas #90

Open
jpwahle opened this issue Aug 18, 2022 · 2 comments
Open

Extend database schemas #90

jpwahle opened this issue Aug 18, 2022 · 2 comments
Assignees
Labels
refactoring Pull Request: Refactoring code without logic change

Comments

@jpwahle
Copy link
Owner

jpwahle commented Aug 18, 2022

Is your feature request related to a problem? Please describe.
Currently, everything is stored in the paper collection while the other schemas that were introduced in 2c59cba have not been used.
Because especially aggregate and group are expensive we want to avoid these steps by using the separate collections now.

Describe the solution you'd like
Each dashboard that requires aggregation, grouping, etc. should have a separate collection (e.g., authors, venues).
Also MongoDB should write data to the unused collections and map back to the paper objects.
For fast filtering, each collection should have the key filter elements (e.g., year, inCitationsCount, ...)
The solution should be backward compatible, so the paper collection should remain to be the same.

@jpwahle jpwahle added the refactoring Pull Request: Refactoring code without logic change label Aug 18, 2022
@jpwahle jpwahle added this to Additional features in cs-insights via automation Aug 18, 2022
@jpwahle jpwahle moved this from Additional features to Todo in cs-insights Aug 18, 2022
@jpwahle jpwahle moved this from Todo to Near Future in cs-insights Aug 22, 2022
@jpwahle jpwahle removed this from Near Future in cs-insights Aug 23, 2022
@jpwahle jpwahle added this to Backlog in cs-insights Sep 26, 2022
@jpwahle jpwahle removed this from Backlog in cs-insights Oct 12, 2022
@jpwahle
Copy link
Owner Author

jpwahle commented Nov 3, 2022

One suggestion here is to switch to a MySQL / PostgreSQL database.

Pros:

  • Potentially much faster
  • Can be hosted by GWDG

Cons:

  • We have to touch all schemas
  • Normalizing data

@jpwahle
Copy link
Owner Author

jpwahle commented Dec 1, 2022

We should also think about adding more data from FatCat and Internet Archive Scholar which export everything in PostgreSQL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Pull Request: Refactoring code without logic change
Projects
Status: 👀 In review
Development

No branches or pull requests

2 participants