Implement automated storing to db/backend #70

jpwahle · 2021-11-02T14:25:12Z

Is your feature request related to a problem? Please describe.
We need to store author, venue, and publication data into our backend automatically when the next d3 version is released.

Describe the solution you'd like
Implement a backend class that:

creates papers and authors
updates papers and authors
delets papers and authors

Additional context
https://www.mongodb.com/docs/database-tools/mongoimport/#std-label-ex-mongoimport-merge

github-actions · 2021-11-06T09:40:39Z

Branch issue-70 created!

jpwahle · 2022-08-23T09:27:47Z

The final layer missing here is the automatic update of the backend.
One of the main issues is that more than 6 million queries have to be sent to the backend to check whether a paper exists and has to be updated/inserted.

One solution would be to hash all papers and let the backend return a list of all hashes that the crawler can compare to without sending any other request. Then the crawler can decide what to update/write which results in few requests per update.

trannel · 2022-08-23T11:28:21Z

So far we used the code I created on branch https://github.com/gipplab/cs-insights-crawler/tree/data-upload-full in the file upload/d3_full.py. There might be some helpful things there, that could help with this issue.

jpwahle · 2023-09-18T12:37:57Z

@muhammadtalha242 Is the new data ingestion through SemanticScholar already ready?

jpwahle added Epic Larger stories of issues with sub-issues enhancement Pull Request: A new feature and removed Epic Larger stories of issues with sub-issues labels Nov 2, 2021

jpwahle mentioned this issue Nov 5, 2021

New release after refactoring and documentation #76

Merged

jpwahle self-assigned this Nov 6, 2021

jpwahle assigned alexandertv and unassigned jpwahle Nov 16, 2021

jpwahle mentioned this issue Nov 2, 2021

DBLP Client, Processor, Backend Client #68

Closed

3 tasks

jpwahle assigned jpwahle and unassigned alexandertv Nov 21, 2021

jpwahle mentioned this issue Aug 23, 2022

Dataset Release v2.0 #91

Closed

6 tasks

jpwahle changed the title ~~Implement backend client~~ Implement automated storing to db/backend Sep 12, 2022

jpwahle assigned muhammadtalha242 and unassigned jpwahle Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement automated storing to db/backend #70

Implement automated storing to db/backend #70

jpwahle commented Nov 2, 2021 •

edited

github-actions bot commented Nov 6, 2021

jpwahle commented Aug 23, 2022

trannel commented Aug 23, 2022

jpwahle commented Sep 18, 2023

Implement automated storing to db/backend #70

Implement automated storing to db/backend #70

Comments

jpwahle commented Nov 2, 2021 • edited

github-actions bot commented Nov 6, 2021

jpwahle commented Aug 23, 2022

trannel commented Aug 23, 2022

jpwahle commented Sep 18, 2023

jpwahle commented Nov 2, 2021 •

edited