Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/DBLP abstract extraction process change #1993

Merged
merged 23 commits into from
Apr 1, 2024

Conversation

xkopenreview
Copy link
Contributor

this pr should update the process function

  • DBLP.org/-/Record to add abstract (if html is available and the abstract is extracted) when a paper is imported
  • DBLP.org/-/Abstract to allow manual triggering of abstract extraction

@xkopenreview xkopenreview marked this pull request as draft January 24, 2024 18:57
@@ -0,0 +1,28 @@
async function process(client, edit, invitation) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the abstract invitation should have a process function to extract the abstract?

This is invitation is being to set the abstract value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to allow manual triggering of abstract extraction when the process function in DBLP.org/-/Record fail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the abstract in the edit would be empty or what?

if the edit has a value, do you overwrite it?

and it is seems to be an infinite loop here, you are posting an edit with the same invitation of the process function.


const html = note.content.html?.value;

if (html) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great! should we do the same to extract the PDF link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to extract the PDF link?
from what i understand it's just the redirected url of html, not the link to a .pdf file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but Andrew wants to complete the PDF value with the link to the PDF not to a webpage.

@xkopenreview
Copy link
Contributor Author

to test importing many dblp papers in v2, existing notes in v1 need to be removed (so that there's no match)
the notes index needs to be created again so that the note can't be found in elasticsearch and will be treated as a new note

@melisabok
Copy link
Member

can you try Andrew's profile? there are a lot of new publications that were not imported or we can pick another profile that doesn't exist in the dev site and create it.

@melisabok melisabok marked this pull request as ready for review March 8, 2024 21:44
@xkopenreview
Copy link
Contributor Author

discussed with @carlosmondra again about this and decided to:

  1. process function call Tools.extractAbstract(html url of dblp paper)
  2. Tools.extractAbstract is a wrapper of fetch which make the call to cloud function url+html url of dblp paper and return json object (which contains abstract,pdf,error)
  3. cloud function url is set as an environment var in image of cloud function and Tools.extractAbstract will read that environment variable

so that there's no dependency between meta-extraction package and api and both fetch and cloud function url are not exposed.

@melisabok
Copy link
Member

In order the test to pass we may need to mock the service or something similar.

Maybe if extractAbstract can not reach the cloud function, we shouldn't throw an error?

@xkopenreview
Copy link
Contributor Author

discussed with Carlos again about this
decided to add the cloud function url in openreview-js

@xkopenreview
Copy link
Contributor Author

the test failure looks random to me
the following tests which circleci was reporting error are passed in my local

  • test_icml_conference.py
  • test_venue_request_v2
  • test_venue_request_v2

@melisabok can you help to rerun the test

@melisabok
Copy link
Member

checking now.

@melisabok
Copy link
Member

Some API 1 tests are failing because I think time.sleep(0.5) is not enough to wait for the button to be clickable. I increased it.

@melisabok
Copy link
Member

A try/catch was missing in the abstract process function, tests should pass now.

@melisabok melisabok merged commit fd22f09 into master Apr 1, 2024
1 check passed
@melisabok melisabok deleted the feature/dblp-abstract-process branch April 1, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants