Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve callout to reference match #614

Merged
merged 1 commit into from
Aug 2, 2020
Merged

Improve callout to reference match #614

merged 1 commit into from
Aug 2, 2020

Conversation

mash
Copy link
Contributor

@mash mash commented Aug 2, 2020

Improve callout to reference match by removing parentheses often seen in callouts.

As seen in the docs, callouts often include parentheses.
https://grobid.readthedocs.io/en/latest/training/fulltext/

I saw many occasions that these parentheses prevent matching the correct references.
See the red boxes when trying arxiv.org/pdf/1801.01290.pdf

Screen Shot 2020-08-02 at 12 56 29

@coveralls
Copy link

Coverage Status

Coverage remained the same at 38.434% when pulling 11cbbcf on mash:callout-to-reference-author-match into ff10968 on kermitt2:master.

@kermitt2
Copy link
Owner

kermitt2 commented Aug 2, 2020

Many thanks @mash ! Very good finding, for some reasons I was thinking that the parenthesis were removed somewhere upstream in the process, but you're absolutely right.

In addition to your example PDF, here's the evaluation for callout resolution on the PMC set (1943 PDF - 139,835 reference callouts), branch update_header without your PR:

Precision citation contexts:     81.32
Recall citation contexts:        69.13
fscore citation contexts:        74.73

and with your PR:

Precision citation contexts:     81.41
Recall citation contexts:        69.86
fscore citation contexts:        75.2

Note that there are many references with numerical reference markers in PMC, so the positive impact should be much larger with arXiv for instance.

@kermitt2 kermitt2 merged commit cf446ec into kermitt2:master Aug 2, 2020
@mash mash deleted the callout-to-reference-author-match branch August 3, 2020 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants