-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results seem to be vastly improved by first parsing citation with GROBID #25
Comments
Hello @bfirsh ! Thanks a lot for the interest in glutton! These are all good remarks and from this I think I need to iterate on the documentation to make things clearer for the users. The logic of the Basically if you only have the full raw bibliographical string, which is your case I think, the best is to pass it to glutton and configure a GROBID service to be used by glutton. In the config file
ok it should be called This way of calling glutton is close to what you are doing in your code, but glutton might use more metadata to speed-up the matching (look-up with Normally if the raw reference string is passed alone (without |
Interesting -- I am running it with grobid and I have configured the grobidPath. I know it can talk to grobid, because it refuses to resolve at all if it isn't connected to grobid (it says "you need either grobid or pass firstAuthor" or something along those lines). It's definitely calling grobid too -- I can see it in grobid's logs. Even still, I get much better results by parsing first and then passing Maybe something is broken, but it is swallowing the real error? |
I made some tests regarding the use of GROBID by glutton and the calls are working fine - if GROBID is running and the config is pointing to the running service, you should not get the "Post-validation not possible, no title/first author provided for validation and GROBID is not available." However, there was a bug in the way the GROBID response was parsed, the same parser instance was reused for each GROBID response, resulting in wrong metadata after the second call. It might be the reason for this loss of accuracy. It's fixed in the current master. |
Looks like it was that bug. I updated to upstream master and this seems to be fixed. Thanks! :) |
I think this is what you're getting at in #13 and #21, but I figure it'd be useful to share my specific experience with using biblio-glutton.
I'm using biblio-glutton to add citation links to https://www.arxiv-vanity.com/. It works really well -- thank you!
The implementation is a bit weird though. I would have thought I would just be able to do
/service/lookup?biblio=...
and be done with it. That gave me very few positive results though -- I think I tried several papers and only got perhaps 5 positive matches.I found I got much better results (most citations working, almost all on some papers) by first parsing the citation with GROBID then passing
atitle
andfirstAuthor
to biblio-glutton. I am getting almost perfect results -- I haven't seen a false positive yet.Here's the lookup code I'm using, if you're interested. You can see the high-level logic at the bottom where it first does a grobid call, then a biblio-glutton call.
This simple improvement makes me wonder -- why doesn't biblio-glutton do this internally? Am I doing something stupid?
The text was updated successfully, but these errors were encountered: