New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add REST service for mixed matching approach #12
Comments
I am doing a new iteration on this. The "super" matching service would work as this:
The logic would be the following one:
Validation takes place if Ideally it should work with PMID, PMCID or ISTEX_ID instead of DOI (any "strong" identifier). |
The point number 4 is not clear, what to do if the postvalidation doesn't work. Why should be re-validated with the author/title extracted from the full string? shouldn't we try to extract them with grobid and try to match (2)? Second, so are you sure you want to change |
about point 4) I think it is clear but I can rephrase with more details:
In 2) we use the provided metadata only, Grobid is not called. So it's a different setting, with higher priority as it does not involve costly Grobid parsing. |
Yes because, without post-validation, the false positive due to full reference string matching will kill the accuracy. Parsing the reference is a way to allow this post-validation even when no metadata are provided, so it is something to exploit as much as possible. |
what does that means? if the title is not present then it's ignored and if it's present is considered for the postValidation while if the author is not present the post validation will fail? is it correct? |
So this is only for step 4) if the title is not provided as metadata, but author is, we can post-validate just with author (my observation is that most of the cases it is enough, but it might require more tests) this is an acceptable trade-off (there is also good chance that the title is anyway not present in the raw reference bibliography). Of course if both title and author are provided, we use both. If the post-validation with author only passes, we are done, success. If the post-validation with author only fails, we are also done, 404. There is no additional grobid reference parsing in this case (this is the trade-off). |
another question.... back to point 1. (but valid everytime) if the author is not present and postvalidate = true its a 404, right? If this is the case, this might be a problem for DOI lookup cause if postvalidation is not disable will return 404 |
mmm if author is not present, we don't post-validate in (1), we go basically then to (3) - and if we reach (4) we parse the raw reference string to try to get an author. There is no "author is not present and postvalidate = true" possible in the whole process. The availability of at least an author name (provided or extracted) if a condition for postvalidating. |
So done with PR #18 |
Finally the mixed matching approach has been removed in version |
From GROBID we usually have both the raw full reference string and the parsed extracted fields (authors and title).
As addition to the present stuff, it would be nice to have as input to the matching service:
Post-validation is necessary to avoid false positive due to the search-based step (which is only the block step normally).
The text was updated successfully, but these errors were encountered: