-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keep the extracted affiliations if none found from consolidation. #563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be possible to rename bibo
and bib
with more appropriate names?
Might be a lot of work but this class deserves to have a unit test perhaps :-)
See my other comment below ;-)
// we have a match (initial) | ||
if ( StringUtils.isNotBlank(aut2.getFirstName()) && StringUtils.isNotBlank(aut.getFirstName()) | ||
&&( aut.getFirstName().equals(aut2.getFirstName()) || | ||
((aut.getFirstName().length() == 1) && (aut.getFirstName().equals(aut2.getFirstName().substring(0,1)))))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the initial is with a dot? so it has a length of 2?
What about the initial is present in the aut2
instead of aut
? shouldnt you check that condition too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes actually we should compare the initials for the firstnames either for aut2
or aut
, , so the check would be either they are equal or they have the same initial letter in their firstname , WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the check (whch was like this before, so I might be wrong) first check if aut
has length 1, and if so it compares aut
with the first character of aut2
what if aut
does not have length 1, but aut2
is just the initial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I didn't change this check actually, it was like this..
basically if it is not the same firstname or aut
firstname has more than one character then it 's not considered the same author form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I suggest is for firstnames to compare only the first char (initials)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
I noticed another thing, at line 4205, there is bib.setFullAuthors(bibo.getFullAuthors());
so the authors are copied anyway from the second biblio... or should I better be in bed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple of tests, there is an ignored one that is failing and I'm wondering what is the rule here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the data about authors coming from consolidation is considered more accurate than the parsed one,
in this PR I would like to keep the affiliations if none fetched from the consolidation which seems to be more frequent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the test, so far the affiliations from crossref are just raw and not deserialised, so maybe we could keep the affiliations parsed from the pdf ? I'm not sure how good are the affiliations from crossref actually
So as explained on Mattermost, I updated this part with the new Person object deduplication, and this part becomes simply now: if (bibo.getFullAuthors() != null) {
if (CollectionUtils.isEmpty(bib.getFullAuthors()))
bib.setFullAuthors(bibo.getFullAuthors());
else {
// we have the complete list of authors so we can take them from the second
// biblio item and merge some possible extra from the first when a match is
// reliable
List<Person> thePersons = bib.getFullAuthors();
thePersons.addAll(bibo.getFullAuthors());
Person.deduplicate(thePersons);
}
} and the deduplication handles the merging of affiliation. |
With this approach, it works fine when authors are well recognized initially, but when the authors are badly extracted from the PDF, we will have as authors at the same time the erroneous extraction and the correct consolidated ones. So maybe this could be refined to prioritize the consolidated authors? |
ok too many authors added with this approach, I'll try to find something safer... |
No description provided.