Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set redirect and delete old entry when entries were merged #117

Open
acka47 opened this issue Jun 8, 2018 · 14 comments
Open

Set redirect and delete old entry when entries were merged #117

acka47 opened this issue Jun 8, 2018 · 14 comments
Assignees
Projects

Comments

@acka47
Copy link
Contributor

@acka47 acka47 commented Jun 8, 2018

Reqeuested via email.

On 08.06.2018 12:03, P.R. wrote:

Ein Problem, das ein paar Mal aufgetaucht ist, ist die Zusammenlegung doppelt angelegter individualisierter Personensätze in der GND: zum Beispiel 1090750048 und 111508401. Hier wäre es für mich ideal, wenn bei Abfrage des aufgelösten Datensatzes (hier 1090750048) eine Weiterleitung zum fortgeführten Datensatz (als 111508401) erkennbar wäre

Currently, we still have both entries in lobid-gnd so it might be the case that deletions are not taken into account in the update workflow right now:

The merged record (111508401) has information in it about the ID of the deleted record, both in oldAuthorityNumbern in deprecatedUri:

{
  "@context":"http://lobid.org/gnd/context.jsonld",
  "id":"http://d-nb.info/gnd/111508401",
  "oldAuthorityNumber":[
    "(DE-588a)111508401",
    "(DE-588)1090750048"
  ],
  "deprecatedUri":[
    "http://d-nb.info/gnd/1090750048"
  ]
}

There are two values in oldAuthorityNumber and only one in deprecatedUri. I guess the second only reflects the changes after the start of the Linked Data GND service while the other also gives back old IDs from before that. (This would also explain that there are ~400k entries with deprecatedUri but >9 Million entries with oldAuthorityNumber.)

For implementing this feature I think it is sufficient to work with the deprecatedUri field and I suggest the following:

  • When someone directly looks up an entry there is a lookup in deprecatedUri whether there is a match of the GND ID.
  • If yes, the client is redirected to the resource containing the deprecatedUri entry.

Nonetheless, we have to get rid of the deleted entry on a regular basis and not only when indexing a new base dump. Otherwise, search results will contain deprecated entries.

@acka47
Copy link
Contributor Author

@acka47 acka47 commented Jun 8, 2018

I just took a look at the many values in oldAuthorityNumber. They obviously stem from the time before the GND when different authority files for Körperschaften, Personen, Schlagwörter etc. existed. If I only search for entries with "(DE-558)" in this field (and not DE-558a, DE-558b, DE-558c for the old deprecated ones) I get exactly the same amount of records as with querying for deprecatedUri:

http://lobid.org/gnd/search?q=oldAuthorityNumber%3A%22%28DE-588%29%22
vs.
http://lobid.org/gnd/search?q=_exists_%3AdeprecatedUri

Loading

@fsteeg
Copy link
Member

@fsteeg fsteeg commented Jun 13, 2018

Loading

@fsteeg fsteeg assigned acka47 and unassigned fsteeg Jun 13, 2018
@acka47
Copy link
Contributor Author

@acka47 acka47 commented Jun 13, 2018

+1 for the redirect. Shall I open another issue for removing deleted entries (because the entries are still delivered via the API when querying, e.g. http://lobid.org/gnd/search?q=Erdmann%2C+Elisabeth+von),

Loading

@acka47 acka47 removed their assignment Jun 13, 2018
@fsteeg fsteeg changed the title Set redirect when entries were merged Set redirect and delete old entry when entries were merged Jun 14, 2018
@fsteeg fsteeg self-assigned this Jun 14, 2018
@fsteeg fsteeg added working and removed review labels Jun 14, 2018
@fsteeg
Copy link
Member

@fsteeg fsteeg commented Jun 14, 2018

Shall I open another issue for removing deleted entries

I'll continue with the deletion here.

Loading

@fsteeg
Copy link
Member

@fsteeg fsteeg commented Jun 15, 2018

Loading

@fsteeg fsteeg assigned acka47 and unassigned fsteeg Jun 15, 2018
@fsteeg fsteeg added review and removed working labels Jun 15, 2018
@acka47
Copy link
Contributor Author

@acka47 acka47 commented Jun 18, 2018

Nice. +1

Loading

@fsteeg
Copy link
Member

@fsteeg fsteeg commented Jun 18, 2018

Loading

@acka47
Copy link
Contributor Author

@acka47 acka47 commented Jun 18, 2018

I wrote and sent the email. Closing.

Loading

@acka47 acka47 closed this Jun 18, 2018
@acka47 acka47 reopened this Jun 22, 2020
@acka47
Copy link
Contributor Author

@acka47 acka47 commented Jun 22, 2020

I am not sure whether this is still working as expected. From a discussion at Wikidata (see https://www.wikidata.org/wiki/Talk:Q567#GND) I became aware of these two records for Angela Merkel:

At the DNB 1210121425 redirects to the canonical entry, see https://d-nb.info/gnd/1210121425. Also, the deprecatedUri and oldAuthorityNumber statements are there in the canonical entry, from :

<https://d-nb.info/gnd/119545373> gndo:gndIdentifier "119545373";
  gndo:oldAuthorityNumber "(DE-588)1210121425";
  owl:sameAs <https://d-nb.info/gnd/1210121425>;
  dnbt:deprecatedUri "https://d-nb.info/gnd/1210121425" .

So it looks like the implementation to delete entries that are marked deprecated in other entries and to set a redirect does not work in this case.

Loading

@fsteeg fsteeg self-assigned this Jun 22, 2020
@acka47
Copy link
Contributor Author

@acka47 acka47 commented Sep 30, 2020

The problem from the former comment with the two entries for Angela Merkel is solved. Maybe the reason for this problem was a missing update which was fixed with loading a new full dump and won't happen again because of #268

However, @LibrErli noticed the problem with our incomplete implementation (setting a redirect without deleting the entry for the redirected URI). Quoting from https://openbiblio.social/@librerli/104950423916517684:

https://lobid.org/gnd/search?q=alphons+danzer bringt zwei Ergebnisse - sind Dubletten. Hab ich der GND-Red gemeldet (ohne auf der DNB-Seite nochmal zu checken). Dort wurde mir gezeigt, dass die Datensätze im Juli zusammengeführt wurden. 1055248781 ist Verweis in 127701818

While the search delivers two results, the redirect works as expected:

$ curl -IL https://lobid.org/gnd/1055248781
HTTP/1.1 301 Moved Permanently
Date: Wed, 30 Sep 2020 06:45:29 GMT
Server: Apache/2.4.10 (Linux/SUSE)
Location: /gnd/127701818
Access-Control-Allow-Origin: *

HTTP/1.1 200 OK
Date: Wed, 30 Sep 2020 06:45:29 GMT
Server: Apache/2.4.10 (Linux/SUSE)
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 3940

Loading

@LibrErli
Copy link

@LibrErli LibrErli commented Sep 30, 2020

thanks for the information that there is already also in lobid a redirect from https://lobid.org/gnd/1055248781 to https://lobid.org/gnd/127701818
but maybe it could be confusing for some user (like me) that both items are shown in the result list.

Loading

@fsteeg
Copy link
Member

@fsteeg fsteeg commented Sep 30, 2020

I think this might be a deployment issue. It's possible for inconsistencies to sneak in since the deprecated URIs to be deleted are collected and stored during the data conversion, but actually deleted after the indexing. Sometimes (e.g. for #268) we do a full reindex without full conversion before. I suspect that some deletions were missed here. I'd suggest we revisit this after the next full dump conversion and indexing. If that fixes the issue, we should look into making the deletions setup more robust to avoid this kind of error in the future.

Loading

@acka47
Copy link
Contributor Author

@acka47 acka47 commented May 11, 2021

The issue from #117 (comment) is now fixed. So, we probably "should look into making the deletions setup more robust to avoid this kind of error in the future".

Loading

@acka47 acka47 removed their assignment May 11, 2021
@acka47 acka47 added this to Backlog in lobid board via automation May 11, 2021
@fsteeg
Copy link
Member

@fsteeg fsteeg commented May 12, 2021

See also #284

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants