-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include sameAs and depiction information from EntityFacts in lobid-gnd index #69
Comments
How will we embed the information? It gets a bit problematic as we already have Current lobid: {
"sameAs": [
"http://viaf.org/viaf/313478392",
"http://orcid.org/0000-0003-0232-7085"
]
} {
"sameAs":[
{
"@id":"http://d-nb.info/gnd/1066621098/about",
"collection":{
"abbr":"DNB",
"name":"Gemeinsame Normdatei (GND) im Katalog der Deutschen Nationalbibliothek",
"publisher":"Deutsche Nationalbibliothek",
"icon":"http://www.dnb.de/SiteGlobals/StyleBundles/Bilder/favicon.png?__blob=normal&v=1"
}
},
{
"@id":"http://orcid.org/0000-0003-0232-7085",
"collection":{
"abbr":"ORCID",
"name":"Open Researcher and Contributor ID",
"publisher":"ORCID"
}
},
{
"@id":"http://viaf.org/viaf/313478392",
"collection":{
"abbr":"VIAF",
"name":"Virtual International Authority File (VIAF)",
"publisher":"OCLC",
"icon":"http://viaf.org/viaf/images/viaf.ico"
}
}
]
} After some discussion with @fsteeg, we prefer the following approach:
{
"sameAs":[
{
"@id":"http://viaf.org/viaf/313478392"
},
{
"@id":"http://orcid.org/0000-0003-0232-7085"
}
]
}
|
Another challenge is the linking to wikipedia. From GND we get the following: {
"id":"http://d-nb.info/gnd/118634313",
"wikipedia":[
"https://de.wikipedia.org/wiki/Ludwig_Wittgenstein"
]
} In EntityFacts we get links to the German and English wikipedia as part of the {
"@id":"http://d-nb.info/gnd/118634313",
"sameAs":[
{
"@id":"https://de.wikipedia.org/wiki/Ludwig_Wittgenstein",
"collection":{
"abbr":"dewiki",
"name":"Wikipedia (Deutsch)",
"publisher":"Wikimedia Foundation Inc.",
"icon":"https://de.wikipedia.org/static/favicon/wikipedia.ico"
}
},
{
"@id":"https://en.wikipedia.org/wiki/Ludwig_Wittgenstein",
"collection":{
"abbr":"enwiki",
"name":"Wikipedia (English)",
"publisher":"Wikimedia Foundation Inc.",
"icon":"https://en.wikipedia.org/static/favicon/wikipedia.ico"
}
}
]
} One solution would be to ignore the wikipedia links from GND and only use those from EntityFacts. This would mean that we would lose the links for all resources that are not part of EntityFacts. I think this is ok as on ~170 resources have a wikipedia link that are not also in EntitFacts, see this query for resources with wikipedia link that are neither of type person, nor corporate body nor PlaceOrGeographicName. |
As discussed offline, I am also ok with keeping the – then mostly redundant – wikipedia links in |
Currently processing EntityFacts data integrated in our JSON data. When we last discussed this, we said we'd like to stay close to the source data, as we do for the core GND data. For the
For
Added the I think the whole Maybe a
And for
Current implementation uses the EntityFacts JSON and inserts the data into the framed, compacted GND JSON. We will have to look into the context and add some things there to keep our JSON consumable as RDF ( |
If we create a better structure from this, I would like to not to go off too far from the source. Maybe something like this? {
"depiction":[
{
"id":"https://commons.wikimedia.org/wiki/Special:FilePath/MarkTwain.LOC.jpg",
"url":"https://commons.wikimedia.org/wiki/File:MarkTwain.LOC.jpg?uselang=en",
"thumbnail":"https://commons.wikimedia.org/wiki/Special:FilePath/MarkTwain.LOC.jpg?width=270"
}
]
}
In this case, I'd rather keep the structure as the icon is for the collection and not the linked thing/entry. The abbreviation etc. is also useful for some and should be included. We should probably add {
"id":"https://www.deutsche-digitale-bibliothek.de/entity/118624822",
"collection":{
"id":"http://d-nb.info/gnd/1070828033",
"abbr":"DDB",
"name":"Deutsche Digitale Bibliothek",
"publisher":"Deutsche Digitale Bibliothek",
"icon":"https://www.deutsche-digitale-bibliothek.de/appStatic/images/favicon.ico"
}
} |
To summarize our offline discussion. Reorder fields for
Add
Add
|
- Tweak structure for enriched data from EntityFacts - Add collection details for GND `sameAs` data - Add new properties to JSON-LD context See #69
Remove EntityFacts object and additional index lookup See #69
- Set up dynamic template for *.id subfields - Replace string type with text or keyword See #69
Deployed new consistent JSON structure to test: http://test.lobid.org/gnd/search?q=depiction.id:*&format=json The The Wikidata QIDs from the comment above will be in the next index (will start conversion now). |
Wikidata based http://test.lobid.org/gnd/118512676.json (sameAs with EntityFacts) |
Everything looks good except one minor thing: |
Absolutely with you. Will try to last minute squeeze this in for correction in Release 2018.03 (https://wiki.dnb.de/x/wgcbBQ)! Many thanks for alerting. |
@jentschk Nice, thanks. |
@acka47 Deployed to test: http://test.lobid.org/gnd/context.jsonld |
@jentschk provided me with a list of all collections used in EntityFacts. Looks like I have missed some in #69 (comment):
|
Current fallback is to use the domain of the linked resource as the collection ID, see https://test.lobid.org/gnd/125217145.json so I think it's fine to deploy what we have here. I'll move the missing collections to a new issue and assign the pull request for this issue for review. |
I now added the missing links to the comment above. When doing this I noticed that the link to the Sophie digital library is broken and the icon link also gives a 404, see http://sophie.byu.edu/. @jentschk, can you please exchange this in EntityFacts and use https://scholarsarchive.byu.edu/sophie/ instead? I also noticed that the links to the Voralberg Chronik do not work anymore and notified the submitter, see https://de.wikipedia.org/wiki/Benutzer_Diskussion:AndreasPraefcke/BEACON#Links_in_die_Voralberg-Chronik_sind_tot. It probably would be good to remove them from EntityFacts for now... Also, are there any plans for a new (and at best regular) EntityFacts dump? |
Thanks, @acka47, for alerting. Both linking targets will be removed in our next monthly "enrichment update" (should be available by Wednesday morning). |
Resolves #69 See: http://lobid.org/gnd/118512676.json (sameAs with EntityFacts) http://lobid.org/gnd/1006691-3.json (sameAs without EntityFacts)
I wrote in #44:
The dump is there now (see https://data.dnb.de/opendata/) and we already set up an ES index with the data.
Reasons for including the information in the data:
The text was updated successfully, but these errors were encountered: