Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Country code for Switzerland has label "Europe" in filter #256

Closed
acka47 opened this issue Jun 8, 2020 · 8 comments · Fixed by #392
Closed

Country code for Switzerland has label "Europe" in filter #256

acka47 opened this issue Jun 8, 2020 · 8 comments · Fixed by #392
Assignees
Labels

Comments

@acka47
Copy link
Contributor

acka47 commented Jun 8, 2020

Am 08.06.20 um 09:53 schrieb E. E.-M.:

Auf der Rechercheoberfläche zum Durchsuchen der GND in lobid-gnd
vermissen wir den Ländercode der Schweiz. Wäre es möglich, diese Facette
anzuzeigen?

The filter is there but it has the wrong label, .i.e. "Europa", see http://lobid.org/gnd/search. You can also see the incorrect label at the >160k entries with country code Switzerland, see http://lobid.org/gnd/search?filter=%2B(geographicAreaCode.id%3A%22https%3A%2F%2Fd-nb.info%2Fstandards%2Fvocab%2Fgnd%2Fgeographic-area-code%23XA-CH%22)

In the SKOS vocabulary all the labels seem to be fine, see https://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-CH (but maybe there was an error in the previous version which I can not access from the vocab page).

@acka47 acka47 added the bug label Jun 8, 2020
@acka47
Copy link
Contributor Author

acka47 commented Jun 8, 2020

but maybe there was an error in the previous version which I can not access from the vocab page

No, there is no error, I looked it up in the actual file we have copied into the GitHub repo:

<skos:prefLabel xml:lang="de">Schweiz</skos:prefLabel>

@fsteeg
Copy link
Member

fsteeg commented Jun 8, 2020

I think this is what's happening: XA-CH has the de label specified after the broader entry, which itself has a de label. When processing, the label of the broader concept ('Europe') is picked (it's the first that is found) instead of the label further below. We need to be more specific when picking the label (here). Will fix with the next full dump update (see #255).

@acka47
Copy link
Contributor Author

acka47 commented Jun 8, 2020

I think this is what's happening: XA-CH has the de label specified after the broader entry, which itself has a de label. When processing, the label of the broader concept ('Europe') is picked (it's the first that is found) instead of the label further below.

Yes, that's it. The problem occurs for every place where the broader concept is not only linked but described as an embedded node (search for <skos:broader> in the vocab to find others). Anther example is Steiermark: http://lobid.org/gnd/search?filter=%2B(geographicAreaCode.id%3A%22https%3A%2F%2Fd-nb.info%2Fstandards%2Fvocab%2Fgnd%2Fgeographic-area-code%23XA-AT-6%22)

@acka47
Copy link
Contributor Author

acka47 commented Jun 8, 2020

Maybe better to handle the data as RDF and not plain XML...

@fsteeg
Copy link
Member

fsteeg commented Jun 9, 2020

Maybe better to handle the data as RDF and not plan XML...

Right, this particular issue would have not come up when processing the input as RDF. The XML-based fix should be easy (don't search child elements). But we could use this as an opportunity to switch it all to RDF. That would mean SPARQL queries, right? The processing happens in the process and the two load* methods starting here. Writing the queries might be relatively straightforward for you, @acka47, right? We'd also have to tweak the general processing logic a bit, which right now maps the IDs to labels while processing the data (e.g. here). Instead, a SPARQL query would return some result set, which we could then process to create the mapping. That would mean the query should be something that gets all the required labels. Or we could do a SPARQL query on the entire data for each label lookup, but that might slow the (already rather long-running) transformation down.

@fsteeg fsteeg assigned acka47 and unassigned fsteeg Dec 3, 2020
@leozachl
Copy link

as workaround i flattended conf/geographic-area-code.rdf
with this php-script

#!/usr/bin/php
<?php

class MySimpleXMLElement extends SimpleXMLElement {
    function sxml_append(MySimpleXMLElement $to, MySimpleXMLElement $from) {
        $toDom = dom_import_simplexml($to);
        $fromDom = dom_import_simplexml($from);
        $toDom->appendChild($toDom->ownerDocument->importNode($fromDom, true));
    }
}

$rdf = simplexml_load_string(normalizer_normalize(file_get_contents($argv[1]),Normalizer::FORM_KC),'MySimpleXMLElement');
$rdf->registerXPathNamespace('skos','http://www.w3.org/2004/02/skos/core#');

foreach ($rdf->xpath('//skos:broader[skos:Concept]') as $node){
    $Concept = $node->children('skos', TRUE);
    $node->addAttribute('rdf:resource',(string)$Concept->attributes('rdf', TRUE)->about);
    $rdf->sxml_append($rdf, $Concept);
    unset($Concept[0][0]);
    $node[0][0] = '';
}
echo $rdf->asXML();

acka47 added a commit that referenced this issue May 8, 2024
Use newly added country code list and Python script
for creating the map.
@acka47
Copy link
Contributor Author

acka47 commented May 8, 2024

@leozachl I am very sorry, this lay around for such a long time although it is a rather simple fix. We will now be correcting this at last.

@acka47 acka47 assigned fsteeg and unassigned acka47 May 13, 2024
@acka47
Copy link
Contributor Author

acka47 commented May 13, 2024

As discussed in today's meeting, we will try to fix the XML-based approach first and probably close #391. One reason being that this approach is used on five files and not only on the geographic area codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
3 participants