Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get GND labels from base data #139

Closed
acka47 opened this issue Apr 20, 2015 · 15 comments
Closed

Get GND labels from base data #139

acka47 opened this issue Apr 20, 2015 · 15 comments
Assignees

Comments

@acka47
Copy link
Contributor

acka47 commented Apr 20, 2015

Currently, we are enriching the title data with GND labels using hadoop job. There are at least two problems with this approach: #84 and one problem not documented appearing after the last morph adjustment.

To avoid these problems and reduce transformation time, we will get the labels directly out of the Aleph XML using morph.

Amongst others, we need to know:

  • How do we differentiate the different GND entity types (corporate body, subject heading, person etc.)?
  • Where do we get preferred and alternate names?
@acka47 acka47 self-assigned this Apr 20, 2015
@acka47 acka47 added the working label Apr 20, 2015
@acka47
Copy link
Contributor Author

acka47 commented Apr 20, 2015

Subject headings and preferred labels are in 902, alternate labels in 952. To find out of which type a GND entity is, you have to take a look at the indicator of 902. From the MAB documentation:

902       KETTENGLIED DER 1. SCHLAGWORTKETTE

          Indikator:
          p     = Personenschlagwort
          g     = geographisch-ethnographisches Schlagwort
          s     = Sachschlagwort
          k     = Koerperschaftsschlagwort: Ansetzung unter dem
                  Individualnamen
          c     = Koerperschaftsschlagwort: Ansetzung unter dem
                  Ortssitz
          z     = Zeitschlagwort
          f     = Formschlagwort
          t     = Werktitel als Schlagwort
          blank = Unterschlagwort einer Ansetzungskette

@acka47
Copy link
Contributor Author

acka47 commented Apr 20, 2015

Example 1 (without contributor and with only one subject headings type): http://lobid.org/resource/HT010726584

Desired outcome is to have the preferred names as usual associated with the GND objects and the alternate names along witht eh prefered names in field subjectLabel to allow querying by all labels:

{
  "@graph" : [ {
    "@id" : "http://d-nb.info/gnd/4046259-6",
    "preferredName" : "Plasmaphysik",
    "preferredNameForTheSubjectHeading" : "Plasmaphysik"
  }, {
    "@id" : "http://d-nb.info/gnd/4067488-5",
    "preferredName" : "Zeitschrift",
    "preferredNameForTheSubjectHeading" : "Zeitschrift"
  }, {
    "@id" : "http://d-nb.info/gnd/4511937-5",
    "preferredName" : "Online-Publikation",
    "preferredNameForTheSubjectHeading" : "Online-Publikation"
  }, {
    "@id" : "http://dewey.info/class/530/",
    "prefLabel" : [ {
      "@language" : "en",
      "@value" : "Physics"
    }, {
      "@language" : "de",
      "@value" : "Physik"
    } ]
  }, {
    "@id" : "http://lobid.org/resource/HT010726584",
    ...
    "subject" : [ "http://d-nb.info/gnd/4067488-5", "http://dewey.info/class/530/", "http://d-nb.info/gnd/4046259-6", "http://d-nb.info/gnd/4511937-5" ],
    "subjectLabel" : [ "On-line-Dokument", "Online-Dokument", "On-line-Publikation", "Online-Ressource", "Computerdatei im Fernzugriff (Formschlagwort)", "Netzpublikation", "Zeitschriften", "Online-Datenbank (Formschlagwort)", "Periodikum", "On-line-Datenbank (Formschlagwort)" ],
   ...
   } ]
...
}

Aleph XML (snippet):

...
<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Plasmaphysik</subfield>
<subfield code="9">(DE-588)4046259-6</subfield>undefined</datafield>undefined<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Zeitschrift</subfield>
<subfield code="9">(DE-588)4067488-5</subfield>undefined</datafield>undefined<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Online-Publikation</subfield>
<subfield code="9">(DE-588)4511937-5</subfield>undefined</datafield>
...
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Computerdatei im Fernzugriff</subfield>
    <subfield code="h">Formschlagwort</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Online-Datenbank</subfield>
    <subfield code="h">Formschlagwort</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Online-Dokument</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">On-line-Datenbank</subfield>
    <subfield code="h">Formschlagwort</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">On-line-Dokument</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Online-Ressource</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">On-line-Publikation</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Netzpublikation</subfield>
</datafield>

The implementation looks quite straightforward. For subjectLabel take all entries for 902 und 952, for preferredName only take 902.

@acka47
Copy link
Contributor Author

acka47 commented Apr 20, 2015

Example 2 (with corporate body as contribtuor and three different types of subject headings): http://lobid.org/resource/HT013077595/about

Desired outcome:

{
  "@graph" : [ {
    "@id" : "http://d-nb.info/gnd/109490312",
    "preferredName" : "Boer, Hans-Peter",
    },
    "preferredNameForThePerson" : "Boer, Hans-Peter"
  }, {
    "@id" : "http://d-nb.info/gnd/11079267X",
    "preferredName" : "Balke, Kirsten",
    "preferredNameForThePerson" : "Balke, Kirsten"
  }, {
    "@id" : "http://d-nb.info/gnd/128755-2",
    "preferredName" : "Kreisheimatverein <Coesfeld>",
    "preferredNameForTheCorporateBody" : "Kreisheimatverein <Coesfeld>"
  }, {
    "@id" : "http://d-nb.info/gnd/4010355-9",
    "preferredName" : "Coesfeld",
    "preferredNameForThePlaceOrGeographicName" : "Coesfeld"
  }, {
    "@id" : "http://d-nb.info/gnd/4010356-0",
    "preferredName" : "Kreis Coesfeld",
    "preferredNameForThePlaceOrGeographicName" : "Kreis Coesfeld"
  }, {
    "@id" : "http://d-nb.info/gnd/4024116-6",
    "preferredName" : "Heimatkundeunterricht",
    "preferredNameForTheSubjectHeading" : "Heimatkundeunterricht"
  }, {
    "@id" : "http://lobid.org/resource/HT013077595",
    "contributorLabel" : [ "Balke, Kirsten", "Boer, Hans Peter", "Boer, Hans-Peter" ],
    "subjectLabel" : [ "Coesfeld. Hauptamt", "Landkreis Coesfeld", "Kreis Coesfeld. Kreistag", "Kreis Coesfeld. Hauptamt", "Kosfel'd", "Kreis Coesfeld. Oberkreisdirektor", "Coesfeld (Kreis)", "Kreis Coesfeld. Landrat", "Landrat (Kreis Coesfeld)", "Oberkreisdirektor (Kreis Coesfeld)", "Kreisverwaltung (Kreis Coesfeld)", "Kreistag (Kreis Coesfeld)", "Heimatkunde (Unterricht)", "Hauptamt (Kreis Coesfeld)", "Heimatkundedidaktik", "Stadtdirektor (Coesfeld)", "Pressestelle (Coesfeld)", "Hauptamt (Coesfeld)", "Coesfeld. Pressestelle", "Coesfeld. Stadtdirektor", "Heimatkunde / Didaktik", "Stadt Coesfeld", "Kreis Coesfeld. Kreisverwaltung" ],
    "contributor" : [ "http://d-nb.info/gnd/11079267X", "http://d-nb.info/gnd/128755-2", "http://d-nb.info/gnd/109490312" ],
    "subject" : [ "http://d-nb.info/gnd/4010355-9", "http://d-nb.info/gnd/4024116-6", "http://d-nb.info/gnd/4010356-0" ],
"subjectChain" : [ "Coesfeld | Heimatkundeunterricht | Lehrmittel", "Kreis Coesfeld | Heimatkundeunterricht | Lehrmittel (213)", "Kreis Coesfeld | Heimatkundeunterricht | Lehrmittel", "Coesfeld | Heimatkundeunterricht | Lehrmittel (213)" ],
   ...
   }]
...
}

Source data (snippet):

<datafield tag="104" ind1="b" ind2="1">
    <subfield code="p">Boer, Hans-Peter</subfield>
    <subfield code="d">1949-</subfield>
    <subfield code="b">[Red.]</subfield>
    <subfield code="9">(DE-588)109490312</subfield>
</datafield>
<datafield tag="105" ind1="-" ind2="1">
    <subfield code="p">Boer, Hans Peter</subfield>
    <subfield code="d">1949-</subfield>
</datafield>
<datafield tag="200" ind1="b" ind2="1">
    <subfield code="k">Kreisheimatverein</subfield>
    <subfield code="h">Coesfeld</subfield>
    <subfield code="9">(DE-588)128755-2</subfield>
</datafield>
<datafield tag="331" ind1="-" ind2="1">
    <subfield code="a">Geschichte hier</subfield>
</datafield>
...
<datafield tag="902" ind1="-" ind2="1">
    <subfield code="g">Coesfeld</subfield>
    <subfield code="9">(DE-588)4010355-9</subfield>
</datafield>
<datafield tag="902" ind1="-" ind2="1">
    <subfield code="s">Heimatkundeunterricht</subfield>
    <subfield code="9">(DE-588)4024116-6</subfield>
</datafield>
<datafield tag="902" ind1="-" ind2="1">
    <subfield code="f">Lehrmittel</subfield>
</datafield>
...
<datafield tag="902" ind1="-" ind2="1">
    <subfield code="s">Heimatkundeunterricht</subfield>
    <subfield code="9">(DE-588)4024116-6</subfield>
</datafield>
<datafield tag="902" ind1="-" ind2="1">
    <subfield code="f">Lehrmittel</subfield>
</datafield>
<datafield tag="903" ind1="-" ind2="1">
    <subfield code="a">213</subfield>
</datafield>
<datafield tag="907" ind1="-" ind2="1">
    <subfield code="g">Kreis Coesfeld</subfield>
    <subfield code="9">(DE-588)4010356-0</subfield>
</datafield>
<datafield tag="907" ind1="-" ind2="1">
    <subfield code="s">Heimatkundeunterricht</subfield>
    <subfield code="9">(DE-588)4024116-6</subfield>
</datafield>
<datafield tag="907" ind1="-" ind2="1">
    <subfield code="f">Lehrmittel</subfield>
</datafield>
<datafield tag="908" ind1="-" ind2="1">
    <subfield code="a">213</subfield>
</datafield>
<controlfield tag="SYS">011404221</controlfield>
<datafield tag="LOW" ind1="-" ind2="1">
    <subfield code="a">M0001</subfield>
</datafield>
<datafield tag="LOW" ind1="-" ind2="1">
    <subfield code="a">M1168</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="k">Coesfeld</subfield>
    <subfield code="b">Hauptamt</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="k">Hauptamt</subfield>
    <subfield code="h">Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="k">Coesfeld</subfield>
    <subfield code="b">Stadtdirektor</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="k">Stadtdirektor</subfield>
    <subfield code="h">Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="k">Coesfeld</subfield>
    <subfield code="b">Pressestelle</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="k">Pressestelle</subfield>
    <subfield code="h">Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="g">Kosfel'd</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="g">Stadt Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Heimatkunde</subfield>
    <subfield code="h">Unterricht</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Heimatkunde</subfield>
    <subfield code="x">Didaktik</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
    <subfield code="s">Heimatkundedidaktik</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreis Coesfeld</subfield>
    <subfield code="b">Oberkreisdirektor</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Oberkreisdirektor</subfield>
    <subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreis Coesfeld</subfield>
    <subfield code="b">Kreistag</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreistag</subfield>
    <subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreis Coesfeld</subfield>
    <subfield code="b">Hauptamt</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Hauptamt</subfield>
    <subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreis Coesfeld</subfield>
    <subfield code="b">Landrat</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Landrat</subfield>
    <subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreis Coesfeld</subfield>
    <subfield code="b">Kreisverwaltung</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="k">Kreisverwaltung</subfield>
    <subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="g">Landkreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="g">Coesfeld</subfield>
    <subfield code="h">Kreis</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="s">Heimatkunde</subfield>
    <subfield code="h">Unterricht</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="s">Heimatkunde</subfield>
    <subfield code="x">Didaktik</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
    <subfield code="s">Heimatkundedidaktik</subfield>
</datafield>

@acka47
Copy link
Contributor Author

acka47 commented Apr 20, 2015

As we currently do, we should record the preferred Name in the RDF using both the general and the more specific property, e.g.:

    "@id" : "http://d-nb.info/gnd/4076769-3",
    "preferredName" : "Römerzeit",
    "preferredNameForTheSubjectHeading" : "Römerzeit"

Mapping the subfields from #139 (comment) to RDF properties, respectively their JSON object keys:

p: preferredNameEntityForThePerson
g: preferredNameForThePlaceOrGeographicName
s: preferredNameForTheSubjectHeading
k: preferredNameForTheCorporateBody
c: ❓
z: No specific properties as these aren't GND entities, thus are not linked and only occur as part of a subject chain in RDF.
f: same as for z.
t: preferredNameForTheWork (:exclamation: We have to be careful here as subdfiled t co-occurs with subfield p, see e.g. http://lobid.org/resource?id=HT018312899&format=source. For the start, we should map to preferredNameForTheWork if t occurs and prefix the creator name followed by colon and space (see e.g. http://193.30.112.134/F/?func=find-c&ccl_term=IDN%3DHT018312899 for implementation).

@acka47 acka47 removed the working label Apr 22, 2015
@acka47
Copy link
Contributor Author

acka47 commented Apr 24, 2015

Regarding subfield c, can you point me to an example, @dr0i?

@acka47 acka47 added the working label Apr 24, 2015
@acka47 acka47 assigned dr0i and unassigned acka47 Apr 24, 2015
@dr0i
Copy link
Member

dr0i commented Apr 24, 2015

@acka47
Copy link
Contributor Author

acka47 commented Apr 24, 2015

t: preferredNameForTheWork (:exclamation: We have to be careful here as subdfiled t co-occurs with subfield p, see e.g. http://lobid.org/resource?id=HT018312899&format=source. For the start, we should map to preferredNameForTheWork if t occurs and prefix the creator name followed by colon and space (see e.g. http://193.30.112.134/F/?func=find-c&ccl_term=IDN%3DHT018312899 for implementation).

At the NWBib meeting, customers asked for GND work titles having the author name in the label (see https://wiki1.hbz-nrw.de/x/DQBEB). Example: http://lobid.org/resource?id=HT018312899&format=full

Instead of:

{

    "@id": "http://d-nb.info/gnd/7683386-0",
    "preferredName": "Der Cid",
    "preferredNameForTheWork": "Der Cid"

}

it should look like this:

{

    "@id": "http://d-nb.info/gnd/7683386-0",
    "preferredName": "Grabbe, Christian Dietrich: Der Cid",
    "preferredNameForTheWork": "Der Cid"

}

dr0i added a commit to lobid/lodmill that referenced this issue Apr 28, 2015
See hbz/lobid#139.

This will make obsolete the enrichment with gnd using hadoop.
dr0i added a commit to lobid/lodmill that referenced this issue Apr 28, 2015
See hbz/lobid#139.

On the way to make obsolete the enrichment with gnd using hadoop.

* update tests
dr0i added a commit that referenced this issue May 4, 2015
See #141.

Not exactly sure why the old settings weren't working anymore.
Mind that it broke against an index where the @graph.@id of the
items wasn't used yet (as might the history of lodmill-ld suggest,
see 7f0b3bbe2f825268fc06b286938fc09f03b943b8 committed at 2015-03-06
in lodmill-ld) as we switched back to an old index because of the
hadoop enrichment issue, see #139).
This phrase query against the internal id is working fine, though.
dr0i added a commit to lobid/lodmill that referenced this issue May 7, 2015
See hbz/lobid#139.

* add test and test data

This metafacture module generates json-ld from a jena rdf model.
The generated documents are ready to be elasticsearch bulk indexed.

There are highly specific requirements for generating documents out of the
hbz01 catalog graph build with morph. One hbz01 catalog entry may result in
dozens of documents, namely the

- 'main' doc (data about the main resource)
- 'items' of a doc
- 'super' docs ("hasPart" nodes)
- 'sameAs' docs

But not all nodes should be nodes on their own: the gnd nodes must stay
sub nodes of the main node.

So this module is not generic and may be made generic only to a certain degree.
dr0i added a commit to lobid/lodmill that referenced this issue May 8, 2015
See hbz/lobid#139.

* add test and test data

This metafacture module generates json-ld from a jena rdf model.
The generated documents are ready to be elasticsearch bulk indexed.

There are highly specific requirements for generating documents out of the
hbz01 catalog graph build with morph. One hbz01 catalog entry may result in
dozens of documents, namely the

- 'main' doc (data about the main resource)
- 'items' of a doc
- 'super' docs ("hasPart" nodes)
- 'sameAs' docs

But not all nodes should be nodes on their own: the gnd nodes must stay
sub nodes of the main node.

So this module is not generic and may be made generic only to a certain degree.
dr0i added a commit to lobid/lodmill that referenced this issue May 10, 2015
See hbz/lobid#139.

* add test and test data

This metafacture module generates json-ld from a jena rdf model.
The generated documents are ready to be elasticsearch bulk indexed.

There are highly specific requirements for generating documents out of the
hbz01 catalog graph build with morph. One hbz01 catalog entry may result in
dozens of documents, namely the

- 'main' doc (data about the main resource)
- 'items' of a doc
- 'super' docs ("hasPart" nodes)
- 'sameAs' docs

But not all nodes should be nodes on their own: the gnd nodes must stay
sub nodes of the main node.

So this module is not generic and may be made generic only to a certain degree.
dr0i added a commit to lobid/lodmill that referenced this issue May 10, 2015
* add productive lobid index config

This metafacture command consumes a HashMap and index the json values
into an Elasticsearch index.

See hbz/lobid#139.
@dr0i
Copy link
Member

dr0i commented May 12, 2015

Ready for testing.
E.g. http://lobid.org/resource/HT007496264 vs http://test.lobid.org/resource/HT007496264
Transformation and indexing for all 20M docs (resulting in 66M docs) took 14h (formerly, with hadoop: 35h).
Missing yet: enrichment with openlibrary, dbpedia and gutenberg. Made a ticket for this: lobid/lodmill/#667).

@dr0i dr0i assigned acka47 and unassigned dr0i May 12, 2015
@literarymachine
Copy link

I believe that restricting the type of a resource is now broken, e.g. http://test.lobid.org/resource?name=Tom%2BSawyer&from=0&size=10&type=http%3A%2F%2Fpurl.org%2Fontology%2Fbibo%2FBook returns resoruces that are not bibo:Book (e.g. http://lobid.org/resource/HT016678345).

dr0i added a commit to lobid/lodmill that referenced this issue May 12, 2015
As we don't want to make use of metafacture flow anymore a "run" package
is added for starting processes. The "flow" class starts the
transformation and json conversion and indexing into elasticsearch.

Fixed: The bulk indexer was not reset so that update requests were ever more added
and indexed all over again which which results in low performance of course.

* update tests

See hbz/lobid#139.
dr0i added a commit to lobid/lodmill that referenced this issue May 13, 2015
As we don't want to make use of metafacture flow anymore a "run" package
is added for starting processes. The "flow" class starts the
transformation and json conversion and indexing into elasticsearch.

Fixed: The bulk indexer was not reset so that update requests were ever more added
and indexed all over again which which results in low performance of course.

* update tests

See hbz/lobid#139.
dr0i added a commit to lobid/lodmill that referenced this issue May 13, 2015
We had some redundancy observed in the index mappings of elasticsearch.
Most things run smoothly enough and we thought that that was ok. But it is not.

This commit adjusts the config mappings so that a lookup of the mappings of the
index dont't hold any "redundancy" (more accurate: "not used definitions").
It should fix the type-query mentioned in hbz/lobid#139 and also hbz/lobid#141
(see the commit a6219bbb3bf5596b3a030da1e489fc1ba852d60a "... the @graph.@id of
the items wasn't used yet" ).
@dr0i
Copy link
Member

dr0i commented May 27, 2015

Deployed to staging and production.
@acka47 please have a look. Mind also comment in lobid/lodmill#669.

@acka47
Copy link
Contributor Author

acka47 commented May 28, 2015

We can close this one as we have this in production and there probably only will be some minor adjustments in the future

@acka47 acka47 closed this as completed May 28, 2015
@acka47 acka47 removed the review label May 28, 2015
dr0i added a commit to lobid/lodmill that referenced this issue Jun 9, 2015
Since hbz/lobid#139 we domn't use anymore hadoop and thus the test set
is way easier to generate.
dr0i added a commit to lobid/lodmill that referenced this issue Jun 9, 2015
Since hbz/lobid#139 we domn't use anymore hadoop and thus the test set
is way easier to generate.
dr0i added a commit to lobid/lodmill that referenced this issue Jun 9, 2015
Since hbz/lobid#139 we domn't use anymore hadoop and thus the test set
is way easier to generate.
dr0i added a commit to lobid/lodmill that referenced this issue Jun 9, 2015
This is necessary because of hbz/lobid#139.
dr0i added a commit to lobid/lodmill that referenced this issue Jun 9, 2015
Since hbz/lobid#139 we don't use anymore hadoop and thus the test set
is way easier to generate.

* add some more test resources mentioned in hbz/lobid#153
dr0i added a commit to lobid/lodmill that referenced this issue Jun 11, 2015
As of hbz/lobid#139 we (mostly) don't use lodmill-ld anymore.
It's jsut needed by the old lobid-organisations , which will be
exchanged with a new way to make the data. Furthermore, the old
lobid-organisations will not be enhanced anymore. Thus, it is expected
to not alter lodmill-ld anymore. Thus, it is safe to remove lodmill-ld
form the build processes, especially for travis since such a build with
all the tests and mockups take around 7 minutes or so and even sometimes
fail because travis has problems with it (memory, especially).
dr0i added a commit that referenced this issue Jul 10, 2015
Making queries using lv#contributorLabel instead of dc:contributor.
See hbz/nwbib#117.

After resolving #139 an update of the test data in the API
reveals what is now missing, e.g. searching resources by author with
date of birth and date of death. Enrichment with gutenberg, dbpedia and
OpenLIbrary are also missing.

See also #106.
dr0i added a commit that referenced this issue Jul 10, 2015
Making queries using lv#contributorLabel instead of dc:contributor.
See hbz/nwbib#117.

After resolving #139 an update of the test data in the API
reveals what is now missing, e.g. searching resources by author with
date of birth and date of death. Enrichment with gutenberg, dbpedia and
OpenLibrary are also missing.

Also, the organisations index was not properly configured (missing
@graph.@properties) so that auto completion didn't work.

See also #106.
dr0i added a commit that referenced this issue Jul 10, 2015
Making queries using lv#contributorLabel instead of dc:contributor.
See hbz/nwbib#117.

After resolving #139 an update of the test data in the API
reveals what is now missing, e.g. searching resources by author with
date of birth and date of death. Enrichment with gutenberg, dbpedia and
OpenLibrary are also missing.

Also, the organisations index was not properly configured (missing
@graph.@properties) so that auto completion didn't work.

See also #106.
dr0i added a commit that referenced this issue Jul 10, 2015
After resolving lobid/hbz#169 we forgot to update the API:
Making queries using lv#contributorLabel instead of dc:contributor.

After resolving #139 an update of the test data in the API
reveals what is now missing, e.g. searching resources by author with
date of birth and date of death. Enrichment with gutenberg, dbpedia and
OpenLibrary are also missing.

Also, the organisations index was not properly configured (missing
@graph.@properties) so that auto completion didn't work.

See also #106.
dr0i added a commit to dr0i/lobid-resources that referenced this issue Jan 26, 2016
* add productive lobid index config

This metafacture command consumes a HashMap and index the json values
into an Elasticsearch index.

See hbz/lobid#139.
dr0i added a commit to dr0i/lobid-resources that referenced this issue Jan 26, 2016
As we don't want to make use of metafacture flow anymore a "run" package
is added for starting processes. The "flow" class starts the
transformation and json conversion and indexing into elasticsearch.

Fixed: The bulk indexer was not reset so that update requests were ever more added
and indexed all over again which which results in low performance of course.

* update tests

See hbz/lobid#139.
dr0i added a commit to hbz/lobid-resources that referenced this issue Jan 28, 2016
* add productive lobid index config

This metafacture command consumes a HashMap and index the json values
into an Elasticsearch index.

See hbz/lobid#139.
dr0i added a commit to hbz/lobid-resources that referenced this issue Jan 28, 2016
As we don't want to make use of metafacture flow anymore a "run" package
is added for starting processes. The "flow" class starts the
transformation and json conversion and indexing into elasticsearch.

Fixed: The bulk indexer was not reset so that update requests were ever more added
and indexed all over again which which results in low performance of course.

* update tests

See hbz/lobid#139.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants