by-name queries yield too many results #28

fsteeg · 2014-07-17T15:17:45Z

eg http://api.lobid.org/subject?name=Heinsberg&format=full (but true for all by-name queries)

First hit is a person from Heinsberg, not named Heinsberg, caused by usage of the same field name in different JSON objects. This is caused by all the resolved labels in the record (which we added by user request). I don't think we can solve this on the Elasticsearch query level.

We often discuss this topic (no resolution in the data we serve, only in index, etc). Also @jschnasse recently requested additional literals for works. We probably need to have an in-depth discussion of this and how to approach it @acka47 @dr0i.

The fast fix, namely removing resolved values is not an option as it would break current API usage. I think the proper solution would be the nested JSON-LD we've been discussing for a while. With that we could use specific queries like creator.preferredName. That's a bit of work though, it's essentially issue #1.

Or am I missing something and there's an easy way to fix or avoid the issue?

The text was updated successfully, but these errors were encountered:

acka47 · 2014-07-18T07:27:16Z

Though I think we should ASAP find a solution for the whole GND API (as it is currently useless for autosuggest), I also want to point out a solution that is focused on NWBib. We also have the problem that all GND resources are auto suggested when typing in a subject in NWBib's advanced search (see hbz/nwbib#51). We could fix this and the problem with too results for the by-name queries by building a seperate index of GND resources in NWBib.

fsteeg · 2014-07-18T07:45:44Z

Building a special GND index for NWBib would be a lot of very specialized work. I think the best solution for hbz/nwbib#51 would be the nested JSON-LD for resources again: if the resources contained all subject, author etc. labels in a structured way, we could build general queries and restrict them on the NWBib set.

fsteeg · 2014-07-18T07:54:26Z

Another thought about this issue: we had always been resolving the creator, but the additional fields were a more recent addition, based on user feedback. See lobid/lodmill#318 for details. I believe it was requested for bonnus, which in the meantime stopped using the API. So that would be another option: remove the additional resolutions, keep only creator. It would be a breaking API change, but on the other hand, that API addition caused a regression that we only discovered now.

We could describe the situation on the mailing list and ask if anyone is using these labels...

fsteeg · 2014-07-18T08:26:07Z

After discussion with @jschnasse it seems we originally implemented this for @edoweb. The thing for bonnus was adding literals to resources, not GND entities. @literarymachine: @jschnasse mentioned that you might actually not be using the literals in the lobid API response any more, but doing a lookup yourself against the GND. The UI also looks like this. Is this correct? Do you only search by literals for the entity itself (not linked literals like placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity)?

fsteeg · 2014-07-18T10:08:36Z

@literarymachine: after talking to @jschnasse it is my understanding that you only use the literals in the primary topic object (like its preferredName, variantNames, placeOfDeath, etc.), and not the resolved properties in the other objects (like the profession, which you fetch from the DNB yourself). Given this, can we remove the literals for placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity?

literarymachine · 2014-07-18T10:19:10Z

Given this, can we remove the literals for placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity?

Correct!

See #331 If only one filename is delivered the hadoop map process shouldn't start, as it doesn't make sense to collect triple subjects without mapping their triple objects to anything (well, it could be that this one file is a deep graph in its own and that it should be processed so that resulting records have raised triples clinging to the top subject URI (see e.g. hbz/lobid#28 "removing resolved values"), but for now we take it that this is not the wanted default behaviour. If you need this, pass two identical filenames ).

See hbz/lobid#28

acka47 · 2014-08-28T08:27:59Z

Bug was fixed more than a month ago. Closing.

fsteeg added the bug label Jul 17, 2014

dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014

Remove some GND resolving and predicates

e7855cc

See hbz/lobid#28

dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014

Remove some GND resolving and predicates

a3237d9

See hbz/lobid#28

dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014

Remove some GND resolving

987e7e3

See hbz/lobid#28

dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014

Remove some GND resolving

5d7669b

See hbz/lobid#28

dr0i mentioned this issue Jul 22, 2014

Remove some GND resolving lobid/lodmill#516

Merged

acka47 closed this as completed Aug 28, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

by-name queries yield too many results #28

by-name queries yield too many results #28

fsteeg commented Jul 17, 2014

acka47 commented Jul 18, 2014

fsteeg commented Jul 18, 2014

fsteeg commented Jul 18, 2014

fsteeg commented Jul 18, 2014

fsteeg commented Jul 18, 2014

literarymachine commented Jul 18, 2014

acka47 commented Aug 28, 2014

by-name queries yield too many results #28

by-name queries yield too many results #28

Comments

fsteeg commented Jul 17, 2014

acka47 commented Jul 18, 2014

fsteeg commented Jul 18, 2014

fsteeg commented Jul 18, 2014

fsteeg commented Jul 18, 2014

fsteeg commented Jul 18, 2014

literarymachine commented Jul 18, 2014

acka47 commented Aug 28, 2014