Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by-name queries yield too many results #28

Closed
fsteeg opened this issue Jul 17, 2014 · 7 comments
Closed

by-name queries yield too many results #28

fsteeg opened this issue Jul 17, 2014 · 7 comments
Labels

Comments

@fsteeg
Copy link
Member

fsteeg commented Jul 17, 2014

eg http://api.lobid.org/subject?name=Heinsberg&format=full (but true for all by-name queries)

First hit is a person from Heinsberg, not named Heinsberg, caused by usage of the same field name in different JSON objects. This is caused by all the resolved labels in the record (which we added by user request). I don't think we can solve this on the Elasticsearch query level.

We often discuss this topic (no resolution in the data we serve, only in index, etc). Also @jschnasse recently requested additional literals for works. We probably need to have an in-depth discussion of this and how to approach it @acka47 @dr0i.

The fast fix, namely removing resolved values is not an option as it would break current API usage. I think the proper solution would be the nested JSON-LD we've been discussing for a while. With that we could use specific queries like creator.preferredName. That's a bit of work though, it's essentially issue #1.

Or am I missing something and there's an easy way to fix or avoid the issue?

@fsteeg fsteeg added the bug label Jul 17, 2014
@acka47
Copy link
Contributor

acka47 commented Jul 18, 2014

Though I think we should ASAP find a solution for the whole GND API (as it is currently useless for autosuggest), I also want to point out a solution that is focused on NWBib. We also have the problem that all GND resources are auto suggested when typing in a subject in NWBib's advanced search (see hbz/nwbib#51). We could fix this and the problem with too results for the by-name queries by building a seperate index of GND resources in NWBib.

@fsteeg
Copy link
Member Author

fsteeg commented Jul 18, 2014

Building a special GND index for NWBib would be a lot of very specialized work. I think the best solution for hbz/nwbib#51 would be the nested JSON-LD for resources again: if the resources contained all subject, author etc. labels in a structured way, we could build general queries and restrict them on the NWBib set.

@fsteeg
Copy link
Member Author

fsteeg commented Jul 18, 2014

Another thought about this issue: we had always been resolving the creator, but the additional fields were a more recent addition, based on user feedback. See lobid/lodmill#318 for details. I believe it was requested for bonnus, which in the meantime stopped using the API. So that would be another option: remove the additional resolutions, keep only creator. It would be a breaking API change, but on the other hand, that API addition caused a regression that we only discovered now.

We could describe the situation on the mailing list and ask if anyone is using these labels...

@fsteeg
Copy link
Member Author

fsteeg commented Jul 18, 2014

After discussion with @jschnasse it seems we originally implemented this for @edoweb. The thing for bonnus was adding literals to resources, not GND entities. @literarymachine: @jschnasse mentioned that you might actually not be using the literals in the lobid API response any more, but doing a lookup yourself against the GND. The UI also looks like this. Is this correct? Do you only search by literals for the entity itself (not linked literals like placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity)?

@fsteeg
Copy link
Member Author

fsteeg commented Jul 18, 2014

@literarymachine: after talking to @jschnasse it is my understanding that you only use the literals in the primary topic object (like its preferredName, variantNames, placeOfDeath, etc.), and not the resolved properties in the other objects (like the profession, which you fetch from the DNB yourself). Given this, can we remove the literals for placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity?

@literarymachine
Copy link

Given this, can we remove the literals for placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity?

Correct!

dr0i added a commit to lobid/lodmill that referenced this issue Jul 21, 2014
See #331

If only one filename is delivered the hadoop map process shouldn't start,
as it doesn't make sense to collect triple subjects without mapping their
triple objects to anything (well, it could be that this one file is a deep
graph in its own and that it should be processed so that resulting records
have raised triples clinging to the top subject URI (see e.g. hbz/lobid#28
"removing resolved values"), but for now we take it that this is not the
wanted default behaviour. If you need this, pass two identical filenames ).
dr0i added a commit to lobid/lodmill that referenced this issue Jul 21, 2014
See #331

If only one filename is delivered the hadoop map process shouldn't start,
as it doesn't make sense to collect triple subjects without mapping their
triple objects to anything (well, it could be that this one file is a deep
graph in its own and that it should be processed so that resulting records
have raised triples clinging to the top subject URI (see e.g. hbz/lobid#28
"removing resolved values"), but for now we take it that this is not the
wanted default behaviour. If you need this, pass two identical filenames ).
dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014
dr0i added a commit to lobid/lodmill that referenced this issue Jul 22, 2014
@acka47
Copy link
Contributor

acka47 commented Aug 28, 2014

Bug was fixed more than a month ago. Closing.

@acka47 acka47 closed this as completed Aug 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants