-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
by-name queries yield too many results #28
Comments
Though I think we should ASAP find a solution for the whole GND API (as it is currently useless for autosuggest), I also want to point out a solution that is focused on NWBib. We also have the problem that all GND resources are auto suggested when typing in a subject in NWBib's advanced search (see hbz/nwbib#51). We could fix this and the problem with too results for the by-name queries by building a seperate index of GND resources in NWBib. |
Building a special GND index for NWBib would be a lot of very specialized work. I think the best solution for hbz/nwbib#51 would be the nested JSON-LD for resources again: if the resources contained all subject, author etc. labels in a structured way, we could build general queries and restrict them on the NWBib set. |
Another thought about this issue: we had always been resolving the We could describe the situation on the mailing list and ask if anyone is using these labels... |
After discussion with @jschnasse it seems we originally implemented this for @edoweb. The thing for bonnus was adding literals to resources, not GND entities. @literarymachine: @jschnasse mentioned that you might actually not be using the literals in the lobid API response any more, but doing a lookup yourself against the GND. The UI also looks like this. Is this correct? Do you only search by literals for the entity itself (not linked literals like |
@literarymachine: after talking to @jschnasse it is my understanding that you only use the literals in the primary topic object (like its |
Correct! |
See #331 If only one filename is delivered the hadoop map process shouldn't start, as it doesn't make sense to collect triple subjects without mapping their triple objects to anything (well, it could be that this one file is a deep graph in its own and that it should be processed so that resulting records have raised triples clinging to the top subject URI (see e.g. hbz/lobid#28 "removing resolved values"), but for now we take it that this is not the wanted default behaviour. If you need this, pass two identical filenames ).
See #331 If only one filename is delivered the hadoop map process shouldn't start, as it doesn't make sense to collect triple subjects without mapping their triple objects to anything (well, it could be that this one file is a deep graph in its own and that it should be processed so that resulting records have raised triples clinging to the top subject URI (see e.g. hbz/lobid#28 "removing resolved values"), but for now we take it that this is not the wanted default behaviour. If you need this, pass two identical filenames ).
Bug was fixed more than a month ago. Closing. |
eg http://api.lobid.org/subject?name=Heinsberg&format=full (but true for all by-name queries)
First hit is a person from Heinsberg, not named Heinsberg, caused by usage of the same field name in different JSON objects. This is caused by all the resolved labels in the record (which we added by user request). I don't think we can solve this on the Elasticsearch query level.
We often discuss this topic (no resolution in the data we serve, only in index, etc). Also @jschnasse recently requested additional literals for works. We probably need to have an in-depth discussion of this and how to approach it @acka47 @dr0i.
The fast fix, namely removing resolved values is not an option as it would break current API usage. I think the proper solution would be the nested JSON-LD we've been discussing for a while. With that we could use specific queries like
creator.preferredName
. That's a bit of work though, it's essentially issue #1.Or am I missing something and there's an easy way to fix or avoid the issue?
The text was updated successfully, but these errors were encountered: