Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong count previews in owner facet #207

Open
fsteeg opened this issue Feb 3, 2017 · 10 comments
Open

Wrong count previews in owner facet #207

fsteeg opened this issue Feb 3, 2017 · 10 comments
Assignees
Labels
Projects

Comments

@fsteeg
Copy link
Member

fsteeg commented Feb 3, 2017

Since owners are based on exemplar aggregations, and aggregation requests have a limited size, the owner counts are wrong (just the owners of the most frequent X exemplar, which are actually all 1). To fix this, we have to improve the efficiency of the aggregations processing to enable an aggregations request with unlimited size for exemplars.

@fsteeg fsteeg self-assigned this Feb 3, 2017
@fsteeg
Copy link
Member Author

fsteeg commented Feb 3, 2017

This can be reproduced with any queries returning high result counts, e.g. owner facet for:
http://lobid.org/resources/search?q=k%C3%B6ln

@fsteeg
Copy link
Member Author

fsteeg commented Feb 6, 2017

The basic problem here is that we are faceting over a field (the item owner) that's not in our data. This approach won't work for the entire catalog: if we query everything, we'd have to get all items, and create the owner facet from that.

Instead, I suggest we add an exemplar.owner field, so for example in http://lobid.org/resources/HT012213725?format=json we'd have:

"exemplar": [{
  "id": "http://lobid.org/items/HT012213725:DE-6:ZD%207381#!",
  "owner": "http://lobid.org/organisations/DE-6",
  "label": "lobid Bestandsressource"
}],

That way, we could simply facet over exemplar.owner directly, which would give us all owners (not all items, as with the current facet, which is based on exemplar.id).

What do you think @dr0i @acka47? If it makes no sense to expose the owner in the data (but I do think it's useful for API usage), we could also create an internal Elasticsearch field or a custom aggregation. If we do want to expose it, we should add it on the Metafacture level.

@fsteeg fsteeg assigned acka47 and dr0i and unassigned fsteeg Feb 6, 2017
@fsteeg fsteeg added review and removed working labels Feb 6, 2017
@acka47
Copy link
Contributor

acka47 commented Feb 6, 2017

+1 from me. I already proposed embedding item information in the instance data, see #140. We might just reopen that issue.

@dr0i dr0i added working and removed ready labels Feb 7, 2017
@dr0i
Copy link
Member

dr0i commented Feb 7, 2017

Using a child aggregation on our data querying "köln" seems to come with a plausible result:

"hits" : {
"total" : 569.808,
 ...
"aggregations" : {
"items" : {
  "doc_count" : 1.686.515,
  "top-isil" : {
  ...
    "buckets" : [ {
      "key" : "http://lobid.org/organisations/DE-38",
      "doc_count" : 172.288
    } ...

I can imagine that the factor 3 in ration resources/items is a result of libraries holding more than one item. Is this acceptable or do you really want to have a ration of 1? Though I doubt that if we take the data from the child into the parent and subsequently have e.g. 3 same exemplar.owner.id (reflecting the fact of multiple holdings of a manifestation (aka "resource")) an aggreagation about this would would result in that 1/1 ration (without tinkering with filter or something).

@dr0i dr0i assigned fsteeg, acka47 and ChristophEwertowski and unassigned dr0i Feb 7, 2017
@dr0i dr0i added review and removed working labels Feb 7, 2017
@fsteeg
Copy link
Member Author

fsteeg commented Mar 1, 2017

Reopening, see discussion starting in #278 (comment).

@fsteeg fsteeg added the ready label Mar 1, 2017
@fsteeg fsteeg self-assigned this Mar 1, 2017
@fsteeg fsteeg removed the ready label Jun 28, 2017
@fsteeg fsteeg added the ready label Jun 11, 2018
@acka47 acka47 added this to Ready in lobid board Apr 8, 2019
@acka47 acka47 removed the ready label Apr 9, 2019
@acka47 acka47 moved this from Ready to Backlog in lobid board Dec 3, 2020
@acka47
Copy link
Contributor

acka47 commented Mar 25, 2021

This came up again, see #1169, where @hagbeck wrote:

From the Aleph based index we're getting 1.334.514 records [1]
The facet "Bestand in Bibliotheken" in the Aleph based index shows 1.471.170 records.

[1] http://lobid.org/resources/search?owner=http%3A%2F%2Flobid.org%2Forganisations%2FDE-290%23%21&aggregations=owner

I pointed out this problem in #278 (comment):

Isn't the underlying mechanism that the facet gives the number of items while the query result lists the FRBR manifestations (or in bibframe-speak: instances)?

@TobiasNx
Copy link
Contributor

TobiasNx commented Jan 16, 2023

This came up again in context of the comparison of ALMA and ALEPH resources of UB Münster. Idealy this should be fixed before ALMA Fix replaces ALEPH-Morph. #1601

@acka47
Copy link
Contributor

acka47 commented Mar 10, 2023

@blackwinter will take a look whether this should be added to milestone DigiBib or not.

@acka47 acka47 assigned blackwinter and unassigned fsteeg Mar 10, 2023
@blackwinter
Copy link
Member

We would not be affected by this issue.

@blackwinter blackwinter assigned fsteeg and unassigned blackwinter Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

6 participants