Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics example queries sometimes too long for ranked attributes #349

Open
arildm opened this issue Mar 20, 2024 · 0 comments
Open

Statistics example queries sometimes too long for ranked attributes #349

arildm opened this issue Mar 20, 2024 · 0 comments
Labels

Comments

@arildm
Copy link
Member

arildm commented Mar 20, 2024

Background

When clicking an attribute value in the statistics table, a sub-query is generated that finds specifically the items that are represented by that statistics row.

Some attributes are ranked, so values like apple:0.6 and apple:0.3 are grouped into one row. The row heading is shown as apple, and the CQP generated when clicking it has contains 'apple:(0\.6|0\.3)'.

Problem

In the example below, the top rows are merged from a lot of values, and the generated CQP queries contain a lot of probability numbers in the parentheses. They even get too long for the backend to handle.

https://spraakbanken.gu.se/korplabb/#?cqp=%3Csentence%3E%20%5B%5D&corpus=suc3&stats_reduce=transformer-neighbour&show_stats&search_tab=1&result_tab=2&search=cqp

Solution?

The statistics include the highest ranked value in each token, so a token with x="|apple:0.4|lemon:0.3|lime:0.3|" will only contribute to the "apple" row. Clicking the "apple" row should yield that token, but not a token with x="|lemon:0.4|apple:0.3|lime:0.3|". I guess that's why we include the probabilities. However, I suppose this will indeed yield a token with x="|lemon:0.6|apple:0.4|", and I believe that is false, because that token would contribute only to the "lemon" row.

So, if the statistics are compiled on highest-ranking attribute, can/should we not use highest ranking also in the sub query?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant