You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When clicking an attribute value in the statistics table, a sub-query is generated that finds specifically the items that are represented by that statistics row.
Some attributes are ranked, so values like apple:0.6 and apple:0.3 are grouped into one row. The row heading is shown as apple, and the CQP generated when clicking it has contains 'apple:(0\.6|0\.3)'.
Problem
In the example below, the top rows are merged from a lot of values, and the generated CQP queries contain a lot of probability numbers in the parentheses. They even get too long for the backend to handle.
The statistics include the highest ranked value in each token, so a token with x="|apple:0.4|lemon:0.3|lime:0.3|" will only contribute to the "apple" row. Clicking the "apple" row should yield that token, but not a token with x="|lemon:0.4|apple:0.3|lime:0.3|". I guess that's why we include the probabilities. However, I suppose this will indeed yield a token with x="|lemon:0.6|apple:0.4|", and I believe that is false, because that token would contribute only to the "lemon" row.
So, if the statistics are compiled on highest-ranking attribute, can/should we not use highest ranking also in the sub query?
The text was updated successfully, but these errors were encountered:
Background
When clicking an attribute value in the statistics table, a sub-query is generated that finds specifically the items that are represented by that statistics row.
Some attributes are ranked, so values like
apple:0.6
andapple:0.3
are grouped into one row. The row heading is shown asapple
, and the CQP generated when clicking it hascontains 'apple:(0\.6|0\.3)'
.Problem
In the example below, the top rows are merged from a lot of values, and the generated CQP queries contain a lot of probability numbers in the parentheses. They even get too long for the backend to handle.
https://spraakbanken.gu.se/korplabb/#?cqp=%3Csentence%3E%20%5B%5D&corpus=suc3&stats_reduce=transformer-neighbour&show_stats&search_tab=1&result_tab=2&search=cqp
Solution?
The statistics include the highest ranked value in each token, so a token with
x="|apple:0.4|lemon:0.3|lime:0.3|"
will only contribute to the "apple" row. Clicking the "apple" row should yield that token, but not a token withx="|lemon:0.4|apple:0.3|lime:0.3|"
. I guess that's why we include the probabilities. However, I suppose this will indeed yield a token withx="|lemon:0.6|apple:0.4|"
, and I believe that is false, because that token would contribute only to the "lemon" row.So, if the statistics are compiled on highest-ranking attribute, can/should we not use highest ranking also in the sub query?
The text was updated successfully, but these errors were encountered: