You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a query matches multiple tokens and statistics are compiled on one or more text (structural) attributes only (not word (positional) ones), a statistics example KWIC shows a separate match for each token of the original query.
For example: Search “vara säker att” and compile statistics on subject, choose the Statistics tab and click on the value of the subject, for example “Sociologi” with 4 matches, and the example KWIC shows 12 matches, one for each token in the original search result. The example KWIC is returned by this backend query where I think the relevant parameters are the following:
The secondary CQP expression cqp2 matches separately each token matched by the primary CQP expression.
I think this issue is somewhat similar to #288, even though the secondary CQP expression is not padded with []’s. I’d thus think that similar solutions would work:
Instead of using a secondary CQP expression, add the expression selecting the statistics value(s) with & to the last token of the primary CQP (or to the first token if that is faster): [lemma contains "vara"] [lemma contains "säker"] [word = "att" & _.text_subject="Sociologi"].
In fact, a similar approach seems to be already used if statistics are compiled by both a word and a text attribute, when the secondary CQP can be for example ([word="är" & _.text_subject="Sociologi"] [word="säkert"] [word="att"]). (A difference is probably that this secondary CQP is not a modification of the primary one.)
Instead of using a secondary CQP expression, add the time expression to the primary CQP as a global constraint referring to the match label: [lemma contains "vara"] [lemma contains "säker"] [word = "att"] :: match.text_subject="Sociologi"
Use the subset operation in the secondary CQP expression: subset Last where match: [_.text_subject="Sociologi"]
My disclaimers and notes in #288 also apply here: I don’t know which of the queries would be the fastest. I think option 3 might be the easiest from the point of view of the frontend, as it wouldn’t need to modify existing CQP expressions. However, the backend would need to be modified not to add a within clause to such CQP expressions. And I don’t know if supporting CQP queries of this kind would have some security implications. In options 1 and 2, I think it might be possible to modify the CQP expression with a regular expression replacement operation, without having to parse the CQP, but you’d need to take into account the possible existing global constraint (in the advanced search).
The text was updated successfully, but these errors were encountered:
When a query matches multiple tokens and statistics are compiled on one or more text (structural) attributes only (not word (positional) ones), a statistics example KWIC shows a separate match for each token of the original query.
For example: Search “vara säker att” and compile statistics on subject, choose the Statistics tab and click on the value of the subject, for example “Sociologi” with 4 matches, and the example KWIC shows 12 matches, one for each token in the original search result. The example KWIC is returned by this backend query where I think the relevant parameters are the following:
cqp
:[lemma contains "vara"] [lemma contains "säker"] [word = "att"]
cqp2
:([_.text_subject="Sociologi"])
expand_prequeries
:false
The secondary CQP expression
cqp2
matches separately each token matched by the primary CQP expression.I think this issue is somewhat similar to #288, even though the secondary CQP expression is not padded with
[]
’s. I’d thus think that similar solutions would work:&
to the last token of the primary CQP (or to the first token if that is faster):[lemma contains "vara"] [lemma contains "säker"] [word = "att" & _.text_subject="Sociologi"]
.In fact, a similar approach seems to be already used if statistics are compiled by both a word and a text attribute, when the secondary CQP can be for example
([word="är" & _.text_subject="Sociologi"] [word="säkert"] [word="att"])
. (A difference is probably that this secondary CQP is not a modification of the primary one.)match
label:[lemma contains "vara"] [lemma contains "säker"] [word = "att"] :: match.text_subject="Sociologi"
subset
operation in the secondary CQP expression:subset Last where match: [_.text_subject="Sociologi"]
My disclaimers and notes in #288 also apply here: I don’t know which of the queries would be the fastest. I think option 3 might be the easiest from the point of view of the frontend, as it wouldn’t need to modify existing CQP expressions. However, the backend would need to be modified not to add a
within
clause to such CQP expressions. And I don’t know if supporting CQP queries of this kind would have some security implications. In options 1 and 2, I think it might be possible to modify the CQP expression with a regular expression replacement operation, without having to parse the CQP, but you’d need to take into account the possible existing global constraint (in the advanced search).The text was updated successfully, but these errors were encountered: