-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trend diagram example KWIC does not work correctly with search expressions containing repetition #288
Comments
It would be best if Martin also looked at this issue before we fix it in the frontend. I think that both 1 and 2 are easier edits to the CQP-query than we already do, but as Jyrki says, 3 is the easiest one. As for performance, I don't know. @MartinHammarstedt which of the alternatives do you think is the best? And @janiemi about "the value of cqp2 contains newlines and a large number extra spaces", you're probably right. Should be fixed before closing this (but maybe the code will be replaced anyway). |
Regarding which is the fastest one, I don't know either. Not using a secondary query is maybe faster. Like Jyrki said, number 3 would require changes in the backend. Number 2 is probably the cleanest one but wouldn't work if someone is already using global constraints (unless you detect that and add the new constraints to the existing ones). Number 1 feels a little dirty, and if people are using the advanced search it might be hard to figure out exactly which token to add the constraint to (for example in the query |
@MartinHammarstedt, you’re right. I didn’t take token-level regular expressions into account, so option 1 is probably ruled out. And yes, in option 2, a possible existing global constraint should be detected and the CQP expression should be modified differently based on whether it already contains a global constraint or not. I suppose it wouldn’t be too difficult, but of course it would be an additional complication. Even if option 2 were implemented in the frontend, I think it might sometimes be useful in its own right to have the backend support (secondary) CQP queries that cannot take a |
If an extended or advanced search expression contains repetition for a token expression, the example KWIC from the trend diagram does not always work correctly:
[…]{2,3}
) and matches more tokens than it has token expressions, the trend diagram examples shows hits multiple times. For example, for a query with three token expressions matching the five tokensa b c d e
, the example KWIC shows the three-token matchesa b c
,b c d
andc d e
.[…]{0,1}
) and matches fewer tokens than it has token expressions, trend diagram examples do not show hits without the optional tokens (hits with fewer tokens than token expressions).(This may be a known problem but at least I didn’t find any GitHub issue for it. The problem was noticed by @jpiitula, and I investigated it further.)
Examples of the two cases:
Uncaught Object { message: "Expected \"<\" or \"[\" but \"(\" found.", expected: (3) […], found: "(", location: {…}, name: "SyntaxError", stack: "" } bundle.js:280:18186
. On a local test instance running Språkbanken’s code in thedev
branch, I got a different error:Uncaught TypeError: this.s is undefined trend_results.js:49:24
. (In fact, this also seemed to happen for the first search in thedev
version and in the current Korplabb, so maybe something has been changed that affects the trend diagram examples.) Before this, another error is shown on the console, but I don’t know if it’s related:ReferenceError: token is not defined repeatChange token.js:100
This appears to be due to the backend query having a primary
cqp
for the base search expression and a secondarycqp2
for the time expression corresponding to the selected point on the trend diagram, padded with[]
’s to have the same number of token expressions as the primary CQP expression. For example, search 1 above results in this backend query, where I think the relevant parameters for this issue are the following:cqp
:[lemma contains "säker"] []{2,2} [word = "att"]
cqp2
:[(int(_.text_datefrom) >= 19970501 & int(_.text_dateto) <= 19970531) | (int(_.text_datefrom) <= 19970501 & int(_.text_dateto) >= 19970531)] [] []
expand_prequeries
:false
In this case, the secondary CQP matches all three-token sequences in the matches of the first CQP. If a token expression is optional, the secondary CQP is longer than matches without the optional token, so it doesn’t match at all.
(By the way, is it intentional that the value of
cqp2
contains newlines and a large number extra spaces, or do they just result from a multi-line string in the JavaScript code? Its JSON representation is"[(int(_.text_datefrom) >= 19970501 & int(_.text_dateto) <= 19970531) |\n (int(_.text_datefrom) <= 19970501 & int(_.text_dateto) >= 19970531)\n ] [] []"
.)I can think of at least the following solutions:
&
to the last token expression of the primary CQP (or to the first one, if that is faster):[lemma contains "säker"] []{2,2} [word = "att" & ((int(_.text_datefrom) >= 19970501 & int(_.text_dateto) <= 19970531) | (int(_.text_datefrom) <= 19970501 & int(_.text_dateto) >= 19970531))]
match
label:[lemma contains "säker"] []{2,2} [word = "att"] :: (int(match.text_datefrom) >= 19970501 & int(match.text_dateto) <= 19970531) | (int(match.text_datefrom) <= 19970501 & int(match.text_dateto) >= 19970531)
subset
operation in the secondary CQP expression:subset Last where match: [(int(_.text_datefrom) >= 19970501 & int(_.text_dateto) <= 19970531) | (int(_.text_datefrom) <= 19970501 & int(_.text_dateto) >= 19970531)]
I don’t know which of the queries would be the fastest. I think option 3 might be the easiest from the point of view of the frontend, as it wouldn’t need to modify existing CQP expressions. However, the backend would need to be modified not to add a
within
clause to such CQP expressions. And I don’t know if supporting CQP queries of this kind would have some security implications. In options 1 and 2, I think it might be possible to modify the CQP expression with a regular expression replacement operation, without having to parse the CQP, but you’d need to take into account the possible existing global constraint (in the advanced search).The text was updated successfully, but these errors were encountered: