-
Notifications
You must be signed in to change notification settings - Fork 64
Description
The problem is that using the word AND
to express multiple constraints in a SIMULATE
query is incorrect, however that syntax is correct in a SELECT
query so a user may use it by accident. When this happens, no error is returned. The query runs normally. However, the wrong results are produced.
AND
works in SELECT
queries. In specific, the BQL or SQL expression
1: %bql SELECT var1, var2 FROM t WHERE "var1"=val1 AND "var2"=val2
is interpreted as:
2: %bql SELECT var1, var2 FROM t WHERE ("var1"=val1) AND ("var2"=val2)
However AND
does not work normally in SIMULATE
queries. In specific, the BQL expression
3: %bql SIMULATE var1, var2 FROM p WHERE "var1"=val1 AND "var2"=val2 LIMIT 10
is interpreted as:
4: %bql SIMULATE var1, var2 FROM p WHERE "var1" = (val1 AND "var2" = val2) LIMIT 10
Therefore, if a user invokes query 3 above, they will not get the answer they expect. Currently, the correct way to write multiple constraints in a SIMULATE expression is for them to be separated by commas, not by AND. Therefore the user will only get the right answer by using the syntax below:
5: %bql SIMULATE var1, var2 FROM p WHERE "var1"=val1, "var2"=val2 LIMIT 10
This is a problem because a user (like me!) who believes naturally that SELECT
and SIMULATE
have parallel syntax is susceptible to accidentally use AND
with SIMULATE
instead of commas. The bug is especially insidious because the result will silently be wrong, without an error. And, the results will look almost reasonable -- if the constraints are ignored, the variable will usually just be simulated at values near its overall mean. This is a trap for the unwary, especially when the incorrect results propagate through the rest of an analysis.
Possible resolutions:
- Return a syntax error when the compiler sees a
SIMULATE
query with multipleAND
s instead of commas. - Compile an
SIMULATE
query with multipleAND
s to interpret the query as if it has commas.