Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database\Table\Selection::where and NOT IN construct #667

Closed
tomaswindsor opened this issue May 30, 2012 · 22 comments
Closed

Database\Table\Selection::where and NOT IN construct #667

tomaswindsor opened this issue May 30, 2012 · 22 comments

Comments

@tomaswindsor
Copy link
Contributor

Currently, when using
$db->table('book')->where('NOT id', array()) or $db->table('book')->where('id NOT', array()) it will result in query SELECT * FROM book WHERE NOT ID IN (NULL) or SELECT * FROM book WHERE ID NOT IN (NULL) which will both return no rows because of special behavior of NULL. Same problem occurs for MySQL subqueries in $db->table('book')->where('NOT id', $subQuery), because Nette\Database performs optimization and replaces the actual subquery with its result, and if this result is empty, it will use (NULL).

Solution could be replacing the (NULL) with some kind of query that returns an empty result (no idea about that one). Alternatively instead of empty list, we could use list containing special unique value, e.g. WHERE NOT ID IN ('~~NDB~NO~VALUES~~'). Solution with simply removing whole condition in case of NOT is insufficient because NULL can appear also on the left side of the IN operator, and removing whole condition would then cause different behavior - imagine $db->table('book')->where('NOT NULL', array()) - I know this example doesn't look to be of much use, but keep in mind that NULL can be result of some sql function used in the condition as well...

@temistokles
Copy link

Is there any workaround?

@hrach
Copy link
Contributor

hrach commented Nov 15, 2012

Well, it's mysql thing. If you dont want to restrict the seleciton, don't call where condition.

$selection = $db->table('book');
if ($bookIds)
     $selection->where('NOT IN', $bookIds);

@hrach hrach closed this as completed Nov 15, 2012
@tomaswindsor
Copy link
Contributor Author

By saying it's mysql thing, you mean in other databases this works well? Either way the behavior of this for mysql has big WTF factor, especially when using subqueries (which are optimized internally by NDB for MySql - and therefore won't work, while they would have worked if left unoptimized). Is the solution using special constant really so unacceptable? I think it would solve most problems for now, and perhaps later mysql comes with a possibility how to resolve this in cleaner way...

@dg
Copy link
Member

dg commented Nov 18, 2012

Tohle je celkem známý problém, viz http://php.vrana.cz/operator-in-s-prazdnou-mnozinou.php#d-8096. Teoreticky workaround by existovat mohl.

@hrach
Copy link
Contributor

hrach commented Nov 18, 2012

Srr, you are right, it's not only mysql problem, it's also in postgre. Only workaround which I know is to pass an magic value, which can't be present. Such ~~NDB~NO~VALUES~~, etc...

@tomaswindsor
Copy link
Contributor Author

Alternatively, it is also possible to use subquery such as SELECT 0 FROM $table WHERE 0, where $table is a string containing name of any existing table (e.g. the currently accessed one). Unfortunately it is not possible to use the dummy table dual here, as it seems to ignore the WHERE 0 part when used within the NOT IN construct (at least in MySql).

@milo
Copy link
Member

milo commented Nov 20, 2012

The trick is that NULL in comparison to anything else is always NULL, not BOOL. So, with NOT IN you can use:

SELECT * FROM table WHERE COALESCE(id NOT IN (NULL), TRUE);

I'm sure on PostgreSQL only...

@hrach
Copy link
Contributor

hrach commented Nov 20, 2012

Wow! Works in MySQL too. I'm not sure if nette should add COALESCE automatically.

@milo
Copy link
Member

milo commented Nov 20, 2012

This is a better solution than subquery, because you do not have to care about subquery return type.

But when I take a look at SqlBuilder::addWhere(), you must detect condition negation NOT. After that, you must detect emptiness of parameters (already done). And when you know these two things, you can simply remove whole condition or replace it by TRUE/FALSE. So COALESCE or IN (NULL) is not longer needed.

Maybe I'm missing something, I'm a Nette\Database amateur.

@hrach hrach reopened this Nov 20, 2012
@milo
Copy link
Member

milo commented Nov 20, 2012

@hrach What do you think milo@65f05ee?

@tomaswindsor
Copy link
Contributor Author

The COALESCE is good idea, but in presented form it can be used only when the left side of the IN construct is not NULL. You can e.g. have WHERE NULL NOT IN (NULL) (or you can replace the first NULL with something that can be NULL) and then you expect it to filter out those rows where it is NULL, but COALESCE would change this behavior.
So maybe something like this would work better:

SELECT * FROM table WHERE (NOT ISNULL(expression) AND COALESCE (expression NOT IN (NULL), TRUE))

... where expression contains whatever is on the left side of the IN, like in:

$db->table('table')->where("$expression NOT IN", $array_or_subquery);
or
$db->table('table')->where("NOT $expression IN", $array_or_subquery);

However all this relates to ANSI NULLS being enabled, but some RDBMS allow to disable it...

@hrach
Copy link
Contributor

hrach commented Nov 20, 2012

Well, I would avoid stripping condition and parameter analysis. There is no need to do it.
SELECT * FROM users WHERE COALESCE(userId IN (1), TRUE); works as well as SELECT * FROM users WHERE COALESCE(NOT userId IN (NULL), TRUE); It leads (at least) to better variable caching key.

$db->table('table')->where("$expression NOT IN", $array_or_subquery); is definetelly NOT valid use of Nette\Database, not escaped $expression, IN keyword is duplicit, NOT only at the begining...

@tomaswindsor
Copy link
Contributor Author

@hrach You didn't address the issue I presented at all. In your example, you work with primary key (userId) which you expect to be always non-NULL. If you consider cases with column that can contain NULL (or more generally an expression that evaluates to NULL) , you will realize COALESCE changes the behavior (if ANSI NULLS are enabled).

My PHP example was merely meant to illustrate what i mean by expression. It was by no means meant to show how Nette\Database should be used. But your argument that it was not escaped is speculative, as I didn't specify whether the $expression was or wasn't pre-escaped (escaping was irrelevant for what I wanted to show). Also I fail to see any duplicate IN in my illustration. Both notations are valid btw: expression NOT IN (...) and NOT expression IN (...).

@hrach
Copy link
Contributor

hrach commented Nov 20, 2012

Duplicit IN: see https://github.com/nette/nette/blob/master/Nette/Database/Table/SqlBuilder.php#L181

Oh, I get it now, maybe you should try to express you thoughts better :D Just write where('nullableColumn IN (NULL)'). So @milo solution is right. Let's do it.

@tomaswindsor
Copy link
Contributor Author

Ok, you were right about the duplicate IN, my bad.

As for the actual problem we are discussing, I am glad you got it now, but I am afraid solution of @milo is also not respecting ANSI NULLS behavior for operand left of the IN operator. I am afraid replacing the condition with boolean is not the way because of this, as only the database knows what the left operand will evaluate to (whether it will be NULL or something else) and therefore PHP can't resolve it this way.

What we need is to find a condition that could be used for empty list of values in combination with NOT IN and would work preferably for both ANSI NULLS enabled or disabled. Solution with COALESCE (when extended by check for NULL) might work for ANSI NULLS enabled, but what if they are disabled? So far I am inclined to solution with the special constant, but if there was another solution without described negative side effects, I would prefer that one.

@hrach
Copy link
Contributor

hrach commented Nov 20, 2012

From my point of view I would not deal with empty array as "NULL" in "IN". If you want to filter by NULL, you would use "array(NULL)".

@tomaswindsor
Copy link
Contributor Author

@hrach I am not sure if you wanted to react on me in your last comment. In case you did, you probably missed the point again, in which case I suggest we discuss the matter in czech.
I agree that adding more NULLs only complicates things here. Hence my inclination to solution using special constant or subquery that returns empty result set (whichever is more efficient).

@tomaswindsor
Copy link
Contributor Author

Maybe I found the solution. For empty arrays, following replacement would occur:

WHERE expression NOT IN () would be replaced with: WHERE expression IS NOT NULL
and
WHERE expression IN () would be replaced with: WHERE FALSE

Or did I miss something?

@milo
Copy link
Member

milo commented Nov 21, 2012

Souhlas s češtinou :-)

Vše je o situacích:

1. expr IN ()
2. expr NOT IN ()

3. expr IN (NULL)     // db's decision
4. expr NOT IN (NULL) // db's decision

5. expr IN (1, 2, 3)     // db's decision
6. expr NOT IN (1, 2, 3) // db's decision

A celá naše diskuze je o tom, že se snažíme opravit SQL chování (syntax error) pro "prázdnou množinu", tedy případ 1 a 2.

Myslím, že 1. by mělo být vždy FALSE, protože množina je prázdná a není v ní ani nějaká neznámá hodnota. A 2. situace vždy TRUE ze stejného důvodu. Když si to uživatel bude chtít změnit, může si přidat další podmínku na přidání/odebrání NULL sloupců.

@hrach
Copy link
Contributor

hrach commented Nov 21, 2012

@tomaswindsor: @milo solution is right, because it no longer solve 1st / 2nd as 3rd / 4th expression - which is the cutting edge.

@tomaswindsor
Copy link
Contributor Author

Máš pravdu. Z nějakého důvodu jsem měl za to, že když je expr NULL, tak to v případech 1 a 2 vrací NULL, ale teď jsem si ověřil, že to tak není. Jediné, co mě ještě napadá, že by mohl být problém, je situace, kdy expr by obsahovalo něco s vedlejším efektem, např. (@i:=@i+1). To by šlo vyřešit tak, že v případě 1 by se použilo (expr) AND FALSE a v případě 2 (expr) OR TRUE.

@milo
Copy link
Member

milo commented Nov 21, 2012

I made a pull request #869. If you agree, we can continue discuss there.

hrach added a commit to hrach/nette that referenced this issue Apr 21, 2013
dg added a commit that referenced this issue Apr 21, 2013
Database: SqlBuilder::addWhere() consider NOT [Closes #667]
hrach added a commit to hrach/nette that referenced this issue May 17, 2013
…efs nette#667]

Removed old behavior, which could produce malformed SQL query.
Condition must have parentheses around all expressions with empty array.
dg added a commit that referenced this issue May 17, 2013
Database: implemented correct support for empty array NOT operator [Refs #667]
hrach added a commit to nette/database that referenced this issue Mar 18, 2014
…efs nette/nette#667]

Removed old behavior, which could produce malformed SQL query.
Condition must have parentheses around all expressions with empty array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants