New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COLLATE NOCASE #1172

Closed
mtissington opened this Issue Oct 15, 2017 · 13 comments

Comments

Projects
None yet
3 participants
@mtissington
Copy link

mtissington commented Oct 15, 2017

This is a UTF16 database

I have a very strange problem when using COLLATE NOCASE on a column.
It seems that when I do a SELECT the WHERE clause is only looking at the first letter.

CREATE TABLE words (
word TEXT COLLATE NOCASE NOT NULL CHECK(word <> ''),
);

INSERT INTO words (word) VALUE ('aw');

SELECT * FROM WORDS WHERE word = 'a'

returns ALL rows that being with 'a'

I might add that using sqlite3.exe produces the correct result.

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Oct 15, 2017

Interesting. With the sqlite3.exe version that's working, which version is that? Not sure if it makes a difference, but it might be helpful. 😄

@mtissington

This comment has been minimized.

Copy link
Author

mtissington commented Oct 15, 2017

Its 3.20.1

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Oct 15, 2017

Just tried it here with 3.10.1 (on Linux though), with the following SQL in a brand new (unsaved) database. The query correctly returned no rows:

PRAGMA encoding = "UTF-16";

CREATE TABLE words (
    word TEXT COLLATE NOCASE NOT NULL CHECK(word <> '')
);

INSERT INTO words (word) VALUES ('aw');

SELECT * FROM WORDS WHERE word = 'a';

Note that your SQL has two slight typos in it, which stopped it from running as-is when cut-n-pasted. 😉

  • word TEXT COLLATE NOCASE NOT NULL CHECK(word <> ''),
    • The comma on the end needed removing
  • INSERT INTO words (word) VALUE ('aw');
    • Should be VALUES not VALUE

They're both minor, so I guess you've adapted your original SQL manually. Not really sure what to look at, as it's working for me. Maybe someone else can try it on their Windows system and see if it mucks up?

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Oct 15, 2017

Ahhh, it seems like my test database above is still UTF-8. Running:

PRAGMA encoding;

Returns UTF-8. I'll probably need to muck around with creating the database from the command line as UTF-16 first instead. 😄

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Oct 15, 2017

Yep, when creating the database as UTF-16 using the sqlite3 CLI, the problem happens for me too when running the select in DB4S.

DB4S does display a warning for the database though, indicating there's likely to be trouble:

collation_needed_warning

@MKleusberg Shouldn't "NOCASE" work?

    https://sqlite.org/datatype3.html#collating_sequences

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Oct 15, 2017

As a data point #411 is related. 😄

@mtissington

This comment has been minimized.

Copy link
Author

mtissington commented Oct 15, 2017

Good you see the problem ... need to get rid of that scary warning :)

MKleusberg added a commit that referenced this issue Oct 19, 2017

Fix default collation for UTF16 strings
Fix our default collation which we install when an unknown collation is
requested to (more or less) support UTF16 strings as well. Before this
UTF16 strings wouldn't compare correctly because they might contain null
bytes which would lead to only the first (few) byte(s) getting compared.

See issue #1172.

MKleusberg added a commit that referenced this issue Oct 19, 2017

Don't override built-in collations
Don't ask the user whether to load a collation function if the requested
collation is already built into SQLite. Also, never override the
built-in functions.

See issue #1172.
@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Oct 19, 2017

Ok, this seems to be two issues at once 😄

As described in @justinclift's link, the NOCASE function is built into SQLite and we don't need to override that at all. I've fixed that which has two effects: 1) the warning is gone for NOCASE, BINARY, and RTRIM collations, and 2) the SELECT query posted by @mtissington is working now.

The other issue was in our own collation function which was invoked before. It didn't handle UTF16 correctly which led to the initial problem. Even though it doesn't matter for this issue any more, I've still fixed it too 😄

@mtissington Are you ok to download tomorrow's nightly build and see if it's working for you? 😄

@mtissington

This comment has been minimized.

Copy link
Author

mtissington commented Oct 19, 2017

great - thanks - happy to test.

@mtissington

This comment has been minimized.

Copy link
Author

mtissington commented Oct 20, 2017

I can confirm the issue is fixed - many thanks :)

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Oct 20, 2017

Excellent, thanks for confirming it @mtissington. Should we close #411 too?

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Oct 20, 2017

Same for #367. Can we close that, too?

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Dec 4, 2017

Good point. Just closed that one too. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment