SQLite + upper() or lower() does not work as expected #151

Closed
socketpair opened this Issue Oct 19, 2015 · 4 comments

Projects

None yet

2 participants

@socketpair

Since SQLite does not support Unicode operations, upper() and lower() SQL functions does not work for national symbols.

i.e. select(upper(r.field) for r in table) will FAIL if SQLite driver is used and field contains non-ASCII symbols (only ASCII symbols will be converted to upper case)

https://www.sqlite.org/lang_corefunc.html#upper :
upper(X) The upper(X) function returns a copy of input string X in which all lower-case ASCII characters are converted to their upper-case equivalent.

http://www.sqlite.org/faq.html#q18 case (18) about collation and sorting (!)

So, my decision is:

  1. Disallow Unicode operations in queries (on SQLite driver) by default. i.e. raise NotImplementedError exception.
  2. Add SLOW work-around as described in point 18 of faq, and enable it only by special PONY flag. There are two implementation options: use native python unicode functions, or libicu functions directly (https://pypi.python.org/pypi/PyICU ?)

Next. Suppose we want to select somefun(unicode_string) order by 1. So, SQLite engine should call collation function many times to implement sorting. Each call mean full UTF-8 decode (since SQLite store strings in UTF-8 format). Yes, we can (should?) implement partial decoding (this is slightly complex).

Also, we should discover if SQL index by string field will be (1) correct and (2) effective if custom collation will be used.

@kozlovsky kozlovsky closed this in 3025f6e Jan 15, 2016
@kozlovsky
Contributor

Now Pony registers two additional unicode-aware functions in SQLite: py_upper(s) and py_lower(s), and uses these functions instead of the standard upper and lower funcitons:

>>> select(p.id for p in Person if p.name.upper() == 'John')[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
WHERE py_upper("p"."name") = 'John'

For other databases Pony still uses standard upper and lower functions.

If you want to sort query by a column in a case-insensitive way, you can manually call .upper():

>>> select(p.id for p in Person).order_by(lambda: p.name.upper())[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
ORDER BY py_upper("p"."name")
@socketpair

Есть для SQLite ещё способ - Cython + реализация на Си, а также регистрация функции из Си.

Вообще, думаю, стоит рассмотреть включение ускорялок на Си для поней. Опциональных. как в aiohttp или tornado.

@kozlovsky kozlovsky added a commit that referenced this issue Jan 15, 2016
@kozlovsky kozlovsky Fixes #151: py_upper() & py_lower() should process None correctly and…
… also convert values to unicode if necessary
6b9409d
@kozlovsky
Contributor

I definitely think that we need to add C speedup module, but probably not right now, because it may slow down the development - we have many important tasks regarding migration support, integration with JavaScript frameworks, and implementing GraphQL backend. I think that some time later we can hire a dedicated developer who'll write such a C speedup module.

@socketpair

Добавляйте issue - может я и законтрибучу что-нибудь

@kozlovsky kozlovsky added the bug label Feb 4, 2016
@kozlovsky kozlovsky added this to the 0.6.3 milestone Feb 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment