New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite + upper() or lower() does not work as expected #151

Closed
socketpair opened this Issue Oct 19, 2015 · 4 comments

Comments

Projects
None yet
2 participants
@socketpair

socketpair commented Oct 19, 2015

Since SQLite does not support Unicode operations, upper() and lower() SQL functions does not work for national symbols.

i.e. select(upper(r.field) for r in table) will FAIL if SQLite driver is used and field contains non-ASCII symbols (only ASCII symbols will be converted to upper case)

https://www.sqlite.org/lang_corefunc.html#upper :
upper(X) The upper(X) function returns a copy of input string X in which all lower-case ASCII characters are converted to their upper-case equivalent.

http://www.sqlite.org/faq.html#q18 case (18) about collation and sorting (!)

So, my decision is:

  1. Disallow Unicode operations in queries (on SQLite driver) by default. i.e. raise NotImplementedError exception.
  2. Add SLOW work-around as described in point 18 of faq, and enable it only by special PONY flag. There are two implementation options: use native python unicode functions, or libicu functions directly (https://pypi.python.org/pypi/PyICU ?)

Next. Suppose we want to select somefun(unicode_string) order by 1. So, SQLite engine should call collation function many times to implement sorting. Each call mean full UTF-8 decode (since SQLite store strings in UTF-8 format). Yes, we can (should?) implement partial decoding (this is slightly complex).

Also, we should discover if SQL index by string field will be (1) correct and (2) effective if custom collation will be used.

@kozlovsky kozlovsky closed this in 3025f6e Jan 15, 2016

@kozlovsky

This comment has been minimized.

Show comment
Hide comment
@kozlovsky

kozlovsky Jan 15, 2016

Member

Now Pony registers two additional unicode-aware functions in SQLite: py_upper(s) and py_lower(s), and uses these functions instead of the standard upper and lower funcitons:

>>> select(p.id for p in Person if p.name.upper() == 'John')[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
WHERE py_upper("p"."name") = 'John'

For other databases Pony still uses standard upper and lower functions.

If you want to sort query by a column in a case-insensitive way, you can manually call .upper():

>>> select(p.id for p in Person).order_by(lambda: p.name.upper())[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
ORDER BY py_upper("p"."name")
Member

kozlovsky commented Jan 15, 2016

Now Pony registers two additional unicode-aware functions in SQLite: py_upper(s) and py_lower(s), and uses these functions instead of the standard upper and lower funcitons:

>>> select(p.id for p in Person if p.name.upper() == 'John')[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
WHERE py_upper("p"."name") = 'John'

For other databases Pony still uses standard upper and lower functions.

If you want to sort query by a column in a case-insensitive way, you can manually call .upper():

>>> select(p.id for p in Person).order_by(lambda: p.name.upper())[:]

SQLite query:

SELECT "p"."id"
FROM "Person" "p"
ORDER BY py_upper("p"."name")
@socketpair

This comment has been minimized.

Show comment
Hide comment
@socketpair

socketpair Jan 15, 2016

Есть для SQLite ещё способ - Cython + реализация на Си, а также регистрация функции из Си.

Вообще, думаю, стоит рассмотреть включение ускорялок на Си для поней. Опциональных. как в aiohttp или tornado.

socketpair commented Jan 15, 2016

Есть для SQLite ещё способ - Cython + реализация на Си, а также регистрация функции из Си.

Вообще, думаю, стоит рассмотреть включение ускорялок на Си для поней. Опциональных. как в aiohttp или tornado.

kozlovsky added a commit that referenced this issue Jan 15, 2016

Fixes #151: py_upper() & py_lower() should process None correctly and…
… also convert values to unicode if necessary
@kozlovsky

This comment has been minimized.

Show comment
Hide comment
@kozlovsky

kozlovsky Jan 15, 2016

Member

I definitely think that we need to add C speedup module, but probably not right now, because it may slow down the development - we have many important tasks regarding migration support, integration with JavaScript frameworks, and implementing GraphQL backend. I think that some time later we can hire a dedicated developer who'll write such a C speedup module.

Member

kozlovsky commented Jan 15, 2016

I definitely think that we need to add C speedup module, but probably not right now, because it may slow down the development - we have many important tasks regarding migration support, integration with JavaScript frameworks, and implementing GraphQL backend. I think that some time later we can hire a dedicated developer who'll write such a C speedup module.

@socketpair

This comment has been minimized.

Show comment
Hide comment
@socketpair

socketpair Jan 15, 2016

Добавляйте issue - может я и законтрибучу что-нибудь

socketpair commented Jan 15, 2016

Добавляйте issue - может я и законтрибучу что-нибудь

@kozlovsky kozlovsky added the bug label Feb 4, 2016

@kozlovsky kozlovsky added this to the 0.6.3 milestone Feb 4, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment