New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when querying JSON columns with wide unicode characters. #4798
Comments
I've never heard of this parameter but it is ensure_ascii |
Oops, that was a typo on my part. I've edited the issue to reflect the correct parameter name. |
so Id like to change the parameter straight but I need to understand the issue. the raw value comes back without issue so this is something to do with what sqlites doing with that json_arra function and it is unclear if this is a bug we should be reporting to the Python tracker for sqlite3 or not. I tried changing the URL to postgresql to see what PG does but the json_array() function seems to not work, if you know how to show PG doing the same thing here we can see what it expects, or I could at least experiment with it. There are JSON encoders used w/ the mysql / postgresql dialects also so need to see if this issue applies to them also. Basically i dont know these SQL functions very well so I don't really know what's going on until I'd have time to look more deeply. |
Thanks for taking a look. I've tested this in postgresql and it seems to work correctly so the issue does seem to be sqlite specific. I adapted the example above to postgresql using import sqlalchemy as sa
import sqlalchemy.dialects.postgresql as pg_types
metadata = sa.MetaData()
User = sa.Table(
"users",
metadata,
sa.Column("username", sa.String, primary_key=True),
sa.Column("groups", pg_types.JSON)
)
engine = sa.create_engine("postgresql+psycopg2://postgres:postgres@localhost:5432/postgres")
conn = engine.connect()
metadata.drop_all(engine)
metadata.create_all(engine)
conn.execute("""
INSERT INTO users (username, groups) VALUES ('alice', json_build_array('𝓓𝓞Γ'));
""")
stmt = sa.select([sa.literal_column("json_array_elements.value")])\
.select_from(User)\
.select_from(sa.func.json_array_elements(User.c.groups))
# Works
print(conn.execute(stmt).fetchall())
conn.execute(User.insert(), {
"username": "bob",
"groups": ["𝓓𝓞Γ"]
})
# Also Works
print(conn.execute(stmt).fetchall()) The only hint I have so far as to what's going wrong is that when I check the sqlite database directly it's clear the strings inside the groups column are encoded totally differently.
|
well one issue right off is that there are json_serializer and json_deserializer arguments to the Engine you can set, but for SQLite only, they are named _json_serializer and _json_deserializer, which is wrong. But you can work around using this for now, i will fix the names but the old ones will work throughout 1.3.x:
|
Mike Bayer has proposed a fix for this issue in the master branch: Correct name for json_serializer / json_deserializer, document and test https://gerrit.sqlalchemy.org/1399 |
Mike Bayer has proposed a fix for this issue in the rel_1_3 branch: Correct name for json_serializer / json_deserializer, document and test https://gerrit.sqlalchemy.org/1400 |
OK now for PG. If you run your program then look at the PG database, they are encoded exactly the same way as they are with the SQLite dialect, because SQLAlchemy is doing the same thing:
psycopg2 isn't tripping over this, but it's getting back the escaped string and json.loads is just fixing it. So, SQLAlchemy shouldn't be doing this (or should it?) but also sqlite3 dialect is still looking suspect. |
mmmm this is totally valid JSON, here's the spec:
so Python json.dumps is doing the right thing, the JSON is valid in the DB with the \u, SQLAlchemy will fix the serializers here so they are programmable, and the thing you are seeing is a bug in sqlite, I think. |
Thank you for the help, that solves my problem. I believe you're correct about this being a bug in sqlite, I'll report it there. |
yeah is this valid?
postgresql doesn't have this problem.
|
The dialects that support json are supposed to take arguments ``json_serializer`` and ``json_deserializer`` at the create_engine() level, however the SQLite dialect calls them ``_json_serilizer`` and ``_json_deserilalizer``. The names have been corrected, the old names are accepted with a change warning, and these parameters are now documented as :paramref:`.create_engine.json_serializer` and :paramref:`.create_engine.json_deserializer`. Fixes: #4798 Change-Id: I1dbfe439b421fe9bb7ff3594ef455af8156f8851 (cherry picked from commit 104e690)
SQLAlchemy Version:
1.3.6
SQLAlchemy serializes
JSON
columns usingjson.dumps
which by default has the parameterensure_ascii=True
, this converts unicode characters to ascii-compatible unicode escape sequences. Unfortunately, this behaviour seems to produce encoding errors when working with unicode characters that require more than a single unicode escape sequence to encode, such as the characters𝓓
and𝓞
in the example below.Running this example yields:
I know it wouldn't be a fix to the underlying issue but I believe allowing the user to set the serializer
ensure_ascii=False
option would resolve this error.The text was updated successfully, but these errors were encountered: