-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
force utf8 decoding for SQL Alchemy types #7295
force utf8 decoding for SQL Alchemy types #7295
Conversation
Note that the linked SO post is specific to MySQL. I think this error is Postgresql specific, and would be best applied using this fix here: https://stackoverflow.com/a/14788796 Would also be good to cover this with a simple test - I assume the issue is having a unicode string inside the extra_metadata on a job, so that should be relatively simple to replicate in a unit test. Lastly, to properly test this, I realize that we only override Django settings for our postgres test suite, whereas what we really should be doing is overriding our options so that the iceqube storage backend is properly tested too. |
Hmm you're right, it might be best to create tests for this first. Especially since I'm unsure if that suggested SO solution is also the answer. The basic problem here is that SQL Alchemy interprets whatever we get from the database as ascii instead of UTF-8 when it decodes. That SO answer seems like it's telling postgres what its encoding should be, which is already correct. Anyway, more testing would help elucidate what the correct answer here is. |
After talking with @rtibbles, here's a summary of what we need to do to get this PR merged: We need to add a new test that replicates this issue, and then the appropriate fix to address that test. However along the way we also need to change the test infrastructrure to actually test iceqube with postgres. Right now iceqube actually doesn't connect to postgres, and still continues to use sqlite. We need to do the following changes:
@rtibbles let me know if I missed anything here! |
Seems to cover the bases! |
This should be merged from 0.13 -> ProFuturo -> 0.14 -> Develop |
adding 0.14 milestone so we don't lose track of it |
this was a bug where even if we were running postgres tests, we weren't actually using it, and were using sqlite instead
8214046
to
4fe7985
Compare
…gres nor sqlite actually do an autoincrement for a non-primary key field. To implement an auto-increment, the steps are: - create a table sequence - have the database use that table sequence for generating the value of queue_order
…mn (since we don't set it), and it just slows our queries down.
4fe7985
to
f5f203d
Compare
…icodeDecodeError. That did not happen.
…rage.py`, and use python2.7 (the last thing kevin and i tried)
Okay, I've added code to stringify exception and traceback objects, and also to log them and nullify if there are any errors encoding them to utf-8. Tests ensure this works, but of course I was unable to replicate the failure, so can't 100% confirm this will resolve the problem. |
logger.warning("Job had traceback: {}".format(traceback)) | ||
traceback = str(traceback) | ||
traceback.encode('utf-8') | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, can you change this to except Exception
? Leaving it like this catches more than just real errors, since python likes to use exceptions for regular control flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give an example of the usage you're speaking about? i.e. a case where this code would throw but that we wouldn't want the except handling to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some things to reduce the scope of the change slightly.
@@ -6,8 +7,9 @@ | |||
|
|||
from kolibri.core.tasks.job import Job | |||
from kolibri.core.tasks.job import State | |||
from kolibri.core.tasks.storage import Storage | |||
from kolibri.core.tasks.storage import Storage, ORMJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see this extra import being used anywhere, and linting won't allow this.
deps = | ||
-r{toxinidir}/requirements/test.txt | ||
-r{toxinidir}/requirements/base.txt | ||
-r{toxinidir}/requirements/cext.txt | ||
-r{toxinidir}/requirements/postgres.txt | ||
commands = | ||
py.test {posargs:--cov=kolibri --color=no} | ||
py.test {posargs:--cov=kolibri --color=no} kolibri/core/tasks/test/test_storage.py -k test_can_save_and_read_utf8_metadata --pdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't want to merge this change :)
KOLIBRI_RUN_MODE = tox | ||
basepython = | ||
postgres: python3.5 | ||
postgres: python2.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If switching back to 2.7 does not help with replication, I'd rather revert this change, as if someone can install postgres, they can install a newer version of Python - hopefully...
Superseded by #7975 |
Summary
This forces SQL Alchemy to interpret every column received from postgres and sqlite as UTF8.
Somehow despite the encoding environment variables being set as UTF8, SQL Alchemy still interprets the values sent as ascii. This causes encoding issues on the profuturo server when some exceptions contain
non-ascii characters.
NOTE: don't merge yet, since we need to fully test this, we need some build artifacts deployed to profuturo first.
…
Reviewer guidance
To check, once profuturo is deployed, import some new test channels.
If there are no errors, create some new tasks and then manually add some UTF8 strings.
…
References
Permanently solves this sentry error.
This fix was suggested by this stack overflow answer.
Contributor Checklist
PR process:
Testing:
Reviewer Checklist
yarn
andpip
)