force utf8 decoding for SQL Alchemy types #7295

aronasorman · 2020-07-14T14:33:08Z

Summary

This forces SQL Alchemy to interpret every column received from postgres and sqlite as UTF8.

Somehow despite the encoding environment variables being set as UTF8, SQL Alchemy still interprets the values sent as ascii. This causes encoding issues on the profuturo server when some exceptions contain
non-ascii characters.

NOTE: don't merge yet, since we need to fully test this, we need some build artifacts deployed to profuturo first.

…

Reviewer guidance

To check, once profuturo is deployed, import some new test channels.

If there are no errors, create some new tasks and then manually add some UTF8 strings.

…

References

Permanently solves this sentry error.

This fix was suggested by this stack overflow answer.

Contributor Checklist

PR process:

PR has the correct target branch and milestone
PR has 'needs review' or 'work-in-progress' label
If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
If this is an important user-facing change, PR or related issue has a 'changelog' label
If this includes an internal dependency change, a link to the diff is provided

Testing:

Contributor has fully tested the PR manually
If there are any front-end changes, before/after screenshots are included
Critical user journeys are covered by Gherkin stories
Critical and brittle code paths are covered by unit tests

Reviewer Checklist

Automated test coverage is satisfactory
PR is fully functional
PR has been tested for accessibility regressions
External dependency files were updated if necessary (yarn and pip)
Documentation is updated
Contributor is in AUTHORS.md

rtibbles · 2020-07-14T16:39:56Z

Note that the linked SO post is specific to MySQL.

I think this error is Postgresql specific, and would be best applied using this fix here: https://stackoverflow.com/a/14788796

Would also be good to cover this with a simple test - I assume the issue is having a unicode string inside the extra_metadata on a job, so that should be relatively simple to replicate in a unit test.

Lastly, to properly test this, I realize that we only override Django settings for our postgres test suite, whereas what we really should be doing is overriding our options so that the iceqube storage backend is properly tested too.

aronasorman · 2020-07-14T17:52:29Z

Hmm you're right, it might be best to create tests for this first. Especially since I'm unsure if that suggested SO solution is also the answer.

The basic problem here is that SQL Alchemy interprets whatever we get from the database as ascii instead of UTF-8 when it decodes. That SO answer seems like it's telling postgres what its encoding should be, which is already correct.

Anyway, more testing would help elucidate what the correct answer here is.

aronasorman · 2020-07-15T15:13:00Z

After talking with @rtibbles, here's a summary of what we need to do to get this PR merged:

We need to add a new test that replicates this issue, and then the appropriate fix to address that test. However along the way we also need to change the test infrastructrure to actually test iceqube with postgres. Right now iceqube actually doesn't connect to postgres, and still continues to use sqlite.

We need to do the following changes:

deprecate the postgres_test.py setting and move the TEST values found there to base.py
move the credentials found in postgres_test.py to env variables found in tox.ini's postgres section
create a new conditional in the defaultbackend function of test_storage.py which actually sets up postgres based on CONF values
actually write the test that writes, and then reads, UTF-8 metadata on the jobs table.

@rtibbles let me know if I missed anything here!

rtibbles · 2020-07-15T20:48:26Z

Seems to cover the bases!

indirectlylit · 2020-07-21T17:53:36Z

This should be merged from 0.13 -> ProFuturo -> 0.14 -> Develop

indirectlylit · 2020-07-21T17:54:02Z

adding 0.14 milestone so we don't lose track of it

…over

this was a bug where even if we were running postgres tests, we weren't actually using it, and were using sqlite instead

…gres nor sqlite actually do an autoincrement for a non-primary key field. To implement an auto-increment, the steps are: - create a table sequence - have the database use that table sequence for generating the value of queue_order

…mn (since we don't set it), and it just slows our queries down.

…icodeDecodeError. That did not happen.

…rage.py`, and use python2.7 (the last thing kevin and i tried)

…pickling.

kollivier · 2020-07-23T23:40:42Z

Okay, I've added code to stringify exception and traceback objects, and also to log them and nullify if there are any errors encoding them to utf-8. Tests ensure this works, but of course I was unable to replicate the failure, so can't 100% confirm this will resolve the problem.

aronasorman · 2020-07-24T15:57:39Z

kolibri/core/tasks/storage.py

+                logger.warning("Job had traceback: {}".format(traceback))
+                traceback = str(traceback)
+                traceback.encode('utf-8')
+        except:


Hey, can you change this to except Exception? Leaving it like this catches more than just real errors, since python likes to use exceptions for regular control flow

Could you give an example of the usage you're speaking about? i.e. a case where this code would throw but that we wouldn't want the except handling to run?

rtibbles

Just some things to reduce the scope of the change slightly.

rtibbles · 2020-07-24T16:00:08Z

kolibri/core/tasks/test/test_storage.py

@@ -6,8 +7,9 @@

 from kolibri.core.tasks.job import Job
 from kolibri.core.tasks.job import State
-from kolibri.core.tasks.storage import Storage
+from kolibri.core.tasks.storage import Storage, ORMJob


Don't see this extra import being used anywhere, and linting won't allow this.

rtibbles · 2020-07-24T16:46:28Z

tox.ini

 deps =
    -r{toxinidir}/requirements/test.txt
    -r{toxinidir}/requirements/base.txt
    -r{toxinidir}/requirements/cext.txt
    -r{toxinidir}/requirements/postgres.txt
 commands =
-    py.test {posargs:--cov=kolibri --color=no}
+    py.test {posargs:--cov=kolibri --color=no} kolibri/core/tasks/test/test_storage.py -k test_can_save_and_read_utf8_metadata --pdb


We probably don't want to merge this change :)

rtibbles · 2020-07-24T16:47:09Z

tox.ini

    KOLIBRI_RUN_MODE = tox
 basepython =
-    postgres: python3.5
+    postgres: python2.7


If switching back to 2.7 does not help with replication, I'd rather revert this change, as if someone can install postgres, they can install a newer version of Python - hopefully...

rtibbles · 2021-04-07T23:25:21Z

Superseded by #7975

aronasorman changed the base branch from develop to release-v0.13.x July 14, 2020 14:33

indirectlylit added this to the 0.14.0 milestone Jul 21, 2020

Aron Asor added 2 commits July 22, 2020 08:44

remove postgres_test from tox settings, transferring its credentials …

7c8b64b

…over

use postgres database when our env vars say we should use postgres

857307e

this was a bug where even if we were running postgres tests, we weren't actually using it, and were using sqlite instead

aronasorman force-pushed the force-utf8-for-job-storage branch from 8214046 to 4fe7985 Compare July 22, 2020 15:50

Aron Asor added 3 commits July 22, 2020 08:56

add stub for test that reads utf8 metadata

0eb04ca

remove the order_by queue_order, since it's actually just a null colu…

f5f203d

…mn (since we don't set it), and it just slows our queries down.

aronasorman force-pushed the force-utf8-for-job-storage branch from 4fe7985 to f5f203d Compare July 22, 2020 15:56

Aron Asor and others added 3 commits July 22, 2020 16:10

STUB add utf8 metadata test that's expected to fail due to reading Un…

a411105

…icodeDecodeError. That did not happen.

TEMP modify the tox postgres test to run a single test from `test_sto…

f71da5d

…rage.py`, and use python2.7 (the last thing kevin and i tried)

Stringify exceptions and bail if that fails, to avoid errors when de-…

524f61b

…pickling.

aronasorman commented Jul 24, 2020

View reviewed changes

rtibbles reviewed Jul 24, 2020

View reviewed changes

aronasorman marked this pull request as ready for review July 28, 2020 18:29

aronasorman changed the base branch from release-v0.13.x to release-v0.14.x July 28, 2020 18:30

jonboiser modified the milestones: 0.14.0, 0.14.1, 0.14.2 Aug 5, 2020

jonboiser modified the milestones: 0.14.3, upcoming patch Aug 18, 2020

rtibbles changed the base branch from release-v0.14.x to develop April 7, 2021 23:20

rtibbles mentioned this pull request Apr 7, 2021

Run iceqube tests on postgres. #7975

Merged

9 tasks

rtibbles closed this Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force utf8 decoding for SQL Alchemy types #7295

force utf8 decoding for SQL Alchemy types #7295

aronasorman commented Jul 14, 2020

rtibbles commented Jul 14, 2020

aronasorman commented Jul 14, 2020

aronasorman commented Jul 15, 2020 •

edited

rtibbles commented Jul 15, 2020

indirectlylit commented Jul 21, 2020

indirectlylit commented Jul 21, 2020

kollivier commented Jul 23, 2020

aronasorman Jul 24, 2020

kollivier Jul 28, 2020

rtibbles left a comment

rtibbles Jul 24, 2020

rtibbles Jul 24, 2020

rtibbles Jul 24, 2020

rtibbles commented Apr 7, 2021

force utf8 decoding for SQL Alchemy types #7295

force utf8 decoding for SQL Alchemy types #7295

Conversation

aronasorman commented Jul 14, 2020

Summary

Reviewer guidance

References

Contributor Checklist

Reviewer Checklist

rtibbles commented Jul 14, 2020

aronasorman commented Jul 14, 2020

aronasorman commented Jul 15, 2020 • edited

rtibbles commented Jul 15, 2020

indirectlylit commented Jul 21, 2020

indirectlylit commented Jul 21, 2020

kollivier commented Jul 23, 2020

aronasorman Jul 24, 2020

Choose a reason for hiding this comment

kollivier Jul 28, 2020

Choose a reason for hiding this comment

rtibbles left a comment

Choose a reason for hiding this comment

rtibbles Jul 24, 2020

Choose a reason for hiding this comment

rtibbles Jul 24, 2020

Choose a reason for hiding this comment

rtibbles Jul 24, 2020

Choose a reason for hiding this comment

rtibbles commented Apr 7, 2021

aronasorman commented Jul 15, 2020 •

edited