Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

force utf8 decoding for SQL Alchemy types #7295

Conversation

aronasorman
Copy link
Collaborator

Summary

This forces SQL Alchemy to interpret every column received from postgres and sqlite as UTF8.

Somehow despite the encoding environment variables being set as UTF8, SQL Alchemy still interprets the values sent as ascii. This causes encoding issues on the profuturo server when some exceptions contain
non-ascii characters.

NOTE: don't merge yet, since we need to fully test this, we need some build artifacts deployed to profuturo first.

Reviewer guidance

To check, once profuturo is deployed, import some new test channels.

If there are no errors, create some new tasks and then manually add some UTF8 strings.

References

Permanently solves this sentry error.

This fix was suggested by this stack overflow answer.


Contributor Checklist

PR process:

  • PR has the correct target branch and milestone
  • PR has 'needs review' or 'work-in-progress' label
  • If PR is ready for review, a reviewer has been added. (Don't use 'Assignees')
  • If this is an important user-facing change, PR or related issue has a 'changelog' label
  • If this includes an internal dependency change, a link to the diff is provided

Testing:

  • Contributor has fully tested the PR manually
  • If there are any front-end changes, before/after screenshots are included
  • Critical user journeys are covered by Gherkin stories
  • Critical and brittle code paths are covered by unit tests

Reviewer Checklist

  • Automated test coverage is satisfactory
  • PR is fully functional
  • PR has been tested for accessibility regressions
  • External dependency files were updated if necessary (yarn and pip)
  • Documentation is updated
  • Contributor is in AUTHORS.md

@aronasorman aronasorman changed the base branch from develop to release-v0.13.x July 14, 2020 14:33
@rtibbles
Copy link
Member

Note that the linked SO post is specific to MySQL.

I think this error is Postgresql specific, and would be best applied using this fix here: https://stackoverflow.com/a/14788796

Would also be good to cover this with a simple test - I assume the issue is having a unicode string inside the extra_metadata on a job, so that should be relatively simple to replicate in a unit test.

Lastly, to properly test this, I realize that we only override Django settings for our postgres test suite, whereas what we really should be doing is overriding our options so that the iceqube storage backend is properly tested too.

@aronasorman
Copy link
Collaborator Author

Hmm you're right, it might be best to create tests for this first. Especially since I'm unsure if that suggested SO solution is also the answer.

The basic problem here is that SQL Alchemy interprets whatever we get from the database as ascii instead of UTF-8 when it decodes. That SO answer seems like it's telling postgres what its encoding should be, which is already correct.

Anyway, more testing would help elucidate what the correct answer here is.

@aronasorman
Copy link
Collaborator Author

aronasorman commented Jul 15, 2020

After talking with @rtibbles, here's a summary of what we need to do to get this PR merged:

We need to add a new test that replicates this issue, and then the appropriate fix to address that test. However along the way we also need to change the test infrastructrure to actually test iceqube with postgres. Right now iceqube actually doesn't connect to postgres, and still continues to use sqlite.

We need to do the following changes:

  • deprecate the postgres_test.py setting and move the TEST values found there to base.py
  • move the credentials found in postgres_test.py to env variables found in tox.ini's postgres section
  • create a new conditional in the defaultbackend function of test_storage.py which actually sets up postgres based on CONF values
  • actually write the test that writes, and then reads, UTF-8 metadata on the jobs table.

@rtibbles let me know if I missed anything here!

@rtibbles
Copy link
Member

Seems to cover the bases!

@indirectlylit
Copy link
Contributor

This should be merged from 0.13 -> ProFuturo -> 0.14 -> Develop

@indirectlylit indirectlylit added this to the 0.14.0 milestone Jul 21, 2020
@indirectlylit
Copy link
Contributor

adding 0.14 milestone so we don't lose track of it

Aron Asor added 2 commits July 22, 2020 08:44
this was a bug where even if we were running postgres tests, we weren't actually using it, and were using sqlite instead
Aron Asor added 3 commits July 22, 2020 08:56
…gres nor sqlite actually do an autoincrement for a non-primary key field.

To implement an auto-increment, the steps are:
- create a table sequence
- have the database use that table sequence for generating the value of queue_order
…mn (since we don't set it), and it just slows our queries down.
Aron Asor and others added 3 commits July 22, 2020 16:10
@kollivier
Copy link
Contributor

Okay, I've added code to stringify exception and traceback objects, and also to log them and nullify if there are any errors encoding them to utf-8. Tests ensure this works, but of course I was unable to replicate the failure, so can't 100% confirm this will resolve the problem.

logger.warning("Job had traceback: {}".format(traceback))
traceback = str(traceback)
traceback.encode('utf-8')
except:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, can you change this to except Exception? Leaving it like this catches more than just real errors, since python likes to use exceptions for regular control flow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example of the usage you're speaking about? i.e. a case where this code would throw but that we wouldn't want the except handling to run?

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some things to reduce the scope of the change slightly.

@@ -6,8 +7,9 @@

from kolibri.core.tasks.job import Job
from kolibri.core.tasks.job import State
from kolibri.core.tasks.storage import Storage
from kolibri.core.tasks.storage import Storage, ORMJob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see this extra import being used anywhere, and linting won't allow this.

deps =
-r{toxinidir}/requirements/test.txt
-r{toxinidir}/requirements/base.txt
-r{toxinidir}/requirements/cext.txt
-r{toxinidir}/requirements/postgres.txt
commands =
py.test {posargs:--cov=kolibri --color=no}
py.test {posargs:--cov=kolibri --color=no} kolibri/core/tasks/test/test_storage.py -k test_can_save_and_read_utf8_metadata --pdb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't want to merge this change :)

KOLIBRI_RUN_MODE = tox
basepython =
postgres: python3.5
postgres: python2.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If switching back to 2.7 does not help with replication, I'd rather revert this change, as if someone can install postgres, they can install a newer version of Python - hopefully...

@aronasorman aronasorman marked this pull request as ready for review July 28, 2020 18:29
@aronasorman aronasorman changed the base branch from release-v0.13.x to release-v0.14.x July 28, 2020 18:30
@jonboiser jonboiser modified the milestones: 0.14.0, 0.14.1, 0.14.2 Aug 5, 2020
@jonboiser jonboiser modified the milestones: 0.14.3, upcoming patch Aug 18, 2020
@rtibbles rtibbles changed the base branch from release-v0.14.x to develop April 7, 2021 23:20
@rtibbles rtibbles mentioned this pull request Apr 7, 2021
9 tasks
@rtibbles
Copy link
Member

rtibbles commented Apr 7, 2021

Superseded by #7975

@rtibbles rtibbles closed this Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants