Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define all blob types for text unicode decoding #90

Open
saaj opened this issue Nov 28, 2014 · 0 comments
Open

Define all blob types for text unicode decoding #90

saaj opened this issue Nov 28, 2014 · 0 comments

Comments

@saaj
Copy link

saaj commented Nov 28, 2014

Here's the code to illustrate the issue (python 2.7, mysql 5.5.32):

import MySQLdb

connection = MySQLdb.connect(user = 'guest', db = 'test', charset = 'utf8')
cursor     = connection.cursor()

cursor.execute(u"SELECT 'abcdё' `s`, ExtractValue('<a>abcdё</a>', '/a') `b`")

print cursor.fetchone() # (u'abcd\u0451', 'abcd\xd1\x91')
print cursor.description # (('s', 253, 6, 15, 15, 31, 0), ('b', 251, 6, 50331648, 50331648, 31, 1))
print cursor.description_flags # (1, 0)

As you can see, b column is returned as a byte string instead of unicode, regardless of the fact that FLAG.BINARY is not set. Unicode decoding works fine for FIELD_TYPE.VAR_STRING (253) and FIELD_TYPE.BLOB (252), but it doesn't for FIELD_TYPE.LONG_BLOB (251), which is returned by ExtractValue.

Here's the workaround.

import MySQLdb
import MySQLdb.converters as conv
import MySQLdb.constants as const

connection = MySQLdb.connect(user = 'guest', db = 'test', charset = 'utf8')
connection.converter[const.FIELD_TYPE.LONG_BLOB] = connection.converter[const.FIELD_TYPE.BLOB]
cursor = connection.cursor()

cursor.execute(u"SELECT 'abcdё' `s`, ExtractValue('<a>abcdё</a>', '/a') `b`")

print cursor.fetchone() # (u'abcd\u0451', u'abcd\u0451')
print cursor.description # (('s', 253, 6, 15, 15, 31, 0), ('b', 251, 6, 50331648, 50331648, 31, 1))
print cursor.description_flags # (1, 0)

The workaround also shows that current value converter design needs improvement. Unicode decoders are set in connection constructor in contrast to most that are set in MySQLdb.converters. _get_string_decoder is also defined in constructor. So it's impossible to use conv constructor argument to pass extended decoder dict, and the only way is to patch instances individually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant