-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility to explicitely set connections to return ANSI bytestrings in... #30
Conversation
… in Py3 using unicode_results = False
I'm not sure I see the reasoning here. In Python 3 all strings are Unicode - why wouldn't you want strings in Unicode even if the DB isn't set up right? Python's bytes is like a binary encoding. I could certainly see having per-query encodings for certain columns if the DB is all ASCII and you want to treat a couple as binary. Can you explain some more? |
Mike, In certain cases one can have different encodings on a per-record base. Consider the case of a CMS with these columns: where each record might have a different encoding (for the chinese page,
Basically I think there should be the option to use a database 'garbage I like your idea about per-column indicators although that might bloat In my personal case I've even got to deal with stuff in (binary) Thanks. Robert On 4/30/2015 4:05 AM, Michael Kleehammer wrote:
|
Fair enough. I'm going to need to manually work this one. First, I manually removed the unicode_results from Python 3 since it wasn't needed. Second, I need to add unit tests to a couple of databases. |
While working on this it occurred to me you might already have the tools you need. I have already implemented per-type converters. Can you use something like this? def test_varchar_converter(self):
"""
Ensure a conversion function can be used to force CHAR and VARCHAR columns
to be returned as bytes instead of Unicode strings.
"""
def tobyte(val):
if val is None:
return None
return val.encode('ascii')
self.cnxn.add_output_converter(pyodbc.SQL_CHAR, tobyte)
self.cnxn.add_output_converter(pyodbc.SQL_VARCHAR, tobyte)
self.cnxn.add_output_converter(pyodbc.SQL_LONGVARCHAR, tobyte)
value = b'testing'
self.cursor.execute("create table t1(s varchar(20))")
self.cursor.execute("insert into t1 values (?)", value)
v = self.cursor.execute("select * from t1").fetchone()[0]
self.assertEqual(type(v), bytes)
self.assertEqual(v, value) The data is converted to Unicode, regardless of the data type, and passed to the converter. (I couldn't come up with a way to pass "raw" data in Python. Passing a 4-byte array instead of a "long" didn't seem like the right way to go.) I'm open to suggestions on improving the converters too. |
Mike, Actually I'm not using the output converters but I understand the mechanism. The problem is that the actual content isn't correct unicode to start If any converter would be used, its purpose would be to assemble correct btw, our current mail exchange, is that as a reaction to the request I Robert On 4/30/2015 2:54 PM, Michael Kleehammer wrote:
|
Both, but I'd rather use converters if they would do the trick. I'm considering making the converter parameter |
Mike, That would be perfect. Wouldn't that break existing functionality? BTW I'd be happy to make a financial contribution for this. Robert On 4/30/2015 3:53 PM, Michael Kleehammer wrote:
|
Michael, Actually I was thinking. When you just pass in the to-be-converted fields you have no context of Content (Varchar) LanguageISO(char(5)) one would need, while processing the Content field, access to the Robert On 4/30/2015 3:53 PM, Michael Kleehammer wrote:
|
It would break, but I would actually be willing to do that and call it 3.1 or 4.0. I've been uncomfortable with passing Unicode to them. Until then, I'm going to merge your patch, but I'll need to hand touch it up because I removed Also, I thought I asked this earlier but I guess I deleted it. Currently the patch you provided would also affect GUIDs and XML. I really think XML should remain Unicode since most people would expect it. I don't have an opinion on GUIDs, but I think the option makes more sense if is is simply "the flag causes CHAR and VARCHAR to be returned as bytes". So I would want to make a small change for that. Also, is "unicode_results" really right? Perhaps a new parameter name is better such as "char_as_bytes"? It is a bit more work and makes the C code a little messier, but I think it would be better to have a more self-describing parameter. Thoughts? |
And I forgot to answer the last question - I don't pass in context since I'm calling the converter as I process each column. It would be possible to do so afterwards but wouldn't be as efficient. If more than one column need conversion, you would not know in each converter if the other column had already been converted or not. (Not important in your example, but something that could trip someone up.) Which is why I'm leaning towards the flag now. |
I think unicode_results is understandable: it instructs pyODBC to return For python3.x it's an understandable flag because, although it should As for your GUID and XML types: I expect there would never be an I'm not sure about the XML types so I'm no expert but it seems that at https://msdn.microsoft.com/en-us/library/ms131375.aspx This means you'd have to make an assumption about the encoding. If I would have to decide this I would both send also GUIDs and XML as Robert On 1-5-2015 00:01, Michael Kleehammer wrote:
|
One more addition to my previous mail: At least sql server has the distinction between varchar and nvarchar. what you're facing in that scenario is that you'd need a full list of For instance I also use Oracle for the same application and also there For Oracle a 'complication' is that number datatypes could return long Robert On 1-5-2015 00:07, Michael Kleehammer wrote:
|
Mike, I was just wondering, do you have any official release scheduled? For now pyodbc is keeping me from moving to Py3 unless I use my manually patched source to facilitate fallback to bytes instead of unicode for varchar fields (versus nvarchar fields). It would be nice to have an 'official' solution. |
I don't think this is necessary anymore after the complete Unicode rewrite. You can now use something like: Thanks though. |
Added Possibility to explicitely set connections to return ANSI bytestrings in Py3 using unicode_results = False (it will default to true in Py3). This allows use of bytearrays in case of databases that are not properly unicode encoded.
This patch does add functionality but does not affect any other code paths so is fully compatible. Arguably it makes functioning of the module more congruent with the parameters you can pass. More important It might help adoption of py3 by facilitating certain py2-to-py3 legacy issues.