-
-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding error on py3.4 #331
Comments
It crashes on Py2 too with:
it seems a small bug, yes: normally it shouldn't be triggered, unless you really want to write Greek chars into a latin1 connection, which would fail anyway downstream. The adapter uses the connection encoding to encode the strings; if no connection is set, as here, it uses latin1 as (not enough) "sensible" default. Normally the adapters are "prepared": this is what happens behind the scene of a query: In [15]: cnn = psycopg2.connect('') I'll take a look if it would be possible to set the default encoding to utf8 |
Well, beside this issue, I don't see any drawback in switching the default encoding to utf8 |
Update about this issue: Changing the default encoding from latin1 to utf8 is something that would likely break existing code: as much as the user opening this ticket expected it to work with utf8, there must be someone around expecting to work with latin1. So I'm afraid bluntly switching the default in 2.6.2 is out of question. One possibility would be to use a 'replace' strategy to emit So I think the right solution would be to expose the encoder used on the adapter as a writeable property, leaving the default as latin1 but allowing users to change it (so that it can be customized without creating a connection, which seems useful e.g. in vertica/vertica-python#112). Maybe the default could be switched to utf8 in psycopg 2.7. |
Would help using adapt(unicode) to quote strings without a connection, see ticket #331. Currently in heisenbug state: if test_connection_wins_anyway and test_encoding_default run (in this order), the latter fail because the returned value is "'\xe8 '", with an extra space. Skipping the first test, the second succeed. The bad value is returned by the libpq: ql = PQescapeString(to+eq+1, from, len); just returns len = 2 and an extra space in the string... meh.
Would help using adapt(unicode) to quote strings without a connection, see ticket #331. Currently in heisenbug state: if test_connection_wins_anyway and test_encoding_default run (in this order), the latter fail because the returned value is "'\xe8 '", with an extra space. Skipping the first test, the second succeed. The bad value is returned by the libpq: ql = PQescapeString(to+eq+1, from, len); just returns len = 2 and an extra space in the string... meh.
The encoding of the adapted string is now settable. Unfortunately it doesn't work in a generic way: it only works in applications that use consistently the same encoding. Psycopg uses So, the feature is in, but I'll leave it only available to people who want to use it, hence I'll leave it undocumented. |
So is there currently a way to set the default encoding for |
@JoeSham what is your use case? the encoding is a property of the connection and the database you are talking to, not really a global thing. The only thing global, the encoding the libpq wants to use in a context where there is no connection, is out of psycopg control and not something I'd rely on. |
My db's encoding is utf-8. I am using airflow and there is the following line in postgres_hook.py: This gives me an error for non latin-1 characters (mostly for russian and greek alphabet): I was able to "solve" it in my own test script by using
So I hoped there could be a solution where I just set some global variable or perhaps modify the adapter or something, basically anything where I don't have to modify airflow's code. |
If you don't want to modify airflow code you can write your own customised adapter for the text types. It should receive the data from the database as bytes and you can convert it to any python object the way you want. However you will change a global behaviour: if something else in the process uses psycopg it may not work as expected. Can't you just fix airflow? To answer more concisely to your question: no, there is no thing such as a global encoding: there is only a connection encoding, plus some global implementation accidents that may or not may work for you. |
For other people running into this looking for a super explicit solution, add this in your adapter: class AnalyzedTextAdapter:
def __init__(self, text: AnalyzedText):
self.text = text
def prepare(self, conn):
self.conn = conn
def getquoted(self):
content = adapt(self.text.content)
content.prepare(self.conn)
terms = adapt(self.text.terms)
terms.prepare(self.conn)
return f'({content},' \
f'{adapt(self.text.sentiment)},' \
f'{terms},' \
f'\'{json.dumps(self.text.meta)}\'::JSONB)' (you only need it for fields that may contain UTF-8 chars) |
I understand all earlier responses, but in my case this does not work: I don't have a connection available when I need to use Escape strings: I am writing Escaped SQL Values to a file in the same way /Copy .. to ... does so I can import these files later with /Copy ... from ... myfilewithescapedvalues.copy Is there a way to create a passive default connection without connecting? (e.g. a connection that has utf-8 set). Or any other way to specify utf-8 explicitly? |
As @dvarrazzo mentions above that encoding is exposed, you can solve this problem like so:
|
Thank you @f0rk , I'll look into it! I don't remember if I solved this problem or surrounded it somehow.. |
The following works on python2.7 but fails on python3.4.
Is it a bug or am I missing something?
The text was updated successfully, but these errors were encountered: