Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow RAW encoding #385

Closed
Lexcon opened this issue Jan 2, 2020 · 4 comments
Closed

allow RAW encoding #385

Lexcon opened this issue Jan 2, 2020 · 4 comments
Labels
enhancement patch available Awaiting inclusion in official release

Comments

@Lexcon
Copy link

Lexcon commented Jan 2, 2020

cx_Oracle could use an encoding 'raw' which would lead to returning bytes instead of unicode strings without any conversion. That way, conversion and fixing of corrupt strings can be done on Python level instead of cx_Oracle level.

Also, legacy database content with mixed encodings can be supported then. It would work like the utl_raw.cast_to_raw function but without the length limitation of 4000 bytes. In fact, it would work like in python 2.7 now.

For testing, there also should be a way to write data. Eg this table could be supported:

create table translations (encoding varchar2(20), content varchar2(1000))
insert into translations (encoding, content) values ('utf-8', 'abë'.encode('utf-8'))
insert into translations (encoding, content) values ('windows-1252', 'abë'.encode('windows-1252'))

Additional advantage is that legacy 2.7 Python code now might have already encoding and decoding in place. In case Py27 code still runs, it would make moving this to 3.8 easier because no changes on the Python level are needed then.

This change of course would only apply to the py3 version of cx_Oracle since in Py2 this is already how it worked.

@anthony-tuininga
Copy link
Member

An interesting concept. And I understand the purpose behind it. The question is what sort of interface would make sense. Any thoughts on that?

@Lexcon
Copy link
Author

Lexcon commented Jan 3, 2020

I think there are still a lot of patch-up python scripts out there that look like the below. If 'raw' is just recognized as a dummy charset, the below py27 python code may be run in py3x without if/else blocks to distinquish between python versions. The dummy 'raw' encoding is used to just bypasses the cx_Oracles internal decode() function. I suspect that this is the most easy way to implement it.

cursor.execute('select somestring, somencoding from sometable')
for row in cursor:
try:
result = row[0].decode('windows-1252') # or based on the somencoding field
except:
result = row[0].decode('utf-8')

Alternatively, a sort of 'outputtypehandle' could be implemented where the user can feed a converter function into the query or settings but on a lower level than is currently possible. That is more 'elegant' in the sense that cx_Oracle would still produce only unicode output, but it would give control to the programmer. This would break the above python code though. Side requirement would be that this converter function needs to receive the entire row object somehow, not just the field in question, because in the example above it would need to read the 'someencoding' field which in itself might be character based.

Given the above I would opt for the more 'quick and dirty' solution in favor of the 'elegant' solution. I know pyODBC has this mechanism, I use it and it works well like that.

Draco94 added a commit to Draco94/python-cx_Oracle that referenced this issue Mar 25, 2021
Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>
anthony-tuininga pushed a commit that referenced this issue Apr 23, 2021
* Implemented #385 enhancement and updated documentation

Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>

* Created flag to Cursor.var()

Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>

* Removed first commit changes, updated documetnation

Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>

* Added testing sample 'QueringRawData.py' and renamed attribute 'bypassstringencoding' to 'bypassencoding' with updated documentation

Signed-off-by: Darko Djolovic <ddjolovic@outlook.com>
anthony-tuininga added a commit that referenced this issue Apr 23, 2021
consistent and to comply with PEP 8 naming guidelines; also adjust
implementation of #385 (originally done in pull request #549) to use the
parameter name `bypass_decode` instead of `bypassencoding`.
@anthony-tuininga anthony-tuininga added the patch available Awaiting inclusion in official release label Apr 23, 2021
@anthony-tuininga
Copy link
Member

Take a look at the implementation which is demonstrated in the new sample. This should address this enhancement but let me know if you agree! Feedback is always appreciated!

@anthony-tuininga
Copy link
Member

cx_Oracle 8.2 has just been released which includes this enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement patch available Awaiting inclusion in official release
Projects
None yet
Development

No branches or pull requests

2 participants