UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

pertz · 2015-08-05T19:04:16Z

I’ve run into a problem that appears to be related to upgrading pymarc from version 2.9.2 to version 3.0.4. I have Python version 2.7.5 and version 2.0.4 of PyZ3950 installed on CentOS Linux 7. The error message is:

Traceback (most recent call last):
File "./recordtest.py", line 15, in
print "marc_record:[%s]" % marc_record
File “/path_to/lib/python/site-packages/pymarc/record.py", line 84, in str
text_list.extend([str(field) for field in self.fields])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1024: ordinal not in range(128)

Below is a short test script I used to replicate the problem (works with pymarc 2.9.2 but not 3.0.4):

-BEGIN- test script

!/_path_to_python/python

from PyZ3950 import zoom
from pymarc import Record

connection = zoom.Connection ('z3950.loc.gov', 7090)
connection.databaseName = 'VOYAGER'
connection.preferredRecordSyntax = 'USMARC'
query = zoom.Query ('CCL', 'isbn=9780415782654')
results = connection.search (query)
for result in results:
print "result:[%s]" % result
print
marc_record = Record(data=result.data)
print "marc_record:[%s]" % marc_record

connection.close ()

-END- test script

The record I am using to test can be found here:
http://lccn.loc.gov/2011052495

It could very well be that I missed flag or parameter that is required in pymarc 3.#, but I did not see anything in the documentation.

Thoughts or suggestions?

Wooble · 2015-08-05T20:08:19Z

This seems to only affect Python 2; I can print this file from Python 3 fine (well, except on Windows, where it can't print because of the combining diacritic in question not being in cp-1252...)

When loading from a file, instead of with zoom, I get:

Traceback (most recent call last):
File "unicodetest.py", line 6, in
print(record)
File "/private/tmp/venv/lib/python2.7/site-packages/pymarc/record.py", line 84, in str
text_list.extend([str(field) for field in self.fields])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0301' in position 1025: ordinal not in range(128)

(note, my error mentions the codepoint for the combining diacritic; OP's is for composed é instead -- not sure why this would differ, unless the Z39.50 endpoint has a different record than what LoC gives me to download).

I assume the use of 1-argument str() on a unicode is to blame here. In any event calling str() directly in a codebase that runs on 2 and 3 is a code smell; should be using six.text_type probably.

pertz · 2015-08-06T13:35:50Z

Interesting. I had not tried replicating the problem or running the test script with Python 3. Unfortunately, I need to stay with Python 2.7.x, for now, but it's good to know that moving to Python 3 might remedy a few issues. Thanks Wooble.

…rk correctly in Python 2; use six.text_type when building list of textual strings. Fixes issue edsu#74.

pertz · 2015-08-06T20:10:44Z

I applied the changes from the diff and am no longer experiencing the error. Thanks for your help, Wooble. Much appreciated.

fix for issue #74

edsu · 2015-09-07T18:44:07Z

This was just released as part of v3.1.4. Thanks for identifying the problem and fixing it!

Wooble added a commit to Wooble/pymarc that referenced this issue Aug 6, 2015

use @six.python_2_unicode_compatible decorator to make str(Record) wo…

8e14f8c

…rk correctly in Python 2; use six.text_type when building list of textual strings. Fixes issue edsu#74.

Wooble mentioned this issue Aug 6, 2015

fix for issue #74 #75

Merged

edsu added a commit that referenced this issue Sep 7, 2015

Merge pull request #75 from Wooble/upstream_synced

b7b7f49

fix for issue #74

edsu closed this as completed Sep 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

pertz commented Aug 5, 2015

Wooble commented Aug 5, 2015

pertz commented Aug 6, 2015

pertz commented Aug 6, 2015

edsu commented Sep 7, 2015

UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

Comments

pertz commented Aug 5, 2015

-BEGIN- test script

!/_path_to_python/python

-END- test script

Wooble commented Aug 5, 2015

pertz commented Aug 6, 2015

pertz commented Aug 6, 2015

edsu commented Sep 7, 2015