Skip to content
This repository has been archived by the owner on Feb 4, 2020. It is now read-only.

UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

Closed
pertz opened this issue Aug 5, 2015 · 4 comments
Closed

UnicodeEncodeError occuring in 3.0.4 but not 2.9.2? #74

pertz opened this issue Aug 5, 2015 · 4 comments

Comments

@pertz
Copy link

pertz commented Aug 5, 2015

I’ve run into a problem that appears to be related to upgrading pymarc from version 2.9.2 to version 3.0.4. I have Python version 2.7.5 and version 2.0.4 of PyZ3950 installed on CentOS Linux 7. The error message is:

Traceback (most recent call last):
File "./recordtest.py", line 15, in
print "marc_record:[%s]" % marc_record
File “/path_to/lib/python/site-packages/pymarc/record.py", line 84, in str
text_list.extend([str(field) for field in self.fields])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1024: ordinal not in range(128)

Below is a short test script I used to replicate the problem (works with pymarc 2.9.2 but not 3.0.4):

-BEGIN- test script

!/_path_to_python/python

from PyZ3950 import zoom
from pymarc import Record

connection = zoom.Connection ('z3950.loc.gov', 7090)
connection.databaseName = 'VOYAGER'
connection.preferredRecordSyntax = 'USMARC'
query = zoom.Query ('CCL', 'isbn=9780415782654')
results = connection.search (query)
for result in results:
print "result:[%s]" % result
print
marc_record = Record(data=result.data)
print "marc_record:[%s]" % marc_record

connection.close ()

-END- test script

The record I am using to test can be found here:
http://lccn.loc.gov/2011052495

It could very well be that I missed flag or parameter that is required in pymarc 3.#, but I did not see anything in the documentation.

Thoughts or suggestions?

@Wooble
Copy link
Collaborator

Wooble commented Aug 5, 2015

This seems to only affect Python 2; I can print this file from Python 3 fine (well, except on Windows, where it can't print because of the combining diacritic in question not being in cp-1252...)

When loading from a file, instead of with zoom, I get:

Traceback (most recent call last):
File "unicodetest.py", line 6, in
print(record)
File "/private/tmp/venv/lib/python2.7/site-packages/pymarc/record.py", line 84, in str
text_list.extend([str(field) for field in self.fields])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0301' in position 1025: ordinal not in range(128)

(note, my error mentions the codepoint for the combining diacritic; OP's is for composed é instead -- not sure why this would differ, unless the Z39.50 endpoint has a different record than what LoC gives me to download).

I assume the use of 1-argument str() on a unicode is to blame here. In any event calling str() directly in a codebase that runs on 2 and 3 is a code smell; should be using six.text_type probably.

@pertz
Copy link
Author

pertz commented Aug 6, 2015

Interesting. I had not tried replicating the problem or running the test script with Python 3. Unfortunately, I need to stay with Python 2.7.x, for now, but it's good to know that moving to Python 3 might remedy a few issues. Thanks Wooble.

Wooble added a commit to Wooble/pymarc that referenced this issue Aug 6, 2015
…rk correctly in Python 2; use six.text_type when building list of textual strings. Fixes issue edsu#74.
@Wooble Wooble mentioned this issue Aug 6, 2015
@pertz
Copy link
Author

pertz commented Aug 6, 2015

I applied the changes from the diff and am no longer experiencing the error. Thanks for your help, Wooble. Much appreciated.

edsu added a commit that referenced this issue Sep 7, 2015
@edsu
Copy link
Owner

edsu commented Sep 7, 2015

This was just released as part of v3.1.4. Thanks for identifying the problem and fixing it!

@edsu edsu closed this as completed Sep 7, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants