fix UnicodeEncodeError writing SVG string to .svg file, fixes #489 #490

Merged
merged 2 commits into from Jun 1, 2011

Conversation

Projects
None yet
4 participants
Contributor

mspacek commented Jun 1, 2011

Not sure if this is the best way to deal with #489, but it's worked for me so far.

Contributor

rkern commented Jun 1, 2011

Like the other functions in that module, you should check that it is a unicode object before encoding.

Contributor

mspacek commented Jun 1, 2011

I'm new to unicode. Out of curiosity, what's the harm in writing utf-8 to disk even if a string isn't unicode? I had the impression that if all the characters were ASCII, the bytecode would come out the same...

Contributor

rkern commented Jun 1, 2011

If the object is a str object, it might contain bytes in range 128-255, which are not ASCII. The method str.encode() is just an alias for unicode.encode(). That is, it will try to implicitly decode the str bytes to unicode text by interpreting it as 7-bit ASCII, then encode it back to str bytes with the specified encoding. If the str contains non-ASCII bytes, you will get a weird UnicodeDecodeError that a lot of people find confusing because the code is trying to encode rather than decode. If the function has been passed str bytes already, you should leave them alone and write them out to the file. In particular with XML, the bytes may have already been encoded with a different encoding than UTF-8.

Contributor

mspacek commented Jun 1, 2011

Ah, thanks for that. So the lesson is don't equate Python's 8 bit str object with 7-bit ASCII. str is a superset.

Owner

minrk commented Jun 1, 2011

@mspacek - that's right. An easy example is:

u = u'é' # unicode
s = u.encode('utf8') # str
s2 = s.encode('utf8') # UnicodeDecodeError

The third line is equivalent to:

s.decode(sys.getdefaultencoding()).encode('utf8')

the first step of which fails because s is not ASCII.

Owner

takluyver commented Jun 1, 2011

Quick question - does the SVG we're saving already have an encoding declared in the XML header thingy?

Contributor

mspacek commented Jun 1, 2011

Yup. Printing out the string shows this as the first line:

<?xml version="1.0" encoding="utf-8" standalone="no"?>

for both normal and unicode Python strings (plot(range(10)) gets you a normal string, and plot(range(-1, 10)) gets you a unicode string)

Owner

minrk commented Jun 1, 2011

Excellent, thanks for checking. In that case, this should be merged.

@minrk minrk added a commit that referenced this pull request Jun 1, 2011

@minrk minrk Merge pull request #490 from mspacek/svg-unicode
fix UnicodeEncodeError writing SVG string to .svg file
 
closes gh-489
6a0fa99

@minrk minrk merged commit 6a0fa99 into ipython:master Jun 1, 2011

@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

@minrk minrk Merge pull request #490 from mspacek/svg-unicode
fix UnicodeEncodeError writing SVG string to .svg file
 
closes gh-489
5eb37f6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment