nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

zhangruoyu · 2013-07-29T04:06:45Z

Convert following notebook by ipython nbconvert test.ipynb will raise Exception:

  File "C:\Python27\lib\site-packages\ipython-1.0.0_dev-py2.7.egg\IPython\nbconv
ert\filters\strings.py", line 83, in add_anchor
    h = ElementTree.fromstring(py3compat.cast_bytes_py2(html))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1301, in XML
    parser.feed(text)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1643, in feed
    self._raiseerror(v)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1507, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 1, column 9

Here is the content of the notebook. I am using a Japanese Windows, the default encoding is:

In [1]: from IPython.utils import encoding

In [2]: encoding.DEFAULT_ENCODING
Out[2]: 'cp932'

When call py3compat.cast_bytes_py2(html) it can't convert the Chinese characters correctly.

{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "\u6269\u5c55\u7c7b\u578b(cdef\u7c7b)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}

The text was updated successfully, but these errors were encountered:

jdfreder · 2013-08-09T17:27:55Z

py3compat.cast_bytes_py2(u"\u6269\u5c55\u7c7b\u578b(cdef\u7c7b)") works fine for me Window 7 x64, Python 3.3. What version of python are you using? I'll test some more and report back.

jdfreder · 2013-08-09T17:30:13Z

nbconvert --to html worked, I'll try py2.7

jdfreder · 2013-08-09T22:01:47Z

Problem only seems to exist on py2.x

takluyver · 2013-08-09T22:30:14Z

@jdfreder : cast_bytes_py2 is a no-op on Py 3, IIRC, so you wouldn't see any problem there.

jdfreder · 2013-08-09T22:32:51Z

Ah thanks @takluyver . I'm walking through it with winpdb right now, it's either Jinja or the FilesWriter (probably the later)

jdfreder · 2013-08-10T00:22:11Z

Finally found the problem, the add_anchor filter doesn't support Unicode on Win7&py2.x . I'm going to look for a solution a little bit later tonight.

Edit: Should have looked there in the first place, since it was in your trace. The thing is, I can't duplicate the error on my machine because I'm running ascii default encoding. So for me, when I try to convert that nb you posted, I end up with a bunch of question marks. I've found that removing the | add_anchor from the html_basic template is a quick fix. However, I'd rather just fix the add_anchor function. I'll look under the file's history and see what I can find about why it was added.

jdfreder · 2013-08-21T23:19:04Z

@zhangruoyu Sorry it took me so long to make the fix. I was out of town for the past week. When you get the chance please try the fix I posted. I confirmed it fixes the problem on my machine.

zhangruoyu · 2013-08-22T00:52:56Z

@jdfreder It works, thank you.

jdfreder · 2013-08-22T01:20:24Z

No problem, let's wait to close it until the PR that fixes it gets merged

nbconvert: Fix for unicode html headers Closes #3818

… Python 2.x Closes #3818

nbconvert: Fix for unicode html headers Closes ipython#3818

ghost assigned jdfreder Aug 9, 2013

jdfreder mentioned this issue Aug 21, 2013

nbconvert: Fix for unicode html headers, Windows + Python 2.x #4092

Merged

zhangruoyu closed this as completed Aug 22, 2013

jdfreder reopened this Aug 22, 2013

minrk closed this as completed in #4092 Sep 3, 2013

minrk added a commit that referenced this issue Sep 3, 2013

Merge pull request #4092 from jdfreder/japanese

9e3a914

nbconvert: Fix for unicode html headers Closes #3818

minrk added a commit that referenced this issue Sep 4, 2013

Backport PR #4092: nbconvert: Fix for unicode html headers, Windows +…

6d6f830

… Python 2.x Closes #3818

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014

Merge pull request ipython#4092 from jdfreder/japanese

0021abd

nbconvert: Fix for unicode html headers Closes ipython#3818

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

zhangruoyu commented Jul 29, 2013

jdfreder commented Aug 9, 2013

jdfreder commented Aug 9, 2013

jdfreder commented Aug 9, 2013

takluyver commented Aug 9, 2013

jdfreder commented Aug 9, 2013

jdfreder commented Aug 10, 2013

jdfreder commented Aug 21, 2013

zhangruoyu commented Aug 22, 2013

jdfreder commented Aug 22, 2013

nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

Comments

zhangruoyu commented Jul 29, 2013

jdfreder commented Aug 9, 2013

jdfreder commented Aug 9, 2013

jdfreder commented Aug 9, 2013

takluyver commented Aug 9, 2013

jdfreder commented Aug 9, 2013

jdfreder commented Aug 10, 2013

jdfreder commented Aug 21, 2013

zhangruoyu commented Aug 22, 2013

jdfreder commented Aug 22, 2013