Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

Closed
zhangruoyu opened this issue Jul 29, 2013 · 9 comments · Fixed by #4092
Closed

nbconvert can't handle Heading with Chinese characters on Japanese Windows OS. #3818

zhangruoyu opened this issue Jul 29, 2013 · 9 comments · Fixed by #4092
Assignees
Milestone

Comments

@zhangruoyu
Copy link

Convert following notebook by ipython nbconvert test.ipynb will raise Exception:

  File "C:\Python27\lib\site-packages\ipython-1.0.0_dev-py2.7.egg\IPython\nbconv
ert\filters\strings.py", line 83, in add_anchor
    h = ElementTree.fromstring(py3compat.cast_bytes_py2(html))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1301, in XML
    parser.feed(text)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1643, in feed
    self._raiseerror(v)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1507, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 1, column 9

Here is the content of the notebook. I am using a Japanese Windows, the default encoding is:

In [1]: from IPython.utils import encoding

In [2]: encoding.DEFAULT_ENCODING
Out[2]: 'cp932'

When call py3compat.cast_bytes_py2(html) it can't convert the Chinese characters correctly.

{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "\u6269\u5c55\u7c7b\u578b(cdef\u7c7b)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}
@ghost ghost assigned jdfreder Aug 9, 2013
@jdfreder
Copy link
Member

jdfreder commented Aug 9, 2013

py3compat.cast_bytes_py2(u"\u6269\u5c55\u7c7b\u578b(cdef\u7c7b)") works fine for me Window 7 x64, Python 3.3. What version of python are you using? I'll test some more and report back.

@jdfreder
Copy link
Member

jdfreder commented Aug 9, 2013

nbconvert --to html worked, I'll try py2.7

@jdfreder
Copy link
Member

jdfreder commented Aug 9, 2013

Problem only seems to exist on py2.x

@takluyver
Copy link
Member

@jdfreder : cast_bytes_py2 is a no-op on Py 3, IIRC, so you wouldn't see any problem there.

@jdfreder
Copy link
Member

jdfreder commented Aug 9, 2013

Ah thanks @takluyver . I'm walking through it with winpdb right now, it's either Jinja or the FilesWriter (probably the later)

@jdfreder
Copy link
Member

Finally found the problem, the add_anchor filter doesn't support Unicode on Win7&py2.x . I'm going to look for a solution a little bit later tonight.

Edit: Should have looked there in the first place, since it was in your trace. The thing is, I can't duplicate the error on my machine because I'm running ascii default encoding. So for me, when I try to convert that nb you posted, I end up with a bunch of question marks. I've found that removing the | add_anchor from the html_basic template is a quick fix. However, I'd rather just fix the add_anchor function. I'll look under the file's history and see what I can find about why it was added.

@jdfreder
Copy link
Member

@zhangruoyu Sorry it took me so long to make the fix. I was out of town for the past week. When you get the chance please try the fix I posted. I confirmed it fixes the problem on my machine.

@zhangruoyu
Copy link
Author

@jdfreder It works, thank you.

@jdfreder jdfreder reopened this Aug 22, 2013
@jdfreder
Copy link
Member

No problem, let's wait to close it until the PR that fixes it gets merged

minrk added a commit that referenced this issue Sep 3, 2013
nbconvert: Fix for unicode html headers

Closes #3818
minrk added a commit that referenced this issue Sep 4, 2013
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
nbconvert: Fix for unicode html headers

Closes ipython#3818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants