Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSCHTMLReader throws bytes/string error #8

Closed
chemlynx opened this issue Oct 13, 2016 · 1 comment
Closed

RSCHTMLReader throws bytes/string error #8

chemlynx opened this issue Oct 13, 2016 · 1 comment
Labels

Comments

@chemlynx
Copy link

I'm using python 3.5 and trying to process an RSC article (10.1039/C6OB02074G)

I see the error:

TypeError: %b requires bytes, or an object that implements __bytes__, not 'str'

The issue seems to be with the replace_rsc_img_chars function in rsc.py.

Looking at it the matches that are obtained from parsing the entity xpath (u1 and u2) are unicode strings (see lines 270, 272). u1 and u2 are then subsequently used to generate rep (line 276) here the code is trying to insert a unicode string into a byte string.

@mcs07
Copy link
Owner

mcs07 commented Oct 13, 2016

Thanks. There have been a lot of these types of encoding bugs due to me not properly testing under python 3. In this case, it is because the lxml parser returns byte strings in python 2, but unicode strings in python 3. I've committed a fix, and will push a new version pending testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants