You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using python 3.5 and trying to process an RSC article (10.1039/C6OB02074G)
I see the error:
TypeError: %b requires bytes, or an object that implements __bytes__, not 'str'
The issue seems to be with the replace_rsc_img_chars function in rsc.py.
Looking at it the matches that are obtained from parsing the entity xpath (u1 and u2) are unicode strings (see lines 270, 272). u1 and u2 are then subsequently used to generate rep (line 276) here the code is trying to insert a unicode string into a byte string.
The text was updated successfully, but these errors were encountered:
Thanks. There have been a lot of these types of encoding bugs due to me not properly testing under python 3. In this case, it is because the lxml parser returns byte strings in python 2, but unicode strings in python 3. I've committed a fix, and will push a new version pending testing.
I'm using python 3.5 and trying to process an RSC article (10.1039/C6OB02074G)
I see the error:
TypeError: %b requires bytes, or an object that implements __bytes__, not 'str'
The issue seems to be with the replace_rsc_img_chars function in rsc.py.
Looking at it the matches that are obtained from parsing the entity xpath (u1 and u2) are unicode strings (see lines 270, 272). u1 and u2 are then subsequently used to generate rep (line 276) here the code is trying to insert a unicode string into a byte string.
The text was updated successfully, but these errors were encountered: