Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 incompatibility / unicode #17

Open
robnardo opened this issue Jun 11, 2015 · 7 comments
Open

Python3 incompatibility / unicode #17

robnardo opened this issue Jun 11, 2015 · 7 comments
Assignees

Comments

@robnardo
Copy link

Hi, i am using your library and receiving some errors when trying to run it using Python 3.4.0. I recently started working with python (so not an expert), but i was able to fix it for my needs by editing untagle.py on lines 143 and 149 and it worked.

So I changed line 143 to parser.parse(StringIO(filename.decode('utf-8'))) and line 149 to
return string.startswith(b'http://') or string.startswith(b'https://')

@stchris
Copy link
Owner

stchris commented Jun 12, 2015

Hello and thanks for reporting this issue. I wasn't enable to reproduce it yet, but I've enabled the automatic tests to run for Python 3.4 as well. It would help a lot if you could tell me more about how you hit this issue. Can you maybe post the filename, or parts of it, so I can try to write a test which fails?

@mplewis
Copy link

mplewis commented Jan 29, 2016

I am having this issue with the following code:

# etree is of type <class 'xml.etree.ElementTree.Element'>
class Page:
    def __init__(self, etree):
        self.etree = etree
        self.untangled = untangle.parse(ET.tostring(etree))

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-05be7dcdfcdc> in <module>()
     21 
     22 for child in root:
---> 23     print(parse_to_obj(child))

<ipython-input-26-05be7dcdfcdc> in parse_to_obj(etree)
      9         return File(etree)
     10     else:
---> 11         return Page(etree)
     12 
     13 class Page:

<ipython-input-26-05be7dcdfcdc> in __init__(self, etree)
     14     def __init__(self, etree):
     15         self.etree = etree
---> 16         self.untangled = untangle.parse(ET.tostring(etree))
     17 
     18 class File:

/Users/mplewis/.pyenv/versions/3.5.0/lib/python3.5/site-packages/untangle.py in parse(filename)
    138     sax_handler = Handler()
    139     parser.setContentHandler(sax_handler)
--> 140     if os.path.exists(filename) or is_url(filename):
    141         parser.parse(filename)
    142     else:

/Users/mplewis/.pyenv/versions/3.5.0/lib/python3.5/site-packages/untangle.py in is_url(string)
    147 
    148 def is_url(string):
--> 149     return string.startswith('http://') or string.startswith('https://')
    150 
    151 # vim: set expandtab ts=4 sw=4:

TypeError: startswith first arg must be bytes or a tuple of bytes, not str

@stchris
Copy link
Owner

stchris commented Jan 30, 2016

I'll try to have a look at this. @mplewis could you also maybe post the xml you're parsing against?

@rhaamo
Copy link

rhaamo commented Jul 13, 2016

I've needed to do the same as @robnardo , in my case I do something like:

a=requests.get("http://whatever_returns_an_xml/")
b=untangle.parse(a.text)

The XML returned contains sometimes unicode like Francés and without editing anything it explodes on cannot encode unicode crap.
If I do untangle.parse(a.text.encode('UTF-8')) it will explodes like:

  File "/usr/local/lib/python3.4/dist-packages/untangle.py", line 149, in is_url
    return string.startswith('http://') or string.startswith('https://')
TypeError: <flask_script.commands.Command object at 0x7f70316c34e0>: startswith first arg must be bytes or a tuple of bytes, not str

So using robnardo's edit it works as expected.

ps: I use requests and not untangle's one as I need to edit some headers before sending the request

@stchris stchris added the bug label Mar 9, 2017
@stchris stchris added this to the 1.1.1 milestone Mar 9, 2017
@stchris stchris self-assigned this Mar 9, 2017
@stchris stchris changed the title Python 3 compatible Python3 incompatibility / unicode Mar 9, 2017
stchris pushed a commit that referenced this issue Mar 10, 2017
Make sure that unicode strings are parsed properly. #17
@stchris
Copy link
Owner

stchris commented May 7, 2017

Can you test this again with the newly released version 1.1.1 ?

@stchris stchris modified the milestones: 1.1.1, 2.0.0 May 7, 2017
@lolouk44
Copy link

just wanted to state I had the same issue under Python 3.5 (python 2.7 worked ok)
Doing the same changes as @robnardo fixed the issue for me too

@stchris
Copy link
Owner

stchris commented Jul 1, 2022

I added one more test in #89 but wasn't able to reproduce this. Would appreciate a concrete failing test.

@stchris stchris added the unclear label Jul 1, 2022
@stchris stchris removed this from the 2.0.0 milestone Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants