Skip to content

Change decoding in http to iso-8859-1 instead of unicode #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

nosterlu
Copy link

I did not find a good way of passing an argument, it was nested a bit down. If you give me some pointers maybe I can add that instead.

Anyway, since parsing of for example Nordic languages fail with a UnicodeDecodeError on polyglot.as_unicode when opening a

browser = mechanize.Browser()
browser.open("https://register.sportadmin.se/")

an example line to as_unicode that fail looks like
b'<div style="font-size: 28px;color: #3b3933;margin-bottom: 20px;">F\xf6rening</div>'

This is the traceback

Traceback (most recent call last):

  File " tests.py", line 36, in <module>
    S = sportadmin.Sportadmin()

  File "n.py", line 119, in __init__
    self.cookie = self._load_cookie()

  File "n.py", line 124, in _load_cookie
    browser.open(SPORTADMIN_URL)

  File "mechanize\mechanize\_mechanize.py", line 257, in open
    return self._mech_open(url_or_request, data, timeout=timeout)

  File "mechanize\mechanize\_mechanize.py", line 287, in _mech_open
    response = UserAgentBase.open(self, request, data)

  File "mechanize\mechanize\_opener.py", line 188, in open
    req = meth(req)

  File "mechanize\mechanize\_http.py", line 181, in http_request
    self.rfp.read()

  File "mechanize\mechanize\_http.py", line 130, in read
    self.parse(map(as_unicode, lines))

  File "C:\temp\Apps\Anaconda64\lib\urllib\robotparser.py", line 95, in parse
    for line in lines:

  File "mechanize\mechanize\polyglot.py", line 200, in as_unicode
    x = x.decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 66: invalid start byte

Since parsing of for example Nordic languages fail with a UnicodeDecodeError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant