Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Server and UTF-8 on Windows 8.1 #21

Open
suidobashi opened this issue Dec 9, 2014 · 3 comments

Comments

Projects
None yet
3 participants
@suidobashi
Copy link

commented Dec 9, 2014

On Windows 8.1 within a Powershell environment, using the command:

pdoc --http

and then seeking to load a page in the browser raised an exception (see below).

The fix was to edit line 264 from "utf-8" to "ISO-8859-1".

Exception report begins:

Exception happened during processing of request from ('127.0.0.1', 9378) Traceback (most recent call last): File "c:\Python27\Lib\SocketServer.py", line 295, in _handle_request_noblock self.process_request(request, client_address) File "c:\Python27\Lib\SocketServer.py", line 321, in process_request self.finish_request(request, client_address) File "c:\Python27\Lib\SocketServer.py", line 334, in finish_request self.RequestHandlerClass(request, client_address, self) File "c:\Python27\Lib\SocketServer.py", line 651, in __init__ self.handle() File "c:\Python27\Lib\BaseHTTPServer.py", line 340, in handle self.handle_one_request() File "c:\Python27\Lib\BaseHTTPServer.py", line 328, in handle_one_request method() File ".\env\Scripts\pdoc", line 100, in do_GET modules.append((name, quick_desc(imp, name, ispkg))) File ".\env\Scripts\pdoc", line 264, in quick_desc for i, line in enumerate(f): File "C:\Users\xxx\Projects\marrs\env\lib\codecs.py", line 681, in next return self.reader.next() File "C:\Users\xxx\Projects\marrs\env\lib\codecs.py", line 612, in next line = self.readline() File "C:\Users\xxx\Projects\marrs\env\lib\codecs.py", line 527, in readline data = self.read(readsize, firstline=True) File "C:\Users\xxx\Projects\marrs\env\lib\codecs.py", line 474, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 0: invalid start byte

@BurntSushi

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2015

Hmm. This is tricky. Is there a solution we can apply that will work in any environment?

@BurntSushi BurntSushi added the bug label Jan 25, 2015

@hhsprings

This comment has been minimized.

Copy link

commented Oct 20, 2015

I have the same problem. For example, coding of msilib/init.py is iso-8859-1, lib-tk/Tix.py has 'iso-latin-1-unix', etc.

So, I wrote (dirty, and tricky) reader like this:

class _PySourceFile(object):
    def __init__(self, fp):
        self._f = None
        rgxes = (
            re.compile(r"^#.*coding[=:]\s*([-\w.]+).*$"),
            re.compile(r"^#\s*!.*$"))
        if sys.version < '3':
            self._encode = 'ascii'
        else:
            self._encode = 'utf-8'

        self._f = open(fp, "rb")
        line = self._f.readline()
        m = rgxes[0].match(line)
        if m:
            self._encode = m.group(1)
        elif rgxes[1].match(line):
            line = self._f.readline()
            m = rgxes[0].match(line)
            if m:
                self._encode = m.group(1)

        # for emacs style with platform
        self._encode = re.sub(r"-(unix|dos|mac)$", "", self._encode)
        # some aliases
        if "latin-1" in self._encode:
            self._encode = "iso-8859-1"

        self._f.seek(0)

    def __next__(self):
        return self._f.readline().decode(self._encode)

    def next(self):
        # for python 2.7
        return self.__next__()

    def __iter__(self):
        return self

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self._f:
            self._f.close()

And in pydoc.quick_desc:

# ...
    if os.path.isfile(fp):
        with _PySourceFile(fp) as f:
            quotes = None
# ...

Maybe this issue is not only --http, but I don't know where is proper module to place this reader, so I can't create PR.

@hhsprings

This comment has been minimized.

Copy link

commented Oct 20, 2015

BTW, this issue is not platform-specific. My platform is CPython 2.7.9 at Win7 (x64).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.