Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doctest in python2.7 can't handle non-ascii characters #53655

Closed
hugo mannequin opened this issue Jul 29, 2010 · 4 comments
Closed

doctest in python2.7 can't handle non-ascii characters #53655

hugo mannequin opened this issue Jul 29, 2010 · 4 comments
Labels
stdlib Python modules in the Lib dir tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@hugo
Copy link
Mannequin

hugo mannequin commented Jul 29, 2010

BPO 9409
Nosy @vstinner, @benjaminp, @ezio-melotti, @merwok, @florentx
Files
  • ascii.txt: not working doctest file
  • non-ascii.txt: working doctest file
  • example.py: runner file
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-10-14.21:48:31.221>
    created_at = <Date 2010-07-29.01:24:06.074>
    labels = ['tests', 'type-bug', 'library', 'expert-unicode']
    title = "doctest in python2.7 can't handle non-ascii characters"
    updated_at = <Date 2010-10-14.21:48:31.219>
    user = 'https://bugs.python.org/hugo'

    bugs.python.org fields:

    activity = <Date 2010-10-14.21:48:31.219>
    actor = 'flox'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-10-14.21:48:31.221>
    closer = 'flox'
    components = ['Library (Lib)', 'Tests', 'Unicode']
    creation = <Date 2010-07-29.01:24:06.074>
    creator = 'hugo'
    dependencies = []
    files = ['18242', '18243', '18244']
    hgrepos = []
    issue_num = 9409
    keywords = []
    message_count = 4.0
    messages = ['111881', '111882', '112233', '118719']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'benjamin.peterson', 'ezio.melotti', 'eric.araujo', 'flox', 'hugo']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue9409'
    versions = ['Python 2.7']

    @hugo
    Copy link
    Mannequin Author

    hugo mannequin commented Jul 29, 2010

    When trying to run my test suite I had a problem with python2.7. My suite ran 100% in Python2.4, Python2.5, Python2.6 and Python3.2a0, so I thought it would be a kind of doctest flaw.

    Taking a look at the code, there is the following in doctest.py:1331:

                source = example.source.encode('ascii', 'backslashreplace')

    The problem is that my doctest file had non-ascii files and I got trouble.

    hugo@hugo-laptop:~/issue$ python2.7 example.py
    non-ascii.txt
    Doctest: non-ascii.txt ... ok
    ascii.txt
    Doctest: ascii.txt ... ERROR

    ======================================================================
    ERROR: ascii.txt
    Doctest: ascii.txt
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/doctest.py", line 2148, in runTest
        test, out=new.write, clear_globs=False)
      File "/usr/local/lib/python2.7/doctest.py", line 1382, in run
        return self.__run(test, compileflags, out)
      File "/usr/local/lib/python2.7/doctest.py", line 1272, in __run
        got += _exception_traceback(exc_info)
      File "/usr/local/lib/python2.7/doctest.py", line 244, in _exception_traceback
        traceback.print_exception(exc_type, exc_val, exc_tb, file=excout)
      File "/usr/local/lib/python2.7/traceback.py", line 125, in print_exception
        print_tb(tb, limit, file)
      File "/usr/local/lib/python2.7/traceback.py", line 69, in print_tb
        line = linecache.getline(filename, lineno, f.f_globals)
      File "/usr/local/lib/python2.7/linecache.py", line 14, in getline
        lines = getlines(filename, module_globals)
      File "/usr/local/lib/python2.7/doctest.py", line 1331, in __patched_linecache_getlines
        source = example.source.encode('ascii', 'backslashreplace')
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)

    Ran 2 tests in 0.006s

    FAILED (errors=1)
    hugo@hugo-laptop:~/issue$

    Taking an inner look at doctest.py in python2.6 and python2.7 I realized there is another inconsistency with filenames in both (I was lucky to try at first a filename that doesn't match the regex):

        __LINECACHE_FILENAME_RE = re.compile(r'<doctest '
                                             r'(?P<name>[\w\.]+)'
                                             r'\[(?P<examplenum>\d+)\]>$')

    Well, <name> is the file name, but filenames are not only composed of alphanums and dots. Maybe it should be slightly different, like:

        __LINECACHE_FILENAME_RE = re.compile(r'<doctest '
                                             r'(?P<name>.+?)'
                                             r'\[(?P<examplenum>\d+)\]>$', re.UNICODE)

    Because we can have several kinds of names. But it is not the top of the iceberg, anyaway.

    To solve my problem, I propose moving back that first snippet to how it was in python2.6. The diff would be:

    --- /usr/local/lib/python2.7/doctest.py	2010-07-28 22:07:01.272234398 -0300
    +++ doctest.py	2010-07-28 22:20:42.000000000 -0300
    @@ -1328,8 +1328,7 @@
             m = self.__LINECACHE_FILENAME_RE.match(filename)
             if m and m.group('name') == self.test.name:
                 example = self.test.examples[int(m.group('examplenum'))]
    -            source = example.source.encode('ascii', 'backslashreplace')
    -            return source.splitlines(True)
    +            return example.source.splitlines(True)
             else:
                 return self.save_linecache_getlines(filename, module_globals)

    @hugo hugo mannequin added the type-bug An unexpected behavior, bug, or error label Jul 29, 2010
    @ezio-melotti
    Copy link
    Member

    This change has been introduced in r79307 (see bpo-7667).
    The error seems to be raised because example.source is not unicode so it gets decoded implicitly before getting encoded with ascii+backslashreplace. I don't know if example.source is always supposed to be str or if the type might be different in some situations.

    @ezio-melotti ezio-melotti added stdlib Python modules in the Lib dir tests Tests in the Lib/test dir topic-unicode labels Jul 29, 2010
    @merwok
    Copy link
    Member

    merwok commented Aug 1, 2010

    Adding the release manager to nosy so that he can confirm this bugfix can make it in the next 2.7 release before there’s more effort on that.

    @florentx
    Copy link
    Mannequin

    florentx mannequin commented Oct 14, 2010

    Fixed in 2.7 with r85496 and r85501. Thank you.

    (in 3.2 only tests, r85495 and r85500)

    @florentx florentx mannequin closed this as completed Oct 14, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants