Experiment in doctests for strings on Python 2.x and 3.x
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
tests
.gitignore
.travis.yml
CHANGES.rst
LICENSE
MANIFEST.in
README.rst
pretext.py
requirements.txt
setup.cfg
setup.py
tox.ini

README.rst

pretext

Write doctests with strings that work in Python 2.x and Python 3.x.

Just import pretext and call pretext.activate(). By default Python 3.x repr() behaviour is used

>>> import pretext; pretext.activate()
>>> b'Byte strings now have the same repr on Python 2.x & 3.x'
b'Byte strings now have the same repr on Python 2.x & 3.x'
>>> u'Unicode strings & nested strings work too'.split()
['Unicode', 'strings', '&', 'nested', 'strings', 'work', 'too']

The problem

Suppose you have the following functions and doctests

def textfunc():
    """
    >>> textfunc()
    u'A unicode string'
    """
    return u'A unicode string'

def bytesfunc()
    """
    >>> bytesfunc()
    b'A byte string'
    """
    return b'A byte string'

Without pretext

  • textfunc() will pass on Python 2.x, but fail on 3.x.
  • bytesfunc() will fail on Python 2.x, but pass on 3.x.

This is because doctest compares the expected result (from the doc-string) with the repr() of the returned value.

  • repr(u'foo') returns u'foo' on Python 2.x, but 'foo' on 3.x
  • repr(b'bar') returns 'bar' on Python 2.x, but b'bar' on 3.x

If the doctests are editted to remove the prefixes

def textfunc():
    """
    >>> textfunc()
    'A unicode string'
    """
    return u'A unicode string'

def bytesfunc()
    """
    >>> bytesfunc()
    'A byte string'
    """
    return b'A byte string'

then the failures will be reversed

  • textfunc() will now fail on Python 2.x, but pass on 3.x.
  • bytesfunc() will now pass on Python 2.x, but fail on 3.x.

The hack

Replace repr() and sys.displayhook with versions that always prefix string literals, regardless of the Python version. Now the doctests can

  • directly show the string values returned by functions/methods, without resorting to print(), or .encode() etc
  • successfully test the examples on all Python versions

Proof of concept:

r"""
>>> import sys
>>> import pretext
>>> myrepr = bar.PrefixRepr()
>>> repr = myrepr.repr
>>> def _displayhook(value):
...     if value is not None:
...         sys.stdout.write(myrepr.repr(value))
>>> sys.displayhook = _displayhook
>>> u''
u''
>>> b''
b''
>>> bytes()
b''
>>> b'\0'
b'\x00'
>>> b"'"
b"'"
"""

Alternatives

If you're ready to run screaming at the above, there are alternatives

  • pytest provides #doctest: ALLOW_UNICODE and (from 2.9.0) #doctest: ALLOW_BYTES directives

  • lxml includes lxml.html.usedoctest and lxml.usedoctest modules for HTML and XML.

  • Wrap byte-string returns in bytearray(). repr(bytearray(b'abc')) == "bytearray(b'abc'))" on all versions of python that have bytearray() (2.6 onward) e.g.

    >>> bytearray(bytesfunc())
    bytearray(b'I return a byte (binary) string')
  • Support Python 3.x exclusively

  • Use print(bytesfunc().decode('ascii')) and choose your input values carefully

  • Use #doctest: +SKIP

  • Use #doctest: +ELLIPSIS