Permalink
Fetching contributors…
Cannot retrieve contributors at this time
36 lines (30 sloc) 1.65 KB

Python 2 versus Python 3

.. currentmodule:: patsy

The biggest difference between Python 2 and Python 3 is in their string handling, and this is particularly relevant to Patsy since it parses user input. We follow a simple rule: input to Patsy should always be of type str. That means that on Python 2, you should pass byte-strings (not unicode), and on Python 3, you should pass unicode strings (not byte-strings). Similarly, when Patsy passes text back (e.g. :attr:`DesignInfo.column_names`), it's always in the form of a str.

In addition to this being the most convenient for users (you never need to use any b"weird" u"prefixes" when writing a formula string), it's actually a necessary consequence of a deeper change in the Python language: in Python 2, Python code itself is represented as byte-strings, and that's the only form of input accepted by the :mod:`tokenize` module. On the other hand, Python 3's tokenizer and parser use unicode, and since Patsy processes Python code, it has to follow suit.

There is one exception to this rule: on Python 2, as a convenience for those using from __future__ import unicode_literals, the high-level API functions :func:`dmatrix`, :func:`dmatrices`, :func:`incr_dbuilders`, and :func:`incr_dbuilder` do accept unicode strings -- BUT these unicode string objects are still required to contain only ASCII characters; if they contain any non-ASCII characters then an error will be raised. If you really need non-ASCII in your formulas, then you should consider upgrading to Python 3. Low-level APIs like :meth:`ModelDesc.from_formula` continue to insist on str objects only.