Skip to content

Commit

Permalink
Merge 1a19ab7 into 34082b6
Browse files Browse the repository at this point in the history
  • Loading branch information
njsmith committed Oct 27, 2015
2 parents 34082b6 + 1a19ab7 commit d458e22
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 4 deletions.
8 changes: 8 additions & 0 deletions doc/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ Changes
v0.4.1
------

New features:

* On Python 2, accept ``unicode`` strings containing only ASCII
characters as valid formula descriptions in
the high-level formula API (:func:`dmatrix` and friends). This is
intended as a convenience for people using Python 2 with ``from
__future__ import unicode_literals``. (See :ref:`py2-versus-py3`.)

Bug fixes:

* Accept ``long`` as a valid integer type in the new
Expand Down
17 changes: 15 additions & 2 deletions doc/py2-versus-py3.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _py2-versus-py3:

Python 2 versus Python 3
========================

Expand All @@ -6,11 +8,11 @@ Python 2 versus Python 3
The biggest difference between Python 2 and Python 3 is in their
string handling, and this is particularly relevant to Patsy since
it parses user input. We follow a simple rule: input to Patsy
should always be of type `str`. That means that on Python 2, you
should always be of type ``str``. That means that on Python 2, you
should pass byte-strings (not unicode), and on Python 3, you should
pass unicode strings (not byte-strings). Similarly, when Patsy
passes text back (e.g. :attr:`DesignInfo.column_names`), it's always
in the form of a `str`.
in the form of a ``str``.

In addition to this being the most convenient for users (you never
need to use any b"weird" u"prefixes" when writing a formula string),
Expand All @@ -20,3 +22,14 @@ byte-strings, and that's the only form of input accepted by the
:mod:`tokenize` module. On the other hand, Python 3's tokenizer and
parser use unicode, and since Patsy processes Python code, it has
to follow suit.

There is one exception to this rule: on Python 2, as a convenience for
those using ``from __future__ import unicode_literals``, the
high-level API functions :func:`dmatrix`, :func:`dmatrices`,
:func:`incr_dbuilders`, and :func:`incr_dbuilder` do accept
``unicode`` strings -- BUT these unicode string objects are still
required to contain only ASCII characters; if they contain any
non-ASCII characters then an error will be raised. If you really need
non-ASCII in your formulas, then you should consider upgrading to
Python 3. Low-level APIs like :meth:`ModelDesc.from_formula` continue
to insist on ``str`` objects only.
13 changes: 13 additions & 0 deletions patsy/highlevel.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# ModelDesign doesn't work -- need to work with the builder set
# want to be able to return either a matrix or a pandas dataframe

import six
import numpy as np
from patsy import PatsyError
from patsy.design_info import DesignMatrix, DesignInfo
Expand Down Expand Up @@ -45,6 +46,18 @@ def _try_incr_builders(formula_like, data_iter_maker, eval_env,
raise PatsyError("bad value from %r.__patsy_get_model_desc__"
% (formula_like,))
# fallthrough
if not six.PY3 and isinstance(formula_like, unicode):
# Included for the convenience of people who are using py2 with
# __future__.unicode_literals.
try:
formula_like = formula_like.encode("ascii")
except UnicodeEncodeError:
raise PatsyError(
"On Python 2, formula strings must be either 'str' objects, "
"or else 'unicode' objects containing only ascii "
"characters. You passed a unicode string with non-ascii "
"characters. I'm afraid you'll have to either switch to "
"ascii-only, or else upgrade to Python 3.")
if isinstance(formula_like, str):
formula_like = ModelDesc.from_formula(formula_like)
# fallthrough
Expand Down
19 changes: 17 additions & 2 deletions patsy/test_highlevel.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

import sys
import __future__
import six
import numpy as np
from nose.tools import assert_raises
from patsy import PatsyError
Expand Down Expand Up @@ -74,7 +75,7 @@ def t(formula_like, data, depth,
depth += 1
def data_iter_maker():
return iter([data])
if (isinstance(formula_like, (str, ModelDesc, DesignInfo))
if (isinstance(formula_like, six.string_types + (ModelDesc, DesignInfo))
or (isinstance(formula_like, tuple)
and isinstance(formula_like[0], DesignInfo))
or hasattr(formula_like, "__patsy_get_model_desc__")):
Expand Down Expand Up @@ -258,7 +259,21 @@ def __patsy_get_model_desc__(self, data):
t("x + y", {"y": [1, 2], "x": [3, 4]}, 0,
True,
[[1, 3, 1], [1, 4, 2]], ["Intercept", "x", "y"])


# unicode objects on py2 (must be ascii only)
if not six.PY3:
# ascii is fine
t(unicode("y ~ x"),
{"y": [1, 2], "x": [3, 4]}, 0,
True,
[[1, 3], [1, 4]], ["Intercept", "x"],
[[1], [2]], ["y"])
# non-ascii is not (even if this would be valid on py3 with its less
# restrict variable naming rules)
eacute = "\xc3\xa9".decode("utf-8")
assert isinstance(eacute, unicode)
assert_raises(PatsyError, dmatrix, eacute, data={eacute: [1, 2]})

# ModelDesc
desc = ModelDesc([], [Term([LookupFactor("x")])])
t(desc, {"x": [1.5, 2.5, 3.5]}, 0,
Expand Down

0 comments on commit d458e22

Please sign in to comment.