Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a unicode string to dmatrices results in a misleading error message #53

Closed
gnovak opened this issue Jan 31, 2015 · 1 comment

Comments

@gnovak
Copy link

gnovak commented Jan 31, 2015

Doing this works fine:
foo = pd.DataFrame(dict(a=[1,2.0], b=[4,5.0]))
patsy.dmatrices('a ~ b', foo)

However, if the formula is given by a unicode string:
patsy.dmatrices(u'a ~ b', foo)

An exception is raised:
ValueError: design matrix must be real-valued floating point

The exception is very misleading about the nature of the problem. I thought an inf or nan had crept into the dataframe, but no bad values were there.

One possible fix is to send the first argument of dmatrices through the str() function:
patsy.dmatrices(str(u'a ~ b'), foo)

Another possible fix is to raise print a different error message so that the user has some clear idea of the cause of the problem.

I'm using patsy version 0.3.0

Thanks!

njsmith added a commit that referenced this issue Oct 27, 2015
Apparently our disallowing unicode formula strings on Py2 was being
rather annoying for people using `from __future__ import
unicode_literals`. Start allowing them in limited circumstances.

Fixes gh-53.
@cjl2183
Copy link

cjl2183 commented Nov 7, 2015

thank you! this helped me. agreed that the error msg is extremely misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants