Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make categorical repr nicer. #4368

Closed
wants to merge 5 commits into from

Conversation

@jseabold
Copy link
Contributor

commented Jul 25, 2013

Make looking at Categorical types a little nicer. Needs some tests still, but works fine locally so far.

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2013

Added tests.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2013

@jseabold this is nice......can you hook up travis? (prob just need to flip the switch), then

git checkin --amend -C HEAD to recommit last commit

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Should be going now. Ping me if it's not. Maybe needed a setup lag.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2013

does not appear to have taken.....

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Appears to be going now.

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Well, I don't see the banner here, but I see it running on travis. I dunno.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

travis usually takes a couple of minutes to actually start, that's how it rolls

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

i see the banner

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Anyone know off the top of their head the 2.6 errors? I assume it's a unicode comparison issue...

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

that kind of looks like a bug

python 2.6:

Categorical:
[a, b, b, a, a, c, c, c]
Levels (3): Index([a, b, c], dtype=object)

In [10]: factor.levels
Out[10]: Index([u'a', u'b', u'c'], dtype=object)

python 2.7

In [1]: factor = Categorical.from_array(['a','b','b','a','a','c','c','c'])

In [2]: factor
Out[2]:
Categorical:
[a, b, b, a, a, c, c, c]
Levels (3): Index(['a', 'b', 'c'], dtype=object)

In [3]: factor.levels
Out[3]: Index([u'a', u'b', u'c'], dtype=object)
@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

or possibly a difference in numpy since categorical repr calls np.array_repr...my python 2.6 has np 1.6.1 and py 2.7 up there has 1.7.1

def _repr_footer(self):
levheader = 'Levels (%d): ' % len(self.levels)
#TODO: should max_line_width respect a setting?
levstring = np.array_repr(self.levels, max_line_width=60)

This comment has been minimized.

Copy link
@jreback

jreback Jul 26, 2013

Contributor

yep...this should be com.pprint_thing(self.levels)

This comment has been minimized.

Copy link
@jreback

jreback Jul 26, 2013

Contributor

or maybe self.levels.format()

This comment has been minimized.

Copy link
@cpcloud

cpcloud Jul 26, 2013

Member

self.levels.format() does the correct thing

This comment has been minimized.

Copy link
@jseabold

jseabold Jul 26, 2013

Author Contributor

I'm almost inclined to just fix the tests here and live with the numpy inconsistency unless there's another way around this. pprint_thing drops the object name and I like knowing levels is an Index. E.g.,

>>> np.array_repr(np.arange(3))
'array([0, 1, 2])'

>>> com.pprint_thing(np.arange(3))
u'[0, 1, 2]'

This comment has been minimized.

Copy link
@cpcloud

cpcloud Jul 26, 2013

Member

what's wrong with self.levels.format()?

This comment has been minimized.

Copy link
@jseabold

jseabold Jul 26, 2013

Author Contributor

It's a list (for 1.8.x)? So either array_repr fails with an error or I'm back to pprint_thing. Either way I lose the Index([...]) information.

This comment has been minimized.

Copy link
@jseabold

jseabold Jul 26, 2013

Author Contributor

Or of course, leaving it, I lose the Index info. I could just do something like "Index(%s)" but I hoped to avoid this in case levels is ever not an Index for any reason, but I guess the tests will catch a change like that now.

This comment has been minimized.

Copy link
@cpcloud

cpcloud Jul 26, 2013

Member

oh duh sorry yes you're right.

This comment has been minimized.

Copy link
@jseabold

jseabold Jul 26, 2013

Author Contributor

In a delicious bit of irony I'm the one that changed the unicode repr of numpy object arrays.

numpy/numpy#459

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

yep np.array2string is different in those two versions of numpy

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Thanks. For the record, y'all beat my attempts to build Python 2.6, the whole numpy stack, and pandas in the background of my work.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

@jseabold building the stack is not fun. i get strange issues with matplotlib and tkinter so i can't do any plotting...i'm also accumulating a bit of enmity for the timedelta64 (non-)functionality of numpy 1.6 (it silently doesn't and then sometimes does convert units). i would suggest avoiding numpy 1.6 like the plague if you have the option. python 2.6 is probably ok

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2013

hah....we have had fair share of issues with py2.6 lately!

here's a fun bug in python (that was actually not fixed): http://bugs.python.org/issue2325

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Yes, I switched to openblas recently and my hacked together numpy/distutils is not working for scipy (numpy tests pass) for 2.6.

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

Fixed the tests.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

gets bit unwieldy for a large number of levels (i tried 100), but my opinion here doesn't matter that much i never use Categorical

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2013

I left a TODO in there for this. It's admittedly not handled, but I can't imagine having categorical variables with too many categories. The degrees of freedom loss is too prohibitive in estimation. Maybe it could be useful in some machine learning contexts, but I don't know how ML people use factors in R. I suspect they don't.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jul 26, 2013

totally fine by me. like i said, i never found a need for this

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2013

@jseabold this looks ready? maybe squash a bit

@cpcloud ?

@cpcloud

This comment has been minimized.

Copy link
Member

commented Aug 23, 2013

fine by me

@hayd

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2013

Too late to throw out case the repr being valid python (so it can eval itself) :s ?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2013

@jseabold this almost got lost...

can you rebase and we can merge it in....

thxs

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2013

Rebased. There was a merge conflict, which I didn't check too closely, so make sure tests pass.

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2013

Any idea what's going on here?

@jseabold

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2013

Looks like a circular import introduced in 85f191c?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2013

merged via 7c76086

thanks @jseabold

@jreback jreback closed this Sep 26, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.