Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashable DataFrames #3882

Closed
hayd opened this issue Jun 13, 2013 · 12 comments · Fixed by #3884

Comments

@hayd
Copy link
Contributor

commented Jun 13, 2013

See this SO answer, they want to use memoisation.

OP points out this gets different results from (presumably it does it off id)

hash(pd.DataFrame([1,2,3])) 

Should they be hashable or should hash raise? (does it defeat the point of hashing if hashing is expensive?) cc @cpcloud

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jun 13, 2013

i've been thinking about this off and on. a somewhat related issue is that of the empty frame, i.e., DataFrame(). i think the PandasObject NDFrame should raise in all cases since that's what numpy does (overriding where it makes sense and is useful). i guess u could have the empty DataFrame be hashable but that seems like it's not worth the effort it would take to do, who needs to hash empty DataFrames?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2013

series raises on __hash__ as should all NDFrame, because they are mutable hashing is meaningless. OTOH, index are hashable, as they are immutable

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jun 13, 2013

Indexes are currently not hashable, since they try to hash the underlying ndarray.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2013

yes..you are right...oh well my argument is bad then!

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2013

Ah, you're right, I didn't even check series, it's just DataFrame which should raise.

Easy fix (raise __hash__ for generics) pr on the way.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jun 13, 2013

could implement this for indices...thoughts?

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jun 13, 2013

in that case u should probably hash the name, number of levels, class, and dtype

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2013

still have the mutability issue

though I suppose if the user accepts this it would be nice to deal with it

I would table to 0.12 for now

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2013

So, at the moment I've put this in NDFrame.

Maybe it should go in PandasObject, and then have objects which should hash override it (like if we can get indices to hash using that clever method). Are there any besides Index/MultiIndex?

@cpcloud

This comment has been minimized.

Copy link
Member

commented Jun 13, 2013

i vote for default to not hashable. better to alert the user to non-hashability rather than possibly giving misleading ideas about the hashability of things

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2013

agree....not hashability is/should be default until we change API

@hayd

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2013

ok I've moved it to PandasObject, removes repeated code too. :)

@hayd hayd closed this in #3884 Jun 13, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.