Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add set operations to Series objects #4480
This is something I would have needed several times in the last half year. Using
To recap (since there's a lot of tangential discussion in this thread), I think there is a good case to be made for a
(set, set) -> set:
(set, set) -> bool:
(set, obj) -> bool:
@h-vetinari sets are not efficiently stored, so this offers only an api benefit, which I have yet to see and interesting use case. You can use
@jreback, well, I was hoping not just for an API improvement, but some fast cython code to back it up (like for the
Do I understand you correctly that you propose to work with
import pandas as pd
Difference operator works between Series.
df['a - b'] = df['a'] - df['b']
Set intersection operactor doesn't work between Series.
df['a & b'] = df['a'] & df['b']
A very slow way to do intersection between Series:
df['a & b'] = df.apply(lambda row: row['a'] & row['b'], axis = 1)
I found it is much more faster to do intersection this way:
df['a & b'] = df['a'] - (df['a'] - df['b'])
I don't know why.
@chinchillaLiao : cool, didn't know set difference worked on Series! It's the only one to work on pandas level though.
But an even better work-around is to go down to the
(@jreback; my comment half a year ago about a
In terms of usability, the really cool thing is that this also works for
Inefficient as opposed to what? Some situations fundamentally require processing sets.
And even so, why make treating sets harder than it needs to be? I used to think (see my response from December) that this wasn't implemented at all, but since it's in
@h-vetinari what is your use case for this? How does this come up?
IMO a nice way to contribute this would be with an extension type in a library. Depending on your use case. If you have a smallish finite super-set you can describe each set as a bitarray (and hence do set operations cheaply).
Note: This is quite different from the original issue: set operations like
OK, I'm thinking about contributing that. Since the numpy-methods I showed above are actually not nan-safe,
I'm back to thinking that a set accessor for Series (not for Index) would be the best. And, since I wouldn't have to write the cython for those methods, I think I can come up with such a wrapper relatively easily.
I've chosen to comment on this issue (rather than opening a new one) due to the title, which imo has a much larger scope. I could easily open a more general issue, if desired.