New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: chained reshaping ops #11485

Closed
jreback opened this Issue Oct 30, 2015 · 8 comments

Comments

@jreback
Contributor

jreback commented Oct 30, 2015

accept callables

In [4]: df = DataFrame({'A' : [1,2,3,4], 'B' : ['a',np.nan,'b','a']})

In [5]: df
Out[5]: 
   A    B
0  1    a
1  2  NaN
2  3    b
3  4    a

an operation that changes the shape of the DataFrame

In [9]: res = df.dropna()

In [10]: res[res.B=='a']
Out[10]: 
   A  B
0  1  a
3  4  a

can be done like this

In [8]: df.dropna().pipe(lambda x: x[x.B=='a'])
Out[8]: 
   A  B
0  1  a
3  4  a

SQL calls this select, which pandas has, but both select/filter are used for filtering LABELS (and not data).

I suppose making this work:

df.dropna().loc[lambda x: x[x.B=='a']] is maybe a slight enhancement of this

any thoughts?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 30, 2015

Contributor

cc @jorisvandenbossche @TomAugspurger @sinhrks @shoyer

FYI @TomAugspurger I really do like .pipe & chaining!

comes from this example

(tidy
     .dropna()
     .pipe(lambda df: df[df.team == 'Los Angeles Lakers'])
     .pipe(sns.FacetGrid, col='team', hue='team')
     .map(sns.barplot, "variable", "rest")
 )
Contributor

jreback commented Oct 30, 2015

cc @jorisvandenbossche @TomAugspurger @sinhrks @shoyer

FYI @TomAugspurger I really do like .pipe & chaining!

comes from this example

(tidy
     .dropna()
     .pipe(lambda df: df[df.team == 'Los Angeles Lakers'])
     .pipe(sns.FacetGrid, col='team', hue='team')
     .map(sns.barplot, "variable", "rest")
 )
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 30, 2015

Contributor

forgot about .query which is a nice soln for this actually

In [63]: df.dropna().query('B=="a"')            
Out[63]: 
   A  B
0  1  a
3  4  a
Contributor

jreback commented Oct 30, 2015

forgot about .query which is a nice soln for this actually

In [63]: df.dropna().query('B=="a"')            
Out[63]: 
   A  B
0  1  a
3  4  a

@jreback jreback closed this Oct 30, 2015

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Oct 30, 2015

Member

Yes, .query works but I hate coding in strings. I would be supportive of accepting a lambda in query, e.g., df.query(lambda df: df.B == 'a')

Member

shoyer commented Oct 30, 2015

Yes, .query works but I hate coding in strings. I would be supportive of accepting a lambda in query, e.g., df.query(lambda df: df.B == 'a')

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 30, 2015

Contributor

yeh, then of course

df[lambda x: x.B == 'a']
df.loc[lambda x: x.B == 'a']

for consistency

ok, then it allows easy chaining based on values, which is nice to do (and then is consistent with how .assign works, e.g. accepts an expression or a lambda)

Contributor

jreback commented Oct 30, 2015

yeh, then of course

df[lambda x: x.B == 'a']
df.loc[lambda x: x.B == 'a']

for consistency

ok, then it allows easy chaining based on values, which is nice to do (and then is consistent with how .assign works, e.g. accepts an expression or a lambda)

@jreback jreback reopened this Oct 30, 2015

@jreback jreback added this to the Next Major Release milestone Oct 30, 2015

@jreback jreback added the Enhancement label Oct 30, 2015

@shoyer

This comment has been minimized.

Show comment
Hide comment
@shoyer

shoyer Oct 30, 2015

Member

Yep, putting support for functions in indexing makes sense to me.

Member

shoyer commented Oct 30, 2015

Yep, putting support for functions in indexing makes sense to me.

@max-sixty

This comment has been minimized.

Show comment
Hide comment
@max-sixty

max-sixty Feb 3, 2016

Contributor

Would be nice if this also worked on Series (unlike query which currently just works on DFs).

And +1 for @shoyer's suggestion re extending query for this: df.query(lambda df: df.B == 'a'), despite the inability to set those values.

Contributor

max-sixty commented Feb 3, 2016

Would be nice if this also worked on Series (unlike query which currently just works on DFs).

And +1 for @shoyer's suggestion re extending query for this: df.query(lambda df: df.B == 'a'), despite the inability to set those values.

@kawochen

This comment has been minimized.

Show comment
Hide comment
@kawochen

kawochen Feb 3, 2016

Contributor

@MaximilianR I'm using your example from #12226:
What's wrong with the following?

In [15]: pd.Series(range(10)).mul(5).pipe(lambda x: x**2).pipe(lambda x: x-500).pipe(lambda x: x[x>200])
Out[15]:
6     400
7     725
8    1100
9    1525
dtype: int64
Contributor

kawochen commented Feb 3, 2016

@MaximilianR I'm using your example from #12226:
What's wrong with the following?

In [15]: pd.Series(range(10)).mul(5).pipe(lambda x: x**2).pipe(lambda x: x-500).pipe(lambda x: x[x>200])
Out[15]:
6     400
7     725
8    1100
9    1525
dtype: int64
@max-sixty

This comment has been minimized.

Show comment
Hide comment
@max-sixty

max-sixty Feb 4, 2016

Contributor

@kawochen Ah - that's very nice. Not sure how I missed that. Thanks!

Contributor

max-sixty commented Feb 4, 2016

@kawochen Ah - that's very nice. Not sure how I missed that. Thanks!

nps added a commit to nps/pandas that referenced this issue May 17, 2016

ENH: Allow where/mask/Indexers to accept callable
closes pandas-dev#12533
closes pandas-dev#11485

Author: sinhrks <sinhrks@gmail.com>

Closes pandas-dev#12539 from sinhrks/where and squashes the following commits:

6b5d618 [sinhrks] ENH: Allow .where to accept callable as condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment