Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: DataFrame groupby with fast transform #12737

Closed
jreback opened this issue Mar 29, 2016 · 2 comments
Closed

PERF: DataFrame groupby with fast transform #12737

jreback opened this issue Mar 29, 2016 · 2 comments
Labels
Enhancement Groupby Performance Memory or execution speed performance
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Mar 29, 2016

from SO

import pandas as pd
import numpy as np

df = pd.DataFrame({'group': np.repeat(np.arange(1000), 10),
                   'B': np.nan,
                   'C': np.nan})

df.ix[4::10, 'B':'C'] = 5 # every 4th row of a group is non-null

df.groupby('group').transform('first')

This is then iterating over groups. Last I can see this was changed is: here. My recollection is that this was ONLY supposed to hit in a special case, and the general case is simply a repeat based on the indices.

This seems to be hitting in all cases makes transform back to super SLOW.

@jreback jreback added Prio-high Groupby Performance Memory or execution speed performance labels Mar 29, 2016
@jreback jreback added this to the 0.18.1 milestone Mar 29, 2016
@jreback
Copy link
Contributor Author

jreback commented Mar 29, 2016

cc @ajcr
cc @chris-b1
cc @evanpw

@jreback
Copy link
Contributor Author

jreback commented Mar 29, 2016

So we have a fast path for Series transforms, but not for DataFrame transforms.

In [17]: result1 = g.transform('first')

In [18]: result2 = pd.concat([g.B.transform('first'), g.C.transform('first')], keys=['B','C'], axis=1)

In [19]: result1.equals(result2)
Out[19]: True

In [20]: %timeit g.transform('first')
10 loops, best of 3: 170 ms per loop

In [21]: %timeit pd.concat([g.B.transform('first'), g.C.transform('first')], keys=['B','C'], axis=1)
1000 loops, best of 3: 2 ms per loop

@jreback jreback changed the title PERF: regression in transform perf PERF: DataFrame groupby with fast transform Mar 29, 2016
@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant