Skip to content

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Nov 19, 2025

I'm seeing a minor slowdown due to not assuming a C-contiguous mask. Alternative could be to prevent the mask from ever not being C-contiguous. Will be posting ASVs in a day or so.

size = 10_000_000
df = pd.DataFrame(
    {
        'a': np.random.randint(0, 100, size), 
        'b': np.random.randint(0, 100, size),
        'c': np.random.randint(0, 100, size),
        'd': np.random.randint(0, 100, size),
    },
    dtype='Int64',
)
gb =  df.groupby('a')
gb.cummax()  # populate caches
%timeit gb.cummax()
# 155 ms ± 52.4 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) <--- main
# 160 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) <--- PR

ASVs are clean.

asv compare -f 1.05 --only-changed 49f4a94a 19df340e

| Change   | Before [49f4a94a] <main>   | After [19df340e] <bug_gb_c_contiguous>   |   Ratio | Benchmark (Parameter)                                             |
|----------|----------------------------|------------------------------------------|---------|-------------------------------------------------------------------|
| -        | 24.8±0.1ms                 | 22.4±0.2ms                               |    0.91 | groupby.GroupByCythonAggEaDtypes.time_frame_agg('Float64', 'min') |

@rhshadrach rhshadrach marked this pull request as ready for review November 20, 2025 12:10
@jbrockmendel jbrockmendel merged commit 7bf6660 into pandas-dev:main Nov 20, 2025
39 of 41 checks passed
@jbrockmendel
Copy link
Member

thanks @rhshadrach

@rhshadrach rhshadrach deleted the bug_gb_c_contiguous branch November 20, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: ValueError: ndarray is not C-contiguous for cummax on nullable dtypes

2 participants