Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Faster transposition of frames with masked arrays #52836

Merged
merged 21 commits into from Jul 16, 2023

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Apr 21, 2023

Faster transpose of dataframes with homogenous masked arrays. This also helps when doing reductions with axis=1 as those currently transpose data before doing reductions.

Performance example:

In [1]:         import pandas as pd, numpy as np
   ...:         values = np.random.randn(100000, 4)
   ...:         values = values.astype(int)
   ...:         df = pd.DataFrame(values).astype("Int64")
>>> %timeit df.transpose()
1.91 s ± 3.08 ms per loop  # main
563 ms ± 2.48 ms per loop  # this PR

Running asv continuous -f 1.1 upstream/main HEAD -b reshape.ReshapeMaskedArrayDtype.time_transpose:

       before           after         ratio
     [c3f0aac1]       [43162428]
     <master>         <cln_managers>
-     13.5±0.06ms      1.82±0.01ms     0.14  reshape.ReshapeMaskedArrayDtype.time_transpose('Float64')
-     13.8±0.03ms      1.83±0.01ms     0.13  reshape.ReshapeMaskedArrayDtype.time_transpose('Int64')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

There may be possible to improve performance when frames have a common masked dtype, I intend to look into that in a followup.

@jorisvandenbossche
Copy link
Member

@topper-123 have you seen #52083 and #52689?
I think it is generally the same idea (directly accessing _data and _mask and creating the transposed versions of those separately, and then faster reconstructing the MaskedArrays from that), but a bit differently organized.

@rhshadrach rhshadrach added Reshaping Concat, Merge/Join, Stack/Unstack, Explode NA - MaskedArrays Related to pd.NA and nullable extension arrays Performance Memory or execution speed performance labels Apr 22, 2023
@rhshadrach
Copy link
Member

Using the timings in the OP, I'm seeing

625 ms ± 8.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- This PR
733 ms ± 3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # <-- #52689

I haven't yet looked into the implementation to see where this difference is coming from. I'm happy to go with this implementation, would like to see if we can get this in 2.0.2.

@topper-123
Copy link
Contributor Author

I've merged the latest changes to the main branch into this in case this is he one we go with.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1414,3 +1414,32 @@ def _groupby_op(
# res_values should already have the correct dtype, we just need to
# wrap in a MaskedArray
return self._maybe_mask_result(res_values, result_mask)


def transpose_homogenous_masked_arrays(masked_arrays):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

transposed_values = np.empty(transposed_shape, dtype=values[0].dtype)
for i, val in enumerate(values):
transposed_values[i, :] = val
transposed_values = transposed_values.copy(order="F")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is faster than np.concatenate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the same. I'll change to use np.concatenate, it's fewer lines.

if isinstance(self._mgr, ArrayManager):
masked_arrays = self._mgr.arrays
else:
masked_arrays = [blk.values for blk in self._mgr.blocks]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be safer to use iter_column_arrays? this might mess up if blocks get shuffled in a weird order

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it.

@topper-123 topper-123 force-pushed the faster_masked_transpose branch 3 times, most recently from 0a878f4 to f36143c Compare May 9, 2023 06:29
@topper-123
Copy link
Contributor Author

ping...

@rhshadrach
Copy link
Member

@topper-123 - have a conflict. @jbrockmendel - friendly ping.

topper-123 and others added 12 commits May 27, 2023 08:35
* Improve performance when selecting rows and columns

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update indexing.py

* Update v2.1.0.rst

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
…3088)

* PERF: Improve performance when accessing GroupBy.groups

* Update v2.1.0.rst

* Fix
* Improve performance when selecting rows and columns

* Update doc/source/whatsnew/v2.1.0.rst

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

* Update indexing.py

* Update v2.1.0.rst

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
phofl and others added 2 commits May 27, 2023 08:37
…3088)

* PERF: Improve performance when accessing GroupBy.groups

* Update v2.1.0.rst

* Fix
@topper-123
Copy link
Contributor Author

Ping.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jul 12, 2023
@topper-123
Copy link
Contributor Author

The PR has gone quite stale, I think it's best to close this PR, unless it can be approved for merging.

cc @jbrockmendel , @rhshadrach .

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; two minor requests and a conflict and I think we're good here.

pandas/core/arrays/masked.py Outdated Show resolved Hide resolved
pandas/core/frame.py Outdated Show resolved Hide resolved
@topper-123
Copy link
Contributor Author

I've updated.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach merged commit f1211e7 into pandas-dev:main Jul 16, 2023
36 checks passed
@rhshadrach rhshadrach added this to the 2.1 milestone Jul 16, 2023
@rhshadrach
Copy link
Member

Thanks @topper-123

@topper-123 topper-123 deleted the faster_masked_transpose branch July 16, 2023 20:26
@topper-123
Copy link
Contributor Author

topper-123 commented Jul 16, 2023

Thanks.

In total, after the other related PRs have been merged, we have:

n [1]:         import pandas as pd, numpy as np
   ...:         values = np.random.randn(100000, 4)
   ...:         values = values.astype(int)
   ...:         df = pd.DataFrame(values).astype("Int64")
>>> %timeit df.transpose()
1.91 s ± 3.08 ms per loop  # main at start
563 ms ± 2.48 ms per loop  # this PR, originally
375 ms ± 5.47 ms per loop  # this PR, now

An improvement, but very slow still...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants