-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: homogeneous concat #52685
PERF: homogeneous concat #52685
Conversation
I ran this through the code in #50652 and it looks good, see below. But this only works for float64, so e.g. won't show a difference in ASV in Can this not already be generalized all numpy floats and ints? What happens if you concatenate e.g. int8 and float32? EDT: Ok, I see that this depends on
|
It would be easy to extend this to float32, others would take more effort |
If we use |
The trouble is determining what the desired common dtype is. |
Yes, I can see it now, if the columns are not all the same it gets very complicated for ints. I think this is good, but we should def. also get this performance boost for float32 IMO. |
Also would be good to have a whatsnew note |
doc/source/whatsnew/v2.1.0.rst
Outdated
@@ -91,6 +91,7 @@ Other enhancements | |||
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (:issue:`52084`) | |||
- Let :meth:`DataFrame.to_feather` accept a non-default :class:`Index` and non-string column names (:issue:`51787`) | |||
- Performance improvement in :func:`read_csv` (:issue:`52632`) with ``engine="c"`` | |||
- Performance improvement in :func:`concat` with homogeneous dtypes (:issue:`52685`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
homogeneous dtypes
-> homogeneous float dtypes
.
@@ -200,6 +202,21 @@ def concatenate_managers( | |||
if concat_axis == 0: | |||
return _concat_managers_axis0(mgrs_indexers, axes, copy) | |||
|
|||
if len(mgrs_indexers) > 0 and mgrs_indexers[0][0].nblocks > 0: | |||
first_dtype = mgrs_indexers[0][0].blocks[0].dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any change to do an upcast if we have one/some float64 and one/some float32? That would generalize this pfastpath to covers all floats and not make a distinction between float32/float64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would change behavior in some cases
whatsnew added, float32 handled, + green |
Thanks @jbrockmendel |
Hey, This PR caused a rather big slowdown on c-aligned ndarrays, see also discussion in #52786: >>> frame_c = pd.DataFrame(np.zeros((10000, 200), dtype=np.float32, order="C"))
>>> %timeit pd.concat([frame_c] * 20, axis=0, ignore_index=False)
45.1 ms ± 166 µs per loop # after this PR
13.2 ms ± 126 µs per loop # before this PR |
I’m out of town this week, will take a look next week
…On Wed, Apr 26, 2023 at 4:52 AM Terji Petersen ***@***.***> wrote:
Hey, This PR caused a rather big slowdown on c-aligned ndarrays, see also
discussion in #52786 <#52786>:
>>> frame_c = pd.DataFrame(np.zeros((10000, 200), dtype=np.float32, order="C"))>>> %timeit pd.concat([frame_c] * 20, axis=0, ignore_index=False)45.1 ms ± 166 µs per loop # after this PR132. ms ± 126 µs per loop # before this PR
—
Reply to this email directly, view it on GitHub
<#52685 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5UM6BTL6HJL5UKJPB6X6LXDED7XANCNFSM6AAAAAAW7S3GE4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@phofl i have what i think should address this, but since i cant reproduce the slowdown also can't check if it works. can you try adding the following at the top of _concat_homogeneous_fastpath
|
Yep that works! Thx for looking into it. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Using the num_rows=100, num_cols=50_000, num_dfs=7 case from #50652, I'm getting 54.15s on main vs 1.21s on this branch.