Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: homogeneous concat #52685

Merged
merged 4 commits into from
Apr 19, 2023
Merged

Conversation

jbrockmendel
Copy link
Member

Using the num_rows=100, num_cols=50_000, num_dfs=7 case from #50652, I'm getting 54.15s on main vs 1.21s on this branch.

@topper-123
Copy link
Contributor

topper-123 commented Apr 16, 2023

I ran this through the code in #50652 and it looks good, see below.

But this only works for float64, so e.g. won't show a difference in ASV in join_merge.ConcatDataFrames, because that particular perf test is in float32.

Can this not already be generalized all numpy floats and ints? What happens if you concatenate e.g. int8 and float32?

EDT: Ok, I see that this depends on take_2d_axis0_{dtype1}_{dtype2} and there is no int8 -> float32 specifically and probably other conversions too. But if we (1) make all the take_2d_axis0_{dtype1}_{dtype2} functions and (2) find the common dtype of the concatenated dataframe, then should be easy? Though you mention "This will be simpler once JoinUnit.is_na behavior is deprecated", so maybe you have some followup, that will make that not necessary?

Testcase 1
NUM_ROWS: 100, NUM_COLS: 1000, NUM_DFS: 3
Pandas: 0.01
Manual: 0.01
True
...
Testcase 7
NUM_ROWS: 100, NUM_COLS: 10000, NUM_DFS: 7
Pandas: 0.14
Manual: 0.19
True
Testcase 8
NUM_ROWS: 100, NUM_COLS: 10000, NUM_DFS: 9
Pandas: 0.23
Manual: 0.30
True
...
Testcase 20
NUM_ROWS: 200, NUM_COLS: 10000, NUM_DFS: 9
Pandas: 0.40
Manual: 0.52
True
Testcase 21
NUM_ROWS: 200, NUM_COLS: 50000, NUM_DFS: 3
Pandas: 0.40
Manual: 0.43
True
Testcase 22
NUM_ROWS: 200, NUM_COLS: 50000, NUM_DFS: 5
Pandas: 0.91
Manual: 1.98
True
Testcase 23
NUM_ROWS: 200, NUM_COLS: 50000, NUM_DFS: 7
Pandas: 2.34
Manual: 7.30
True
Testcase 24
NUM_ROWS: 200, NUM_COLS: 50000, NUM_DFS: 9
Pandas: 10.25
Manual: 13.73
True

@topper-123 topper-123 added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 16, 2023
@topper-123 topper-123 added this to the 2.1 milestone Apr 16, 2023
@jbrockmendel
Copy link
Member Author

It would be easy to extend this to float32, others would take more effort

@topper-123
Copy link
Contributor

If we use pd.core.array_algos.take._take_2d_axis0_dict to match up different dtypes to the desired common dtype?

@jbrockmendel
Copy link
Member Author

The trouble is determining what the desired common dtype is.

@topper-123
Copy link
Contributor

Yes, I can see it now, if the columns are not all the same it gets very complicated for ints.

I think this is good, but we should def. also get this performance boost for float32 IMO.

@mroeschke
Copy link
Member

Also would be good to have a whatsnew note

@@ -91,6 +91,7 @@ Other enhancements
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (:issue:`52084`)
- Let :meth:`DataFrame.to_feather` accept a non-default :class:`Index` and non-string column names (:issue:`51787`)
- Performance improvement in :func:`read_csv` (:issue:`52632`) with ``engine="c"``
- Performance improvement in :func:`concat` with homogeneous dtypes (:issue:`52685`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

homogeneous dtypes -> homogeneous float dtypes .

@@ -200,6 +202,21 @@ def concatenate_managers(
if concat_axis == 0:
return _concat_managers_axis0(mgrs_indexers, axes, copy)

if len(mgrs_indexers) > 0 and mgrs_indexers[0][0].nblocks > 0:
first_dtype = mgrs_indexers[0][0].blocks[0].dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any change to do an upcast if we have one/some float64 and one/some float32? That would generalize this pfastpath to covers all floats and not make a distinction between float32/float64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would change behavior in some cases

@jbrockmendel
Copy link
Member Author

whatsnew added, float32 handled, + green

@mroeschke mroeschke merged commit 4fef063 into pandas-dev:main Apr 19, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@topper-123
Copy link
Contributor

topper-123 commented Apr 26, 2023

Hey, This PR caused a rather big slowdown on c-aligned ndarrays, see also discussion in #52786:

>>> frame_c = pd.DataFrame(np.zeros((10000, 200), dtype=np.float32, order="C"))
>>> %timeit pd.concat([frame_c] * 20, axis=0, ignore_index=False)
45.1 ms ± 166 µs per loop  # after this PR
13.2 ms ± 126 µs per loop  # before this PR

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Apr 26, 2023 via email

@jbrockmendel
Copy link
Member Author

@phofl i have what i think should address this, but since i cant reproduce the slowdown also can't check if it works. can you try adding the following at the top of _concat_homogeneous_fastpath

    if all(not indexers for _, indexers in mgrs_indexers):
        # https://github.com/pandas-dev/pandas/pull/52685#issuecomment-1523287739
        arrs = [mgr.blocks[0].values.T for mgr, _ in mgrs_indexers]
        arr = np.concatenate(arrs).T
        bp = libinternals.BlockPlacement(slice(shape[0]))
        nb = new_block_2d(arr, bp)
        return nb

@phofl
Copy link
Member

phofl commented Jun 20, 2023

Yep that works! Thx for looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: concat slow, manual concat through reindexing enhances performance
4 participants