Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: assert_frame_equal and assert_series_equal for frames/series with a MultiIndex #55949

Merged
merged 3 commits into from Nov 14, 2023

Conversation

lukemanley
Copy link
Member

import pandas as pd

N = 500

dates = pd.date_range("2000-01-01", periods=N)
strings = pd._testing.makeStringIndex(N)

mi = pd.MultiIndex.from_product([dates, strings])

df1 = pd.DataFrame({"a": 1}, index=mi)
df2 = df1.copy(deep=True)

%timeit pd.testing.assert_frame_equal(df1, df2)

# 4.27 s ± 245 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)    -> main
# 10 ms ± 415 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  -> PR

@lukemanley lukemanley added Testing pandas testing functions or related to the test suite Performance Memory or execution speed performance labels Nov 13, 2023
@lukemanley lukemanley added this to the 2.2 milestone Nov 13, 2023
@mroeschke mroeschke merged commit 00d88e9 into pandas-dev:main Nov 14, 2023
40 checks passed
@mroeschke
Copy link
Member

Nice! Thanks @lukemanley

@jbrockmendel
Copy link
Member

nice!

)
assert_numpy_array_equal(left.codes[level], right.codes[level])
except AssertionError:
# cannot use get_level_values here because it can change dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it change dtype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats an old comment and I'm not sure its valid anymore. opened #55971 to remove

@@ -576,6 +591,9 @@ def raise_assert_detail(

{message}"""

if isinstance(index_values, Index):
index_values = np.array(index_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array->asarray can avoid a copy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #55971 to update

@lukemanley lukemanley deleted the perf-assert-frame-equal branch November 16, 2023 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants