Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#6911: Remove unidist specific workaround in '.from_pandas()' #6912

Merged
merged 1 commit into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -869,14 +869,7 @@ def split_pandas_df_into_partitions(
put_func = cls._partition_class.put
# even a full-axis slice can cost something (https://github.com/pandas-dev/pandas/issues/55202)
# so we try not to do it if unnecessary.
# FIXME: it appears that this optimization doesn't work for Unidist correctly as it
# doesn't explicitly copy the data when putting it into storage (as the rest engines do)
# causing it to eventially share memory with a pandas object that was provided by user.
# Everything works fine if we do this column slicing as pandas then would set some flags
# to perform in COW mode apparently (and so it wouldn't crash our tests).
# @YarShev promised that this will be eventially fixed on Unidist's side, but for now there's
# this hacky condition
if col_chunksize >= len(df.columns) and Engine.get() != "Unidist":
if col_chunksize >= len(df.columns):
col_parts = [df]
else:
col_parts = [
Expand Down
12 changes: 12 additions & 0 deletions modin/pandas/test/test_general.py
Original file line number Diff line number Diff line change
Expand Up @@ -971,3 +971,15 @@ def make_frame(lib):
def test_get(key):
modin_df, pandas_df = create_test_dfs({"col0": [0, 1]})
eval_general(modin_df, pandas_df, lambda df: df.get(key))


def test_df_immutability():
"""
Verify that modifications of the source data doesn't propagate to Modin's DataFrame objects.
"""
src_data = pandas.DataFrame({"a": [1]})

md_df = pd.DataFrame(src_data)
src_data.iloc[0, 0] = 100

assert md_df._to_pandas().iloc[0, 0] == 1