Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#5097: Stop using deprecated mangle_dup_cols. #5104

Merged
merged 4 commits into from
Oct 10, 2022

Conversation

mvashishtha
Copy link
Collaborator

Signed-off-by: mvashishtha mahesh@ponder.io

What do these changes do?

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: Warning: Reading in CSV - Deprecation Warning 'mangle_dupe_cols' #5097
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

Signed-off-by: mvashishtha <mahesh@ponder.io>
@mvashishtha mvashishtha requested a review from a team as a code owner October 7, 2022 05:58
@codecov
Copy link

codecov bot commented Oct 7, 2022

Codecov Report

Merging #5104 (150bee5) into master (0215a13) will increase coverage by 2.82%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5104      +/-   ##
==========================================
+ Coverage   84.56%   87.39%   +2.82%     
==========================================
  Files         256      257       +1     
  Lines       19346    19632     +286     
==========================================
+ Hits        16360    17157     +797     
+ Misses       2986     2475     -511     
Impacted Files Coverage Δ
modin/_compat/core/latest/base_io.py 100.00% <ø> (ø)
modin/_compat/pandas_api/latest/io.py 93.04% <100.00%> (+0.18%) ⬆️
modin/logging/config.py 94.59% <0.00%> (-1.30%) ⬇️
modin/experimental/cloud/rpyc_proxy.py 0.00% <0.00%> (ø)
modin/experimental/batch/test/test_pipeline.py 90.21% <0.00%> (ø)
modin/pandas/series.py 94.51% <0.00%> (+0.23%) ⬆️
modin/pandas/base.py 95.26% <0.00%> (+0.26%) ⬆️
...odin/core/storage_formats/pandas/query_compiler.py 96.33% <0.00%> (+0.39%) ⬆️
modin/pandas/series_utils.py 98.89% <0.00%> (+0.55%) ⬆️
modin/core/io/text/excel_dispatcher.py 94.01% <0.00%> (+0.85%) ⬆️
... and 31 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@vnlitvinov
Copy link
Collaborator

@mvashishtha CI is failing, and failures seem to be relevant to PR title, please have a look

RehanSD
RehanSD previously approved these changes Oct 7, 2022
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's merge once CI is green!

Signed-off-by: mvashishtha <mahesh@ponder.io>
@mvashishtha
Copy link
Collaborator Author

CI is failing, and failures seem to be relevant to PR title, please have a look

@vnlitvinov I think I fixed the failures, which were for HDK and MODIN_ENGINE=PYTHON. Let's see whether my fixes work in CI.

modin/pandas/test/test_io.py Show resolved Hide resolved
@@ -149,6 +149,9 @@ def read_csv(
val.name for val in inspect.signature(pandas.read_csv).parameters.values()
}
_, _, _, f_locals = inspect.getargvalues(inspect.currentframe())
# mangle_dupe_cols has no effect starting in pandas 1.5. Exclude it from
# kwargs so pandas doesn't spuriously warn people not to use it.
f_locals.pop("mangle_dupe_cols", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of other values that are also deprecated for read_csv. See: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html . Perhaps we can address them in another issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to find we weren't getting warnings for prefix and squeeze even though both were deprecated. It turns out we pop squeeze here:

squeeze = kwargs.pop("squeeze", False)

and passing the default of prefix doesn't trigger the warning. not sure why.

import pandas as pd
from io import StringIO
from pandas.api.extensions import no_default

# no warning
print(pd.read_csv(StringIO("a,b,c\n1,2,3"), header=None, prefix=no_default))
# warns
print(pd.read_csv(StringIO("a,b,c\n1,2,3"), header=None, prefix='z'))

Signed-off-by: mvashishtha <mahesh@ponder.io>
pyrito
pyrito previously approved these changes Oct 7, 2022
Copy link
Collaborator

@pyrito pyrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

pytest.warns(
FutureWarning, match="'mangle_dupe_cols' keyword is deprecated"
)
if version.parse(pandas.__version__) >= version.parse("1.5.0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to not use the PandasCompatVersion class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. I've reverted back to PandasCompatVersion for consistency with the other tests.

Signed-off-by: mvashishtha <mahesh@ponder.io>
Copy link
Collaborator

@RehanSD RehanSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but a minor question - should Modin handle processing so that the mangle_dupe_cols argument works the same in compat mode? That way folks who expect pandas 1.1 in compat mode will get the experience they expect

@mvashishtha
Copy link
Collaborator Author

should Modin handle processing so that the mangle_dupe_cols argument works the same in compat mode?

@RehanSD mangle_dupe_cols should work in compat mode. The new warning this PR adds to test_read_csv_mangle_dupe_cols should verify that.

@RehanSD
Copy link
Collaborator

RehanSD commented Oct 7, 2022

should Modin handle processing so that the mangle_dupe_cols argument works the same in compat mode?

@RehanSD mangle_dupe_cols should work in compat mode. The new warning this PR adds to test_read_csv_mangle_dupe_cols should verify that.

Ah mb - I forgot that users will have the old version of pandas installed in compat mode!

Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vnlitvinov vnlitvinov merged commit d005429 into modin-project:master Oct 10, 2022
RehanSD pushed a commit to RehanSD/modin that referenced this pull request Oct 10, 2022
RehanSD pushed a commit that referenced this pull request Oct 10, 2022
Signed-off-by: mvashishtha <mahesh@ponder.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Warning: Reading in CSV - Deprecation Warning 'mangle_dupe_cols'
4 participants