Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Add type check for encoding_errors in pd.read_csv #59075

Merged
merged 19 commits into from
Jun 27, 2024

Conversation

Aliebc
Copy link
Contributor

@Aliebc Aliebc commented Jun 23, 2024

@@ -1413,6 +1413,13 @@ def _make_engine(
raise ValueError(
f"Unknown engine: {engine} (valid options are {mapping.keys()})"
)

errors = self.options.get("encoding_errors", "strict")
if not isinstance(errors, str) and errors is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not isinstance(errors, str) and errors is not None:
if not isinstance(errors, str):

@@ -1413,6 +1413,13 @@ def _make_engine(
raise ValueError(
f"Unknown engine: {engine} (valid options are {mapping.keys()})"
)

errors = self.options.get("encoding_errors", "strict")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this logic into _read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have moved it into _read

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a whatsnew note in v3.0.0.rst and a unit test?

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv labels Jun 24, 2024
@Aliebc Aliebc requested a review from mroeschke June 25, 2024 13:24
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a unit test?

Aliebc and others added 6 commits June 26, 2024 01:25
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
@Aliebc Aliebc requested a review from mroeschke June 27, 2024 02:00
@mroeschke
Copy link
Member

Could you add a unit test to validate the exception message?

@Aliebc
Copy link
Contributor Author

Aliebc commented Jun 27, 2024

Could you add a unit test to validate the exception message?

Ok, I added a test.

@@ -590,6 +590,24 @@ def test_encoding_errors(encoding_errors, format):
tm.assert_frame_equal(df, expected)


@pytest.mark.parametrize("encoding_errors", [0, None, "strict"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to test "strict" here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Aliebc and others added 4 commits June 27, 2024 23:16
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
@Aliebc Aliebc requested a review from mroeschke June 27, 2024 16:25
@mroeschke mroeschke added this to the 3.0 milestone Jun 27, 2024
@mroeschke mroeschke merged commit 10d3615 into pandas-dev:main Jun 27, 2024
45 checks passed
@mroeschke
Copy link
Member

Thanks @Aliebc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Pandas does not validate some parameters properly when reading CSVs and it causes segmentation faults
2 participants