Skip to content

Conversation

twoertwein
Copy link
Member

@twoertwein twoertwein commented Jan 6, 2021

Should 1.3 use errors='replace' when no encoding/errors are specified or use errors=None (strict)?

@phofl
Copy link
Member

phofl commented Jan 6, 2021

I would say replace only when encoding is Not set and inferred to utf-8.

This should go back to 1.2.x

@simonjayhawkins simonjayhawkins added the IO CSV read_csv, to_csv label Jan 6, 2021
@simonjayhawkins simonjayhawkins added this to the 1.2.1 milestone Jan 6, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests!

@twoertwein
Copy link
Member Author

Should a follow-up PR intentionally revert this PR for 1.3? Personally, I would want to know when read_csv fails to read certain characters (errors = None by default).

@phofl
Copy link
Member

phofl commented Jan 7, 2021

I don't think that content from skipped rows should matter and it should raise no error if the rest of the file is ok. Maybe you could add a test withouth skiprows, this raises on 1.1.5 which is good and should keep raising

@@ -553,8 +553,10 @@ def get_handle(
Returns the dataclass IOHandles
"""
# Windows does not default to utf-8. Set to utf-8 for a consistent behavior
encoding_not_specified = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead why don't you

encoding_passed, encoding = encoding, encoding or 'utf-i'

and then you can test encoding_passed is not None its the same as you have but i think a bit more natural

@twoertwein
Copy link
Member Author

I thought errors is exposed in read_csv. It isn't. In that case it makes sense to keep 'replace' as the default.

I don't think it is feasible to ignore encoding errors only for skipped rows. (We could read everything in binary mode but we would need to decode it to determine line endings.)

@phofl
Copy link
Member

phofl commented Jan 7, 2021

This errors parameter is Not exposed

@twoertwein
Copy link
Member Author

yes, it isn't exposed. Sorry, that is what I meant to say :)

@jreback jreback merged commit 89ddd8a into pandas-dev:master Jan 7, 2021
@jreback
Copy link
Contributor

jreback commented Jan 7, 2021

thanks @twoertwein

@jreback
Copy link
Contributor

jreback commented Jan 7, 2021

@meeseeksdev backport 1.2.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jan 7, 2021

Something went wrong ... Please have a look at my logs.

@phofl
Copy link
Member

phofl commented Jan 7, 2021

Thanks

jreback pushed a commit that referenced this pull request Jan 7, 2021
…ot specified (#39021)

Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_csv raising when null bytes are in skipped rows
4 participants