-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat[python]: Add support for reading non-utf8 encoded CSV files. (#4464
) Add support for reading non-utf8 encoded CSV files in read_csv. - Decode CSV files when encoding is not set to "utf8" or "utf8-lossy" in _prepare_file_arg if use_pyarrow=False. In that case, decoding is done by python, so fast path readers, of Polars are not used. If use_pyarrow=True, pass the encoding parameter directly to pa.csv.read_csv. Other fixes/features: - Return BytesIO object for bytes input in _prepare_file_arg when using pyarrow (read_csv, read_ipc, read_parquet) as pyarrow only works with file like objects. - Check if eol_char argument value for read_csv is 1 byte. - Add quote_char support for read_csv: - Expand test_csv_quote_char: Check if fields surrounded by quotes keep the quotes with quote_char=None, both when reading with polars and with pyarrow. - Do no check the value of parse_dates when use_pyarrow=True as date parsing can't be disabled.
- Loading branch information
Showing
3 changed files
with
148 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters