Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: prefer user-provided mode #39440

Merged
merged 2 commits into from
Jan 28, 2021
Merged

REGR: prefer user-provided mode #39440

merged 2 commits into from
Jan 28, 2021

Conversation

twoertwein
Copy link
Member

@twoertwein twoertwein commented Jan 27, 2021

follow-up to #39253

Make sure that the user-provided mode is always preferred (if it contains 't' or 'b') and extend list of text-only classes.

@jreback jreback added the IO CSV read_csv, to_csv label Jan 28, 2021
@jreback jreback added this to the 1.2.2 milestone Jan 28, 2021
@@ -857,12 +857,15 @@ def file_exists(filepath_or_buffer: FilePathOrBuffer) -> bool:

def _is_binary_mode(handle: FilePathOrBuffer, mode: str) -> bool:
"""Whether the handle is opened in binary mode"""
# specified by user
if "t" in mode or "b" in mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is t here? is this tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"t" is used to explicitly request text-mode but it usually omitted. One of read_json or to_json is using it. I will add a test with a handle that needs that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk

@jreback
Copy link
Contributor

jreback commented Jan 28, 2021

@twoertwein ping when ready here (merge master as well)

@twoertwein
Copy link
Member Author

I assume that the occasional errors in test_server_and_default_headers can be fixed with pytest --dist=loadfile?

@jreback
Copy link
Contributor

jreback commented Jan 28, 2021

I assume that the occasional errors in test_server_and_default_headers can be fixed with pytest --dist=loadfile?

yeah that makes sense. separate PR though

@jreback jreback merged commit 5dfca7a into pandas-dev:master Jan 28, 2021
@jreback
Copy link
Contributor

jreback commented Jan 28, 2021

thanks @twoertwein

@jreback
Copy link
Contributor

jreback commented Jan 28, 2021

@meeseeksdev backport 1.2.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jan 28, 2021

Something went wrong ... Please have a look at my logs.

jreback pushed a commit that referenced this pull request Jan 29, 2021
Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com>
@roeps
Copy link

roeps commented Feb 23, 2021

Perhaps I'm misusing the API, but this seems to have broken the ability to write gzipped JSON. I haven't fully walked the code path, but here is what I'm seeing:

Reproduction:

from io import BytesIO
import pandas as pd

dataframe = pd.DataFrame([1, 2, 3], columns=['a'])
object_stream = BytesIO()
dataframe.to_json(object_stream, compression='gzip', orient='records', lines=True)

Result:

        if path_or_buf is not None:
            # apply compression and byte/text conversion
            with get_handle(
                path_or_buf, "wt", compression=compression, storage_options=storage_options
            ) as handles:
>               handles.handle.write(s)
E               TypeError: a bytes-like object is required, not 'str'

../../.env/lib/python3.7/site-packages/pandas/io/json/_json.py:105: TypeError

From what I can tell, _json's get_handle uses a static 'wt' mode which now, with the latest changes to common's _is_binary_mode, will always return false rather than check the path_or_buf instance type, thereby omitting the b flag on the mode passed to the rest of get_handle.

Is this expected behavior/am I misusing the Pandas API? Not the reproduction code works with Pandas 1.2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: V1.2 DataFrame.to_csv() fails to write a file with codecs
3 participants