Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To string with encoding #28951

Merged
merged 32 commits into from Oct 23, 2019
Merged

Conversation

mohitanand001
Copy link
Contributor

@mohitanand001 mohitanand001 commented Oct 13, 2019

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farziengineer Thanks for the PR.

I think best to put this on hold until #28692 is merged to ensure consistency.

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
pandas/core/frame.py Outdated Show resolved Hide resolved
pandas/core/frame.py Outdated Show resolved Hide resolved
pandas/io/formats/format.py Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Oct 16, 2019

can you merge master

@mohitanand001
Copy link
Contributor Author

@simonjayhawkins please have a look.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farziengineer

The three methods to_html, to_latex and to_string have a lot of code in common and I'd prefer to not have separate tests.

although that was done in #28692, now that encoding has been added to to_html, that just leaves to_string (i.e. this PR)

can you look to add encoding in test_filepath_or_buffer_arg in test_formats.py instead of the test added here. (if you can)

otherwise can you make the test consistent with #28692

@jreback jreback added this to the 1.0 milestone Oct 18, 2019
@@ -0,0 +1,6 @@
def test_to_string_encoding(float_frame,):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all test_to_format.py, pls move there

would take a followup to split up test_to_format into 2 parts though (e.g. move the to_string out to a separate file)

@mohitanand001
Copy link
Contributor Author

Hi the CI is failing for reasons I do not understand. https://travis-ci.org/pandas-dev/pandas/jobs/600239741
It says

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohitanand001
Copy link
Contributor Author

Hi @WillAyd @jreback can you please review.

pandas/io/formats/format.py Show resolved Hide resolved
pandas/tests/io/formats/test_format.py Show resolved Hide resolved
ValueError, match="buf is not a file name and encoding is specified."
):
getattr(df, method)(buf=filepath_or_buffer, encoding=encoding)
elif encoding == "foo":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the invalid encoding; this doesn't test any function pandas provides rather just builtin Python functionality

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I asked for it to be added was so that the precedence of the Exceptions was checked and to confirm the encoding parameter was passed to the builtin function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK my mistake - just didn't see that was asked for previously (lost in GH comments)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so agree with #28951 (comment) but this sort of compensates. If we conform the encoding is passed, then reading back in is only testing Python functionality.

float_frame will probably work with any encoding, so maybe best to modify float_frame if encoding=="gbq".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should work...

diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py
index 096fc6cb4..490cecb41 100644
--- a/pandas/tests/io/formats/test_format.py
+++ b/pandas/tests/io/formats/test_format.py
@@ -73,17 +73,19 @@ def filepath_or_buffer(filepath_or_buffer_id, tmp_path):
 
 
 @pytest.fixture
-def assert_filepath_or_buffer_equals(filepath_or_buffer, filepath_or_buffer_id):
+def assert_filepath_or_buffer_equals(
+    filepath_or_buffer, filepath_or_buffer_id, encoding
+):
     """
     Assertion helper for checking filepath_or_buffer.
     """
 
     def _assert_filepath_or_buffer_equals(expected):
         if filepath_or_buffer_id == "string":
-            with open(filepath_or_buffer) as f:
+            with open(filepath_or_buffer, encoding=encoding) as f:
                 result = f.read()
         elif filepath_or_buffer_id == "pathlike":
-            result = filepath_or_buffer.read_text()
+            result = filepath_or_buffer.read_text(encoding=encoding)
         elif filepath_or_buffer_id == "buffer":
             result = filepath_or_buffer.getvalue()
         assert result == expected
@@ -3250,6 +3252,8 @@ def test_filepath_or_buffer_arg(
     filepath_or_buffer_id,
 ):
     df = float_frame
+    if encoding == "gbk":
+        float_frame.iloc[0, 0] = "造成输出中文显示乱码"
 
     if filepath_or_buffer_id not in ["string", "pathlike"] and encoding is not None:
         with pytest.raises(

pandas/tests/io/formats/test_format.py Show resolved Hide resolved
filepath_or_buffer,
assert_filepath_or_buffer_equals,
encoding,
filepath_or_buffer_id,
):
df = float_frame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use float_frame here? I think can just construct a DataFrame using the data you have on line below instead of modifying this; as is its not clear on intent to use this fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins float_frame fixture has been uniformly used in all the tests, hence it is used here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe..

diff --git a/pandas/tests/io/formats/test_format.py b/pandas/tests/io/formats/test_format.py
index 096fc6cb4..dc2784a7b 100644
--- a/pandas/tests/io/formats/test_format.py
+++ b/pandas/tests/io/formats/test_format.py
@@ -3240,16 +3240,19 @@ def test_repr_html_ipython_config(ip):
 
 
 @pytest.mark.parametrize("method", ["to_string", "to_html", "to_latex"])
-@pytest.mark.parametrize("encoding", [None, "utf-8", "gbk", "foo"])
+@pytest.mark.parametrize(
+    "encoding, data",
+    [(None, "abc"), ("utf-8", "abc"), ("gbk", "造成输出中文显示乱码"), ("foo", "abc")],
+)
 def test_filepath_or_buffer_arg(
-    float_frame,
     method,
     filepath_or_buffer,
     assert_filepath_or_buffer_equals,
     encoding,
+    data,
     filepath_or_buffer_id,
 ):
-    df = float_frame
+    df = DataFrame([data])
 
     if filepath_or_buffer_id not in ["string", "pathlike"] and encoding is not None:
         with pytest.raises(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, I'll make the changes and push.

pandas/tests/io/formats/test_format.py Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Oct 22, 2019

lgtm. merge when @WillAyd and @simonjayhawkins good.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can address outstanding suggestion from @simonjayhawkins on float_frame replacement then lgtm

pandas/tests/io/formats/test_format.py Show resolved Hide resolved
@mohitanand001
Copy link
Contributor Author

@WillAyd updated the branch with the required changes, please have a look.

else:
expected = getattr(df, method)()
getattr(df, method)(buf=filepath_or_buffer, encoding=encoding)
assert_filepath_or_buffer_equals(expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this to be a fixture instead of just a global function? This way of invoking the function seems very magical; I think easier if not a fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I make this change as a part of this PR itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm sorry thought it was a part of this PR. I think OK to do here but let's see what @simonjayhawkins thinks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine as is.

@WillAyd WillAyd merged commit ee59b7d into pandas-dev:master Oct 23, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 23, 2019

Thanks @farziengineer

@mohitanand001
Copy link
Contributor Author

Thanks @simonjayhawkins @WillAyd for all the reviews.

HawkinsBA pushed a commit to HawkinsBA/pandas that referenced this pull request Oct 29, 2019
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_string() does not have an "encoding" parameter.
4 participants