Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: True cannot be cast to bool in read_excel #58994

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

rmhowe425
Copy link
Contributor

@rmhowe425 rmhowe425 commented Jun 13, 2024

@rmhowe425
Copy link
Contributor Author

@rhshadrach pinging on green

@@ -164,6 +164,17 @@ def xfail_datetimes_with_pyxlsb(engine, request):


class TestReaders:
def test_read_excel_type_check(self):
# GH 58159
df = DataFrame({"bool_column": [True]}, dtype=pd.BooleanDtype.name)
Copy link
Contributor

@asishm asishm Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since boolean is a nullable EA, can you also add None as a value in this column? the assert can also probably change to tm.assert_frame_equal(df, df2)

@rmhowe425 rmhowe425 requested a review from asishm June 15, 2024 23:41
@rmhowe425
Copy link
Contributor Author

@rhshadrach @asishm

PR ready for review. Not sure what's going on with the lone failing test. It's passing locally. Not much info from the test's output to go on.
image

Copy link
Contributor

@asishm asishm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failure is unrelated - #59018 (comment)

just one comment - may not even be relevant.

Comment on lines +750 to 753
values_str,
dtype=cast_type,
true_values=self.true_values,
false_values=self.false_values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none_values is not passed anything here. Can the equivalent na_values and keep_default_na be passed in here, or is it irrelevant?

Copy link
Contributor Author

@rmhowe425 rmhowe425 Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you're saying. You're suggesting to propagate na_values and keep_default_na from read_excel() to _cast_types()?

I think that's a good idea. I'll make that change. That should allow me to get rid of _NONE_VALUES in boolean.py, and allow me to dynamically handle any values that equate to None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those params are already available in _cast_types since it's an instance method (so self.na_values and self.keep_default_na are available here, similar to self.true_values and self.false_values). I was thinking about progagating them to _from_sequence_of_strings()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we're on the same page. Making those changes now

@rmhowe425
Copy link
Contributor Author

@asishm @rhshadrach PR is ready for review

@rmhowe425 rmhowe425 requested a review from asishm June 17, 2024 02:22
@@ -547,6 +547,7 @@ I/O
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`)
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)

@@ -297,7 +297,13 @@ class BooleanArray(BaseMaskedArray):
"""

_TRUE_VALUES = {"True", "TRUE", "true", "1", "1.0"}
_FALSE_VALUES = {"False", "FALSE", "false", "0", "0.0"}
_FALSE_VALUES = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you undo the formatting here?

def test_read_excel_type_check(self, col):
# GH 58159
df = DataFrame({"bool_column": col}, dtype=pd.BooleanDtype.name)
df.to_excel("test-type.xlsx", index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the tmp_excel fixture for this path?


def test_pass_none_type(self):
# GH 58159
with pd.ExcelFile("test7.xlsx") as excel:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the datapath fixture to access test7.xlsx. Also could you give a test7.xlsx a better name reflective of the contents?

@mroeschke mroeschke added the IO Excel read_excel, to_excel label Jun 18, 2024
@pytest.mark.parametrize("col", [[True, None, False], [True], [True, False]])
def test_read_excel_type_check(self, col):
# GH 58159
df = DataFrame({"bool_column": col}, dtype=pd.BooleanDtype.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df = DataFrame({"bool_column": col}, dtype=pd.BooleanDtype.name)
df = DataFrame({"bool_column": col}, dtype="boolean")

And could you change other usages here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: True cannot be cast to bool
3 participants