Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: True cannot be cast to bool in read_excel #58994

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -547,6 +547,7 @@ I/O
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype=pd.BooleanDtype.name``. (:issue:`58159`)
- Bug in :meth:`read_excel` raising ``ValueError`` when passing array of boolean values when ``dtype="boolean"``. (:issue:`58159`)

- Bug in :meth:`read_stata` raising ``KeyError`` when input file is stored in big-endian format and contains strL data. (:issue:`58638`)

Period
Expand Down
8 changes: 7 additions & 1 deletion pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,9 @@ def _cast_types(self, values: ArrayLike, cast_type: DtypeObj, column) -> ArrayLi
elif isinstance(cast_type, ExtensionDtype):
array_type = cast_type.construct_array_type()
try:
if isinstance(cast_type, BooleanDtype):
if isinstance(cast_type, BooleanDtype) and all(
isinstance(value, str) for value in values
):
# error: Unexpected keyword argument "true_values" for
# "_from_sequence_of_strings" of "ExtensionArray"
return array_type._from_sequence_of_strings( # type: ignore[call-arg]
Expand All @@ -751,6 +753,10 @@ def _cast_types(self, values: ArrayLike, cast_type: DtypeObj, column) -> ArrayLi
true_values=self.true_values,
false_values=self.false_values,
)
elif isinstance(cast_type, BooleanDtype) and all(
isinstance(value, bool) for value in values
):
return values
else:
return array_type._from_sequence_of_strings(values, dtype=cast_type)
except NotImplementedError as err:
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/io/excel/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,17 @@ def xfail_datetimes_with_pyxlsb(engine, request):


class TestReaders:
def test_read_excel_type_check(self):
# GH 58159
df = DataFrame({"bool_column": [True]}, dtype=pd.BooleanDtype.name)
Copy link
Contributor

@asishm asishm Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since boolean is a nullable EA, can you also add None as a value in this column? the assert can also probably change to tm.assert_frame_equal(df, df2)

df.to_excel("test-type.xlsx", index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the tmp_excel fixture for this path?

df2 = pd.read_excel(
"test-type.xlsx",
dtype={"bool_column": pd.BooleanDtype.name},
engine="openpyxl",
)
assert all(isinstance(val, bool) for val in df2["bool_column"])

@pytest.fixture(autouse=True)
def cd_and_set_engine(self, engine, datapath, monkeypatch):
"""
Expand Down