Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars + fastexcel fails to load excel file #14388

Closed
2 tasks done
durgeksh opened this issue Feb 9, 2024 · 3 comments
Closed
2 tasks done

Polars + fastexcel fails to load excel file #14388

durgeksh opened this issue Feb 9, 2024 · 3 comments
Labels
A-io-excel Area: reading/writing Excel files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@durgeksh
Copy link

durgeksh commented Feb 9, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

dfs = pl.read_excel('input.xlsx', engine='calamine', sheet_id=0)
for key in dfs.keys():
    print(dfs[key].head())

Log output

Traceback (most recent call last):
  File "/Users/Desktop/workspace/pocs/demo.py", line 27, in <module>
    dfs = pl.read_excel('compass_input.xlsx', engine='calamine', sheet_id=0)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/utils/deprecation.py", line 136, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/utils/deprecation.py", line 136, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 259, in read_excel
    return _read_spreadsheet(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 487, in _read_spreadsheet
    parsed_sheets = {
                    ^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 488, in <dictcomp>
    name: reader_fn(
          ^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 834, in _read_spreadsheet_calamine
    df = ws.to_polars()
         ^^^^^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/fastexcel/__init__.py", line 64, in to_polars
    df = pl.from_arrow(data=self.to_arrow())
                            ^^^^^^^^^^^^^^^
  File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/fastexcel/__init__.py", line 47, in to_arrow
    return self._sheet.to_arrow()
           ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Could not create RecordBatch from sheet Hybrid_Productivity

Caused by:
    0: Could not build schema for sheet Hybrid_Productivity
    1: Error in calamine cell: NA

Issue description

Polars + fastexcel fails to load an input file. Same file I tried with Pandas + python-calamine it loads fine. Difference I see between two is binding library used underneath.

Expected behavior

Polars + fastexcel should be able to load the input without any issue.

Installed versions

--------Version info---------
Polars:               0.20.7
Index type:           UInt32
Platform:             macOS-12.7.3-arm64-arm-64bit
Python:               3.11.2 (v3.11.2:878ead1ac1, Feb  7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.0
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
None

Process finished with exit code 0

@durgeksh durgeksh added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 9, 2024
@stinodego stinodego added the A-io-excel Area: reading/writing Excel files label Feb 9, 2024
@avimallu
Copy link
Contributor

avimallu commented Feb 9, 2024

You may need to provide a reproducible example (not necessarily your original Excel, just something that causes the same problem).

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Feb 9, 2024

If you can raise an issue over with the fastexcel folks (preferably with a reproducible example), that would be great, thanks. We are using that library, but we are not its authors/developers ;)

You can find their issues page here:
https://github.com/ToucanToco/fastexcel/issues

@durgeksh
Copy link
Author

Thank you @alexander-beedie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-excel Area: reading/writing Excel files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

4 participants