Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLRDError from most recent xlrd #88

Closed
berland opened this issue Dec 11, 2020 · 9 comments
Closed

XLRDError from most recent xlrd #88

berland opened this issue Dec 11, 2020 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@berland
Copy link
Collaborator

berland commented Dec 11, 2020

Crash in github actions, due to updated upstream library.

>       snorrebergdesign = summarize_design(
            testdir + "/data/sensitivities/distributions/" + "design.xlsx", "DesignSheet01"
        )

tests/test_calc_tornado.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/fmu/tools/sensitivities/_designsummary.py:56: in summarize_design
    dgn = pd.read_excel(filename, sheetname)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pandas/util/_decorators.py:296: in wrapper
    return func(*args, **kwargs)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pandas/io/excel/_base.py:304: in read_excel
    io = ExcelFile(io, engine=engine)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pandas/io/excel/_base.py:867: in __init__
    self._reader = self._engines[engine](self._io)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pandas/io/excel/_xlrd.py:22: in __init__
    super().__init__(filepath_or_buffer)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pandas/io/excel/_base.py:353: in __init__
    self.book = self.load_workbook(filepath_or_buffer)
/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/pandas/io/excel/_xlrd.py:37: in load_workbook
    return open_workbook(filepath_or_buffer)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

filename = '/home/runner/work/fmu-tools/fmu-tools/tests/data/sensitivities/distributions/design.xlsx'
logfile = <_io.TextIOWrapper name="<_io.FileIO name=6 mode='rb+' closefd=True>" mode='r+' encoding='utf-8'>
verbosity = 0, use_mmap = True, file_contents = None, encoding_override = None
formatting_info = False, on_demand = False, ragged_rows = False
ignore_workbook_corruption = False

  
        # We have to let unknown file formats pass through here, as some ancient
        # files that xlrd can parse don't start with the expected signature.
        if file_format and file_format != 'xls':
>           raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
E           xlrd.biffh.XLRDError: Excel xlsx file; not supported

/opt/hostedtoolcache/Python/3.8.6/x64/lib/python3.8/site-packages/xlrd/__init__.py:170: XLRDError
=========================== short test summary info ============================
FAILED tests/test_calc_tornado.py::test_designsummary - xlrd.biffh.XLRDError:...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
============================== 1 failed in 2.99s ===============================
@berland berland added the bug Something isn't working label Dec 11, 2020
@tralsos
Copy link
Collaborator

tralsos commented Dec 11, 2020

Seems like pandas read_excel should not use xlrd to read excel files since it does no longer support reading of xlsx. I can specify engine "openpyxl" in the read_excel commands, or do you have other suggestions @berland ?

@ChrisHaddy
Copy link

@berland I wanted to let you know I had encountered this issue in my github actions script as well. I fixed it by downgrading to their 1.2.0 release.

So we now specify a version of xlrd to pip install in the container, rather than the latest version.

here is a snippet:

  - name: Checkout
    uses: actions/checkout@v1

  - name: Install python packages
    run: pip install xlrd==1.2.0 boto3

You can see here they produced their first release in 2 years. Which is what broke our pipeline.
https://pypi.org/project/xlrd/#history

@rohit-imt
Copy link

@berland I was facing the same issue and downgrading xlrd to 1.2.0 release fixed the issue. Cheers 👍

@berland
Copy link
Collaborator Author

berland commented Dec 14, 2020

I would prefer @tralsos's suggestion. We are using xlrd/openpyxl through Pandas and it will be fixed there as well, pandas-dev/pandas#38424.

There should be tests for both xls and xlsx format.

@cjw296
Copy link

cjw296 commented Dec 14, 2020

As the author of xlrd, please can I ask you not to use version 1.2 due to the potential security vulnerabilities it contains.
The correct change here is to change dgn = pd.read_excel(filename, sheetname) to be dgn = pd.read_excel(filename, sheetname, engine='openpyxl').

@berland
Copy link
Collaborator Author

berland commented Dec 14, 2020

See also this: equinor/pyscal#222

@exeptionerror
Copy link

@tralsos
Copy link
Collaborator

tralsos commented Dec 21, 2020

There seems to be a difference in how cells with no content except formatting, e.g. background colour is/was handled in xlrd and with openpyxl as engine. If the user has done background colouring of whole rows, it will result in additional columns with e.g 'Unnamed: 10' as header and NaN values) when using pandas.read_excel with engine 'openpyxl'. I cannot see there is an option to not read cell formatting here, so I guess I will have to do a subsequent check and drop these columns/rows.

@exeptionerror
Copy link

Here Is All Possible Solution Added Please Look At Here [Solved] xlrd.biffh.XLRDError: Excel xlsx file; not supported in python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants