Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getiterator deprecated in Python 3.9; failure to call pd.read_excel() #37795

Closed
zachgilbert97 opened this issue Nov 12, 2020 · 10 comments
Closed
Labels
IO Excel read_excel, to_excel

Comments

@zachgilbert97
Copy link

I've read here https://stackoverflow.com/a/64744395 that getiterator has been deprecated in Python 3.9.0. My script ran without issues until upgrading Python to 3.9, and now I get the following error:

Traceback (most recent call last):
  File "/Users/user/Documents/Programming/MyProject/MyProject.py", line 551, in <module>
    df = pd.read_excel(excel_filename, excel_sheetname)
    ...
AttributeError: 'ElementTree' object has no attribute 'getiterator'

Running Python 3.9.0, Pandas 1.1.4

@arw2019
Copy link
Member

arw2019 commented Nov 12, 2020

Could you provide a copy-pastable example?

@twoertwein
Copy link
Member

@zilbert97 Do you know which engine pandas is using for your file? If it is xlrd, #28547 seems related:

xlrd is unmaintained and the previous maintainer has asked us to move towards openpyxl. xlrd works now, but might have some issues when Python 3.9 or later gets released and changes some elements of the XML parser, as default usage right now throws a PendingDeprecationWarning

If xlrd is no longer working with python 3.9, it would be good to address that before 1.2 is released @WillAyd

@zachgilbert97
Copy link
Author

Could you provide a copy-pastable example?

Sure! I hope this is what you mean - simply running two lines at the command line can reproduce the error.

Python 3.8.1 works fine (created a virtual environment):

import pandas as pd
df = pd.read_excel('ZipZag Trolley Monitoring.xlsx')

But Python 3.9.0 does not work:

import pandas as pd
df = pd.read_excel('ZipZag Trolley Monitoring.xlsx')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandas/util/_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandas/io/excel/_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandas/io/excel/_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/xlrd/__init__.py", line 130, in open_workbook
    bk = xlsx.open_workbook_2007_xml(
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/xlrd/xlsx.py", line 812, in open_workbook_2007_xml
    x12book.process_stream(zflo, 'Workbook')
  File "/Users/ZG/.pyenv/versions/3.9.0/lib/python3.9/site-packages/xlrd/xlsx.py", line 266, in process_stream
    for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
AttributeError: 'ElementTree' object has no attribute 'getiterator'

@zilbert97 Do you know which engine pandas is using for your file? If it is xlrd, #28547 seems related:

It looks like it is xlrd that is being used - thanks for linking the issue!

@twoertwein
Copy link
Member

Does it work with with the openpyxl engine?

@twoertwein twoertwein added the IO Excel read_excel, to_excel label Nov 13, 2020
@AllanChain
Copy link

AllanChain commented Nov 13, 2020

Same problem here. openpyxl works fine. I think it's time to make openpyxl default 😄 Tying engine='openpyxl' everytime reading an excel file is quite verbose.

@jreback
Copy link
Contributor

jreback commented Nov 13, 2020

there is a PR outstanding to do this just needs some TLC

@ghost
Copy link

ghost commented Nov 20, 2020

Also interesting is that even if openpyxl is installed, and xlrd is not installed, I still get the following ImportError if I don't specify the engine:

ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

@twoertwein
Copy link
Member

#35029 was just merged: the default engine is in most cases now openpyxl and it is acknowledged that xlrd does not work with python>=3.9.

@zhangyiay
Copy link

请参考https://blog.csdn.net/suhao0911/article/details/110950742

@Svtter
Copy link

Svtter commented Jun 23, 2021

#35029 was just merged: the default engine is in most cases now openpyxl and it is acknowledged that xlrd does not work with python>=3.9.

It might be a wrong answer. Python==3.9.5 and xlrd==1.2.0 works well on my PC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

7 participants