Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ImportError: Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl. #41715

Closed
ranabanik opened this issue May 29, 2021 · 18 comments
Labels
Bug Needs Info Clarification about behavior needed to assess issue Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ranabanik
Copy link

I am running the following code from Jupyter notebook.

dataDir =  r'D:\\'
files = glob(os.path.join(dataDir, '*.xlsx'))
print(files)
if os.path.isfile(files[0]):
    print('ok')
df = pd.read_excel(files[0], engine='openpyxl')

which prints:
['D:\\\\file_index_all.xlsx', 'D:\\\\file_index_all2.xlsx']
ok

But I get the following error.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-52-5dc0ef3ce47b> in <module>
      6 if os.path.isfile(files[0]):
      7     print('ok')
----> 8 df = pd.read_excel(files[0], engine='openpyxl')

c:\users\ranab\miniconda3\lib\site-packages\pandas\io\excel\_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

c:\users\ranab\miniconda3\lib\site-packages\pandas\io\excel\_base.py in __init__(self, io, engine)
    819         self._io = stringify_path(io)
    820 
--> 821         self._reader = self._engines[engine](self._io)
    822 
    823     def __fspath__(self):

c:\users\ranab\miniconda3\lib\site-packages\pandas\io\excel\_openpyxl.py in __init__(self, filepath_or_buffer)
    482             Object to be parsed.
    483         """
--> 484         import_optional_dependency("openpyxl")
    485         super().__init__(filepath_or_buffer)
    486 

c:\users\ranab\miniconda3\lib\site-packages\pandas\compat\_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
     90     except ImportError:
     91         if raise_on_missing:
---> 92             raise ImportError(msg) from None
     93         else:
     94             return None

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

pandas version:
pd.__version__
1.0.1
Also one other thing I noticed despite having openpyxl 3.0.7 I can't import it in jupyter notebook but in pycharm

import openpyxl
print(openpyxl.__version__)

shows 3.0.7

@ranabanik ranabanik added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2021
@phofl
Copy link
Member

phofl commented May 29, 2021

Please show Output of pd.show_versions()

@phofl phofl added the Needs Info Clarification about behavior needed to assess issue label May 29, 2021
@ranabanik
Copy link
Author

running from Jupyter notebook

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
machine          : AMD64
processor        : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : None.None

pandas           : 1.0.1
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 45.2.0.post20200210
Cython           : None
pytest           : 5.3.5
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.1
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.3.5
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : 0.48.0

It shows openpyxl as None. But when I run pip list from git bash:

$ pip list
Package            Version
------------------ -------------------
attrs              21.2.0
certifi            2020.12.5
decorator          5.0.9
et-xmlfile         1.1.0
importlib-metadata 4.3.1
ipython-genutils   0.2.0
jdcal              1.4.1
mkl-fft            1.3.0
mkl-random         1.1.1
mkl-service        2.3.0
numpy              1.19.2
openpyxl           3.0.7
pandas             1.1.5
pip                21.1.1
pyrsistent         0.17.3
python-dateutil    2.8.1
pytz               2021.1
pywin32            300
setuptools         52.0.0.post20210125
six                1.15.0
tornado            6.1
traitlets          4.3.3
typing-extensions  3.10.0.0
wcwidth            0.2.5
wheel              0.36.2
wincertstore       0.2
xlrd               1.0.0
zipp               3.4.1

It has openpyxl version 3.0.7 installed.

Also pycharm shows the same version too.

@phofl
Copy link
Member

phofl commented May 29, 2021

Are you using a different environment for jupyter?

@ranabanik
Copy link
Author

Hello, just to provide you more information, I ran the .ipynb file from pycharm. I get the following while trying to run a cell.
image

@phofl
Copy link
Member

phofl commented May 29, 2021

Looks like you are using different environments

@ranabanik
Copy link
Author

That's also what I get from my understanding. But I have only one env named Elucid36

base                     C:\Users\ranab\Miniconda3
Elucid36              *  C:\Users\ranab\Miniconda3\envs\Elucid36

I am initiating the jupyter notebook from that env. How does it change to different env.

@phofl
Copy link
Member

phofl commented May 29, 2021

You have two envs, base and the other one.

@ranabanik
Copy link
Author

I am running the notebook as following in git bash:

(Elucid36)ranab@banikr MINGW64 ~/PycharmProjects/Elucid (master)
$ jupyter notebook
[I 12:09:13.514 NotebookApp] Serving notebooks from local directory: C:\Users\ranab\PycharmProjects\Elucid

Yeah base and the other. But I conda activate Elucid36 before running notebook in that.

@phofl
Copy link
Member

phofl commented May 29, 2021

And what is active in Pycharm? can you run pd.show_versions() for both please?

@ranabanik
Copy link
Author

This is the pd.show_versions() from pycharm:

INSTALLED VERSIONS
------------------
commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
python           : 3.6.13.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.17763
machine          : AMD64
processor        : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 1.1.5
numpy            : 1.19.2
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 21.1.1
setuptools       : 52.0.0.post20210125
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : 3.0.7
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.0.0
xlwt             : None
numba            : None
None

It has openpyxl version 3.0.7 which is None in notebook(I just checked openpyxl package only)

@ranabanik
Copy link
Author

python and pandas versions are also different. This never happened to me. Looks like both IDEs operating in different environments despite I started notebook one from the same env.

@phofl
Copy link
Member

phofl commented May 29, 2021

Can close here then?

@ranabanik
Copy link
Author

ok. Thanks for the help but the problem persists.

@phofl
Copy link
Member

phofl commented May 29, 2021

Yeah but I don't think this is on the pandas side, isn't it?

@ranabanik
Copy link
Author

To some extent, I think pandas removal of xlrd engine in new versions resulting in these types of errors. I have checked a bunch of issues. I am not entirely sure it is only an 'env' issue. But anyway I will try removing all the env and run the same checks.

@phofl
Copy link
Member

phofl commented May 29, 2021

Yeah but you haven‘t installed openpyxl in the env, so nothing we can do about from our side

@ArthurHowardMorris
Copy link

ArthurHowardMorris commented Mar 21, 2024

@phofl: I was able to replicate a similar error when answering this stack overflow question: https://stackoverflow.com/questions/78154244/unable-to-update-to-latest-openpyxl-version-3-1-or-later-using-anaconda-prompt/78197620#78197620

The issue in this case appears to be that:
conda create -n demo python=3.11 anaconda installs openpyxl 3.0.10, so I needed to follow with: conda install openpyxl=3.1.0 in order to correct the behavior, even though I had 'installed openpyxl in the env'.

Note: I was answering the question, because I was having the same problem :)

@ArthurHowardMorris
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Info Clarification about behavior needed to assess issue Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants