New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save_as from xlsx to csv: error array index out of range #20

Closed
Rehgan opened this Issue Jan 26, 2016 · 16 comments

Comments

Projects
None yet
3 participants
@Rehgan

Rehgan commented Jan 26, 2016

#!/usr/bin/env python
import pyexcel
import os
from pyexcel.ext import xlsx
from pyexcel.ext import xls

pyexcel.save_as(file_name="Test2.xlsx", dest_file_name="Test2.csv")

When I tried to use the code above to save the attached file as a csv file in python 3, the following error occurred:
IndexError: array index out of range
Test2.xlsx

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

Hi,
There is something about the xlsx file that is causing the error.
I made a fresh one and the error is gone.

new xlsx:
Test2_edited.xlsx

output csv:
output.zip

@chfw

This comment has been minimized.

Member

chfw commented Jan 26, 2016

For Test2.xlsx, it seems to work with latest tool setup:

>>> import pyexcel, pyexcel.ext.xlsx
>>> pyexcel.save_as(file_name="Test2.xlsx", dest_file_name="Test3.csv")
/../python2.7/site-packages/openpyxl/workbook/names/named_range.py:121: UserWarning: Discarded range with reserved name
  warnings.warn("Discarded range with reserved name")

Here's what I got:

cat Test3.csv 
Run Date,Network,,Day,,Length,,Time,Program,,,,,,Copy,,,,,,,Amount
9/1/2015,N/A,,Tue,,15    ,,19:13,N/A,,,,,,N/A,,,,,,,20.00

Here's my pip packages:

pyexcel==0.2.0
pyexcel-io==0.1.0
pyexcel-xlsx==0.1.0

For Test2_edited.xlsx, it was better:

>>> import pyexcel,pyexcel.ext.xlsx
>>> pyexcel.save_as(file_name="Test2_edited.xlsx", dest_file_name="test4.csv")
>>> 

$ cat test4.csv 
Run Date,Network,,Day,,Length,,Time,Program,,,,,,Copy,,,,,,,Amount
9/1/2015,N/A,,Tue,,15    ,,19:13,N/A,,,,,,N/A,,,,,,,20.00

By the way, for xlsx, you need not to import pyexcel.ext.xls.

@chfw chfw closed this Jan 26, 2016

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

Hi chfw,
Thanks again for pyexcel.

I updated all the pyexcel packages,
pyexcel==0.2.0
pyexcel-io==0.1.0
pyexcel-xls==0.1.0
pyexcel-xlsx==0.1.0

It should be noted that using the original code and the original xlsx file,

#!/usr/bin/env python
import pyexcel
import os
from pyexcel.ext import xlsx
from pyexcel.ext import xls

pyexcel.save_as(file_name="Test2.xlsx", dest_file_name="Test2.csv")

we still get

Traceback (most recent call last):
  File "pyex_001.py", line 7, in <module>
    pyexcel.save_as(file_name="Test2.xlsx", dest_file_name="Test2.csv")
  File "D:\Tools\Python34\lib\site-packages\pyexcel\sources\__init__.py", line 318, in save_as
    sheet.save_to(dest_source)
  File "D:\Tools\Python34\lib\site-packages\pyexcel\sheets\sheet.py", line 20, in save_to
    source.write_data(self)
  File "D:\Tools\Python34\lib\site-packages\pyexcel\sources\file.py", line 80, in write_data
    **self.keywords)
  File "D:\Tools\Python34\lib\site-packages\pyexcel_io\__init__.py", line 296, in save_data
    **keywords)
  File "D:\Tools\Python34\lib\site-packages\pyexcel_io\__init__.py", line 260, in store_data
    writer.write(data)
  File "D:\Tools\Python34\lib\site-packages\pyexcel_io\base.py", line 290, in write
    sheet.write_array(sheet_dicts[name])
  File "D:\Tools\Python34\lib\site-packages\pyexcel_io\base.py", line 252, in write_array
    for r in table:
  File "D:\Tools\Python34\lib\site-packages\pyexcel_io\base.py", line 100, in to_array
    cell_value = self.cell_value(r, c)
  File "D:\Tools\Python34\lib\site-packages\pyexcel_xls\__init__.py", line 99, in cell_value
    cell_type = self.native_sheet.cell_type(row, column)
  File "D:\Tools\Python34\lib\site-packages\xlrd\sheet.py", line 413, in cell_type
    return self._cell_types[rowx][colx]
IndexError: array index out of range

then, excluding the import, as you indicated,

#!/usr/bin/env python
import pyexcel
import os
from pyexcel.ext import xlsx

pyexcel.save_as(file_name="Test2.xlsx", dest_file_name="Test2.csv")

the output is,

D:\Tools\Python34\lib\site-packages\openpyxl-2.2.2-py3.4.egg\openpyxl\workbook\names\named_range.py:121: UserWarning: Discarded range with reserved name
  warnings.warn("Discarded range with reserved name")

and, despite the warning, the output is OK!.

So it appears that including the import,

from pyexcel.ext import xls

pyexcel.xlx lib tries to process the xlsx file and ultimately calls xlrd and that is when the error occurs.
Without the import, the processing of the xlsx is handled by openpyxl and it successfully produces the output but with a warning.

@chfw

This comment has been minimized.

Member

chfw commented Jan 26, 2016

Aha.. python 3. I haven't get one at hand. Will try it later. Looking at the tracback, it is related xlrd.

Can you try the following code instead?

#!/usr/bin/env python
import pyexcel
import os
from pyexcel.ext import xlsx
# from pyexcel.ext import xls <-- 

pyexcel.save_as(file_name="Test2.xlsx", dest_file_name="Test2.csv")

because pyexcel.ext.xls can read xlsx, as well as pyexcel.ext.xlsx, which uses openpyxl.

@chfw chfw reopened this Jan 26, 2016

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

Yes, xlrd can't handle that file.

Python3 isn't the issue.

@chfw

This comment has been minimized.

Member

chfw commented Jan 26, 2016

OK, I read the following part of your response. openpyxl is OK

@chfw

This comment has been minimized.

Member

chfw commented Jan 26, 2016

If you want to keep xls support along with xlsx, you can do this:

import pyexcel
import os
from pyexcel.ext import xls # set reader for xls, xlsx, and writer for xls
from pyexcel.ext import xlsx # set again reader for xlsx, and writer for xslx, 

...

So the second import will overwrite the first one. I will try to address this problem in future version of pyexcel-io v0.2.0.

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

As you have pointed out, no problem with openpyxl.

The problem is caused by including the import of pyexcel-xls after pyexcel-xlsx.
That caused pyexcel to use xlrd instead of openpyxl.

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

The real problem is that original xlsx file.

xlrd handles the 2nd one without issue.

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

The point is that openpyxl handles it but with a warning and xlrd can't handle it at all.

The spreadsheet is at fault. I wouldn't change pyexcel for this one off occurance.

@chfw

This comment has been minimized.

Member

chfw commented Jan 26, 2016

Can we conclude that the issue is with xlrd? Hence, probably we shall contact xlrd via this issue tacker

@rlatowicz

This comment has been minimized.

rlatowicz commented Jan 26, 2016

Agreed.

@Rehgan

This comment has been minimized.

Rehgan commented Jan 27, 2016

Thank you everyone for looking into it! My full script has to be able to handle both xls and xlsx files from clients which is why they were both in there. I appreciate the fix, switching the order of the imports was successful!

@chfw

This comment has been minimized.

Member

chfw commented Jan 28, 2016

Issue was analyzed, root cause found and error was filed. Hence closing this issue. Please follow xlrd issue #167 for further info.

@chfw chfw closed this Jan 28, 2016

chfw added a commit that referenced this issue May 24, 2016

@chfw

This comment has been minimized.

Member

chfw commented May 24, 2016

@Rehgan, Here is the sample code to use 'library' parameter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment