Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tablib no longer able to import Google Docs exported xlsx #465

Closed
dmosberger opened this issue Apr 28, 2020 · 8 comments · Fixed by #547
Closed

tablib no longer able to import Google Docs exported xlsx #465

dmosberger opened this issue Apr 28, 2020 · 8 comments · Fixed by #547

Comments

@dmosberger
Copy link
Contributor

tablib used to be able to read xlsx files exported from Google Docs just fine. Now it fails with InvalidDimensions if not all columns in the spreadsheet are populated with values. For example, download this spreadsheet as xlsx (also attached below) and then try to read it like this:

python3 -c "import tablib; tablib.Databook().load(open('test.xlsx', 'rb').read(), 'xlsx')"

and it'll result in an InvalidDimensions exception. The problem seems to be due to the fact that the first row has values in 2 columns, but the second row has an empty cell in the second column. I'm guessing something changed in the way Google exports the xlsx files?
test.xlsx

@dmosberger dmosberger changed the title tablib no longer able to import Google Docs exported xls tablib no longer able to import Google Docs exported xlsx Apr 28, 2020
@claudep
Copy link
Contributor

claudep commented Apr 28, 2020

Isn't this the same issue as #226 ? That was supposed to be fixed in tablib 1.1.0.
I cannot access your xlsx sample file, please put it in a non-protected location (with non-sensitive data, of course).

@dmosberger
Copy link
Contributor Author

#226 certainly looks related, but the fix seems to be specifically for csv files, not xlsx, so it doesn't look to be the same. I'm not really familiar with tablib internals though, so perhaps I'm missing something.

The test.xslx is attached to my original post, so you shouldn't need a Google account to download the test case.

@claudep
Copy link
Contributor

claudep commented Apr 29, 2020

Oh right, I'll have a look soon.

@claudep
Copy link
Contributor

claudep commented Apr 30, 2020

That might be a Google Docs pecularity. If I save the same file with LibreOffice as .xlsx, I'm getting two values ((<ReadOnlyCell 'Sheet1'.A2>, <EmptyCell>)) for the problematic row.
We may apply the same fix as for the csv format to circumvent this use case.

claudep added a commit to claudep/tablib that referenced this issue Apr 30, 2020
@dmosberger
Copy link
Contributor Author

Yes, I agree it's very likely something changed in Google Docs.

I tried commit 71ff737 and it fixes the test case, but I think there is still a problem if the first row has fewer columns than a subsequent row. See the attached test case.

python3 -c "import tablib; tablib.Databook().load(open('test2.xlsx', 'rb').read(), 'xlsx')"

still gives me an InvalidDimensions error.

test2.xlsx

@claudep
Copy link
Contributor

claudep commented Apr 30, 2020

Sure, however your latest example looks clearly wrong to me. tablib expects generally at least a full row of headers. So I don't think we can support all use cases.

@dmosberger
Copy link
Contributor Author

Hmmh, I don't know. I quite often have a fixed format table and then use a cell to the right of a row to make random comments. For example:

 Col A   Col B
 val1    val2
 val3    val4    This_is_comment_about_row_3

It'd be sad if tablib can't load such a file.

@hugovk hugovk closed this as completed in 6d097c0 May 16, 2020
@EbubekirUstalar
Copy link

@hugovk, @claudep could you add the solution to the import_set as well, you just added it to import_book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants