ENH: XLSB support #29836

Rik-de-Kort · 2019-11-25T16:26:33Z

Hey all, a moderately commonly requested feature is xlsb support. I thought I'd go ahead and make a PR for it, based on Pyxlsb. The library isn't very full-featured: datetimes are loaded in as floats without any indication they're datetimes. Would that be grounds for rejection?

Alternative would be to implement xlsb support in Openpyxl which looks like it will take a long time for someone not familiar with the file formats (as I am).

closes Enhancement: XLSB support in read_excel() #8540
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jbrockmendel · 2019-11-25T18:10:56Z

to get the CI to pass, can you run isort pandas/io/excel/_pyxlsb.py

Rik-de-Kort · 2019-11-25T22:39:10Z

@jbrockmendel thanks for the tip! Did it alphabetically but apparently there's some kind of convention.

The documentation not compiling looks like a fun bug. Will try and see if I can reproduce and squash it tomorrow.

jbrockmendel · 2019-11-25T22:40:41Z

Did it alphabetically but apparently there's some kind of convention.

Yah, isort has its own conventions and then we have some configuration in setup.cfg, not sure what the problem was here, but looks resolved now, thanks.

The documentation not compiling looks like a fun bug. Will try and see if I can reproduce and squash it tomorrow.

That'd be great, its affecting a lot of PRs right now

Rik-de-Kort · 2019-11-26T19:37:30Z

Haven't been able to reproduce the bug on my computer, but I did do the necessary modifications to the test suite. Currently it's failing a lot of tests because ExcelFile appears to get called with no engine and that raises an XLRDError. Any help here? I'm not sure ExcelFile should get called at all with no engine and an .xlsb-file, since I added the appropriate checks near the beginning of the fixture.

WillAyd

Nice PR. I think the limitation you mentioned is OK to start - it is something that would need to be fixed upstream first right?

Can you add a whatsnew note for 1.0.0 and update the user guide?

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#excel-files

pandas/io/excel/_pyxlsb.py

Rik-de-Kort · 2019-11-28T17:32:05Z

A bunch of tests are not passing because (seems like) pd.ExcelFile doesn't get patched to use the pyxlsb engine. Can anyone see what's going on with that, as I really don't know a whole lot about the way pytest works. I would anticipate the cd_and_set_engine-fixture to be used for every test in the class...

pandas/tests/io/excel/test_readers.py

Rik-de-Kort · 2019-11-30T14:31:06Z

Found it, there was an addiional fixture I had missed read_ext == ".xlsb" and engine != "pyxlsb" line on. Thanks for the help as well, @WillAyd!

All of the failing tests look normal, having to do with not recognizing datetimes. There's two which have to do with a column header not being converted to "Unnamed: 0", but rather to "None". Probably something with types

Rik-de-Kort · 2019-12-02T23:33:58Z

Alright, seems like most things are in order now. Commits are a bit messy but I'm not great at git yet.

The one build that's failing is failing because jinja2 is doing weird stuff, not sure about that.

jbrockmendel · 2019-12-03T00:09:07Z

Commits are a bit messy but I'm not great at git yet.

Dont worry about it, we squash everything on merge so it wont matter before long. git is one of those learning curves that never seems to end.

The one build that's failing is failing because jinja2 is doing weird stuff, not sure about that.

Unrelated

WillAyd

Very nice. Can you update the read_excel docs as well? Otherwise lgtm

WillAyd · 2019-12-04T13:38:59Z

pandas/tests/io/excel/test_readers.py

+            "pyxlsb",
+            marks=[
+                td.skip_if_no("pyxlsb"),
+                pytest.mark.filterwarnings("ignore:.*(tree\\.iter|html argument)"),


Is filterwarnings actually required here?

Was seeing some defusedxml warnings but they're present either way, so I've removed filterwarnings.

TomAugspurger · 2020-01-09T15:14:02Z

Pushing to 1.1

Rik-de-Kort · 2020-01-14T08:57:37Z

Hey, sorry for going AWOL, holidays and stuff. Will rebase tonight.

Rik-de-Kort · 2020-01-15T08:30:14Z

@jreback pinging on green (two weeks later lol)

WillAyd · 2020-01-15T16:28:04Z

@Rik-de-Kort need to pytest.xfail the tests with date times that aren't supported

Rik-de-Kort · 2020-01-17T11:40:51Z

Done. There's still a failing test (read from url), due to pulling the test file from the master branch of the repo, where the files won't be present till this pull request is complete. :)

WillAyd · 2020-01-17T21:20:18Z

Yea I've seen that network one before. Can you xfail here and then fix in a follow up PR after this gets merged?

Rik-de-Kort · 2020-01-18T10:31:31Z

Yea I've seen that network one before. Can you xfail here and then fix in a follow up PR after this gets merged?

Yep, done in the latest commit. I added two xfails since the test is going to fail regardless even if test1.xlsb is present in the master branch.

Rik-de-Kort · 2020-01-18T10:41:19Z

@wwwiiilll just realized that because of the PEP 396 addition I might need the right version of Pyxlsb in the environment files. Currently it's 1.0.5, which doesn't have it. Will putting 1.0.6 suffice?

WillAyd

Implementation lgtm just need to fix up a few of the version things at this point

WillAyd · 2020-01-20T17:44:16Z

doc/source/getting_started/install.rst

@@ -264,6 +264,7 @@ pyarrow                   0.12.0             Parquet, ORC (requires 0.13.0), and
 pymysql                   0.7.11             MySQL engine for sqlalchemy
 pyreadstat                                   SPSS files (.sav) reading
 pytables                  3.4.2              HDF5 reading / writing
+pyxlsb                    1.0.5              Reading for xlsb files


Suggested change

pyxlsb 1.0.5 Reading for xlsb files

pyxlsb 1.0.6 Reading for xlsb files

WillAyd · 2020-01-20T17:45:03Z

pandas/compat/_optional.py

@@ -19,6 +19,7 @@
    "pyarrow": "0.13.0",
    "pytables": "3.4.2",
    "pytest": "5.0.1",
+    "pyxlsb": "1.0.5",


Suggested change

"pyxlsb": "1.0.5",

"pyxlsb": "1.0.6",

pandas/tests/io/excel/test_readers.py

jreback · 2020-01-20T23:46:56Z

this is prob ok for 1.0.0, if you are ok with it @WillAyd

WillAyd · 2020-01-20T23:48:43Z

Oh yea sure

WillAyd · 2020-01-20T23:49:56Z

@Rik-de-Kort very nice PR. If you can fix up comments on version and unskip the test that fails because it requires the file on master in a follow up would be much appreciated

Co-authored-by: Rik-de-Kort <32839123+Rik-de-Kort@users.noreply.github.com>

Rik-de-Kort · 2020-01-21T16:30:31Z

Am I correct in assuming no further work is needed?

WillAyd · 2020-01-21T16:31:17Z

If you can clarify the min version required in a follow up (I think should be 1.0.6 not 1.0.5) that should be it

TomAugspurger · 2020-02-03T12:12:35Z

There's an issue with backported PRs. The author is set to the bot, rather than the original git author: scientific-python/MeeseeksDev#35

initial xlsb support

a4f2d22

Import order fix for CI pass

62564cf

Initial tests

a7a8460

WillAyd requested changes Nov 26, 2019

View reviewed changes

pandas/io/excel/_pyxlsb.py Outdated Show resolved Hide resolved

WillAyd added the IO Excel read_excel, to_excel label Nov 26, 2019

Rik-de-Kort added 2 commits November 28, 2019 18:15

style fixes

d9be281

documentation

8bf8c78

Rik-de-Kort changed the title ~~XLSB support~~ ENH: XLSB support Nov 28, 2019

forgot place to document

cd95dce

WillAyd requested changes Nov 29, 2019

View reviewed changes

pandas/tests/io/excel/test_readers.py Outdated Show resolved Hide resolved

Fixed test issue with XLRDError

7a7390d

Rik-de-Kort and others added 6 commits November 30, 2019 18:34

Fix for unnamed column issue

248ac12

style fix

6ea78de

line up with upstream master

44c5439

Merge branch 'master' of https://github.com/pandas-dev/pandas

92c98cd

Fix broken xlrd test

64fa6f3

get docs to build

cb276e8

Rik-de-Kort marked this pull request as ready for review December 2, 2019 23:27

Rik-de-Kort requested a review from WillAyd December 4, 2019 10:31

WillAyd requested changes Dec 4, 2019

View reviewed changes

Rik-de-Kort added 2 commits December 6, 2019 09:27

Remove warning filter

4ebcb48

Merge branch 'master' of https://github.com/Rik-de-Kort/pandas

71436a0

TomAugspurger added this to the 1.1 milestone Jan 9, 2020

Rik-de-Kort added 3 commits January 15, 2020 08:53

Merge upstream

43ab0fe

Added issue number

024492a

Updated to use .rows(sparse=False) for future compat

b424c8e

Rik-de-Kort added 2 commits January 17, 2020 10:30

Merge branch 'master' of https://github.com/pandas-dev/pandas

571489b

xfails in test_readers.py

dad4a53

xfail url loads

9b6bc9a

WillAyd requested changes Jan 20, 2020

View reviewed changes

jreback approved these changes Jan 20, 2020

View reviewed changes

WillAyd modified the milestones: 1.1, 1.0.0 Jan 20, 2020

WillAyd merged commit cdffa43 into pandas-dev:master Jan 20, 2020

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 20, 2020

Backport PR pandas-dev#29836: ENH: XLSB support

33ec9d5

meeseeksmachine mentioned this pull request Jan 20, 2020

Backport PR #29836 on branch 1.0.x (ENH: XLSB support) #31166

Merged

simonjayhawkins pushed a commit that referenced this pull request Jan 21, 2020

Backport PR #29836: ENH: XLSB support (#31166)

459a789

Co-authored-by: Rik-de-Kort <32839123+Rik-de-Kort@users.noreply.github.com>

WillAyd mentioned this pull request Feb 1, 2020

Follow-up: XLSB Support #31215

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: XLSB support #29836

ENH: XLSB support #29836

Rik-de-Kort commented Nov 25, 2019 •

edited

Loading

jbrockmendel commented Nov 25, 2019

Rik-de-Kort commented Nov 25, 2019

jbrockmendel commented Nov 25, 2019

Rik-de-Kort commented Nov 26, 2019

WillAyd left a comment

Rik-de-Kort commented Nov 28, 2019

Rik-de-Kort commented Nov 30, 2019 •

edited

Loading

Rik-de-Kort commented Dec 2, 2019

jbrockmendel commented Dec 3, 2019

WillAyd left a comment

WillAyd Dec 4, 2019

Rik-de-Kort Dec 6, 2019

TomAugspurger commented Jan 9, 2020

Rik-de-Kort commented Jan 14, 2020

Rik-de-Kort commented Jan 15, 2020

WillAyd commented Jan 15, 2020

Rik-de-Kort commented Jan 17, 2020

WillAyd commented Jan 17, 2020

Rik-de-Kort commented Jan 18, 2020

Rik-de-Kort commented Jan 18, 2020

WillAyd left a comment

WillAyd Jan 20, 2020

WillAyd Jan 20, 2020

jreback commented Jan 20, 2020

WillAyd commented Jan 20, 2020 •

edited

Loading

WillAyd commented Jan 20, 2020

Rik-de-Kort commented Jan 21, 2020

WillAyd commented Jan 21, 2020

TomAugspurger commented Feb 3, 2020

	pyxlsb 1.0.5 Reading for xlsb files
	pyxlsb 1.0.6 Reading for xlsb files

ENH: XLSB support #29836

ENH: XLSB support #29836

Conversation

Rik-de-Kort commented Nov 25, 2019 • edited Loading

jbrockmendel commented Nov 25, 2019

Rik-de-Kort commented Nov 25, 2019

jbrockmendel commented Nov 25, 2019

Rik-de-Kort commented Nov 26, 2019

WillAyd left a comment

Choose a reason for hiding this comment

Rik-de-Kort commented Nov 28, 2019

Rik-de-Kort commented Nov 30, 2019 • edited Loading

Rik-de-Kort commented Dec 2, 2019

jbrockmendel commented Dec 3, 2019

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Dec 4, 2019

Choose a reason for hiding this comment

Rik-de-Kort Dec 6, 2019

Choose a reason for hiding this comment

TomAugspurger commented Jan 9, 2020

Rik-de-Kort commented Jan 14, 2020

Rik-de-Kort commented Jan 15, 2020

WillAyd commented Jan 15, 2020

Rik-de-Kort commented Jan 17, 2020

WillAyd commented Jan 17, 2020

Rik-de-Kort commented Jan 18, 2020

Rik-de-Kort commented Jan 18, 2020

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Jan 20, 2020

Choose a reason for hiding this comment

WillAyd Jan 20, 2020

Choose a reason for hiding this comment

jreback commented Jan 20, 2020

WillAyd commented Jan 20, 2020 • edited Loading

WillAyd commented Jan 20, 2020

Rik-de-Kort commented Jan 21, 2020

WillAyd commented Jan 21, 2020

TomAugspurger commented Feb 3, 2020

Rik-de-Kort commented Nov 25, 2019 •

edited

Loading

Rik-de-Kort commented Nov 30, 2019 •

edited

Loading

WillAyd commented Jan 20, 2020 •

edited

Loading