-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: XLSB support #29836
ENH: XLSB support #29836
Conversation
to get the CI to pass, can you run |
@jbrockmendel thanks for the tip! Did it alphabetically but apparently there's some kind of convention. The documentation not compiling looks like a fun bug. Will try and see if I can reproduce and squash it tomorrow. |
Yah, isort has its own conventions and then we have some configuration in setup.cfg, not sure what the problem was here, but looks resolved now, thanks.
That'd be great, its affecting a lot of PRs right now |
Haven't been able to reproduce the bug on my computer, but I did do the necessary modifications to the test suite. Currently it's failing a lot of tests because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR. I think the limitation you mentioned is OK to start - it is something that would need to be fixed upstream first right?
Can you add a whatsnew note for 1.0.0 and update the user guide?
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#excel-files
A bunch of tests are not passing because (seems like) |
Found it, there was an addiional fixture I had missed All of the failing tests look normal, having to do with not recognizing datetimes. There's two which have to do with a column header not being converted to "Unnamed: 0", but rather to "None". Probably something with types |
Alright, seems like most things are in order now. Commits are a bit messy but I'm not great at git yet. The one build that's failing is failing because jinja2 is doing weird stuff, not sure about that. |
Dont worry about it, we squash everything on merge so it wont matter before long. git is one of those learning curves that never seems to end.
Unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. Can you update the read_excel docs as well? Otherwise lgtm
"pyxlsb", | ||
marks=[ | ||
td.skip_if_no("pyxlsb"), | ||
pytest.mark.filterwarnings("ignore:.*(tree\\.iter|html argument)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is filterwarnings actually required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was seeing some defusedxml warnings but they're present either way, so I've removed filterwarnings.
Pushing to 1.1 |
Hey, sorry for going AWOL, holidays and stuff. Will rebase tonight. |
@jreback pinging on green (two weeks later lol) |
@Rik-de-Kort need to pytest.xfail the tests with date times that aren't supported |
Done. There's still a failing test (read from url), due to pulling the test file from the master branch of the repo, where the files won't be present till this pull request is complete. :) |
Yea I've seen that network one before. Can you xfail here and then fix in a follow up PR after this gets merged? |
Yep, done in the latest commit. I added two xfails since the test is going to fail regardless even if test1.xlsb is present in the master branch. |
@wwwiiilll just realized that because of the PEP 396 addition I might need the right version of Pyxlsb in the environment files. Currently it's 1.0.5, which doesn't have it. Will putting 1.0.6 suffice? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation lgtm just need to fix up a few of the version things at this point
@@ -264,6 +264,7 @@ pyarrow 0.12.0 Parquet, ORC (requires 0.13.0), and | |||
pymysql 0.7.11 MySQL engine for sqlalchemy | |||
pyreadstat SPSS files (.sav) reading | |||
pytables 3.4.2 HDF5 reading / writing | |||
pyxlsb 1.0.5 Reading for xlsb files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyxlsb 1.0.5 Reading for xlsb files | |
pyxlsb 1.0.6 Reading for xlsb files |
@@ -19,6 +19,7 @@ | |||
"pyarrow": "0.13.0", | |||
"pytables": "3.4.2", | |||
"pytest": "5.0.1", | |||
"pyxlsb": "1.0.5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pyxlsb": "1.0.5", | |
"pyxlsb": "1.0.6", |
this is prob ok for 1.0.0, if you are ok with it @WillAyd |
Oh yea sure |
@Rik-de-Kort very nice PR. If you can fix up comments on version and unskip the test that fails because it requires the file on master in a follow up would be much appreciated |
Co-authored-by: Rik-de-Kort <32839123+Rik-de-Kort@users.noreply.github.com>
Am I correct in assuming no further work is needed? |
If you can clarify the min version required in a follow up (I think should be 1.0.6 not 1.0.5) that should be it |
There's an issue with backported PRs. The author is set to the bot, rather than the original git author: scientific-python/MeeseeksDev#35 |
Hey all, a moderately commonly requested feature is xlsb support. I thought I'd go ahead and make a PR for it, based on Pyxlsb. The library isn't very full-featured: datetimes are loaded in as floats without any indication they're datetimes. Would that be grounds for rejection?
Alternative would be to implement xlsb support in Openpyxl which looks like it will take a long time for someone not familiar with the file formats (as I am).
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff