ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422

jorisvandenbossche · 2020-04-09T14:19:31Z

Upcoming pyarrow 0.17 release will include an upgraded feather format.

This PRs updates pandas for that, more specifically ensures the new keywords can be passed through (the basics should keep working out of the box, since the public API did not change), and small update to the tests

jorisvandenbossche · 2020-04-09T14:20:54Z

pandas/tests/io/test_feather.py

@@ -102,8 +104,8 @@ def test_read_columns(self):

    def test_unsupported_other(self):

-        # period
-        df = pd.DataFrame({"a": pd.period_range("2013", freq="M", periods=3)})


Since feather now exactly maps to the Arrow memory, periods are now supported (since Period is supported in the pandas->pyarrow.Table conversion)

that answers my question above, never mind

TomAugspurger

Needs a whatsnew note, and a small comment on the docs in io.rst. LGTM otherwise.

TomAugspurger · 2020-04-09T14:52:18Z

doc/source/user_guide/io.rst

 * The format will NOT write an ``Index``, or ``MultiIndex`` for the
  ``DataFrame`` and will raise an error if a non-default one is provided. You
  can ``.reset_index()`` to store the index or ``.reset_index(drop=True)`` to
  ignore it.
 * Duplicate column names and non-string columns names are not supported
-* Non supported types include ``Period`` and actual Python object types. These will raise a helpful error message
+* Non supported types actual Python object types. These will raise a helpful error message


Missing a word here. Maybe

Suggested change

* Non supported types actual Python object types. These will raise a helpful error message

* object-dtype columns are not supported. This will raise with a helpful error message

for my edification, does this mean that PeriodIndex or Series[Period] is supported? If so, is that a change from the older version?

jbrockmendel · 2020-04-09T15:08:16Z

pandas/core/frame.py

@@ -2058,18 +2058,24 @@ def to_stata(
        writer.write_file()

    @deprecate_kwarg(old_arg_name="fname", new_arg_name="path")
-    def to_feather(self, path) -> None:
+    def to_feather(self, path, **kwargs) -> None:


any reason not to make these explicit?

any reason not to make these explicit?

It might change with the pyarrow version, needing us to each time update if other keywords get added. Passing through kwargs makes this more "future-robust".

But I could make the ones that there are now explicit. However, that also means that we need to check the pyarrow version to give a nice error message to say which keyword is not yet supported with the older pyarrow versions (which is of course not that difficult)

jreback

lgtm. maybe a whatsnew note, otherwise merge when ready.

noklam · 2020-06-21T15:33:27Z

Hi, is feather v2 now supported by pandas? It seems tag 1.0.4 but I cannot find it in the release note, thanks!

the current to_feather() in pandas 1.0.5 seems does not support the compression that was introduced in feather V2 as well.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_feather.html

jreback · 2020-06-21T16:49:14Z

this is in 1.1

jorisvandenbossche · 2020-06-21T17:13:17Z

@noklam Feather V2 is already supported by pandas 1.0.4, as long as you have pyarrow>=0.17 installed.
But only the default behaviour, so no additional keywords (that is what this PR is enabling: correctly passing through keywords such as compression)

noklam · 2020-06-21T17:25:34Z

Thanks! Got it. Looking forward to release 1.1 😀

ENH: update feather IO for pyarrow 0.17 / Feather V2

87ac6bc

jorisvandenbossche added the IO Parquet parquet, feather label Apr 9, 2020

jorisvandenbossche added this to the 1.1 milestone Apr 9, 2020

jorisvandenbossche commented Apr 9, 2020

View reviewed changes

add period/timedelta to tests

0b61a79

TomAugspurger reviewed Apr 9, 2020

View reviewed changes

jbrockmendel reviewed Apr 9, 2020

View reviewed changes

jreback approved these changes Apr 9, 2020

View reviewed changes

update io.rst + add whatsnew

a42ad0f

jorisvandenbossche merged commit a00202d into pandas-dev:master Apr 10, 2020

jorisvandenbossche deleted the feather-update branch April 10, 2020 12:41

jorisvandenbossche mentioned this pull request Apr 27, 2020

TST: add Feather V2 round-trip test #33810

Closed

This was referenced May 4, 2020

RLS: 1.0.4 #33300

Closed

CI: test_unsupported_other fails on pyarrow 0.17 #33990

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422

ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422

jorisvandenbossche commented Apr 9, 2020 •

edited

jorisvandenbossche Apr 9, 2020

jbrockmendel Apr 9, 2020

TomAugspurger left a comment

TomAugspurger Apr 9, 2020

jbrockmendel Apr 9, 2020

jbrockmendel Apr 9, 2020

jorisvandenbossche Apr 9, 2020 •

edited

jreback left a comment

noklam commented Jun 21, 2020 •

edited

jreback commented Jun 21, 2020

jorisvandenbossche commented Jun 21, 2020

noklam commented Jun 21, 2020 •

edited

	* Non supported types actual Python object types. These will raise a helpful error message
	* object-dtype columns are not supported. This will raise with a helpful error message

ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422

ENH: update feather IO for pyarrow 0.17 / Feather V2 #33422

Conversation

jorisvandenbossche commented Apr 9, 2020 • edited

jorisvandenbossche Apr 9, 2020

Choose a reason for hiding this comment

jbrockmendel Apr 9, 2020

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger Apr 9, 2020

Choose a reason for hiding this comment

jbrockmendel Apr 9, 2020

Choose a reason for hiding this comment

jbrockmendel Apr 9, 2020

Choose a reason for hiding this comment

jorisvandenbossche Apr 9, 2020 • edited

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

noklam commented Jun 21, 2020 • edited

jreback commented Jun 21, 2020

jorisvandenbossche commented Jun 21, 2020

noklam commented Jun 21, 2020 • edited

jorisvandenbossche commented Apr 9, 2020 •

edited

jorisvandenbossche Apr 9, 2020 •

edited

noklam commented Jun 21, 2020 •

edited

noklam commented Jun 21, 2020 •

edited