COMPAT: pyarrow >= 0.7.0 compat #17588

jreback · 2017-09-19T10:01:34Z

codecov · 2017-09-19T10:32:18Z

Codecov Report

Merging #17588 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17588      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49625    49625              
==========================================
- Hits        45270    45261       -9     
- Misses       4355     4364       +9

Flag	Coverage Δ
#multiple	`88.99% <ø> (ø)`	⬆️
#single	`40.19% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.77% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e85ca7...5bc3def. Read the comment docs.

codecov · 2017-09-19T10:32:21Z

Codecov Report

Merging #17588 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17588      +/-   ##
==========================================
- Coverage   91.22%    91.2%   -0.02%     
==========================================
  Files         163      163              
  Lines       49625    49625              
==========================================
- Hits        45270    45261       -9     
- Misses       4355     4364       +9

Flag	Coverage Δ
#multiple	`88.99% <ø> (ø)`	⬆️
#single	`40.19% <ø> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.77% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e85ca7...0c75389. Read the comment docs.

jreback · 2017-09-19T10:34:53Z

In [13]: import pyarrow

In [14]: pyarrow.__version__
Out[14]: '0.7.0'

In [15]: df = pd.DataFrame({'a': pd.Categorical(list('abc'))})

In [16]: df.dtypes
Out[16]: 
a    category
dtype: object

In [17]: df.to_parquet('foo.pq', engine='pyarrow')

In [18]: pd.read_parquet('foo.pq', engine='pyarrow')
Out[18]: 
   a
0  a
1  b
2  c

In [19]: pd.read_parquet('foo.pq', engine='pyarrow').dtypes
Out[19]: 
a    object
dtype: object

@wesm
de-serializing seems not to preserve the cat type ?

wesm · 2017-09-19T10:58:13Z

Correct, it comes back as non-categorical. Parquet does not have a categorical type. We can try to emulate it as best we can (not foolproof; when dictionaries grow too big, encoding is turned off) but it will take some more work in parquet-cpp.

jreback · 2017-09-19T11:08:37Z

no need to emulate. will make a note in the docs.

closes pandas-dev#17581

wesm · 2017-09-19T13:15:24Z

We're going to support direct column reads as categorical soon hopefully, which will use the dictionary page if there is one, but that has some edge cases:

Dictionaries generally will be different from file to file
Dictionaries may be different within each row group
A column chunk may switch to plain encoding mid-stream (if the dictionary got too big)

closes pandas-dev#17581

jreback added Compat pandas objects compatability with Numpy or Python functions IO Parquet parquet, feather labels Sep 19, 2017

jreback added this to the 0.21.0 milestone Sep 19, 2017

jreback force-pushed the pyarrow branch from 5990aaa to 5bc3def Compare September 19, 2017 10:31

COMPAT: pyarrow >= 0.7.0 compat

0c75389

closes pandas-dev#17581

jreback force-pushed the pyarrow branch from 5bc3def to 0c75389 Compare September 19, 2017 11:12

jreback merged commit 6630c4e into pandas-dev:master Sep 19, 2017

gfyoung mentioned this pull request Sep 20, 2017

ENH: GH17054: read_html() handles rowspan/colspan and infers headers #17089

Closed

4 tasks

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

COMPAT: pyarrow >= 0.7.0 compat (pandas-dev#17588)

2e09d71

closes pandas-dev#17581

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

COMPAT: pyarrow >= 0.7.0 compat (pandas-dev#17588)

8e50b03

closes pandas-dev#17581

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COMPAT: pyarrow >= 0.7.0 compat #17588

COMPAT: pyarrow >= 0.7.0 compat #17588

jreback commented Sep 19, 2017

codecov bot commented Sep 19, 2017

codecov bot commented Sep 19, 2017 •

edited

Loading

jreback commented Sep 19, 2017

wesm commented Sep 19, 2017

jreback commented Sep 19, 2017

wesm commented Sep 19, 2017

COMPAT: pyarrow >= 0.7.0 compat #17588

COMPAT: pyarrow >= 0.7.0 compat #17588

Conversation

jreback commented Sep 19, 2017

codecov bot commented Sep 19, 2017

Codecov Report

codecov bot commented Sep 19, 2017 • edited Loading

Codecov Report

jreback commented Sep 19, 2017

wesm commented Sep 19, 2017

jreback commented Sep 19, 2017

wesm commented Sep 19, 2017

codecov bot commented Sep 19, 2017 •

edited

Loading