Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_arrow conversion ignores array slicing #668

Closed
bmschmidt opened this issue May 19, 2021 · 3 comments
Closed

from_arrow conversion ignores array slicing #668

bmschmidt opened this issue May 19, 2021 · 3 comments

Comments

@bmschmidt
Copy link

Thank you for your work on this excellent project.

Are you using Python or Rust?

Python

What version of polars are you using?

0.7.16

What operating system are you using polars on?

OS X 10.14.3

Describe your bug.

When using pl.from_arrow on python tables with utf8() columns, something is happening that doesn't respect existing slices of arrays.

What are the steps to reproduce the behavior?

import pyarrow as pa
import polars as pl

letters = pa.array(["A", "B", "C"])
tab = pa.table({
    'leading': letters[:-1],
    'lagging': letters[1:]
})
pl.from_arrow(tab)

Gives:

leading	lagging
str	str
"A"	"A"
"B"	"B"

While tab.to_pandas() gives, correctly

leading | lagging
-- | -- | --
A | B
B | C

What is the expected behavior?

The to_pandas() output is correct.

Note: this behavior does not appear if the pyarrow column is an integer rather a string, i.e. the following code works as expected.

import pyarrow as pa
import numpy as np
import polars as pl

numbers = pa.array(np.arange(3))
tab = pa.table({
    'lagging': numbers[1:],
    'leading': numbers[:-1]
})
tab.to_pandas()
pl.from_arrow(tab)
@ritchie46
Copy link
Member

Thanks for your issue report. I will look into it.

@ritchie46
Copy link
Member

apache/arrow-rs#335

@ritchie46
Copy link
Member

Is merged upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants