Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Respect dtype and strict in pl.Series's constructor for pyarrow arrays, numpy arrays, and pyarrow-backed pandas #15962

Merged
merged 15 commits into from
May 9, 2024

Conversation

luke396
Copy link
Contributor

@luke396 luke396 commented Apr 30, 2024

xref #15948 (comment) and a follow-up PR of #15948

Fix wrong dtype when create pl.Series from pd.Series.

when

print(pl.Series("foo", pd.Series([1, 2, 3], dtype='Int64'), pl.Float32))

before

shape: (3,)
Series: 'foo' [i64]
[
	1
	2
	3
]

after

shape: (3,)
Series: 'foo' [f32]
[
	1.0
	2.0
	3.0
]

Copy link

codecov bot commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.98%. Comparing base (acb601d) to head (c000f20).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15962   +/-   ##
=======================================
  Coverage   80.97%   80.98%           
=======================================
  Files        1386     1386           
  Lines      178479   178479           
  Branches     2877     2877           
=======================================
+ Hits       144530   144539    +9     
+ Misses      33458    33448   -10     
- Partials      491      492    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@luke396 luke396 requested a review from MarcoGorelli May 2, 2024 01:47
@luke396
Copy link
Contributor Author

luke396 commented May 3, 2024

Thanks, @MarcoGorelli, for the quick review. I believe I now understand what you meant and have made the necessary changes.

@luke396 luke396 requested a review from MarcoGorelli May 3, 2024 08:52
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating! I think this is very close 💪

I'd just suggest expanding the tests, to include:

  • a test where you convert from a pyarrow array with strict=False
  • a test where you convert from a pandas series with strict=False

@luke396
Copy link
Contributor Author

luke396 commented May 4, 2024

  • a test where you convert from a pyarrow array with strict=False
  • a test where you convert from a pandas series with strict=False

I believe the change has been made

@luke396 luke396 requested a review from MarcoGorelli May 4, 2024 02:59
@luke396 luke396 requested a review from MarcoGorelli May 6, 2024 11:53
Comment on lines 2374 to 2381
def test_series_from_pandas_with_strict() -> None:
s = pl.Series(pd.Series([1, 2.5, 3]), strict=False)
assert_series_equal(s, pl.Series([1.0, 2.5, 3.0], dtype=pl.Float64))


def test_series_from_pyarrow_with_strict() -> None:
s = pl.Series(pa.array([1, 2.5, 3]), strict=False)
assert_series_equal(s, pl.Series([1.0, 2.5, 3.0], dtype=pl.Float64))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but I don't see what these tests add to what you already have

Rather than adding these, how about testing converting from a numpy array with [-1, 2, 3] to a Polars Series with dtype pl.UInt8, with strict=False and strict=True? I think this will cover the behaviour change you mentioned in https://github.com/pola-rs/polars/pull/15962/files#r1590912969

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two tests added earlier were because I felt there needed to be a simple test with strict=False without any other parameters. However, I think you are right, these two tests are covered by others.

The behavior change mentioned in #15962 (comment) has been covered by the test using strict=False like below. It was precisely because of your previous suggestion to modify the test that I discovered this potential problem.

s = pl.Series("foo", pd.Series([-1, 2, 3], dtype="Int8"), pl.UInt8, strict=False)
assert s.to_list() == [None, 2, 3]
assert s.dtype == pl.UInt8

Nevertheless, I will add a test with a numpy array and the strict parameter.

@luke396 luke396 requested a review from MarcoGorelli May 8, 2024 02:07
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @luke396 !

@MarcoGorelli MarcoGorelli changed the title fix(python): Fix pl.Series 's dtype for pyarrow path fix(python): Respect dtype and strict in pl.Series's constructor for pyarrow arrays, numpy arrays, and pyarrow-backed pandas May 8, 2024
@ritchie46 ritchie46 merged commit f18c306 into pola-rs:main May 9, 2024
15 checks passed
@luke396
Copy link
Contributor Author

luke396 commented May 10, 2024

Thanks, @MarcoGorelli, for your patience and selfless help. This PR wouldn't have been completed without you!

@luke396 luke396 deleted the add-test-pyarrow branch May 11, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants