Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to pandas 2 #461

Merged
merged 6 commits into from Dec 7, 2023
Merged

Add support to pandas 2 #461

merged 6 commits into from Dec 7, 2023

Conversation

alifbe
Copy link
Collaborator

@alifbe alifbe commented Nov 14, 2023

Add support to pandas 2 and add pandas version to test matrix

_fix_dframe_for_libecl(dframe),
expected_dframe,
check_index_type=False,
check_column_type=False,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set check_column_type=False because in Pandas > 2.0 set empty data frame to RangeIndex and that's not the case when you delete a column. Hence it will fail on case 10-12

# to use nanosecond timestamps in the dataframe object for these dates.
# This is maybe a PyArrow bug/limitation that we must be aware of.
pyat.to_pandas()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception doesn't seem to happen anymore.

@@ -228,7 +228,7 @@ def _df2pyarrow(dframe: pd.DataFrame) -> pyarrow.Table:
else:
field_metadata = {}
if colname == "DATE":
dtype = pyarrow.timestamp("ms")
dtype = pyarrow.timestamp("ns")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful with this, with nanoseconds you will hit a limit in year 2262, and summary files going past 2262 is something we need. Why is this needed for pandas2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. 👍

From what I can see, pyarrow 12.0.0 to_pandas() method will convert DATE column to datetime64[ns] despite what we set on the line that you point above. The new pyarrow 14.0.0 does a proper conversion and causing a dtype issue in the test below since pandas by default will set dtype to datetime64[ns]. I think we can just get away with it by specifying check_dtype=False

https://github.com/equinor/ecl2df/blob/798ee6225697919cad4876d2390e3aa209ba48e0/tests/test_wellcompletiondata.py#L89

Should we ensure that all date column from ecl2df will be using datetime64[ms] ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests regarding year 2262 support, check that these cover this case (export to arrow format)

@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8fa3b8f) 95.22% compared to head (7d147b0) 95.18%.

❗ Current head 7d147b0 differs from pull request most recent head 5add63c. Consider uploading reports for the commit 5add63c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #461      +/-   ##
==========================================
- Coverage   95.22%   95.18%   -0.05%     
==========================================
  Files          33       33              
  Lines        4461     4461              
==========================================
- Hits         4248     4246       -2     
- Misses        213      215       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alifbe alifbe requested a review from berland November 15, 2023 13:51
Copy link
Collaborator

@berland berland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ensure squashing when merging.

@alifbe alifbe merged commit bf4029a into equinor:master Dec 7, 2023
7 checks passed
@alifbe alifbe deleted the support-pandas-2 branch December 7, 2023 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants