New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to pandas 2 #461
Conversation
_fix_dframe_for_libecl(dframe), | ||
expected_dframe, | ||
check_index_type=False, | ||
check_column_type=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set check_column_type=False
because in Pandas > 2.0 set empty data frame to RangeIndex and that's not the case when you delete a column. Hence it will fail on case 10-12
# to use nanosecond timestamps in the dataframe object for these dates. | ||
# This is maybe a PyArrow bug/limitation that we must be aware of. | ||
pyat.to_pandas() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception doesn't seem to happen anymore.
ecl2df/wellcompletiondata.py
Outdated
@@ -228,7 +228,7 @@ def _df2pyarrow(dframe: pd.DataFrame) -> pyarrow.Table: | |||
else: | |||
field_metadata = {} | |||
if colname == "DATE": | |||
dtype = pyarrow.timestamp("ms") | |||
dtype = pyarrow.timestamp("ns") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful with this, with nanoseconds you will hit a limit in year 2262, and summary files going past 2262 is something we need. Why is this needed for pandas2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. 👍
From what I can see, pyarrow 12.0.0 to_pandas()
method will convert DATE
column to datetime64[ns]
despite what we set on the line that you point above. The new pyarrow 14.0.0 does a proper conversion and causing a dtype
issue in the test below since pandas by default will set dtype to datetime64[ns]
. I think we can just get away with it by specifying check_dtype=False
Should we ensure that all date column from ecl2df will be using datetime64[ms]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests regarding year 2262 support, check that these cover this case (export to arrow format)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #461 +/- ##
==========================================
- Coverage 95.22% 95.18% -0.05%
==========================================
Files 33 33
Lines 4461 4461
==========================================
- Hits 4248 4246 -2
- Misses 213 215 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Ensure squashing when merging.
…be-support-pandas-2
Conflicts: .github/workflows/res2df.yml tests/test_summary.py
Add support to pandas 2 and add pandas version to test matrix