-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix related to USGS stations #77
Bugfix related to USGS stations #77
Conversation
Codecov Report
@@ Coverage Diff @@
## master #77 +/- ##
==========================================
- Coverage 89.95% 89.71% -0.24%
==========================================
Files 13 13
Lines 1005 1011 +6
Branches 146 148 +2
==========================================
+ Hits 904 907 +3
- Misses 57 58 +1
- Partials 44 46 +2
|
I'm trying to test the branch on binder. It takes some time for binder Jupyter to come up, do you have the same issue with binder? Is it related to the number of package dependencies? |
@brey it looks like both are now working fine: https://mybinder.org/v2/gh/SorooshMani-NOAA/searvey/bugfix/examples |
Yes. There are a number of potential reasons/measures. See here for more info. |
searvey/usgs.py
Outdated
df.end_date = pd.to_datetime(df.end_date, errors="coerce") | ||
df.begin_date = pd.to_datetime(df.begin_date, errors="coerce") | ||
df.end_date = pd.to_datetime(df.end_date, errors="coerce", format="mixed") | ||
df.begin_date = pd.to_datetime(df.begin_date, errors="coerce", format="mixed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not the proper fix.
The format
argument does exist in pandas 1.X but mixed
is not a special value in 1.X. format="mixed"
was only added in 2.0. Consequently, even though pd.to_datetime()
does not raise an Exception in pandas 1.X it always returns NaT
.
For example this will raise an AssertionError
in 2.X but not in 1.X
import pandas as pd
assert pd.to_datetime("2001-09-23", errors="coerce", format="mixed") is pd.NaT
I didn't spend too much time on this, but I can't find a cleaner solution than checking the pandas version with importlib.metadata.version()
https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.to_datetime.html
https://pandas.pydata.org/pandas-docs/version/2.0/reference/api/pandas.to_datetime.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a fix. Not sure if it is the best solution though. If anyone things of something better, please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmav99 thanks for noticing this. I think what you did makes the most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what exactly we need here, but if it to test whether the time stamp is not available or pd.Nat
this should work for both versions:
if pd.isna(pd.to_datetime("NaN")):
do something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the opposite. We don't want to detect NaT
but to avoid generating NaT
s in the first place. In order to do so we need an extra argument in pd.to_datetime()
but the value of this argument is not valid in 1.X. That's why we need to detect the pandas version and conditionally pass this argument.
I added a small test and squashed/rebased in order to keep the git history clean. Test coverage is decreasing, but I am not sure it is worth it to setup CI to run with both pandas 1 and 2. |
@pmav99 Thanks, I don't think it's worth it. |
Fixes #76