Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix related to USGS stations #77

Merged
merged 4 commits into from
Apr 27, 2023

Conversation

SorooshMani-NOAA
Copy link
Contributor

Fixes #76

@SorooshMani-NOAA SorooshMani-NOAA marked this pull request as draft April 13, 2023 18:25
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #77 (f5a52b6) into master (f4a29d5) will decrease coverage by 0.24%.
The diff coverage is 72.72%.

❗ Current head f5a52b6 differs from pull request most recent head 371ccf9. Consider uploading reports for the commit 371ccf9 to get more accurate results

@@            Coverage Diff             @@
##           master      #77      +/-   ##
==========================================
- Coverage   89.95%   89.71%   -0.24%     
==========================================
  Files          13       13              
  Lines        1005     1011       +6     
  Branches      146      148       +2     
==========================================
+ Hits          904      907       +3     
- Misses         57       58       +1     
- Partials       44       46       +2     
Impacted Files Coverage Δ
searvey/usgs.py 89.75% <72.72%> (-1.50%) ⬇️

@SorooshMani-NOAA SorooshMani-NOAA marked this pull request as ready for review April 13, 2023 19:39
@SorooshMani-NOAA
Copy link
Contributor Author

I'm trying to test the branch on binder. It takes some time for binder Jupyter to come up, do you have the same issue with binder? Is it related to the number of package dependencies?

@SorooshMani-NOAA
Copy link
Contributor Author

@brey it looks like both are now working fine: https://mybinder.org/v2/gh/SorooshMani-NOAA/searvey/bugfix/examples
The pull is ready to be merged. Note that we do need a release because of the bugfix related to pandas

@brey
Copy link
Contributor

brey commented Apr 14, 2023

I'm trying to test the branch on binder. It takes some time for binder Jupyter to come up, do you have the same issue with binder? Is it related to the number of package dependencies?

Yes. There are a number of potential reasons/measures. See here for more info.

searvey/usgs.py Outdated
df.end_date = pd.to_datetime(df.end_date, errors="coerce")
df.begin_date = pd.to_datetime(df.begin_date, errors="coerce")
df.end_date = pd.to_datetime(df.end_date, errors="coerce", format="mixed")
df.begin_date = pd.to_datetime(df.begin_date, errors="coerce", format="mixed")
Copy link
Member

@pmav99 pmav99 Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not the proper fix.

The format argument does exist in pandas 1.X but mixed is not a special value in 1.X. format="mixed" was only added in 2.0. Consequently, even though pd.to_datetime() does not raise an Exception in pandas 1.X it always returns NaT.

For example this will raise an AssertionError in 2.X but not in 1.X

import pandas as pd
assert pd.to_datetime("2001-09-23", errors="coerce", format="mixed") is pd.NaT

I didn't spend too much time on this, but I can't find a cleaner solution than checking the pandas version with importlib.metadata.version()

https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.to_datetime.html
https://pandas.pydata.org/pandas-docs/version/2.0/reference/api/pandas.to_datetime.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a fix. Not sure if it is the best solution though. If anyone things of something better, please let me know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmav99 thanks for noticing this. I think what you did makes the most sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what exactly we need here, but if it to test whether the time stamp is not available or pd.Nat this should work for both versions:

if pd.isna(pd.to_datetime("NaN")):
     do something

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the opposite. We don't want to detect NaT but to avoid generating NaTs in the first place. In order to do so we need an extra argument in pd.to_datetime() but the value of this argument is not valid in 1.X. That's why we need to detect the pandas version and conditionally pass this argument.

@pmav99
Copy link
Member

pmav99 commented Apr 27, 2023

I added a small test and squashed/rebased in order to keep the git history clean. Test coverage is decreasing, but I am not sure it is worth it to setup CI to run with both pandas 1 and 2.

@SorooshMani-NOAA
Copy link
Contributor Author

I added a small test and squashed/rebased in order to keep the git history clean. Test coverage is decreasing, but I am not sure it is worth it to setup CI to run with both pandas 1 and 2.

@pmav99 Thanks, I don't think it's worth it.

@pmav99 pmav99 merged commit 2f7ed28 into oceanmodeling:master Apr 27, 2023
@SorooshMani-NOAA SorooshMani-NOAA deleted the bugfix/examples branch July 5, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the example Notebooks binder friendly
3 participants