Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl error with readNWMdata #7

Open
psavoy-usgs opened this issue May 17, 2023 · 11 comments
Open

curl error with readNWMdata #7

psavoy-usgs opened this issue May 17, 2023 · 11 comments

Comments

@psavoy-usgs
Copy link

I've previously used the package to download >10,000 reaches of data without issue. However, now readNWMdata gives me the following error.

readNWMdata(comid = 17595383)

Note:Caching=1
Error:curl error: SSL peer certificate or SSH remote key was not OK
curl error details:
Warning:oc_open: Could not read url
Error in open.nc(call.meta$url[1]) : NetCDF: I/O failure

I suspected it was perhaps an issue with my R version so just updated R, Rstudio, and all packages but the issue persists. The only other thing I could think of is that the thredds url has changed again.

System details
R version: 4.2.3
Rtools version 4.2
curl version 5.0.0

@mikejohnson51
Copy link
Owner

Hi @psavoy-usgs,

I am not seeing this here:

library(nwmTools)

xx = readNWMdata(comid = 17595383)

plot(xx$dateTime, xx$flow_cms_v2.1, type = "l")

Created on 2023-07-12 by the reprex package (v2.0.1)

Are you still getting this error?

Thanks!

@psavoy-usgs
Copy link
Author

psavoy-usgs commented Jul 13, 2023

@mikejohnson51 Yes I am still having this issue and have since updated R, Rtools, and Rstudio again but the issue remains. I am honestly not sure what might be causing the issue unless there is a versioning issue with some dependencies. I could see if I can replicate this error on my personal computer to perhaps isolate what is causing the issue on my work computer if that would be helpful. Here is my output from sessionInfo() , and I will clarify that I am also running Rtools 4.3.

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] nwmTools_0.0.4

loaded via a namespace (and not attached):
[1] utf8_1.2.3 generics_0.1.3 tidyr_1.3.0 class_7.3-22 xml2_1.3.5
[6] KernSmooth_2.23-22 magrittr_2.0.3 grid_4.3.0 timechange_0.2.0 rprojroot_2.0.3
[11] jsonlite_1.8.7 dataRetrieval_2.7.12 processx_3.8.2 zip_2.3.0 pkgbuild_1.4.2
[16] e1071_1.7-13 DBI_1.1.3 ps_1.7.5 httr_1.4.6 rvest_1.0.3
[21] purrr_1.0.1 fansi_1.0.4 pbapply_1.7-2 codetools_0.2-19 cli_3.6.1
[26] RNetCDF_2.6-2 rlang_1.1.1 crayon_1.5.2 units_0.8-2 remotes_2.4.2
[31] RANN_2.6.1 tools_4.3.0 fst_0.9.8 parallel_4.3.0 fstcore_0.9.14
[36] dplyr_1.1.2 curl_5.0.1 vctrs_0.6.3 nhdplusTools_0.6.2 R6_2.5.1
[41] proxy_0.4-27 lifecycle_1.0.3 lubridate_1.9.2 classInt_0.4-9 pkgconfig_2.0.3
[46] desc_1.4.2 callr_3.7.3 terra_1.7-39 pillar_1.9.0 glue_1.6.2
[51] Rcpp_1.0.11 sf_1.0-14 tibble_3.2.1 tidyselect_1.2.0 rstudioapi_0.15.0
[56] compiler_4.3.0 prettyunits_1.1.1

@psavoy-usgs
Copy link
Author

I have asked several colleagues to run the code you provided and I think I am quite certain that the issue originated once I switched to R version 4.2. I am not sure of the root issue, but everyone on prior installations was able to run the code and everyone on 4.2 or later encountered the same error as myself.

@mikejohnson51
Copy link
Owner

Interesting! Are they all on Windows systems? I am on 4.2.1 with a Mac and things are working.

@psavoy-usgs
Copy link
Author

Interesting! Are they all on Windows systems? I am on 4.2.1 with a Mac and things are working.

So I think that may be the issue, I have encountered other issues with curl and OS. We were all on windows machines but several of us have run into this kind of issue where it could not be reproduced on Mac/linux due to some interaction with how systems use curl. I can do some more digging to see if I can find similar examples.

@psavoy-usgs
Copy link
Author

If it is useful I just checked my machine from the command line and have curl 8.0.1 and libcurl 8.0.1. I think there are instances that demonstrate curl behaves differently on Windows and Mac OS, but also just trying to rule out more obvious things like different system versions of curl.

@mikejohnson51
Copy link
Owner

@program--, do you have any thoughts on this Windows/R Version/curl issue?

@program--
Copy link

program-- commented Jul 13, 2023

@psavoy-usgs Are you by chance using a proxy or VPN? The error:

Error:curl error: SSL peer certificate or SSH remote key was not OK

This error can happen if a proxy/VPN is handling SSL/TLS termination. One thing you could try is setting the environment variable CURLOPT_SSL_VERIFYPEER to 0 and then rerunning the code to see if it works then (or at least gives a different error).

In R you can do that like this:

Sys.setenv(CURLOPT_SSL_VERIFYPEER = 0)

Warning: This is not a permanent solution, and it's not advised to use this in any production system due to security issues.

Additionally, R 4.2.2 introduced a bug fix for curl revocation checks that might give some info:

On Windows, environment variable R_LIBCURL_SSL_REVOKE_BEST_EFFORT can be
used to switch to only ‘best-effort’ SSL certificate revocation checks with the de-
fault "libcurl" download method. This reduces security, but may be needed for
downloads to work with MITM proxies (PR#18379)

(from R release notes)


If that doesn't give an indication to the issue, then could you try enabling curl verbosity with:

Sys.setenv(CURLOPT_VERBOSE = 1)

and appending the output of the code after enabling that to this thread?

@psavoy-usgs
Copy link
Author

psavoy-usgs commented Jul 13, 2023

@program-- Thanks for the useful information. Since I am working from a government computer I do not want to mess with anything that results in less secure connections and draw the ire of the IT department. Your point about the 4.2.2 bugfix makes sense with the timing when this issue arose for me and agrees with other colleagues that were or were not able to run the code. I tried running things again with the verbose settings and this is what I have:

  • Trying 137.227.231.111...
  • TCP_NODELAY set
  • Connected to cida.usgs.gov (137.227.231.111) port 443 (#0)
  • ALPN, offering http/1.1
  • SSL certificate problem: self signed certificate in certificate chain
  • Closing connection 0
    Error:curl error: SSL peer certificate or SSH remote key was not OK
    curl error details:
    Warning:oc_open: Could not read url
    Error in open.nc(meta.obj$url) : NetCDF: I/O failure

I consistently pull a lot of data so I am not sure specifically what the culprit is for this issue with this package. I tried this both on and off a VPN and get the same error regardless.

@program--
Copy link

program-- commented Jul 13, 2023

The verbose message:

SSL certificate problem: self signed certificate in certificate chain

implies (but doesn't necessarily confirm) there is something responding with a certificate that shadows the SSL cert of cida.usgs.gov (if there is a MITM proxy, then this typically would).

My best guess is that this is something you'd need to inquire your IT department about, since if your GFE is configured with a proxy, then IT should've ensured that it's responding certificates are trusted on the client.

One more test you could do, if you're able to, is try the same code on a non-government computer, since if that works then the issue isn't GFE-specific. Alternatively, reverting to R 4.2.1 might work?


From an IT perspective as well: I think that enabling R_LIBCURL_SSL_REVOKE_BEST_EFFORT should be safe, assuming the returned certificate is in fact from a trusted proxy and not a malicious one. The biggest security concern is when the certificate is being bypassed to access sensitive information. Though, take this with a grain of salt since I don't know how your GFE is governed on your IT dept's side.


EDIT: I do agree that it is weird though that it seems to manifest primarily when using this package. If you try to do a GET request on a different thredds server being hosted somewhere else, I wonder if it would give the same issue.

@csimeone-usgs
Copy link

@psavoy-usgs Did you ever get this issue solved? I'm running into the same issue running this from a USGS machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants