-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem specifying the encoding for itis retrievals #334
Comments
Hi @scelmendorf - Thanks for the question. Unfortunately, we have a mix of get verbose output itis_terms(query='Amara fulva', "scientific", curlopts=list(verbose=TRUE))
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Amara fulva
* Adding handle: conn: 0x7fec04061600
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 2 (0x7fec04061600) send_pipe: 1, recv_pipe: 0
* About to connect() to www.itis.gov port 80 (#2)
* Trying 137.227.231.25...
* Connected to www.itis.gov (137.227.231.25) port 80 (#2)
> GET /ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Amara%20fulva HTTP/1.1
Host: www.itis.gov
Accept: */*
< HTTP/1.1 200 OK
< Date: Sat, 06 Sep 2014 05:18:47 GMT
< Content-Type: application/xml;charset=UTF-8
< Transfer-Encoding: chunked
<
* Connection #2 to host www.itis.gov left intact
tsn author commonnames nameusage scientificname .attrs
1 110866 (O. Müller, 1776) <NA> valid Amara fulva ax21:SvcItisTerm Set encoding itis_terms(query='Amara fulva', "scientific", curlopts=list(encoding='UTF-8')) Set a timeout itis_terms(query='Amara fulva', "scientific", curlopts=list(timeout.ms=500))
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Amara fulva
Error in function (type, msg, asError = TRUE) :
Operation timed out after 806 milliseconds with 0 out of -1 bytes received You can search for available curl options by doing |
Yup Rstudio Version 0.98.1028 I still get the funny characters even when I set the encoding through curlopts, though. Any other ideas? Is this an rstudio problem? itis_terms(query='Amara fulva', "scientific", curlopts=list(encoding='UTF-8')) |
@scelmendorf What does R print out when you run |
locale: attached base packages: other attached packages: loaded via a namespace (and not attached): |
thanks @scelmendorf - What happens when you try the below. We use Install # install.packages(c("httr","XML")) # install them you don't already have these packages
library('taxize')
library('RCurl')
library('httr')
library('XML')
library('plyr') Define this function foo <- function(srchkey = NA, curlopts=list(), curl = getCurlHandle(), verbose=TRUE, which='httr')
{
url = "http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName"
args <- list()
if (!is.na(srchkey)) args$srchKey <- srchkey
if(which=='httr'){
tt <- GET(url, query=args, config=c(followlocation = 0L, curlopts))
out <- xmlParse(content(tt, as = "text"))
} else{
tt <- getForm(url, .params = args, .opts = c(curlopts, followlocation = 0L), curl = curl)
out <- xmlParse(tt)
}
namespaces <- c(namespaces <- c(ax21 = "http://data.itis_service.itis.usgs.gov/xsd"))
gg <- getNodeSet(out, "//ax21:itisTerms", namespaces = namespaces, xmlToList)
tmp <- do.call(rbind.fill, lapply(gg, function(x) data.frame(x, stringsAsFactors = FALSE)))
names(tmp) <- tolower(names(tmp))
row.names(tmp) <- NULL
tmp
} Try using foo(srchkey='Amara fulva', which='httr')
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Amara fulva
tsn author commonnames nameusage scientificname .attrs
1 110866 (O. Müller, 1776) true valid Amara fulva ax21:SvcItisTerm Try using foo(srchkey='Amara fulva', which='rcurl')
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Amara fulva
tsn author commonnames nameusage scientificname .attrs
1 110866 (O. Müller, 1776) true valid Amara fulva ax21:SvcItisTerm |
Hi Scott,
I commented the mssg line on and moved on, trying to see if I could skip over that and just use the rest. Now stuck on rbind.fill: Error in do.call(rbind.fill, lapply(gg, function(x) data.frame(x, stringsAsFactors = FALSE))) : Suggestions? And also – thanks SO MUCH for your help Sarah |
@scelmendorf sorry about that. I removed the |
Yup, already did that just did not copy all the way up the screen. If this helps:
From: Scott Chamberlain [mailto:notifications@github.com] @scelmendorfhttps://github.com/scelmendorf sorry about that. I removed the mssg thing. Make sure to load those packages above the fxn definition before trying the foo() function. — |
Did |
I updated the script, I think you don't get updates to comments in your email... |
Aha, yes I did not see the update, but now see it on github. So now the function runs ☺. But it unfortunately doesn’t solve the funny character issues. Maybe I should try a different R version or not use Rstudio? Or I could try it on linux, maybe this is windows problem??
From: Scott Chamberlain [mailto:notifications@github.com] I updated the script, I think you don't get updates to comments in your email... — |
FYI - your original function works just fine for me on a linux server. This may be the fastest fix. |
hey @gavinsimpson - I'm lost on this encoding thing. Do you know what a global solution is for windows users for special characters? e.g. See e..g, #334 (comment) |
@cboettig any thoughts on how to fix character encoding problems on windows? |
@sckott Looks like this is probably due to the user's locale settings supporting only ascii characters. On a linux (& probably Mac) machine one would do: Sys.setlocale("LC_ALL", 'en_US.UTF-8') On a Windows machine it looks like the locale might be set by: Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252") but not totally sure -- the sessionInfo() suggests that collate is already using that. Scott, you could try |
@cboettig thx, i'll give that a try |
You can use iconv() to convert strings to current locale of user if I understand the problem? |
thanks @cboettig and @gavinsimpson I tried both setting locale in Windows, using taxize::itis_terms(query='Amara fulva', "scientific", curlopts=list(encoding='UTF-8')) tsn author commonnames nameusage scientificname .attrs
1 110866 (O. Müller, 1776) true valid Amara fulva ax21:SvcItisTerm where the Stepping through the code, its all fine until I get to this line https://github.com/ropensci/taxize/blob/master/R/itis.R#L667-L668 in the function where |
@scelmendorf Is this still a problem for you? |
I gave up on trying it on windows and ran them all on linux because none of the fixes seemed to work. So yes, I think it's still broken unless you've done a patch, I haven't updated my taxize recently. |
Okay, thanks for getting back so quick. Sorry I haven't fixed this yet. I just haven't been able to figure this out. I'll keep at it. |
btw - in trying to borrow your code but tweak it to get synonyms from col, I may(??) have figured out a part of the encoding problem. I think it's not getting set right in getURL, but that's possible to skip? see example: idnum<-1412627 #option 1 - use rcurl #option 2 - skip the rcurl step parsecoldata_syn <- function(x){ syn1<-plyr::ldply(nodes1[[1]]$synonyms, parsecoldata_syn) |
hi @scelmendorf !
If doable, we can try to add in COL as a source to the |
@scelmendorf have you reinstalled since the newer CRAN version from 19 Dec '14 http://cran.rstudio.com/web/packages/taxize/ or a newer Github version? Try reinstalling from Github: |
I have taxize_0.5.2, still having this result: But I will try the github version. |
And yes – I would definitely use synonyms from COL, itis isn’t super comprehensive for some taxa. But I might be in the minority. From: Scott Chamberlain [mailto:notifications@github.com] hi @scelmendorfhttps://github.com/scelmendorf ! in trying to borrow your code but tweak it to get synonyms from col If doable, we can try to add in COL as a source to the synonyms() function. — |
Github version works:
Thanks! |
@scelmendorf great! glad it works now. Should have synonyms for COL up for you to try soon. |
If you really want to be a magical unicorn about my synonyms in COL problem, do you want to figure out how to grab the authorship for the synonyms while you are at it?? - I am having some probs, in particular when it has ampersands in it and is therefore character escaped (example (see Ornithodoros lagophilus): <![CDATA[Philip, Bell & Larson, 1956]]. the xmlTreeParse seems to read this as null. |
@scelmendorf that should work with synonyms("Ornithodoros lagophilus", db = "col") $`Ornithodoros lagophilus`
id name rank name_status genus species infraspecies author
1 1412150 Ornithodoros lagophilus Species synonym Ornithodoros lagophilus Philip, Bell & Larson, 1956
url
1 http://www.catalogueoflife.org/col/details/species/id/1412297/synonym/1412150 |
Perfect, thanks! I hadn’t tried it in the github version. From: Scott Chamberlain [mailto:notifications@github.com] @scelmendorfhttps://github.com/scelmendorf that should work with synonyms. e.g. synonyms("Ornithodoros lagophilus", db = "col") $
1 1412150 Ornithodoros lagophilus Species synonym Ornithodoros lagophilus Philip, Bell & Larson, 1956
1 http://www.catalogueoflife.org/col/details/species/id/1412297/synonym/1412150 — |
well, it's just there now, in the last commmit, so reinstall |
related: col_synonyms I think needs one more encoding statement. Line 136 of synonyms.R, if you change:
I think it fixes it.
after
|
@scelmendorf thanks for that, try again after reinstlal |
Trying to get the authorship info for a taxon, but I'm running into problems with the special characters. Example:
myProblem<-itis_terms(query='Amara fulva', "scientific")
myProblem$author
I poked a bit and think I should be able to pass arguments to curl, but clearly I'm not doing this correctly.
Ideas? Or would it make sense to set the encoding for calls to itis to whatever itis.gov's encoding is (I actually couldn't figure this out from their website, but some trial and error might do the trick)
doesNotFixMyProblem<-itis_terms(query='Amara fulva', "scientific", curlopts=(list(.encoding='UTF-8')))
ideas?
The text was updated successfully, but these errors were encountered: