-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recovering from apparent server errors when using itis_terms with many names #271
Comments
hi @chuckrp - looking into this now. The first answer I can think of off the top is that ITIS is slower than most web APIs out there, but perhaps there is something more... |
@chuckrp Can you post some code you've used and the errors you're getting so I can better address your problem? |
Scott, For example:
This submits 300 names from df of >38000 names. As far as error message, that will have to wait until it happens again, as I didn't write it down. But, basically it is the server returned empty thing. I can't submit jobs from work, so when I get home I'll send you a copy of the actual error message when it occurs again. Thanks for looking into this, From: Scott Chamberlain notifications@github.com @chuckrp Can you post some code you've used and the errors you're getting so I can better address your problem? |
Okay, let me know what that error is when you get a chance. |
Hi there. I'm curious what data you are after. Perhaps you can use a function to get to the data you want that may be more specific. the ITIS terms service gets a whole bunch of data, which you may not need all of? One thing you can change is that you don't need to pass that third parameter in. You are passing in the name of the function getitistermsfromscientificname to the itis_terms('Helianthus annuus', what = "scientific", list(verbose=TRUE)) Which will print out the hairy details of the curl process to request data from the ITIS server http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Helianthus annuus
* Adding handle: conn: 0x7fa92eb7ae00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 58 (0x7fa92eb7ae00) send_pipe: 1, recv_pipe: 0
* About to connect() to www.itis.gov port 80 (#58)
* Trying 137.227.231.25...
* Connected to www.itis.gov (137.227.231.25) port 80 (#58)
> GET /ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Helianthus%20annuus HTTP/1.1
Host: www.itis.gov
Accept: */*
< HTTP/1.1 200 OK
< Date: Wed, 16 Apr 2014 16:42:08 GMT
< Content-Type: application/xml;charset=UTF-8
< Transfer-Encoding: chunked
<
* Connection #58 to host www.itis.gov left intact
tsn author commonnames commonnames.1 commonnames.2 commonnames.3 nameusage
1 36616 L. annual sunflower sunflower wild sunflower common sunflower accepted
2 525928 (Heiser) Heiser true <NA> <NA> <NA> not accepted
3 525929 (Douglas ex Lindl.) Cockerell true <NA> <NA> <NA> not accepted
4 525930 Heiser true <NA> <NA> <NA> not accepted
5 536095 (Douglas ex Lindl.) Steyerm. true <NA> <NA> <NA> not accepted
6 536096 (DC.) Cockerell true <NA> <NA> <NA> not accepted
7 536097 (Heiser) Shinners true <NA> <NA> <NA> not accepted
scientificname .attrs
1 Helianthus annuus ax21:SvcItisTerm
2 Helianthus annuus ssp. jaegeri ax21:SvcItisTerm
3 Helianthus annuus ssp. lenticularis ax21:SvcItisTerm
4 Helianthus annuus ssp. texanus ax21:SvcItisTerm
5 Helianthus annuus var. lenticularis ax21:SvcItisTerm
6 Helianthus annuus var. macrocarpus ax21:SvcItisTerm
7 Helianthus annuus var. texanus ax21:SvcItisTerm |
What I need from itis_terms are the tsn, author, and common name, as these three items are often blank in the database I am working with. So the one-function-does-all nature of itis_terms seems a bonus. Is it possible that requesting this information in steps through two or three functions instead all at once is more efficient? Not using the third parameter getitistermsfromscientificnameis reasonable, and I always wondered why the documentation seems to indicate that it is required. I have just tried it as you suggested and it works fine. Of course, ITIS always works better on weekends. From: Scott Chamberlain notifications@github.com Hi there. I'm curious what data you are after. Perhaps you can use a function to get to the data you want that may be more specific. the ITIS terms service gets a whole bunch of data, which you may not need all of? |
Hey there again @chuckrp - Been playing around today with the Don't know what you do, but this workflow works well for me library("taxize")
library("plyr")
spp <- names_list("species", 500) # get a set of 500 random plant species names
out <- itis_terms(spp, what = "scientific") # search
out <- out[!is.na(out)] # remove elements that are NA
df <- do.call(rbind.fill, out) # combine to data.frame
head(df) # first 6 rows Curious about something. Notice that if there is more than 1 common name, we add additional columns. We could instead make that a single column with the names comma separated - then you'd just need to do something like |
Scott, The time of day as well as the day of the week do make a difference. Running just 300 at a time usually works without an error, but even so I occasionally get an empty return from the server. Putting the common names in a single column sounds like a nice approach. Currently, I ask the user to select one off the list, which is limiting in more ways than one. Our database is set up to allow multiple common names for any species, so the single column approach would simplify things from my perspective. From: Scott Chamberlain notifications@github.com Hey there again @chuckrp - |
@chuckrp Great, I'll get that change done for As for the errors, I think it's a problem with ITIS's servers. I'll get in touch with them and see if there's anything in the works. |
hi @chuckrp Can you please try reinstalling from github, and trying a call with say 400 or 500 species names. and see if you get any errors. The error we were getting the internet tells me can often be related to getting a 300 series http status code, in which case adding the curl option I've tested the new code with a few rounds of passing in 800 species each, and no errors for me. Hopefully you won't get any either! |
Very good. Apparently taxize doesn't work with R version 3.1.0 (2014-04-10) -- "Spring Dance" hi @chuckrp Can you please try reinstalling from github, and trying a call with say 400 or 500 species names. and see if you get any errors. |
@chuckrp Did you have success? Any errors? |
Somehow I messed things up. When I tried to download taxize from GitHub I got version 0.0.5. I have not been able to get to the new version. From: Scott Chamberlain notifications@github.com Did you have success? Any errors? |
Sorry about that. If you do |
I put up an OSX binary and source here https://github.com/ropensci/taxize/releases/tag/v0.2.4 WIndows coming soon |
i submitted 888 names, from waterbears to slime molds to plants and no errors. It took a long time, but my work computer has a poor connection anyway. I assume your fix has done the job. Thanks. From: Scott Chamberlain notifications@github.com Sorry about that. If you do devtools::install_github("ropensci/taxize") you should get the latest version 0.2.4. If that doesn't work, let me know and I can put up some binaries for you. |
Great! Let us know of any other errors you get. |
Hi, I've been running my list of names through itis_terms with reasonable success, as long as I keep each attempt to no more than 500 names. Longer lists rarely work. They generate the same set of errors as before, including empty return from server, timeout error, and endless nothing. In addition, an error I hadn't seen previously appeared once: Error in function (type, msg, asError = TRUE) : This came in the midst of a long run of many names, so it had worked fine for at least 100+ names when the error appeared. From: Scott Chamberlain notifications@github.com Great! Let us know of any other errors you get. |
Sorry about this continued problem @chuckrp I'll look into this problem. I'll see if I can replicate this issue so that I can see how to fix it. No clue right now other than occasionally having internet be lost, but it sounds like you have a reliable internet connection. |
Hi @chuckrp I haven't forgotten about this : still trying to figure this out |
Moved this to milestone 5 for now, haven't had time to track down this problem |
Hi @sckott, It's not especially related to the ITIS servers. I'm using those arguments: The server returns: Thanks for this very useful package ! |
Thanks for the report @SteveViss ! I'll look into that. I imagine that this is a server error from TNRS (from http://taxosaurus.org/). But I'll look into that and see if there is a solution. |
Hey @SteveViss I've tried |
Thx @sckott, I tried to run a script with the data set divided into smaller data frames (about 200 species). I did run the function in parrallel on 10 nodes. This is the output: Error in checkForRemoteErrors (val):
9 nodes Produced errors; first error: Empty reply from server |
Sorry about this @SteveViss - Can you tell me what version of |
No prob @sckott, > packageVersion("taxize")
[1] ‘0.3.0’ |
Hmmm, I'm not sure what the problem could be. Maybe I can use exactly the species list you have. Can you share that with me? Email? Just the species list. I'll try the code on a few other machines to see if I can replicate the behavior. |
@SteveViss My email is scott@ropensci.org |
Thx, I'm sending you the dataset I'm using. |
@SteveViss got it, thx, running now |
Hi @sckott, Ok, I find the prob and that's coming from the remote server. When I'm looking for the "accepted name" in the db iPlant_TNRS, I searched with an id in pasting 4 different fields: genus, species, subspecies and variety (available in the FIA). When I used only the genus and species to create this id, I got no errors. |
Hi @SteveViss - Okay, glad you were able to fix the problem. I'm not sure why that would fix it, but glad it did. You may want to strip the white space from those names too, not a big deal. You could use |
Hi @chuckrp I just tried |
closing for now |
Scott, My goal was to check the status of, get TSNs for, and find authors and common names where available for 40,328 names from a biodiversity database I am involved with. Because of the number of names, I split the list into 50 submissions of 800 and 1 submission of 328 names. When submitted to ITIS, I received the "server returns empty" error at a rate of about every third or fourth 800 name submission. Much better than I expected, actually. Resubmitting from each stop point allowed me to eventually get all the names processed. I followed the same procedure to submitthe names to GBIF and EOL. GBIF responded without any problems at all. EOL also responded well, for the most part. However, EOL did choke on a few names, stopping when they were submitted. It seemed to be an issue with non-standard characters within some of the names, probably a result of original data-entry errors. Removing those particular entries resulted in good returns. GBIF apparently just As far as the returns are concerned, each group of 800 names returned as many as 1,700,000 records. I was astounded. However, after removing hundreds of thousands of duplicates each group of 800 names resulted in 2,000-5,000 unique names. ITIS provided common names for about 5,000 species. On the other hand, when compared with my original list, ITIS matched 24,328, EOL matched 34,041, and GBIF matched 35,420. This does not account for valid/accepted/synonym and so on. I would like to make a request regarding sci2comm. Namely, to be able to run it with ask=FALSE and have it return all found common names for each taxon. Sitting at the monitor while thousands of names are processed is not a viable option for much of what I do. Plus, there is nothing wrong with multiple common names. I noted that ITIS allows for at least as many as 39. If you wish, I can submit this through GitHub to get it into the system. Taxize is a great tool, based on these results. I realize it may not be the On Friday, November 28, 2014 9:55 AM, Scott Chamberlain notifications@github.com wrote: Closed #271. |
hi @chuckrp Congrats on retiring! Sorry about the troubles with ITIS. Are you interested in trying local querying of ITIS data? I am working on a development version of taxize that works with local copies of databases, so far ITIS, NCBI, COL, and Theplantlist. If so, let me know and I'll help you get that set up. For get_tsn_('Poa an') gives $`Poa an`
tsn scientificname commonnames nameusage
2 41107 Poa annua annual bluegrass accepted
5 784054 Nicoraepoa andina <NA> accepted
6 784648 Poa anae <NA> accepted
7 784649 Poa androgyna <NA> accepted Without any prompts. The normal function Unfortuantely, you can't pass that directly to res <- get_tsn_('Poa an')
sci2comm(as.tsn(res$`Poa an`$tsn)) gives $`41107`
[1] "annual blue grass" "walkgrass" "annual bluegrass"
$`784054`
[1] NA
$`784648`
[1] NA
$`784649`
[1] NA thx for the kind works, glad it's been useful! |
taxize is terrific, very useful. However, when I use itis_terms with multiple names (300+) I frequently have problems. Three things tend to happen which I am unable to recover from. The most common is an error message that the server return was empty. That's not too bad as I can resubmit until it finally runs. A very rare problem is a time out when the server does not respond appropriately. A serious and not uncommon problem is that the process hangs. When this happens my only recourse is to force quit R, which is never a good option.
I assume all of these are server side errors. However, it would be nice if taxize could catch at least the server returned empty error and elegantly resolve or bypass the problem. Also, a few days ago there was a post about another taxize function that had a problem with large numbers of names, and a solution was presented. I am unable to relocate that issue, but perhaps that solution would help clear up this problem with itis_terms.
Chuck
The text was updated successfully, but these errors were encountered: