Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovering from apparent server errors when using itis_terms with many names #271

Closed
chuckrp opened this issue Apr 14, 2014 · 38 comments
Closed
Labels
Milestone

Comments

@chuckrp
Copy link

chuckrp commented Apr 14, 2014

taxize is terrific, very useful. However, when I use itis_terms with multiple names (300+) I frequently have problems. Three things tend to happen which I am unable to recover from. The most common is an error message that the server return was empty. That's not too bad as I can resubmit until it finally runs. A very rare problem is a time out when the server does not respond appropriately. A serious and not uncommon problem is that the process hangs. When this happens my only recourse is to force quit R, which is never a good option.

I assume all of these are server side errors. However, it would be nice if taxize could catch at least the server returned empty error and elegantly resolve or bypass the problem. Also, a few days ago there was a post about another taxize function that had a problem with large numbers of names, and a solution was presented. I am unable to relocate that issue, but perhaps that solution would help clear up this problem with itis_terms.

Chuck

@chuckrp chuckrp closed this as completed Apr 14, 2014
@chuckrp chuckrp reopened this Apr 14, 2014
@sckott
Copy link
Contributor

sckott commented Apr 14, 2014

hi @chuckrp - looking into this now. The first answer I can think of off the top is that ITIS is slower than most web APIs out there, but perhaps there is something more...

@sckott
Copy link
Contributor

sckott commented Apr 14, 2014

@chuckrp Can you post some code you've used and the errors you're getting so I can better address your problem?

@chuckrp
Copy link
Author

chuckrp commented Apr 14, 2014

Scott,

For example:

nms14501.14800 <- itis_terms(manynames[14501:14800,1], what='scientific', getitistermsfromscientificname)

This submits 300 names from df of >38000 names. As far as error message, that will have to wait until it happens again, as I didn't write it down. But, basically it is the server returned empty thing. I can't submit jobs from work, so when I get home I'll send you a copy of the actual error message when it occurs again.

Thanks for looking into this,
Chuck


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Monday, April 14, 2014 10:28 AM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

@chuckrp Can you post some code you've used and the errors you're getting so I can better address your problem?

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 14, 2014

Okay, let me know what that error is when you get a chance.

@chuckrp
Copy link
Author

chuckrp commented Apr 15, 2014

Unfortunately, that did not take long. But, as an added bonus the following also shows another error that I had seen once before but forgotten about.

"> nms15501.15800 <- itis_terms(query = manynames[15501:15800,1], what = "scientific", getitistermsfromscientificname)
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina lowensis
Error in curlPerform(url = url, curl = curl, .opts = .opts) : 
  attempt to apply non-function

 nms15501.15800 <- itis_terms(query = manynames[15501:15800,1], what = "scientific", getitistermsfromscientificname)
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina lowensis
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina loxa
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina macdunnoughi
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina magnifica
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina mayelisaria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina nephos
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina nota
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina ochrofuscaria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina platia
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina spaldingata
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucina utahensis
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucodontia pyraustoides
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucolepis saccharella
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucomys sabrinus coloratus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucomys sabrinus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucomys volans
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucopsyche lygdamus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glaucopsyche piasus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glechoma hederacea
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gleditsia triacanthos
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena arcana
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena cognataria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena cribrataria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena furfuraria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena grisearia
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena interpunctata
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena macdunnougharia
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena nigricaria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena plumosaria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glena quinquelinearia
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenognatha emertoni
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenognatha foxi
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenognatha heleios
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenognatha iviei
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenoides lenticuligera
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenoides texanaria
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glenurus gratus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gleosoma
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gleosporium acutatum
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliocladium album
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliocladium cylindrosporium
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliomastix murorum
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliophorus laetus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliophorus luteolaetus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliophorus psittacinus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Gliophorus unguinosus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glipodes sericans
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glipostenoda ambusta
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus confluentus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus fasciatus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus obtusus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus quadrisignatus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus sanguinolentus rubromaculata
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus sanguinolentus sanguinolentus
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Glischrochilus sanguinolentus
Error in function (type, msg, asError = TRUE)  : Empty reply from server
 "


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Monday, April 14, 2014 7:17 PM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

Okay, let me know what that error is when you get a chance.

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 16, 2014

Hi there. I'm curious what data you are after. Perhaps you can use a function to get to the data you want that may be more specific. the ITIS terms service gets a whole bunch of data, which you may not need all of?

One thing you can change is that you don't need to pass that third parameter in. You are passing in the name of the function getitistermsfromscientificname to the ... in the itis_terms function call, but the ... is for passing on curl options to the getitistermsfromscientificname, or getitisterms, or getitistermsfromcommonname functions that are used in the itis_terms function. E.g., you could do

itis_terms('Helianthus annuus', what = "scientific", list(verbose=TRUE))

Which will print out the hairy details of the curl process to request data from the ITIS server

http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Helianthus annuus
* Adding handle: conn: 0x7fa92eb7ae00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 58 (0x7fa92eb7ae00) send_pipe: 1, recv_pipe: 0
* About to connect() to www.itis.gov port 80 (#58)
*   Trying 137.227.231.25...
* Connected to www.itis.gov (137.227.231.25) port 80 (#58)
> GET /ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Helianthus%20annuus HTTP/1.1
Host: www.itis.gov
Accept: */*

< HTTP/1.1 200 OK
< Date: Wed, 16 Apr 2014 16:42:08 GMT
< Content-Type: application/xml;charset=UTF-8
< Transfer-Encoding: chunked
< 
* Connection #58 to host www.itis.gov left intact
     tsn                        author      commonnames commonnames.1  commonnames.2    commonnames.3    nameusage
1  36616                            L. annual sunflower     sunflower wild sunflower common sunflower     accepted
2 525928               (Heiser) Heiser             true          <NA>           <NA>             <NA> not accepted
3 525929 (Douglas ex Lindl.) Cockerell             true          <NA>           <NA>             <NA> not accepted
4 525930                        Heiser             true          <NA>           <NA>             <NA> not accepted
5 536095  (Douglas ex Lindl.) Steyerm.             true          <NA>           <NA>             <NA> not accepted
6 536096               (DC.) Cockerell             true          <NA>           <NA>             <NA> not accepted
7 536097             (Heiser) Shinners             true          <NA>           <NA>             <NA> not accepted
                       scientificname           .attrs
1                   Helianthus annuus ax21:SvcItisTerm
2      Helianthus annuus ssp. jaegeri ax21:SvcItisTerm
3 Helianthus annuus ssp. lenticularis ax21:SvcItisTerm
4      Helianthus annuus ssp. texanus ax21:SvcItisTerm
5 Helianthus annuus var. lenticularis ax21:SvcItisTerm
6  Helianthus annuus var. macrocarpus ax21:SvcItisTerm
7      Helianthus annuus var. texanus ax21:SvcItisTerm

@sckott sckott added this to the v0.3 milestone Apr 16, 2014
@chuckrp
Copy link
Author

chuckrp commented Apr 20, 2014

What I need from itis_terms are the tsn, author, and common name, as these three items are often blank in the database I am working with. So the one-function-does-all nature of itis_terms seems a bonus. Is it possible that requesting this information in steps through two or three functions instead all at once is more efficient?

Not using the third parameter getitistermsfromscientificnameis reasonable, and I always wondered why the documentation seems to indicate that it is required. I have just tried it as you suggested and it works fine. Of course, ITIS always works better on weekends.


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Wednesday, April 16, 2014 12:45 PM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

Hi there. I'm curious what data you are after. Perhaps you can use a function to get to the data you want that may be more specific. the ITIS terms service gets a whole bunch of data, which you may not need all of?
One thing you can change is that you don't need to pass that third parameter in. You are passing in the name of the function getitistermsfromscientificname to the ... in the itis_terms function call, but the ... is for passing on curl options to the getitistermsfromscientificname, or getitisterms, or getitistermsfromcommonname functions that are used in the itis_terms function. E.g., you could do
itis_terms('Helianthus annuus', what = "scientific", list(verbose=TRUE))
Which will print out the hairy details of the curl process to request data from the ITIS server
http://www.itis.gov/ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Helianthus annuus * Adding handle: conn: 0x7fa92eb7ae00 * Adding handle: send: 0 * Adding handle: recv: 0 * Curl_addHandleToPipeline: length: 1 * - Conn 58 (0x7fa92eb7ae00) send_pipe: 1, recv_pipe: 0 * About to connect() to www.itis.gov port 80 (#58) * Trying 137.227.231.25... * Connected to www.itis.gov (137.227.231.25) port 80 (#58) > GET /ITISWebService/services/ITISService/getITISTermsFromScientificName?srchKey=Helianthus%20annuus HTTP/1.1 Host: www.itis.gov Accept: / < HTTP/1.1 200 OK < Date: Wed, 16 Apr 2014 16:42:08 GMT < Content-Type: application/xml;charset=UTF-8 < Transfer-Encoding: chunked < * Connection #58 to host www.itis.gov left intact tsn author commonnames commonnames.1 commonnames.2 commonnames.3 nameusage 1 36616 L. annual sunflower sunflower wild sunflower common sunflower accepted 2 525928 (Heiser) Heiser true not accepted
3 525929 (Douglas ex Lindl.) Cockerell true not accepted 4 525930 Heiser true not accepted 5 536095 (Douglas ex Lindl.) Steyerm. true not accepted 6 536096 (DC.) Cockerell true not accepted 7 536097 (Heiser) Shinners true not accepted scientificname .attrs 1 Helianthus annuus ax21:SvcItisTerm 2 Helianthus annuus ssp. jaegeri ax21:SvcItisTerm 3 Helianthus annuus ssp. lenticularis ax21:SvcItisTerm 4 Helianthus annuus ssp. texanus ax21:SvcItisTerm 5 Helianthus annuus var. lenticularis ax21:SvcItisTerm 6 Helianthus annuus var. macrocarpus ax21:SvcItisTerm 7 Helianthus annuus var. texanus ax21:SvcItisTerm

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 21, 2014

Hey there again @chuckrp -

Been playing around today with the itis_terms function. I passed in a set of 400 names, and a separate set of 500 names, and got no errors. But like you said it's probably better on the weekends - likely fewer computers requesting data.

Don't know what you do, but this workflow works well for me

library("taxize")
library("plyr")
spp <- names_list("species", 500) # get a set of 500 random plant species names
out <- itis_terms(spp, what = "scientific") # search
out <- out[!is.na(out)] # remove elements that are NA
df <- do.call(rbind.fill, out) # combine to data.frame
head(df) # first 6 rows

Curious about something. Notice that if there is more than 1 common name, we add additional columns. We could instead make that a single column with the names comma separated - then you'd just need to do something like strsplit(column, ",") to split apart the names in any one cell in the data.frame. Thoughts on which you prefer?

@chuckrp
Copy link
Author

chuckrp commented Apr 22, 2014

Scott,

The time of day as well as the day of the week do make a difference. Running just 300 at a time usually works without an error, but even so I occasionally get an empty return from the server.

Putting the common names in a single column sounds like a nice approach. Currently, I ask the user to select one off the list, which is limiting in more ways than one. Our database is set up to allow multiple common names for any species, so the single column approach would simplify things from my perspective.


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Sunday, April 20, 2014 9:33 PM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

Hey there again @chuckrp -
Been playing around today with the itis_terms function. I passed in a set of 400 names, and a separate set of 500 names, and got no errors. But like you said it's probably better on the weekends - likely fewer computers requesting data.
Don't know what you do, but this workflow works well for me
library("taxize") library("plyr") spp <- names_list("species", 500) # get a set of 500 random plant species names out <- itis_terms(spp, what = "scientific") # search out <- out[!is.na(out)] # remove elements that are NA df <- do.call(rbind.fill, out) # combine to data.frame head(df) # first 6 rows
Curious about something. Notice that if there is more than 1 common name, we add additional columns. We could instead make that a single column with the names comma separated - then you'd just need to do something like strsplit(column, ",") to split apart the names in any one cell in the data.frame. Thoughts on which you prefer?

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 22, 2014

@chuckrp Great, I'll get that change done for itis_terms.

As for the errors, I think it's a problem with ITIS's servers. I'll get in touch with them and see if there's anything in the works.

@sckott
Copy link
Contributor

sckott commented Apr 22, 2014

hi @chuckrp Can you please try reinstalling from github, and trying a call with say 400 or 500 species names. and see if you get any errors.

The error we were getting the internet tells me can often be related to getting a 300 series http status code, in which case adding the curl option -L (aka --location; for followlocation) can often help.

I've tested the new code with a few rounds of passing in 800 species each, and no errors for me. Hopefully you won't get any either!

@chuckrp
Copy link
Author

chuckrp commented Apr 23, 2014

Very good.

Apparently taxize doesn't work with R version 3.1.0 (2014-04-10) -- "Spring Dance"
I'll reinstall 3.0 and then retry taxize
On Tuesday, April 22, 2014 5:15 PM, Scott Chamberlain notifications@github.com wrote:

hi @chuckrp Can you please try reinstalling from github, and trying a call with say 400 or 500 species names. and see if you get any errors.
The error we were getting the internet tells me can often be related to getting a 300 series http status code, in which case adding the curl option -L (aka --location; for followlocation) can often help.
I've tested the new code with a few rounds of passing in 800 species each, and no errors for me. Hopefully you won't get any either!

Reply to this email directly or view it on GitHub.

@chuckrp chuckrp closed this as completed Apr 23, 2014
@sckott
Copy link
Contributor

sckott commented Apr 23, 2014

@chuckrp Did you have success? Any errors?

@chuckrp
Copy link
Author

chuckrp commented Apr 24, 2014

Somehow I messed things up. When I tried to download taxize from GitHub I got version 0.0.5. I have not been able to get to the new version.


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Wednesday, April 23, 2014 2:59 PM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

Did you have success? Any errors?

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 24, 2014

Sorry about that. If you do devtools::install_github("ropensci/taxize") you should get the latest version 0.2.4. If that doesn't work, let me know and I can put up some binaries for you.

@sckott
Copy link
Contributor

sckott commented Apr 24, 2014

I put up an OSX binary and source here https://github.com/ropensci/taxize/releases/tag/v0.2.4 WIndows coming soon

@chuckrp
Copy link
Author

chuckrp commented Apr 24, 2014

i submitted 888 names, from waterbears to slime molds to plants and no errors. It took a long time, but my work computer has a poor connection anyway.

I assume your fix has done the job.

Thanks.


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Thursday, April 24, 2014 12:09 PM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

Sorry about that. If you do devtools::install_github("ropensci/taxize") you should get the latest version 0.2.4. If that doesn't work, let me know and I can put up some binaries for you.

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 24, 2014

Great! Let us know of any other errors you get.

@chuckrp
Copy link
Author

chuckrp commented Apr 28, 2014

Hi,

I've been running my list of names through itis_terms with reasonable success, as long as I keep each attempt to no more than 500 names. Longer lists rarely work. They generate the same set of errors as before, including empty return from server, timeout error, and endless nothing. In addition, an error I hadn't seen previously appeared once:

Error in function (type, msg, asError = TRUE)  :
  Could not resolve
host: www.itis.gov

This came in the midst of a long run of many names, so it had worked fine for at least 100+ names when the error appeared.


From: Scott Chamberlain notifications@github.com
To: ropensci/taxize taxize@noreply.github.com
Cc: chuckrp cddis@att.net
Sent: Thursday, April 24, 2014 1:39 PM
Subject: Re: [taxize] Recovering from apparent server errors when using itis_terms with many names (#271)

Great! Let us know of any other errors you get.

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Apr 28, 2014

Sorry about this continued problem @chuckrp

I'll look into this problem. I'll see if I can replicate this issue so that I can see how to fix it. No clue right now other than occasionally having internet be lost, but it sounds like you have a reliable internet connection.

@sckott sckott reopened this Apr 28, 2014
@sckott
Copy link
Contributor

sckott commented May 6, 2014

Hi @chuckrp I haven't forgotten about this : still trying to figure this out

@sckott sckott added the Bug label May 6, 2014
@sckott sckott modified the milestones: v0.5, v0.3 May 17, 2014
@sckott
Copy link
Contributor

sckott commented May 17, 2014

Moved this to milestone 5 for now, haven't had time to track down this problem

@SteveViss
Copy link
Member

Hi @sckott,

It's not especially related to the ITIS servers.
I have the same prob with the function tnrs() connected to the iPlant_TNRS API.

I'm using those arguments:
tnrs(query = data$id, source = "iPlant_TNRS",verbose=FALSE, getpost = "POST")
In order to clean the species table from the FIA database with up to 4.000 recorded species.

The server returns: Error in function (type, msg, asError = TRUE) : Empty reply from server

Thanks for this very useful package !

@sckott
Copy link
Contributor

sckott commented May 22, 2014

Thanks for the report @SteveViss !

I'll look into that. I imagine that this is a server error from TNRS (from http://taxosaurus.org/). But I'll look into that and see if there is a solution.

@sckott
Copy link
Contributor

sckott commented May 22, 2014

Hey @SteveViss I've tried tnrs like you have. I didn't use the FIA list, but used the APG list of taxa we have in the package, tried up to 6000 species, and it does take a while, but i didn't get any errors. I'm pretty sure this is an error on their end. Did you try your call again with the 4000 species (b/c when you got the error perhaps they had a temporary "hiccup")? You could also try passing in smaller chunks of species names in a apply type call or a for loop and catch any errors so the process doesn't stop...Let me know what happens.

@SteveViss
Copy link
Member

Thx @sckott, I tried to run a script with the data set divided into smaller data frames (about 200 species). I did run the function in parrallel on 10 nodes. This is the output:

Error in checkForRemoteErrors (val):
   9 nodes Produced errors; first error: Empty reply from server

@sckott
Copy link
Contributor

sckott commented May 22, 2014

Sorry about this @SteveViss - Can you tell me what version of taxize you are using via packageVersion("taxize")?

@SteveViss
Copy link
Member

No prob @sckott,

> packageVersion("taxize")
[1] ‘0.3.0

@sckott
Copy link
Contributor

sckott commented May 22, 2014

Hmmm, I'm not sure what the problem could be. Maybe I can use exactly the species list you have. Can you share that with me? Email? Just the species list. I'll try the code on a few other machines to see if I can replicate the behavior.

@sckott
Copy link
Contributor

sckott commented May 22, 2014

@SteveViss My email is scott@ropensci.org

@SteveViss
Copy link
Member

Thx, I'm sending you the dataset I'm using.

@sckott
Copy link
Contributor

sckott commented May 22, 2014

@SteveViss got it, thx, running now

@SteveViss
Copy link
Member

Hi @sckott,

Ok, I find the prob and that's coming from the remote server. When I'm looking for the "accepted name" in the db iPlant_TNRS, I searched with an id in pasting 4 different fields: genus, species, subspecies and variety (available in the FIA). When I used only the genus and species to create this id, I got no errors.

@sckott
Copy link
Contributor

sckott commented May 23, 2014

Hi @SteveViss - Okay, glad you were able to fix the problem. I'm not sure why that would fix it, but glad it did. You may want to strip the white space from those names too, not a big deal. You could use stringr, like str_trim(data$id[1:10], "both"), which trims whitespace from both sides of each name

@sckott
Copy link
Contributor

sckott commented May 23, 2014

Hi @chuckrp

I just tried itis_terms again, and I can't replicate the errors you're getting. Have you tried doing this in different places? If you have only tried on a company or university internet, perhaps they are blocking very rapid calls to the same url from the same machine.

@sckott
Copy link
Contributor

sckott commented Nov 28, 2014

closing for now

@sckott sckott closed this as completed Nov 28, 2014
@chuckrp
Copy link
Author

chuckrp commented Feb 5, 2015

Scott,
Thank you for handling this issue. There has been a long hiatus since I first brought it up, but I am now retired and have had time to get back to what I was trying to do back then. First of all, this is just an update for your interest. I do not want to reopen #217.

My goal was to check the status of, get TSNs for, and find authors and common names where available for 40,328 names from a biodiversity database I am involved with. Because of the number of names, I split the list into 50 submissions of 800 and 1 submission of 328 names. When submitted to ITIS, I received the "server returns empty" error at a rate of about every third or fourth 800 name submission. Much better than I expected, actually. Resubmitting from each stop point allowed me to eventually get all the names processed. I followed the same procedure to submitthe names to GBIF and EOL. GBIF responded without any problems at all. EOL also responded well, for the most part. However, EOL did choke on a few names, stopping when they were submitted. It seemed to be an issue with non-standard characters within some of the names, probably a result of original data-entry errors. Removing those particular entries resulted in good returns. GBIF apparently just
ignored those names and returned a not found result.

As far as the returns are concerned, each group of 800 names returned as many as 1,700,000 records. I was astounded. However, after removing hundreds of thousands of duplicates each group of 800 names resulted in 2,000-5,000 unique names. ITIS provided common names for about 5,000 species. On the other hand, when compared with my original list, ITIS matched 24,328, EOL matched 34,041, and GBIF matched 35,420. This does not account for valid/accepted/synonym and so on.

I would like to make a request regarding sci2comm. Namely, to be able to run it with ask=FALSE and have it return all found common names for each taxon. Sitting at the monitor while thousands of names are processed is not a viable option for much of what I do. Plus, there is nothing wrong with multiple common names. I noted that ITIS allows for at least as many as 39. If you wish, I can submit this through GitHub to get it into the system.

Taxize is a great tool, based on these results. I realize it may not be the
type of use you had in mind when you developed it, but taxize has been
extremely helpful to me.
Chuck

On Friday, November 28, 2014 9:55 AM, Scott Chamberlain notifications@github.com wrote:

Closed #271.

Reply to this email directly or view it on GitHub.

@sckott
Copy link
Contributor

sckott commented Feb 5, 2015

hi @chuckrp Congrats on retiring!

Sorry about the troubles with ITIS. Are you interested in trying local querying of ITIS data? I am working on a development version of taxize that works with local copies of databases, so far ITIS, NCBI, COL, and Theplantlist. If so, let me know and I'll help you get that set up.

For sci2comm I think what you want is the new functions with the underscore on the end. e.g.,

get_tsn_('Poa an')

gives

$`Poa an`
     tsn    scientificname      commonnames nameusage
2  41107         Poa annua annual bluegrass  accepted
5 784054 Nicoraepoa andina             <NA>  accepted
6 784648          Poa anae             <NA>  accepted
7 784649     Poa androgyna             <NA>  accepted

Without any prompts. The normal function get_tsn() would give a prompt with this query, but this function does not, just gives all data back.

Unfortuantely, you can't pass that directly to sci2comm, but you could coerce each tsn into a proper id object like

res <- get_tsn_('Poa an')
sci2comm(as.tsn(res$`Poa an`$tsn))

gives

$`41107`
[1] "annual blue grass" "walkgrass"         "annual bluegrass" 

$`784054`
[1] NA

$`784648`
[1] NA

$`784649`
[1] NA

thx for the kind works, glad it's been useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants