Graceful handling of errors in vectorised inputs #55

engti · 2019-02-27T09:53:37Z

I am trying to loop through a dataframe with reviews. Some of which seem to be below the detection threshold. I get errors like

When I try to do something like this:

for (i in 1:nrow(df_filtered[1:10,])) {
      tmp <- safely(gl_nlp(df_filtered$review_text[i]))
      
      api_result[[as.character(df_filtered$id[i])]] <- tmp
      
      print(paste0("Index: ",i," Status: ",length(tmp)))
 }

I get errors like:

2019-02-27 14:50:58 -- annotateText: 65 characters
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 1 seconds...
2019-02-27 14:51:03> Request Status Code: 400
Scopes: https://www.googleapis.com/auth/cloud-language https://www.googleapis.com/auth/cloud-platform
Method: service_json
Error: API returned: Invalid text content: too few tokens (words) to process.

I was thinking why should the whole loop error out due to only 1 bad call. I am using the safely function from purrr. But is there a best practise guide for dealing with these situations somewhere?

Thanks.

The text was updated successfully, but these errors were encountered:

MarkEdmondson1234 · 2019-02-27T09:59:25Z

Did you try it with sending in the column of text as is? It is vectorised so should cope with it, and a tryCatch() in the function should handle errors gracefully. If not let me know - so please try this code and report back what it does:

results <- gl_nlp(df_filtered$review_text)

engti · 2019-02-27T14:50:32Z

Thanks Mark for the quick response.

I tried it, but upon getting an error, it exits rather than proceeding gracefully. I did manage to get it working though, by using only rows with 20 words in them, and converting all text to UTF 8. Though it was a fiddly process. Let me know if I should close this comment, or you'd like to know more.

2019-02-27 20:18:33 -- annotateText: 14 characters
Auto-refreshing stale OAuth token.
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 2.5 seconds...
2019-02-27 20:18:41> Request Status Code: 400
Scopes: https://www.googleapis.com/auth/cloud-language https://www.googleapis.com/auth/cloud-platform
Method: service_json
Error: API returned: Invalid text content: too few tokens (words) to process.

MarkEdmondson1234 · 2019-02-27T19:59:24Z

Ok good to know thanks - I will keep issue open to make the fails more graceful.

engti · 2019-02-28T05:50:39Z

Many Thanks Mark. Let me know if you need me to test anything in the future.

thisisnickb · 2019-07-02T18:19:35Z

Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure:

This code:

#Use just instances with more than 25 words of text (arbitrary cutoff)
filelist<-lapply(filelist, function(x) subset(x, WordCount>24))

####Push the data up to Google and get the results back####
#Create the storage dataframe
output<-rep(list(NA), length(ids))
names(output)<-as.numeric(ids)

#Run the data through
tryCatch(
  {
    for(i in 1:length(ids)){
      output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content))
    }
  }
)

ultimately produces this error:

MarkEdmondson1234 · 2019-07-02T18:31:44Z

Ok I’ll take a look to make this fail more gracefully

…

________________________________ From: thisisnickb <notifications@github.com> Sent: Tuesday, July 2, 2019 8:19 PM To: ropensci/googleLanguageR Cc: Mark; Comment Subject: Re: [ropensci/googleLanguageR] Graceful handling of errors in vectorised inputs (#55) Would just like to add that I am having the same issue, and that, unless I have my tryCatch() loop coded incorrectly, I'm also getting the same sort of failure: This code:

________________________________ #Use just instances with more than 25 words of text (arbitrary cutoff) filelist<-lapply(filelist, function(x) subset(x, WordCount>24)) ####Push the data up to Google and get the results back#### #Create the storage dataframe output<-rep(list(NA), length(ids)) names(output)<-as.numeric(ids) #Run the data through tryCatch( { for(i in 1:length(ids)){ output[[i]]<-gl_nlp(as.character(filelist[[i]]$Content)) } } )

________________________________ ultimately produces this error: [unnamed]<https://user-images.githubusercontent.com/35079605/60536503-64d21480-9cd4-11e9-9732-c63231a911ff.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#55?email_source=notifications&email_token=AAYCPLHTF5JAGTQKTO743GDP5OL3PA5CNFSM4G2QP762YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZCEKTY#issuecomment-507790671>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAYCPLBIIYG2HVZITCYEUNDP5OL3PANCNFSM4G2QP76Q>.

MarkEdmondson1234 · 2019-07-02T20:13:30Z

The above scenarios should be better now in version 0.2.0.9000 on Github now (install via remotes::install_github("ropensci/googleLanguageR"))

47c0666

For example, the below calls will carry on if there are 400 errors in the first responses:

library(googleLanguageR)
gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA))
2019-07-02 22:08:00 -- annotateText: 43 characters
2019-07-02 22:08:01> Request Status Code: 400
2019-07-02 22:08:01 -- Error processing string: 'the rain in spain falls mainly on the plain' API returned: Invalid text content: too few tokens (words) to process.
2019-07-02 22:08:01 -- annotateText: 3 characters
2019-07-02 22:08:02> Request Status Code: 400
2019-07-02 22:08:02 -- Error processing string: 'err' API returned: Invalid text content: too few tokens (words) to process.

Which gives a response like below:

$sentences
$sentences[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$sentences[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$sentences[[3]]
[1] "#error - zero length string"

$sentences[[4]]
[1] "#error - zero length string"


$tokens
$tokens[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$tokens[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$tokens[[3]]
[1] "#error - zero length string"

$tokens[[4]]
[1] "#error - zero length string"


$entities
$entities[[1]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$entities[[2]]
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."

$entities[[3]]
[1] "#error - zero length string"

$entities[[4]]
[1] "#error - zero length string"


$language
[1] "#error -  API returned: Invalid text content: too few tokens (words) to process."
[2] "#error -  API returned: Invalid text content: too few tokens (words) to process."
[3] "#error - zero length string"                                                     
[4] "#error - zero length string"                                                     

$text
[1] "the rain in spain falls mainly on the plain"
[2] "err"                                        
[3] ""                                           
[4] NA                                           

$documentSentiment
# A tibble: 4 x 2
  magnitude score
      <dbl> <dbl>
1        NA    NA
2        NA    NA
3        NA    NA
4        NA    NA

$classifyText
# A tibble: 4 x 2
  name  confidence
  <chr>      <int>
1 NA            NA
2 NA            NA
3 NA            NA
4 NA            NA

Note you do not need to loop through indexes etc. to pass multiple text to the API, send in the vector and it will do one API call per text element. It will skip API calls for empty strings or NA vector elements.

thisisnickb · 2019-07-03T13:54:31Z

Fixed - many thanks!

MarkEdmondson1234 · 2019-08-07T09:16:01Z

One thing I have just realised, is that the "too few tokens (words) to process." error only occurs if you include classifyText in the request e.g. if you use the annotateText default that includes all methods. You can get entity analysis for any number of characters if you specify only that

e.g.

gl_nlp(c("the rain in spain falls mainly on the plain", "err", "", NA), nlp_type = "analyzeEntities")

See https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText

MarkEdmondson1234 changed the title ~~Handling errors in a loop using~~ Graceful handling of errors in vectorised inputs Feb 27, 2019

MarkEdmondson1234 closed this as completed Jul 3, 2019

MitsuhaMiyamizu mentioned this issue Dec 28, 2023

Discussion: Stage 1. YuLab-SMU/fanyi#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful handling of errors in vectorised inputs #55

Graceful handling of errors in vectorised inputs #55

engti commented Feb 27, 2019

MarkEdmondson1234 commented Feb 27, 2019

engti commented Feb 27, 2019

MarkEdmondson1234 commented Feb 27, 2019

engti commented Feb 28, 2019

thisisnickb commented Jul 2, 2019 •

edited by MarkEdmondson1234

MarkEdmondson1234 commented Jul 2, 2019 via email

MarkEdmondson1234 commented Jul 2, 2019 •

edited

thisisnickb commented Jul 3, 2019

MarkEdmondson1234 commented Aug 7, 2019

Graceful handling of errors in vectorised inputs #55

Graceful handling of errors in vectorised inputs #55

Comments

engti commented Feb 27, 2019

MarkEdmondson1234 commented Feb 27, 2019

engti commented Feb 27, 2019

MarkEdmondson1234 commented Feb 27, 2019

engti commented Feb 28, 2019

thisisnickb commented Jul 2, 2019 • edited by MarkEdmondson1234

MarkEdmondson1234 commented Jul 2, 2019 via email

MarkEdmondson1234 commented Jul 2, 2019 • edited

thisisnickb commented Jul 3, 2019

MarkEdmondson1234 commented Aug 7, 2019

thisisnickb commented Jul 2, 2019 •

edited by MarkEdmondson1234

MarkEdmondson1234 commented Jul 2, 2019 •

edited