New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monkeylearn: natural language processing with the monkeylearn API #45
Comments
Thanks for your submission @masalmon - We'll get back to you soon |
Hi @masalmon, I'll be your handling editor and am currently seeking reviewers. FYI, this submission prompted discussion among the editors on two topics:
|
Cool, thanks. Actually I know I'm creating work for myself later with API packages. |
Reviewers: @hilaryparker @leeper |
@hilaryparker |
@hilaryparker |
@hilaryparker |
Additional reviewer: @leeper |
This is a slim and easy-to-use package that works exactly as described. The documentation is generally complete and all examples in docs and README run as expected. I think it's really close to being ready for acceptance, but there are a couple of issues and I have some suggestions on usability.
Overall, really easy to use and really nicely done. |
One suggestion regarding outputting |
Many thanks!! 👍 I am on holidays this week so will implement the suggested changes next week. ☺ |
Hi @leeper, thanks again, this will be very useful! I have only questions about 2 points:
I had chosen to output a list of data frames because I did it this way for |
@masalmon R objects have an "attributes" component that stores metadata such as dimensions and names, but also can hold arbitrary R objects as metadata. See this section in Advanced R, the R Manual section, |
@noamross cool, thank you! |
Okay so I now see why it (maybe) made sense to have I'd tend to keep this because I don't know any better solution for this case, and if I do, I'd replace the current "text index" in the Any thought? |
Reg. "The only major substantive suggestion I have for the package is to perhaps create some functions or variables to represent the classifier ID and extractor ID values for some common methods.", I'm trying to get a table of all public modules with their ID and name, if Monkeylearn can send me one (I already know it cannot be obtained automatically). If they can't, I'll do a small table myself and use it. |
Yup, that works. Then there's a consistent structure regardless of whether it's one call or many. |
The function now outputs a data.frame with headers as a df attribute. 😄 I am still not sure about the following part of my code in both functions: output <- tryCatch(monkeylearn_get_extractor(request_part, key, extractor_id))
# for the case when the server returns nothing
# try 5 times, not more
try_number <- 1
while(class(output) == "try-error" && try_number < 6) {
message(paste0("Server returned nothing, trying again, try number", i))
Sys.sleep(2^try_number)
output <- tryCatch(monkeylearn_get_extractor(request_part, key, extractor_id))
try_number <- try_number + 1
}
# check the output -- if it is 429 try again (throttle limit)
while(!monkeylearn_check(output)) {
output <- monkeylearn_get_extractor(request_part, key, extractor_id)
} So if the server returns nothing I re-try up to 5 times with each time more waiting time. And if the response from the server has a 429 code which indicates throttle limit, I wait 60 seconds. I feel this is quite clumsy, maybe I could make this better? Any suggestion would be welcome. Note that I read https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html "For many APIs, the common approach is to retry API calls that return something in the 500 range. However, when doing this, it’s extremely important to make sure to do this with some form of exponential backoff: if something’s wrong on the server-side, hammering the server with retries may make things worse, and may lead to you exhausting quota (or hitting other sorts of rate limits). A common policy is to retry up to 5 times, starting at 1s, and each time doubling and adding a small amount of jitter (plus or minus up to, say, 5% of the current wait time)." Now another question. @leeper now that |
I'd say write an example of how to check rate limiting and if it's really complicated, perhaps write it into the package. If it turns out to be simple, then just leave it as an example and/or note it in the README. |
I feel I can now answer the review comments. Thanks a lot, @leeper! And thanks @noamross for your contributions to the discussion! Something I have done independently of this feedback was to add a small vignette where I do a small analysis of the book Little Women.
Thanks again for the feedback, I'm really happy to be able to improve the package! |
Nice. Looks good to me! |
@leeper I had added you to the DESCRIPTION file without asking first, I'm sorry. Are you ok with being in the DESCRIPTION? |
Sure thing. |
I was running final checks and I'm getting an error sometimes when running line 94 of the guternbergr vignette. It looks like the vignettes don't get tested on Travis?:
This appears to arise in
One has |
Oh thanks a lot @noamross , fixing this. Actually vignettes seem to get tested on Travis: https://travis-ci.org/masalmon/monkeylearn/builds/148549225 @feconroses is it normal for headers to sometimes contain transfer.encoding and sometimes content.length? |
@masalmon in all honesty, these headers aren't really relevant from a MonkeyLearn point of view. These headers (transfer.encoding and content.length) are generated automatically by nginx, they aren't really relevant for the text analysis made by MonkeyLearn or its process. |
@feconroses ok thanks a lot! @noamross has my commit solved the bug on your PC? |
Yes, that has solved it, all checks now passing! I hope you don't mind one more tweak at the end of a long review:. We're starting to use Gabor Csardi's goodpractice package to check for antipatterns in code. Running
|
Oh nice I didn't know about this package, I'll soon make the corresponding corrections. Thank you! |
@noamross now done! :-) |
Thanks for your patience and help, @noamross, now I really got rid of |
@sckott could you please create the |
@masalmon the appveyor is set up |
Wow this was quick, thanks a lot @sckott ! |
For info the package is on CRAN now. Thanks again for the review! |
Well done! On Sun, 11 Sep 2016, 17:35 Maëlle Salmon notifications@github.com wrote:
|
This package provides access to the existing modules of the monkeylearn API, allowing for instance Named Entity Recognition, language detection, attribution of topics to texts.
https://github.com/masalmon/monkeylearn
It works with monkeylearn API
http://monkeylearn.com/
Note that one can register using one's Github account.
Anyone that wants to use Natural Language Processing methods without e.g. first installing Java, or someone that already uses monkeylearn and wants to integrate it in a R workflow (but only for text processing, not for module development)
Yes the openNLP package provides several Natural Language Processing methods such as Named Entity Recognition but has a Java dependency that makes its installation difficult for some users.
devtools
install instructionsdevtools::check()
produce any errors or warnings? If so paste them below.It's not an explanation to any exception but I wanted to underline I'm not sure I've chosen the best solution for dealing with the throttle limit (I retry 5 times if there is a 429 API error which is throttle limit), so I'd very much appreciate feedback on this (and on anything else, obviously!).
The text was updated successfully, but these errors were encountered: