Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gl_nlp() pukes w/ parameter nlp_type="analyzeSyntax" #28

Closed
dan-reznik opened this issue Feb 16, 2018 · 10 comments
Closed

gl_nlp() pukes w/ parameter nlp_type="analyzeSyntax" #28

dan-reznik opened this issue Feb 16, 2018 · 10 comments

Comments

@dan-reznik
Copy link

dan-reznik commented Feb 16, 2018

The code below pukes if nlp_type is specified. If it isn't, it works fine, but I am trying to restrict the nlp performed to get speedier results.

library(googleLanguageR)
gl_auth("<blablabla.json>")
gl_nlp("Two Things are Infinite",
nlp_type="analyzeSyntax" # if i omit this it works
)

I get the following error:

2018-02-16 17:21:50 -- analyzeSyntax: 23 characters
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 1, 0
Error: Result 1 is not a length 1 atomic vector
In addition: Warning message:
In call_api(the_body = body) :
API Data failed to parse. Returning parsed from JSON content.
Use this to test against your data_parse_function.

@dan-reznik dan-reznik changed the title gl_nlp() pukes is nlp_type="analyzeSyntax" gl_nlp() pukes w/ parameter nlp_type="analyzeSyntax" Feb 16, 2018
@MarkEdmondson1234
Copy link
Collaborator

Could you report the error message?

@dan-reznik
Copy link
Author

2018-02-16 17:21:50 -- analyzeSyntax: 23 characters
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 1, 0
Error: Result 1 is not a length 1 atomic vector
In addition: Warning message:
In call_api(the_body = body) :
API Data failed to parse. Returning parsed from JSON content.
Use this to test against your data_parse_function.

@dan-reznik
Copy link
Author

Mark any luck with this issue?

@MarkEdmondson1234
Copy link
Collaborator

The problem was the API returned is inconsistent - when you request everything it includes the sentiment score, when you request just the analyzeSyntax part it omits it. This is a bug at the APIs end, which I will report, but I fixed the R error so that it won't stop if there isn't any content - it will return what it gets but it may not be what you needed in the first place :)

@dan-reznik
Copy link
Author

dan-reznik commented Feb 18, 2018

Thanks for the quick work. Can I already github-install the new version?

Out of curiosity, w/ nlp_type="analyzeSyntax" I would only expect POS and lemmatization, and not expect sentiment to be returned, so it would seem the API is doing the right thing.

So my newbie question is: Is a fixed return format from a REST API a requirement in general? What am I missing?

@MarkEdmondson1234
Copy link
Collaborator

Yes load the Github via remotes::install_github("ropensci/googleLanguageR")

They recently updated the API to include sentiment per entity, not just per document, which is what is (or isn't) being returned, so if you don't need it thats cool.

@dan-reznik
Copy link
Author

I would argue that processing complexity is ordered as

nlp_type = c("analyzeSyntax", "analyzeSentiment", "analyzeEntities", "analyzeEntitySentiment", "annotateText")

i.e., if one explicitly requests syntax analysis ONLY he/she should not expect sentiment nor entities nor entity sentiment.

so maybe the API is doing the right thing already?

@MarkEdmondson1234
Copy link
Collaborator

I thought they were more independent than that but that does make sense.

@dan-reznik
Copy link
Author

dan-reznik commented Feb 18, 2018

the reason NLP doesn't work in the real world (other than for fixed utterances) is because they proceed in this ordered fashion, which is also the reason most computer vision pipelines don't work well (edge detection, feature detection, object detection, etc.), from low to high level. indeed, true intelligence relies on a stack of modules with back-and-forth communication, where all are engaged simultaneously, perhaps seeking consistency maximization, impossible in the serial model.

crucially, that's why sentiment analysis fails to decode sarcasm. "congrats to XYZ customer service. after a quick 34 days in line, a brilliant representative happily deleted my purchase record".

@MarkEdmondson1234
Copy link
Collaborator

Interesting, I wonder if its easier to get these individual elements (e.g. per sentence sentiment) what people will build on top. For instance, your example relies on the context that 34 days in line isn't "quick", if that context can be added perhaps it can reverse the sentiment score and give it more a chance to detect humans being human :)

I missed your question:

So my newbie question is: Is a fixed return format from a REST API a requirement in general? What am I missing?

I read annotateText as a convenience function that is the combination of all the analysis in one big list, e.g analyzeSyntax plus the other 3 options. Its a bug in my mind if the analyzeSyntax part of the annotateText return is not the same as when you request analyzeSyntax on its own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants