# Google NLP API Demo

## Summary
Google's NLP API is easy to implement, robust, and reliable. There are three main functions that would apply to our app. In summary, Sentiment Analysis is pretty good and straightforward, Entity Analysis has a lot of potential for advanced analytics, and Topics Analysis is really really useful for identifying clusters. Overall, I think this option is viable and promising. 

In [1]:
import pandas as pd
import numpy as np

In [2]:
#import 50 dummy paragraphs 
#first 4 were written by me
#next 46 were generated from https://randomwordgenerator.com/paragraph.php

paragraphs = pd.read_csv("paragraphs.csv", names=["text"])
paragraphs.head()

Unnamed: 0,text
0,Today I played volleyball and it was great. Ou...
1,I fought with my mom today. I disagreed with h...
2,My dream when I was a kid was to become a mang...
3,I realized that my best friend Jimmy actually ...
4,Her breath exited her mouth in big puffs as if...


In [3]:
!pip3 install --upgrade google-cloud-language

Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 -m pip install --upgrade pip' command.[0m


In [4]:
from google.cloud import language_v1

In [5]:
#function 1: sentiment analysis of entire paragraph

#docs:
#https://cloud.google.com/natural-language/docs/analyzing-sentiment
#https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values

#run the block again if errors (enable billing error etc)

client = language_v1.LanguageServiceClient.from_service_account_json("memoryz-api-key.json")

def analyze_sentiment(input_text):
    document = language_v1.Document(
        content = input_text, 
        type_ = language_v1.Document.Type.PLAIN_TEXT
    )

    sentiment = client.analyze_sentiment(
        request={"document": document}
    ).document_sentiment
    
    return (sentiment.score, sentiment.magnitude)
 
analyze_sentiment("Savvy ways lead to happy days")

(0.30000001192092896, 0.30000001192092896)

In [6]:
%%time

#time it takes to anaylze 1 paragraph
test = analyze_sentiment(paragraphs.loc[5, "text"])

CPU times: user 6.11 ms, sys: 4.79 ms, total: 10.9 ms
Wall time: 180 ms


In [7]:
%%time

#time it takes to anaylze 50 paragraphs
sentiment_analysis_results = paragraphs["text"].apply(analyze_sentiment)

CPU times: user 157 ms, sys: 36 ms, total: 193 ms
Wall time: 9.76 s


In [8]:
#append results to paragraphs df
paragraphs["sentiment_score"] = sentiment_analysis_results.apply(lambda x: x[0])
paragraphs["sentiment_magnitude"] = sentiment_analysis_results.apply(lambda x: x[1])
paragraphs.head()

Unnamed: 0,text,sentiment_score,sentiment_magnitude
0,Today I played volleyball and it was great. Ou...,0.5,6.9
1,I fought with my mom today. I disagreed with h...,-0.3,4.2
2,My dream when I was a kid was to become a mang...,0.4,4.0
3,I realized that my best friend Jimmy actually ...,-0.2,3.4
4,Her breath exited her mouth in big puffs as if...,-0.3,1.5


In [9]:
#function 2: entity + sentiment analysis

#docs:
#https://cloud.google.com/natural-language/docs/analyzing-entity-sentiment
#https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values

#run the block again if errors (enable billing error etc)

def analyze_entity_sentiments(input_text):

    # Available types: PLAIN_TEXT, HTML
    type_ = language_v1.types.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"content": input_text, "type_": type_, "language": language}
    encoding_type = language_v1.EncodingType.UTF8

    response = client.analyze_entity_sentiment(request = {'document': document, 'encoding_type': encoding_type})
    entity_dict = {}
    for entity in response.entities:
        #entity_dict: {entity_name: [instance_metadata]}
        #needs meta because an entity may appear multiple times in one paragraph
        meta = {}
        if entity.name not in entity_dict:
            meta["id"] = 0
        else:
            meta["id"] = len(entity_dict[entity.name])
            
        meta["type"] = language_v1.Entity.Type(entity.type_).name
        meta["salience"] = entity.salience
        sentiment = entity.sentiment
        meta["sentiment_score"] = sentiment.score
        meta["sentiment_magnitude"] = sentiment.magnitude
        
        #these two functions might have deprecated
#         wiki_meta = []
#         # Loop over the metadata associated with entity. For many known entities,
#         # the metadata is a Wikipedia URL (wikipedia_url) and Knowledge Graph MID (mid).
#         # Some entity types may have additional metadata, e.g. ADDRESS entities
#         # may have metadata for the address street_name, postal_code, et al.
#         for metadata_name, metadata_value in entity.metadata.items():
#             wiki_meta.append((metadata_name, metadata_value))
#         meta["wiki_meta"] = wiki_meta

#         # Loop over the mentions of this entity in the input document.
#         # The API currently supports proper noun mentions.
#         mentions = []
#         for mention in entity.mentions:
#             #(mention text, mention type)
#             #honestly not sure what this is
#             mentions.append((mention.text.content, language_v1.EntityMention.Type(mention.type_).name))
#         meta["mentions"] = mentions
        
        if entity.name not in entity_dict:
            entity_dict[entity.name] = [meta]
        else:
            entity_dict[entity.name].append(meta)
            
    return entity_dict

analyze_entity_sentiments('Grapes are good. Bananas are bad.')

{'Grapes': [{'id': 0,
   'type': 'OTHER',
   'salience': 0.8335162997245789,
   'sentiment_score': 0.800000011920929,
   'sentiment_magnitude': 0.800000011920929}],
 'Bananas': [{'id': 0,
   'type': 'OTHER',
   'salience': 0.16648370027542114,
   'sentiment_score': -0.699999988079071,
   'sentiment_magnitude': 0.699999988079071}]}

In [10]:
%%time

#time it takes to anaylze 1 paragraph
test = analyze_entity_sentiments(paragraphs.loc[10, "text"])

CPU times: user 8.09 ms, sys: 2.14 ms, total: 10.2 ms
Wall time: 165 ms


In [11]:
%%time

#time it takes to anaylze 50 paragraphs
entity_sentiment_analysis_results = paragraphs["text"].apply(analyze_entity_sentiments)

CPU times: user 291 ms, sys: 37.9 ms, total: 329 ms
Wall time: 10.7 s


In [12]:
#append results to paragraph df
paragraphs["entity_sentiment_meta"] = entity_sentiment_analysis_results
paragraphs.head()

Unnamed: 0,text,sentiment_score,sentiment_magnitude,entity_sentiment_meta
0,Today I played volleyball and it was great. Ou...,0.5,6.9,"{'matchpoint': [{'id': 0, 'type': 'OTHER', 'sa..."
1,I fought with my mom today. I disagreed with h...,-0.3,4.2,"{'mom': [{'id': 0, 'type': 'PERSON', 'salience..."
2,My dream when I was a kid was to become a mang...,0.4,4.0,"{'kid': [{'id': 0, 'type': 'PERSON', 'salience..."
3,I realized that my best friend Jimmy actually ...,-0.2,3.4,"{'man': [{'id': 0, 'type': 'PERSON', 'salience..."
4,Her breath exited her mouth in big puffs as if...,-0.3,1.5,"{'breath': [{'id': 0, 'type': 'OTHER', 'salien..."


In [13]:
#function 3: topic analysis

#docs:
#https://cloud.google.com/natural-language/docs/classifying-text
#https://cloud.google.com/natural-language/docs/categories

#run the block again if errors (enable billing error etc)

def analyze_topics(input_text):
    
    type_ = language_v1.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"content": input_text, "type_": type_, "language": language}

    response = client.classify_text(request = {'document': document})
    category_dict = {}
    for category in response.categories:
        #[1:] is to delete the "/" located at position 0
        category_dict[category.name[1:]] = category.confidence
    return category_dict

analyze_topics("Hollywood stars like Brad Pitt may have jolly wood mars and bad pits \
                         and star in movies such as Mr. and Mrs. Smith")

{'Arts & Entertainment/Movies': 0.7300000190734863,
 'News': 0.7300000190734863,
 'Arts & Entertainment/Celebrities & Entertainment News': 0.7200000286102295}

In [14]:
%%time

#time it takes to anaylze 1 paragraph
test = analyze_topics(paragraphs.loc[15, "text"])

CPU times: user 4.45 ms, sys: 2.22 ms, total: 6.66 ms
Wall time: 138 ms


In [15]:
%%time

#time it takes to anaylze 50 paragraphs
topic_analysis_results = paragraphs["text"].apply(analyze_topics)

CPU times: user 154 ms, sys: 31.7 ms, total: 185 ms
Wall time: 8.06 s


In [16]:
#append results to paragraph df
paragraphs["topics_meta"] = topic_analysis_results
paragraphs.head()

Unnamed: 0,text,sentiment_score,sentiment_magnitude,entity_sentiment_meta,topics_meta
0,Today I played volleyball and it was great. Ou...,0.5,6.9,"{'matchpoint': [{'id': 0, 'type': 'OTHER', 'sa...",{'Sports/Team Sports/Volleyball': 0.9700000286...
1,I fought with my mom today. I disagreed with h...,-0.3,4.2,"{'mom': [{'id': 0, 'type': 'PERSON', 'salience...",{'Beauty & Fitness/Hair Care': 0.6299999952316...
2,My dream when I was a kid was to become a mang...,0.4,4.0,"{'kid': [{'id': 0, 'type': 'PERSON', 'salience...",{'Arts & Entertainment/Comics & Animation/Anim...
3,I realized that my best friend Jimmy actually ...,-0.2,3.4,"{'man': [{'id': 0, 'type': 'PERSON', 'salience...",{'Arts & Entertainment/Humor': 0.550000011920929}
4,Her breath exited her mouth in big puffs as if...,-0.3,1.5,"{'breath': [{'id': 0, 'type': 'OTHER', 'salien...",{}


## Summary
Google's NLP API is easy to implement, robust, and reliable. There are three main functions that would apply to our app. In summary, Sentiment Analysis is pretty good and straightforward, Entity Analysis has a lot of potential for advanced analytics, and Topics Analysis is really really useful for identifying clusters. Overall, I think this option is viable and promising. 

### Pros:
- 3 applicable functions (sentiment, entity, and topic analyses) + 1 potentially-useful function (syntax/speech analysis) 
- Sentiment score seems pretty accurate and magnitude can be used to evaluate how strong/extreme a user's feelings are
- Entity analysis is quite robust and can provide a lot of data for heavy data science and analytics in syntax
- Topic analysis is super useful in terms of clustering and suggesting interest (HUGE)
- Reliable, cheap, and really easy to implement
- If we end up using Google Cloud for backend, implementation should be much simpler. Google also has many other services that are compatible with the resources we have (API keys, accounts, databases, etc.)
- Good documentation and sample projects available online

### Cons:
- Slow? Not really sure how fast it should be. My code can probably be optimized further
- Doesn't analyze short text well
- Sentiment analysis is 1 dimensional (one value from -1 to 1)
- Some functions might have been deprecated
- Not free?