<a href="https://colab.research.google.com/github/ipeirotis/dealing_with_data/blob/master/02-WebAPIs/B2-Google_Natural_Language_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interacting with the Google Cloud Natural Language API

Another useful API, especially when dealing with text, is the [Google Cloud Natural Language API](https://cloud.google.com/natural-language), which offers a variety of text analysis functionalities, such as sentiment analysis, entity extraction, keyword extraction, etc.

We will give a couple of examples below, to understand how we can take an unstructured piece of text (either the text alone, or a URL with text), and extract some "semi-structured" representation of its content.



## /analyzeSentiment call

We will first start with the `/analyzeSentiment` API call ([documentation](https://cloud.google.com/natural-language/docs/analyzing-sentiment) & also [here](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeSentiment)) which takes as input a piece of text, and returns an analysis across various dimensions.

The call below gets as input a "text" variable, and returns back the sentiment of the text.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import json
with open('/content/drive/My Drive/api_keys.json') as f:
    api_keys = json.load(f)

In [None]:
import requests

def analyze_sentiment(text, api_key):
    url = f"https://language.googleapis.com/v1/documents:analyzeSentiment?key={api_key}"

    data = {
        "document": {
            "type": "PLAIN_TEXT",
            "content": text
        },
        "encodingType": "UTF8"
    }

    response = requests.post(url, json=data)

    return response.json()

In [None]:
# We will analyze the text below using the IBM Watson API

text = '''
I got their Egg & Cheese sandwich on a Whole Wheat Everything Bagel. 
First off, I loved loved loved the texture of the bagel itself. 
It was very chewy yet soft, which is a top feature for a NY style bagel. 
However, I thought there could've been more seasoning on top of 
the bagel as I found the bagel itself to be a bit bland. 

Speaking of bland, I thought the egg and cheese filling were also quite bland. 
This was definitely lacking salt and pepper in the eggs and the cheese didn't
really add too much flavor either, which was really disappointing! 
My mom also had the same complaint with her bagel sandwich 
(she had the egg sandwich on a blueberry bagel) so I definitely wasn't 
the only one.

'''

In [None]:
data = analyze_sentiment(text, api_keys['google_nlp_api_key'])

Now, let's try to understand the structure of the answer. First, we check the high-level keys.

In [None]:
data.keys()

dict_keys(['documentSentiment', 'language', 'sentences'])

Now, let's check the content of these keys:

In [None]:
data['documentSentiment']

{'magnitude': 5.1, 'score': -0.1}

In [None]:
# Let's go deeper into the 'sentence'
data['sentences']

[{'text': {'content': 'I got their Egg & Cheese sandwich on a Whole Wheat Everything Bagel.',
   'beginOffset': 1},
  'sentiment': {'magnitude': 0.3, 'score': 0.3}},
 {'text': {'content': 'First off, I loved loved loved the texture of the bagel itself.',
   'beginOffset': 71},
  'sentiment': {'magnitude': 0.9, 'score': 0.9}},
 {'text': {'content': 'It was very chewy yet soft, which is a top feature for a NY style bagel.',
   'beginOffset': 136},
  'sentiment': {'magnitude': 0.9, 'score': 0.9}},
 {'text': {'content': "However, I thought there could've been more seasoning on top of \nthe bagel as I found the bagel itself to be a bit bland.",
   'beginOffset': 210},
  'sentiment': {'magnitude': 0.7, 'score': -0.7}},
 {'text': {'content': 'Speaking of bland, I thought the egg and cheese filling were also quite bland.',
   'beginOffset': 334},
  'sentiment': {'magnitude': 0.8, 'score': -0.8}},
 {'text': {'content': "This was definitely lacking salt and pepper in the eggs and the cheese didn

In [None]:
# And a bit more
print(f"The sentiment in this text is {data['documentSentiment']['score']}")

The sentiment in this text is -0.1


### Exercise

Type your own piece of text, and analyze it to extract sentiment. Discuss your findings

## Entities call

[Full Documentation of the call](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeEntities)

This is a an API call that extracts entities from the text, and also the sentiment for each of these entities.

In terms of natural language processing, we will examine a couple of capabilities of the API. First, you will see that there is the capability of "normalizing" each entity, so that two different ways of saying the same thing get mapped to the same entity. So for example, "President Biden" and "Joe Biden" get mapped to the same Knowledge Graph entity.

In [None]:
import requests

def extract_entities(url_to_analyze, api_key):
    html_content = requests.get(url_to_analyze).text
    url = f"https://language.googleapis.com/v1/documents:analyzeEntities?key={api_key}"

    data = {
        "document": {
            "type": "HTML",
            "content": html_content
        },
        "encodingType": "UTF8"
    }

    response = requests.post(url, json=data)
    return response.json()

In [None]:
url_to_analyze = 'https://www.reuters.com/lifestyle/elvis-everything-everywhere-vie-oscar-nods-tuesday-2023-01-24/'

data = extract_entities(url_to_analyze, api_keys['google_nlp_api_key'])

In [None]:
# Let's see what we get back as top-level attributes
data.keys()

dict_keys(['entities', 'language'])

In [None]:
# Let' see the entities list
data['entities']

[{'name': 'tab',
  'type': 'OTHER',
  'metadata': {},
  'salience': 0.7716946,
  'mentions': [{'text': {'content': 'tab', 'beginOffset': 131167},
    'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 135093}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 138259}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 141531}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 144749}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 147935}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 151101}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 154286}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 157797}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 160953}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 171199}, 'type': 'COMMON'},
   {'text': {'content': 'tab', 'beginOffset': 174574}, 'type': 'COMMON'},
   {'text': {'con

In [None]:
# This function takes as input the result
# from the IBM Watson API and returns a list
# of entities that are relevant (above threshold)
# to the article
def getEntities(data, threshold):
    result = []
    for entity in data["entities"]:
        relevance = float(entity['salience'])
        if relevance > threshold:
            result.append(entity['name'])
    return result

getEntities(data, 0.002)

['tab',
 'Thomson Reuters',
 'Box office hits',
 'The Banshees of Inisherin',
 'return',
 'Everything Everywhere',
 'academy',
 'Avatar: The Way of Water',
 'Austin Butler',
 'nominations',
 'Will Smith',
 'Box office hits',
 'TV viewers',
 'Riseborough',
 'Avatar',
 'people',
 'Elvis',
 'All Quiet on the Western Front',
 'land',
 'academy',
 'screen',
 'Lisa Marie Presley']

### Exercise 2
Now let's try to use the classify content tool to the content type of a website. You can find the documentation [here](https://cloud.google.com/natural-language/docs/categories) and [here](https://cloud.google.com/natural-language/docs/reference/rest/v1/ClassificationCategory).

In [None]:
# ADD YOUR CODE HERE