![nyp.jpg](attachment:nyp.jpg)

# Google Cloud Natural Language API

In this practical, we are going to learn more about the [Google Cloud Natural Language API](https://cloud.google.com/natural-language/docs).

Cloud Natural Language allows us to perform the following operations:
- [Analyse Syntax](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeSyntax)
- [Analyse Entities](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeEntities)
- [Analyse Sentiment](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeSentiment)
- [Analyse Entity Sentiment](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeEntitySentiment)
- [Classify Content](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText)

Let's start with exploring [Cloud Natural Language Demo](https://cloud.google.com/natural-language/#natural-language-api-demo).

### Todo

> Try out the demo using your own text. Explore the Entities, Sentiment, Syntax and Categories tabs.

## Using API Key


We will now connect to the Cloud Natural Language API to perform multiple operations in a single request using the [annotateText](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/annotateText) method. Click on the link to find out the request format you need to fill in and what response you expect to get.

In [None]:
import requests
import json

These parameters are required to complete the request.

In [None]:
googleAPIKey = "AIzaSyBhNkvLI0d7ZcHxP95h7slBOs-5OT8nLa0"
googleurl = (
    "https://language.googleapis.com/v1/documents:annotateText?key=" + googleAPIKey
)
req_headers = {"Content-Type": "application/json"}

In [None]:
document = "Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects. They'll not only be academically prepared, but also future-ready - equipped to tackle whatever life throws at them in their career or further education. Our annual Graduate Employment Surveys show that our students are consistently highly sought-after by employers in multiple industries. Many of our graduates have also gone on to local and overseas universities, where they continue to excel in their field of study."

Make an API request. Ensure the request parameters are filled in correctly as required under [annotateText](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/annotateText#Features).

In [1]:
data = {
    "document": {"type": "PLAIN_TEXT", "content": document},
    "features": {
        "extractSyntax": True,
        "extractEntities": True,
        "extractDocumentSentiment": True,
        "extractEntitySentiment": True,
        "classifyText": True,
        "moderateText": True,
    },
    "encodingType": "UTF8",
}

r = requests.post(url=googleurl, headers=req_headers, json=data)

# Check and display the results
if r.status_code == 200:
    result = r.json()

    print(result)

    # loop through the response to get the parameters needed


else:
    print("Error with status")
    print(r.content)

{'sentences': [{'text': {'content': 'Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects.', 'beginOffset': 0}, 'sentiment': {'magnitude': 0.9, 'score': 0.9}}, {'text': {'content': "They'll not only be academically prepared, but also future-ready - equipped to tackle whatever life throws at them in their career or further education.", 'beginOffset': 171}, 'sentiment': {'magnitude': 0.8, 'score': 0.8}}, {'text': {'content': 'Our annual Graduate Employment Surveys show that our students are consistently highly sought-after by employers in multiple industries.', 'beginOffset': 324}, 'sentiment': {'magnitude': 0.8, 'score': 0.8}}, {'text': {'content': 'Many of our graduates have also gone on to local and overseas universities, where they continue to excel in their field of study.', 'beginOffset': 460}, 'sentiment': {'magnitude': 0.8, 'score': 0.8}}], 'tokens': [{'text': {'co

In [2]:
# Pretty print JSON response
print(json.dumps(result, indent=4))

{
    "sentences": [
        {
            "text": {
                "content": "Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects.",
                "beginOffset": 0
            },
            "sentiment": {
                "magnitude": 0.9,
                "score": 0.9
            }
        },
        {
            "text": {
                "content": "They'll not only be academically prepared, but also future-ready - equipped to tackle whatever life throws at them in their career or further education.",
                "beginOffset": 171
            },
            "sentiment": {
                "magnitude": 0.8,
                "score": 0.8
            }
        },
        {
            "text": {
                "content": "Our annual Graduate Employment Surveys show that our students are consistently highly sought-after by employers in multiple industries.",
     

In [3]:
print(result.keys())

dict_keys(['sentences', 'tokens', 'entities', 'documentSentiment', 'language', 'categories', 'moderationCategories'])


### Analyse Syntax

Refer to [Analyse Syntax](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeSyntax). The expected response is shown below.
```
{
  "sentences": [
    {
      object (Sentence)
    }
  ],
  "tokens": [
    {
      object (Token)
    }
  ],
  "language": string
}

```

**Sentences**

In [None]:
# keys are sentences, tokens and language
sentences = result["sentences"]

In [4]:
sentences




[{'text': {'content': 'Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects.',
   'beginOffset': 0},
  'sentiment': {'magnitude': 0.9, 'score': 0.9}},
 {'text': {'content': "They'll not only be academically prepared, but also future-ready - equipped to tackle whatever life throws at them in their career or further education.",
   'beginOffset': 171},
  'sentiment': {'magnitude': 0.8, 'score': 0.8}},
 {'text': {'content': 'Our annual Graduate Employment Surveys show that our students are consistently highly sought-after by employers in multiple industries.',
   'beginOffset': 324},
  'sentiment': {'magnitude': 0.8, 'score': 0.8}},
 {'text': {'content': 'Many of our graduates have also gone on to local and overseas universities, where they continue to excel in their field of study.',
   'beginOffset': 460},
  'sentiment': {'magnitude': 0.8, 'score': 0.8}}]

In [5]:
# explore first sentence
sentences[0]




{'text': {'content': 'Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects.',
  'beginOffset': 0},
 'sentiment': {'magnitude': 0.9, 'score': 0.9}}

In [None]:
import pandas as pd

In [None]:
pd.set_option("display.max_colwidth", 0)

In [None]:
df1 = pd.concat(
    {
        "content": pd.Series([sentence["text"]["content"] for sentence in sentences]),
        "magnitude": pd.Series(
            [sentence["sentiment"]["magnitude"] for sentence in sentences]
        ),
        "score": pd.Series([sentence["sentiment"]["score"] for sentence in sentences]),
    },
    axis=1,
)

In [6]:
df1




                                                                                                                                                                      content  ...  score
0  Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects.  ...  0.9  
1  They'll not only be academically prepared, but also future-ready - equipped to tackle whatever life throws at them in their career or further education.                    ...  0.8  
2  Our annual Graduate Employment Surveys show that our students are consistently highly sought-after by employers in multiple industries.                                     ...  0.8  
3  Many of our graduates have also gone on to local and overseas universities, where they continue to excel in their field of study.                                           ...  0.8  

[4 rows x 3 columns]

**Tokens**

In [None]:
tokens = result["tokens"]

In [7]:
# explore first token
tokens[0]




{'text': {'content': 'Nanyang', 'beginOffset': 0},
 'partOfSpeech': {'tag': 'NOUN',
  'aspect': 'ASPECT_UNKNOWN',
  'case': 'CASE_UNKNOWN',
  'form': 'FORM_UNKNOWN',
  'gender': 'GENDER_UNKNOWN',
  'mood': 'MOOD_UNKNOWN',
  'number': 'SINGULAR',
  'person': 'PERSON_UNKNOWN',
  'proper': 'PROPER',
  'reciprocity': 'RECIPROCITY_UNKNOWN',
  'tense': 'TENSE_UNKNOWN',
  'voice': 'VOICE_UNKNOWN'},
 'dependencyEdge': {'headTokenIndex': 1, 'label': 'NN'},
 'lemma': 'Nanyang'}

In [8]:
tokens[0].keys()




dict_keys(['text', 'partOfSpeech', 'dependencyEdge', 'lemma'])

In [None]:
# create dataframe using partOfSpeech inside each token
df2 = pd.DataFrame([token["partOfSpeech"] for token in tokens])

In [9]:
df2




       tag  ...          voice
0    NOUN   ...  VOICE_UNKNOWN
1    NOUN   ...  VOICE_UNKNOWN
2    VERB   ...  VOICE_UNKNOWN
3    PRON   ...  VOICE_UNKNOWN
4    NOUN   ...  VOICE_UNKNOWN
..    ...   ...            ...
99   PRON   ...  VOICE_UNKNOWN
100  NOUN   ...  VOICE_UNKNOWN
101  ADP    ...  VOICE_UNKNOWN
102  NOUN   ...  VOICE_UNKNOWN
103  PUNCT  ...  VOICE_UNKNOWN

[104 rows x 12 columns]

In [10]:
# create a pandas series using content inside each token
# then insert into first column of df2
df2.insert(0, "content", pd.Series([token["text"]["content"] for token in tokens]))
df2




         content  ...          voice
0    Nanyang      ...  VOICE_UNKNOWN
1    Polytechnic  ...  VOICE_UNKNOWN
2    gives        ...  VOICE_UNKNOWN
3    our          ...  VOICE_UNKNOWN
4    students     ...  VOICE_UNKNOWN
..        ...     ...            ...
99   their        ...  VOICE_UNKNOWN
100  field        ...  VOICE_UNKNOWN
101  of           ...  VOICE_UNKNOWN
102  study        ...  VOICE_UNKNOWN
103  .            ...  VOICE_UNKNOWN

[104 rows x 13 columns]

In [11]:
# do the same for lemma
df2.insert(1, "lemma", pd.Series([token["lemma"] for token in tokens]))
df2




         content  ...          voice
0    Nanyang      ...  VOICE_UNKNOWN
1    Polytechnic  ...  VOICE_UNKNOWN
2    gives        ...  VOICE_UNKNOWN
3    our          ...  VOICE_UNKNOWN
4    students     ...  VOICE_UNKNOWN
..        ...     ...            ...
99   their        ...  VOICE_UNKNOWN
100  field        ...  VOICE_UNKNOWN
101  of           ...  VOICE_UNKNOWN
102  study        ...  VOICE_UNKNOWN
103  .            ...  VOICE_UNKNOWN

[104 rows x 14 columns]

In [12]:
tokens[0]["dependencyEdge"]




{'headTokenIndex': 1, 'label': 'NN'}

In [13]:
# add columns for 'dependencyEdge': {'headTokenIndex': 1, 'label': 'NN'},
df2["d_edge_head_index"] = pd.Series(
    [token["dependencyEdge"]["headTokenIndex"] for token in tokens]
)
df2["d_edge_label"] = pd.Series([token["dependencyEdge"]["label"] for token in tokens])
df2




         content  ... d_edge_label
0    Nanyang      ...  NN         
1    Polytechnic  ...  NSUBJ      
2    gives        ...  ROOT       
3    our          ...  POSS       
4    students     ...  IOBJ       
..        ...     ...   ...       
99   their        ...  POSS       
100  field        ...  POBJ       
101  of           ...  PREP       
102  study        ...  POBJ       
103  .            ...  P          

[104 rows x 16 columns]

&#128161; **Tip:**

> You can consolidate the commands in the previous cells and create a function `get_tokens_dataframe(tokens)` to simplify the process

Explore the different fields inside [Part of Speech](https://cloud.google.com/natural-language/docs/reference/rest/v1/Token#partofspeech).

For example, [Person](https://cloud.google.com/natural-language/docs/reference/rest/v1/Token#Person) indicates different perspectives (i.e. FIRST, SECOND, etc.).

In [14]:
df2.loc[df2["person"] != "PERSON_UNKNOWN"]




   content  ... d_edge_label
2   gives   ...  ROOT       
3   our     ...  POSS       
8   they    ...  NSUBJ      
13  their   ...  POSS       
19  our     ...  POSS       
29  They    ...  NSUBJ      
48  throws  ...  CCOMP      
50  them    ...  POBJ       
52  their   ...  POSS       
58  Our     ...  POSS       
65  our     ...  POSS       
81  our     ...  POSS       
94  they    ...  NSUBJ      
99  their   ...  POSS       

[14 rows x 16 columns]

What about [Number](https://cloud.google.com/natural-language/docs/reference/rest/v1/Token#number)? What does it represent?

In [15]:
df2["number"].value_counts()




number
NUMBER_UNKNOWN    64
SINGULAR          20
PLURAL            20
Name: count, dtype: int64

In [16]:
# which tokens are plural?
df2.loc[df2["number"] == "PLURAL"]




         content  ... d_edge_label
3   our           ...  POSS       
4   students      ...  IOBJ       
8   they          ...  NSUBJ      
13  their         ...  POSS       
19  our           ...  POSS       
22  methods       ...  POBJ       
27  projects      ...  CONJ       
29  They          ...  NSUBJ      
50  them          ...  POBJ       
52  their         ...  POSS       
58  Our           ...  POSS       
65  our           ...  POSS       
66  students      ...  NSUBJPASS  
74  employers     ...  POBJ       
77  industries    ...  POBJ       
81  our           ...  POSS       
82  graduates     ...  POBJ       
91  universities  ...  POBJ       
94  they          ...  NSUBJ      
99  their         ...  POSS       

[20 rows x 16 columns]

In [17]:
# any token with known mood?
df2.loc[df2["mood"] != "MOOD_UNKNOWN"]




     content  ... d_edge_label
2   gives     ...  ROOT       
9   are       ...  AUX        
48  throws    ...  CCOMP      
63  show      ...  ROOT       
67  are       ...  AUXPASS    
83  have      ...  AUX        
95  continue  ...  RCMOD      

[7 rows x 16 columns]

Todo

> Try processing different documents and see how the part of speech (e.g. gender, number, person, etc.) changes. 

In [18]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104 entries, 0 to 103
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   content            104 non-null    object
 1   lemma              104 non-null    object
 2   tag                104 non-null    object
 3   aspect             104 non-null    object
 4   case               104 non-null    object
 5   form               104 non-null    object
 6   gender             104 non-null    object
 7   mood               104 non-null    object
 8   number             104 non-null    object
 9   person             104 non-null    object
 10  proper             104 non-null    object
 11  reciprocity        104 non-null    object
 12  tense              104 non-null    object
 13  voice              104 non-null    object
 14  d_edge_head_index  104 non-null    int64 
 15  d_edge_label       104 non-null    object
dtypes: int64(1), object(15)
memory usage: 13.1+ 

### Analyse Entities

Refer to [Analyse Entities](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeEntities). The expected response is shown below.
```
{
  "entities": [
    {
      object (Entity)
    }
  ],
  "language": string
}

```

In [19]:
entities = result["entities"]
entities[0]




{'name': 'students',
 'type': 'PERSON',
 'metadata': {},
 'salience': 0.630127,
 'mentions': [{'text': {'content': 'students', 'beginOffset': 30},
   'type': 'COMMON',
   'sentiment': {'magnitude': 0.6, 'score': 0.6}},
  {'text': {'content': 'students', 'beginOffset': 377},
   'type': 'COMMON',
   'sentiment': {'magnitude': 0.4, 'score': 0.4}}],
 'sentiment': {'magnitude': 2.4, 'score': 0.4}}

In [20]:
df3 = pd.DataFrame(entities)
df3




                           name  ...                         sentiment
0   students                     ...  {'magnitude': 2.4, 'score': 0.4}
1   Nanyang Polytechnic          ...  {'magnitude': 0.5, 'score': 0.5}
2   Many                         ...  {'magnitude': 1, 'score': 0.3}  
3   head start                   ...  {'magnitude': 0.3, 'score': 0.3}
4   life                         ...  {'magnitude': 0.3, 'score': 0.3}
5   phase                        ...  {'magnitude': 0.5, 'score': 0.5}
6   teaching methods             ...  {'magnitude': 0.3, 'score': 0.3}
7   projects                     ...  {'magnitude': 0.5, 'score': 0.5}
8   universities                 ...  {'magnitude': 0.6, 'score': 0.3}
9   life                         ...  {'magnitude': 0.1, 'score': 0.1}
10  career                       ...  {'magnitude': 0.2, 'score': 0.2}
11  education                    ...  {'magnitude': 0.2, 'score': 0.2}
12  graduates                    ...  {'magnitude': 0.3, 'score': 0.3}
13  Gr

### Analyse Sentiment

Refer to [Analyse Sentiment](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeSentiment). The expected response is shown below.
```
{
  "documentSentiment": {
    object (Sentiment)
  },
  "language": string,
  "sentences": [
    {
      object (Sentence)
    }
  ]
}
```

In [21]:
result.keys()




dict_keys(['sentences', 'tokens', 'entities', 'documentSentiment', 'language', 'categories', 'moderationCategories'])

In [22]:
result["documentSentiment"]




{'magnitude': 3.4, 'score': 0.8}

In [23]:
result["sentences"]




[{'text': {'content': 'Nanyang Polytechnic gives our students the head start they are looking for in their next phase in life with our innovative teaching methods and industry-focused projects.',
   'beginOffset': 0},
  'sentiment': {'magnitude': 0.9, 'score': 0.9}},
 {'text': {'content': "They'll not only be academically prepared, but also future-ready - equipped to tackle whatever life throws at them in their career or further education.",
   'beginOffset': 171},
  'sentiment': {'magnitude': 0.8, 'score': 0.8}},
 {'text': {'content': 'Our annual Graduate Employment Surveys show that our students are consistently highly sought-after by employers in multiple industries.',
   'beginOffset': 324},
  'sentiment': {'magnitude': 0.8, 'score': 0.8}},
 {'text': {'content': 'Many of our graduates have also gone on to local and overseas universities, where they continue to excel in their field of study.',
   'beginOffset': 460},
  'sentiment': {'magnitude': 0.8, 'score': 0.8}}]

### Analyse Entity Sentiment

Refer to [Analyse Entity Sentiment](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/analyzeEntitySentiment). The expected response is shown below.
```
{
  "entities": [
    {
      object (Entity)
    }
  ],
  "language": string
}
```

Refer to Analyse Entity section on how to process entities.

### Classify Content

Refer to [Classify Content](https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/classifyText). The expected response is shown below.

```
{
  "categories": [
    {
      object (ClassificationCategory)
    }
  ]
}
```

In [24]:
result["categories"]




[{'name': '/Jobs & Education/Education/Colleges & Universities',
  'confidence': 0.98}]