# Text Sentiment Analysis & Entity Extraction using Google Cloud's Natural Language API

## Overview

Google Cloud's Natural Language API is a pretrained ML api that can be used to derive insights from text files. This is an easy to use api that user's can use to quickly perform sentiment from a text string or a document, perform entity extraction and classification as well.

## Objective

This notebook illustrates an example on how to perform sentiment analysis, entity extraction and classification using a text files stored in the cloud storage bucket using api client library. Notebook covers following topics

- API setup
- Analyzing Sentiment
- Analyzing Entities
- Analyzing Entity Sentiment
- Content Classification

## Dataset

I am using `BLIND FATE by Toby Bradley` from [Stories and Fictions](http://www.textfiles.com/stories/) in txt format. This txt file is saved in my project's cloud storage bucket

## Setup Resources

Before you can use Natural Language API, you need a Google cloud project with billing and API enabled. Check [this](https://cloud.google.com/natural-language/docs/setup) page to setup API before use. This page explains how to setup API and authentication before you can use it.


We will be using the Python client library method to call the API. 

### Install packages and libraries

In [1]:
pip install --upgrade google-cloud-language

Note: you may need to restart the kernel to use updated packages.


In [2]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
REGION = "us-central1"
BUCKET_NAME = "<YOUR-BUCKET-NAME>" #REPLACE BUCKET NAME
FILE_PATH = "<path-to-file>/blind_fate.txt" #REPLACE PATH TO FILE 

In [None]:
### Create bucket and upload the file

! gsutil mb -l $REGION -c standard gs://$BUCKET_NAME

! gcloud alpha storage cp $FILE_PATH gs://$BUCKET_NAME/

In [3]:
GCS_FILE_PATH = f"gs://{BUCKET_NAME}/{FILE_PATH}"

Reading file content

In [4]:
from google.cloud import storage

BUCKET_NAME = "go-pnishit-assets"
client = storage.Client()
bucket = client.get_bucket(BUCKET_NAME)

blob = bucket.get_blob('AI-ML/01-prebuilt-ml-apis/blind_fate.txt')

downloaded_blob = blob.download_as_text()
print(downloaded_blob)

BLIND FATE
By Toby Bradley
      
       It was an ominous day that Thursday. The exhaust-filled buses rolled in
and out of the BART station, carrying
their weary travelers home to warm families.  As the last bus had left, another
woman, just as tired as the rest, was standing alone.
     "Baby, still," she commanded.  "Baby, still," she said again.  The great
dog stood loyally at her side.  Her eyes probed the surroundings and her ears
stood at attention.
     "Could someone help me please?  Could anyone direct me to the terminal?"
     She was dressed in rather drab and banal clothing that was ragged and
unkept.  She was slightly pudgy, a woman of thirty-two.  Despite her young age,
deep lines wore long on her face.  Her pale yellow hair was dull and uncombed. 
She was not attractive, nor tried to be.
     "Could someone tell me where I can buy a ticket?  Will anyone please put
me in the right direction?"
     The woman seemed to be a victim of an isolation that she had tried to
esca

#### Analyzing sentiment 

In [5]:
from google.cloud import language_v1

In [6]:
def analyze_sentiment(gcs_file_uri):
    """
    Analyzing Sentiment in text file stored in Cloud Storage

    Args:
      gcs_content_uri Google Cloud Storage URI where the file content is located.
      e.g. gs://[Your Bucket]/[Path to File]
    """
    
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT

    language = "en"
    document = {
        "gcs_content_uri": gcs_file_uri,
        "type_": type_,
        "language": language,
    }
    
    encoding_type = language_v1.EncodingType.UTF8

    response = client.analyze_sentiment(
        request={"document": document, "encoding_type": encoding_type}
    )
    
    # Overall Sentiment
    print(f"Overall document sentiment score: {response.document_sentiment.score}")
    print(f"Overall document sentiment magnitude: {response.document_sentiment.magnitude}")
    
    # Per Sentence Sentiment in the document
    for sentence in response.sentences:
        print(f"Sentence text: {sentence.text.content}.")
        print(f"Sentence sentiment score: {sentence.sentiment.score}.")
        print(f"Sentence sentiment magnitude: {sentence.sentiment.magnitude}.")

In [7]:
analyze_sentiment(GCS_FILE_PATH)

Overall document sentiment score: -0.20000000298023224
Overall document sentiment magnitude: 38.0
Sentence text: BLIND FATE
By Toby Bradley.
Sentence sentiment score: 0.0.
Sentence sentiment magnitude: 0.0.
Sentence text: It was an ominous day that Thursday..
Sentence sentiment score: -0.4000000059604645.
Sentence sentiment magnitude: 0.4000000059604645.
Sentence text: The exhaust-filled buses rolled in
and out of the BART station, carrying
their weary travelers home to warm families..
Sentence sentiment score: -0.10000000149011612.
Sentence sentiment magnitude: 0.10000000149011612.
Sentence text: As the last bus had left, another
woman, just as tired as the rest, was standing alone..
Sentence sentiment score: -0.699999988079071.
Sentence sentiment magnitude: 0.699999988079071.
Sentence text: "Baby, still," she commanded..
Sentence sentiment score: 0.0.
Sentence sentiment magnitude: 0.0.
Sentence text: "Baby, still," she said again..
Sentence sentiment score: 0.0.
Sentence sentiment ma

#### Analyze Entities

In [8]:
def analyze_entities(gcs_file_uri):
    """
    Analyzing Entities in text file stored in Cloud Storage

    Args:
      gcs_content_uri Google Cloud Storage URI where the file content is located.
    """
    
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT

    language = "en"
    document = {
        "gcs_content_uri": gcs_file_uri,
        "type_": type_,
        "language": language,
    }
    
    encoding_type = language_v1.EncodingType.UTF8

    entities = client.analyze_entities(
        request={"document": document, "encoding_type": encoding_type}
    )
    
    print(entities)

In [9]:
analyze_entities(GCS_FILE_PATH)

entities {
  name: "one"
  type_: PERSON
  salience: 0.603400707244873
  mentions {
    text {
      content: "one"
      begin_offset: 6620
    }
    type_: COMMON
  }
}
entities {
  name: "Marsha"
  type_: PERSON
  salience: 0.04219701513648033
  mentions {
    text {
      content: "Marsha"
      begin_offset: 4092
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: 4321
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: 4413
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: 4606
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: 5177
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: 5454
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: 5597
    }
    type_: PROPER
  }
  mentions {
    text {
      content: "Marsha

#### Analyze entity sentiment

In [10]:
def analyze_entity_sentiment(gcs_file_uri):
    """
    Analyzing Entity Sentiment in a String

    Args:
      text_content The text content to analyze
    """

    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT
    language = "en"
    document = {
            "gcs_content_uri": gcs_file_uri,
            "type_": type_,
            "language": language,
        }
    
    entity_sentiment = client.analyze_entity_sentiment(
        request={"document": document}
        )
        
    print(entity_sentiment)

In [11]:
analyze_entity_sentiment(GCS_FILE_PATH)

entities {
  name: "one"
  type_: PERSON
  salience: 0.603400707244873
  mentions {
    text {
      content: "one"
      begin_offset: -1
    }
    type_: COMMON
    sentiment {
      magnitude: 0.10000000149011612
      score: 0.10000000149011612
    }
  }
  sentiment {
    magnitude: 0.20000000298023224
    score: 0.10000000149011612
  }
}
entities {
  name: "Marsha"
  type_: PERSON
  salience: 0.04219701513648033
  mentions {
    text {
      content: "Marsha"
      begin_offset: -1
    }
    type_: PROPER
    sentiment {
    }
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: -1
    }
    type_: PROPER
    sentiment {
    }
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: -1
    }
    type_: PROPER
    sentiment {
      magnitude: 0.20000000298023224
      score: 0.20000000298023224
    }
  }
  mentions {
    text {
      content: "Marsha"
      begin_offset: -1
    }
    type_: PROPER
    sentiment {
      magnitude: 0.5
      score: -

#### Content classification

In [12]:
def classify_content(gcs_file_uri):
    """
    Classifying Content in a String

    Args:
      text_content The text content to analyze.
    """
    
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT
    language = "en"
    document = {
            "gcs_content_uri": gcs_file_uri,
            "type_": type_,
            "language": language,
        }
    
    content_categories_version = (
        language_v1.ClassificationModelOptions.V2Model.ContentCategoriesVersion.V2
    )
        
    response = client.classify_text(
        request={"document": document
            }
    )
    
    print(response)

In [13]:
classify_content(GCS_FILE_PATH)

categories {
  name: "/Arts & Entertainment"
  confidence: 0.8299999833106995
}

