# Text Sentiment Analysis & Entity Extraction using Google Cloud's Natural Language API

## Overview

Google Cloud's Natural Language API is a pretrained ML api that can be used to derive insights from text files. This is an easy to use api that user's can use to quickly perform sentiment from a text string or a document, perform entity extraction and classification as well.

## Objective

This notebook illustrates an example on how to perform sentiment analysis, entity extraction and classification using a text files stored in the cloud storage bucket using api client library. Notebook covers following topics

- API setup
- Analyzing Sentiment
- Analyzing Entities
- Analyzing Entity Sentiment
- Content Classification

## Data 

I am using `BLIND FATE By Toby Bradley` in txt format. This txt file is saved in my project's cloud storage bucket

## Setup Resources

## TODO: LIST STEPS FOR API SETUP

### Install packages and libraries

In [8]:
pip install --upgrade google-cloud-language

E0104 19:33:08.106460415   28196 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Collecting google-cloud-language
  Downloading google_cloud_language-2.7.0-py2.py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.3/81.3 KB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
Collecting google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0
  Downloading google_api_core-2.11.0-py3-none-any.whl (120 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.3/120.3 KB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
Collecting google-auth<3.0dev,>=2.14.1
  Downloading google_auth-2.15.0-py2.py3-none-any.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.0/177.0 KB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: google-auth, google-api-core, google-cloud-language
  Attempting uninstall: google-auth
    Found existing installation: google-auth 2.6.0
    Uninstalling google-auth-2.6.0:
      Successfully unin

In [None]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
REGION = "us-central1"
BUCKET_NAME = "<YOUR-BUCKET-NAME>" #REPLACE BUCKET NAME
FILE_PATH = "<path-to-file>/blind_fate.txt" #REPLACE PATH TO FILE 

In [None]:
### Create bucket and upload the file

! gsutil mb -l $REGION -c standard gs://$BUCKET_NAME

! gcloud alpha storage cp $FILE_PATH gs://$BUCKET_NAME/

In [25]:
#GCS_FILE_PATH = f"{BUCKET_NAME}{FILE_PATH}"
GCS_FILE_PATH = 'gs://go-pnishit-assets/AI-ML/01-prebuilt-ml-apis/blind_fate.txt'

#### Analyzing sentiment 

In [None]:
from google.cloud import language_v1

In [21]:
def analyze_sentiment(gcs_file_uri):
    """
    Analyzing Sentiment in text file stored in Cloud Storage

    Args:
      gcs_content_uri Google Cloud Storage URI where the file content is located.
      e.g. gs://[Your Bucket]/[Path to File]
    """
    
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT

    language = "en"
    document = {
        "gcs_content_uri": gcs_file_uri,
        "type_": type_,
        "language": language,
    }
    
    encoding_type = language_v1.EncodingType.UTF8

    response = client.analyze_sentiment(
        request={"document": document, "encoding_type": encoding_type}
    )
    
    # Overall Sentiment
    print(f"Overall document sentiment score: {response.document_sentiment.score}")
    print(f"Overall document sentiment magnitude: {response.document_sentiment.magnitude}")
    
    # Per Sentence Sentiment in the document
    for sentence in response.sentences:
        print(f"Sentence text: {sentence.text.content}.")
        print(f"Sentence sentiment score: {sentence.sentiment.score}.")
        print(f"Sentence sentiment magnitude: {sentence.sentiment.magnitude}.")

In [None]:
analyze_sentiment(GCS_FILE_PATH)

#### Analyze Entities

In [34]:
def analyze_entities(gcs_file_uri):
    """
    Analyzing Entities in text file stored in Cloud Storage

    Args:
      gcs_content_uri Google Cloud Storage URI where the file content is located.
    """
    
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT

    language = "en"
    document = {
        "gcs_content_uri": gcs_file_uri,
        "type_": type_,
        "language": language,
    }
    
    encoding_type = language_v1.EncodingType.UTF8

    entities = client.analyze_entities(
        request={"document": document, "encoding_type": encoding_type}
    )
    
    print(entities)

In [None]:
analyze_entities(GCS_FILE_PATH)

#### Analyze entity sentiment

In [41]:
def analyze_entity_sentiment(gcs_file_uri):
    """
    Analyzing Entity Sentiment in a String

    Args:
      text_content The text content to analyze
    """

    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT
    language = "en"
    document = {
            "gcs_content_uri": gcs_file_uri,
            "type_": type_,
            "language": language,
        }
    
    entity_sentiment = client.analyze_entity_sentiment(
        request={"document": document}
        )
        
    print(entity_sentiment)

In [None]:
analyze_entity_sentiment(GCS_FILE_PATH)

#### Content classification

In [45]:
def classify_content(gcs_file_uri):
    """
    Classifying Content in a String

    Args:
      text_content The text content to analyze.
    """
    
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT # Can use HTML & PLAIN_TEXT
    language = "en"
    document = {
            "gcs_content_uri": gcs_file_uri,
            "type_": type_,
            "language": language,
        }
    
    content_categories_version = (
        language_v1.ClassificationModelOptions.V2Model.ContentCategoriesVersion.V2
    )
        
    response = client.classify_text(
        request={"document": document
            }
    )
    
    print(response)

In [46]:
classify_content(GCS_FILE_PATH)

categories {
  name: "/Arts & Entertainment"
  confidence: 0.8299999833106995
}

