# Customer Sentiment Measurement with CoHere

## Introduction

It's important to keep track of customer sentiment which allows us to understand overall customer satisfaction and their engagement intent. For example, negative mentions on social media and other websites, Facebook or Twitter, can ruin our online reputation and can lead to long-term impacts if we do not take action in time.

In this project, let's use NLP API from CoHere to analyze customer attitudes toward our brand or products.

## Scraping Data from Internet

There are many services as well as libraries supporting crawling and extracting data from the web. There are libraries that let us do web scraping with Python including Scrapy, Urllib, BeautifulSoup (BS4), and LXML.

I assume we already build this platform and have data available in a topic of Kafka.

### Get new Data from Kafka Topic
```python
# Import KafkaConsumer from Kafka library
from kafka import KafkaConsumer

consumer = KafkaConsumer(
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='earliest',
     group_id='my-consumer-1',
)
consumer.subscribe(['topicName'])

while True:
    try:
        records = consumer.poll(10000, 500)
        if records:
            for message in records.values():
                processSentiment(message)
```

### Process Data

Now we have data from one webpage, we can process data and save it if needed.


In [1]:
message = {
    "url": "http://original_url.com",
    "time": "timestamp",
    "content": "This is a review about something",
}

### Target: Names of Brand or Product

We want to keep track of these names of brands or products we are interested

In [2]:
target = ["bissell", "Bissell"]

### Setup Libraries and Functions

In [3]:
import cohere
from cohere.classify import Example

API_key = 'D---------------g' # get free Trial API Key at https://cohere.ai/

co = cohere.Client(API_key) 

examples=[
    Example("I like this", "Positive"),
    Example("I hate this", "Negative"),
    Example("It is okay", "Neutral"),
    Example("it's good", "Positive"),
    Example("It's dead", "Negative"),
    Example("Not very strong", "Neutral"),
]

def classify(inputs):
    response = co.classify(model='large',  
                           inputs=inputs,  
                           examples=examples)
    return response.classifications

def summarize(text):
    response = co.summarize(model='summarize-xlarge',  
                            length='short',
                            text=text)
    return response.summary

def detect_language(texts):
    response = co.detect_language(texts=texts)
    return response.results[0].language_name

### Processing each Data from a WebPage

In [4]:
def processSentiment(message):
    content = message["content"]
    # Detect the language and only process if it is in 'English'
    if detect_language([content]) != "English":
        return
    # first, split into a list of word
    # tokens = co.tokenize(text=content).token_strings
    tokens = content.split()
    # detect if target in this tokens by building hash table
    #     We also can do Aho–Corasick algorithm in large data
    found = False
    unique = {word.strip().lower() for word in tokens}
    for word in target:
        if word in unique:
            found = True
            break
    if found:
        # let's find out the sentiment
        sentiment = classify([content])[0].prediction

        # let's make content concise
        summary = summarize(content)

        print(summary + ": " + sentiment)

        # Then save this result to a DataBase
        # saveToDatabase(message, sentiment, summary)

## Testing

In [5]:
# An real review from Amazon of a product.
message["content"] = (
    "So, I’m the chosen human of an ornery old cat. \n"
    "This cat has begun presenting his disgust of the world by spraying "
    "various locations of our home with the most foul of liquids. \n\n"
    "And this Bissell device is absolutely AMAZING at ridding our "
    "home of his leavings."
)

processSentiment(message)

Bissell is amazing at ridding your home of cat urine.: Positive
