# Sentiment Analysis with ChatGPT

Sentiment Analysis is sort of like the "Hello, world!" of Natural Language Processing (NLP), but luckily for us, it's a bit more fun than just echoing out a string.

This notebook will guide you through analyzing sentiment with ChatGPT and discuss some of the differences between how you can approach this problem with a generative AI like ChatGPT versus how you might have approached this problem in the past.

## What is sentiment analysis?

Sentiment Analysis is a way of analyzing some text to determine if it's positive, negative, or neutral.

This is the kind of thing that's pretty easy for a human who understands the language the text is written in, but it can be hard for a computer to really understand the underlying meaning behind the text.

### Examples

1. "I saw that movie." - Neutral
2. "I love that movie." - Positive
3. "I hate that movie." - Negative

## How do we analyze sentiment?

We'll start with some housekeeping first by making sure that our dependencies are ready.

For this demo, we'll start out by exploring a more traditional approach that uses the Python Natural Language Toolkit (NLTK) and then we'll see how our approach might change when we use ChatGPT via the OpenAI SDK instead.

In [16]:
%%capture

%pip install openai nltk ipywidgets numpy requests-cache

# standard library
import os
import requests as request

# data science
import numpy as np

# caching
import requests_cache

# traditional nlp
import nltk

# modern llm
import openai

# jupyter notebook widgets
import ipywidgets as pywidgets

# project-specific widgets
from widgets.simple import simpleAnalysisWidget
from widgets.config import modelDropdown, apiKeyInput, apiKeyUpdateButton
from widgets.sample import sampleSizeSlider, sampleSizeWarningLabel
from widgets.advanced import advancedAnalysisWidget, configureOpenAi

# project-specific utilities
from utils.obfuscate import obfuscateKey

nltk.download('vader_lexicon')

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# we'll only request new responses during this session
# if previous responses are more than 15 minutes old
REQUEST_CACHE_EXPIRATION_SECONDS = 60 * 15

STORY_SAMPLE_SIZE = 5

# we'll cache our hacker news api requests for 30 minutes
session = requests_cache.CachedSession('hackernews_cache', expire_after=REQUEST_CACHE_EXPIRATION_SECONDS)

## Simple sentiment analysis with NLTK

Let's take a look at a simple example of sentiment analysis with `nltk` and VADER.

The `SentimentIntensityAnalyzer` returns an object with positive, negative, and neutral scores for the given text as well as a combined `compound` score computed from the other three.

For this basic example, we're going to rely on the `compound` score and use a naive rating scale.

In [2]:
# import the VADER sentiment analyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# instantiate the sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

# analyze the sentiment of a string of text
def analyzeSentiment(text):
  if not text:
    return('')

  # use VADER to get the +/- sentiment of the string
  sentiment = analyzer.polarity_scores(text)

  # map the sentiment to a human readable label
  if sentiment['compound'] >= 0.75:
    return('Very Positive')
  elif sentiment['compound'] >= 0.4:
    return('Positive')
  elif sentiment['compound'] >= 0.1:
    return('Leaning Positive')
  elif sentiment['compound'] <= -0.1 and sentiment['compound'] > -0.4:
    return('Leaning Negative')
  elif sentiment['compound'] <= -0.4 and sentiment['compound'] > -0.75:
    return('Negative')
  elif sentiment['compound'] <= -0.75:
    return('Very Negative')
  else:
    return('Neutral')

Now let's test this analyzer with some example strings.

In [3]:
# some simple test statements for our analyzer
statements = [
  'I love that movie.',
  'I hate that movie.',
  'I like that movie.',
  'I dislike that movie.',
  'I saw that movie.',
]

for statement in statements:
  print(f"{statement} ({analyzeSentiment(statement)})")

I love that movie. (Positive)
I hate that movie. (Negative)
I like that movie. (Leaning Positive)
I dislike that movie. (Leaning Negative)
I saw that movie. (Neutral)


We've wired the input below up to the same analyzer function from above. Type in some text and see how the analyzer responds.

In [4]:
# this code cell is just used to display a widget
# that uses the analyzeSentiment function we created
display(simpleAnalysisWidget)

Box(children=(Text(value='', placeholder='Type something'), Output()), layout=Layout(align_items='center', dis…

## How Sentiment Analysis Works

Sentiment analysis, like most text analysis involves a multistep process:

1. **Stemming / Lemmatization**: reduces the words in the text to their root forms to simplify comparison between different forms of the same words
   1. **Stemming**: removes suffixes as an attempt to reduce words to their root forms
   2. **Lemmatization**: uses a morphological analysis of words to reduce them to their root forms
2. **Tokenization**: breaks the text into individual units of meaning called tokens
3. **Vectorization**: converts the tokens into a id that can be used for comparison
4. **Comparison**: compares the tokens to a known set of tokens to determine the sentiment

In this case we're taking advantage of an existing model that has been trained to analyze sentiment in text. If we wanted to build our own from scratch, it would be a more complicated process and require training data to feed into the model.

With the advent of Generative Pre-Trained Transformer (GPT) models like those that power ChatGPT, and other transformer models that have exploded in popularity since, we can leverage the powerful inference and predictive capabilities of these models to perform sentiment analysis without having to train our own model, and we can even leverage some prompting techniques to quickly teach the model how to perform more unique analyses.

## A more interesting example

So, let's see how this works with text generated by other humans without knowing that we're trying to analyze the sentiment of their text.

For this example, we'll pull in a random sample of `20` of the [top stories](https://github.com/HackerNews/API#new-top-and-best-stories) on [Hacker News](https://news.ycombinator.com/) and analyze the sentiment of each submission's title.

You can run the cell below a few times to see different samples of the top stories.

In [5]:
sampleSizeSlider.value = STORY_SAMPLE_SIZE

def updateSampleSize(change):
  global STORY_SAMPLE_SIZE
  STORY_SAMPLE_SIZE = change['new']

sampleSizeWidgetContainer = pywidgets.VBox([
  sampleSizeSlider,
  sampleSizeWarningLabel
])

sampleSizeSlider.observe(updateSampleSize, names='value')

display(sampleSizeWidgetContainer)

VBox(children=(IntSlider(value=5, continuous_update=False, description='Sample Size:', min=1), Label(value='Wa…

In [6]:
# we'll use this Array to aggregate the story titles so we can loop through and analyze them
storyTitles = []

# we'll use the request cache session we created earlier to make sure that this response is fast
# when the cell runs again - 
topStoryIdsRequest = session.get('https://hacker-news.firebaseio.com/v0/topstories.json')

if topStoryIdsRequest.status_code != 200:
  print('There was a problem getting the top stories from Hacker News')
  exit()

topStoryIds = topStoryIdsRequest.json()

storyIds = np.array(topStoryIds)[np.random.choice(len(topStoryIds), STORY_SAMPLE_SIZE, replace=False)]

for storyId in storyIds:
  # we'll use the same request cache so that we don't have to request a story's details more than once
  storyRequest = session.get(f'https://hacker-news.firebaseio.com/v0/item/{storyId}.json')

  if storyRequest.status_code != 200:
    continue
  else:
    story = storyRequest.json()

    if 'title' in story:
      storyTitles.append(story['title'])

# iterate through titles and analyze the sentiment
for storyTitle in storyTitles:
  print(f"{storyTitle} ({analyzeSentiment(storyTitle)})")

F-Zero 99 (Neutral)
Faster Sorting Beyond DeepMind’s AlphaDev (Neutral)
Trends in Remote Employee Salaries (Neutral)
We Are Not Sustainable (Neutral)
Quantum Resistance and the Signal Protocol (Neutral)


## How ChatGPT works

Break down how ChatGPT turns text into tokens and then predicts the most likely tokens to follow the given text so far.

## Prompt engineering

Describe prompt engineering and break down system, user, and assistant prompts

## Zero shot and few shot prompting

Discuss the differences between few show and zero shot and give some examples

In [7]:
# this code cell is just used to display a widget
# for us to configure the OpenAI API and to update
# your API key if you need to change it
apiKeyInput.value = obfuscateKey(OPENAI_API_KEY)

def updateApiKey(event):
  global OPENAI_API_KEY

  # store the updated key in our global variable
  OPENAI_API_KEY = apiKeyInput.value

  # obfuscate the displayed key
  apiKeyInput.value = obfuscateKey(OPENAI_API_KEY)


openAiConfigWidget = pywidgets.Box([apiKeyInput, apiKeyUpdateButton, modelDropdown], layout=pywidgets.Layout(display='flex', flex_direction='column', align_items='center', width='100%'))

apiKeyUpdateButton.on_click(updateApiKey)

display(openAiConfigWidget)

Box(children=(Text(value='sk-I**********yeC3', description='OpenAI API Key', placeholder='Enter your OpenAI AP…

In [8]:
BASIC_SYSTEM_PROMPT = """
You are SentiNet, an advanced AI system for detecting the sentiment conveyed in user-generated text.

The user will provide you with a prompt, and you will respond with the sentiment of that prompt.

Be concise.
"""

def basicChatGptSentiment(prompt, model=modelDropdown.value):
    messages = [{ "role": "system", "content": BASIC_SYSTEM_PROMPT }]

    messages.append({"role": "user", "content": prompt})

    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0,
    )

    return response.choices[0].message["content"]

In [9]:
if OPENAI_API_KEY:
  for storyTitle in storyTitles:
    nltkSentiment = analyzeSentiment(storyTitle)
    openAiSentiment = basicChatGptSentiment(storyTitle)

    print(f"{storyTitle}\nNLTK: {nltkSentiment}\n{modelDropdown.value}: {openAiSentiment}\n---")
else:
  print('Please enter your OpenAI API key above and rerun this cell')

F-Zero 99
NLTK: Neutral
gpt-3.5-turbo: Neutral
---
Faster Sorting Beyond DeepMind’s AlphaDev
NLTK: Neutral
gpt-3.5-turbo: Excitement
---
Trends in Remote Employee Salaries
NLTK: Neutral
gpt-3.5-turbo: Neutral
---
We Are Not Sustainable
NLTK: Neutral
gpt-3.5-turbo: Negative
---
Quantum Resistance and the Signal Protocol
NLTK: Neutral
gpt-3.5-turbo: Neutral
---


In [10]:
ADVANCED_SYSTEM_PROMPT = """
You are SentiNet, an advanced AI system for detecting the sentiment conveyed in user-generated text.

The user will provide you with a prompt, and you will analyze it following these steps:

1. Analyze the prompt for relevant emotion, tone, affinity, sarcasm, irony, etc.
2. Analyze the likely emotional state of the author based on those findings from step 1
3. Summarize the emotional state and sentiment of the prompt based on your findings using 5 or less names for emotions using lowercase letters and separating each emotional state with a comma

Only return the output from the final step to the user.

Be concise.
"""

def advancedChatGptSentiment(prompt, model=modelDropdown.value):
    messages = [{ "role": "system", "content": ADVANCED_SYSTEM_PROMPT }]

    messages.append({"role": "user", "content": prompt})

    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0.15,
    )

    return response.choices[0].message["content"]

In [11]:
if OPENAI_API_KEY:
  for storyTitle in storyTitles:
    nltkSentiment = analyzeSentiment(storyTitle)
    openAiSentiment = advancedChatGptSentiment(storyTitle)

    print(f"{storyTitle}\nNLTK: {nltkSentiment}\n{modelDropdown.value}: {openAiSentiment}\n---")
else:
  print('Please enter your OpenAI API key above and rerun this cell')

F-Zero 99
NLTK: Neutral
gpt-3.5-turbo: neutral
---
Faster Sorting Beyond DeepMind’s AlphaDev
NLTK: Neutral
gpt-3.5-turbo: neutral
---
Trends in Remote Employee Salaries
NLTK: Neutral
gpt-3.5-turbo: neutral
---
We Are Not Sustainable
NLTK: Neutral
gpt-3.5-turbo: negative
---
Quantum Resistance and the Signal Protocol
NLTK: Neutral
gpt-3.5-turbo: neutral
---


In [17]:
# this code cell is just used to display a widget
# that uses the analyzeSentiment function we created
# as well as the advancedChatGptSentiment function
configureOpenAi(OPENAI_API_KEY, modelDropdown.value)

display(advancedAnalysisWidget)

VBox(children=(HBox(children=(Text(value="Ukraine's counteroffensive has breached Russian defenses, but progre…