# Chapter 5 Inference

In this lesson, you will infer sentiment and topics from product reviews and news articles.

<div class="toc">
<ul class="toc-item">
<li><span><a href="#一引言" data-toc-modified-id="一、引言">一、引言</a></span></li>
<li>
<span><a href="#二情感因而信息抽取" data-toc-modified-id="二、情感因而信息抽取">二、情感因而信息抽取</a></span>
<ul class="toc-item">
<li><span><a href="#21-情感偏偏分析" data-toc-modified-id="2.1 情感偏偏分析">2.1 情感偏偏分析</a></span></li> 
<li><span><a href="#22-识别情感型" data-toc-modified-id="2.2 识别情感型">2.2 识别情感型</a></span></li>
<li><span><a href="#23-Identify anger" data-toc-modified-id="2.3 Identify anger">2.3 Identify anger</a></span></li>
<li><span><a href="#24-Product information extraction" data-toc-modified-id="2.4 Product information extraction">2.4 Product information extraction</a></span></li>
<li><span><a href="#25-Comprehensive completion of tasks" data-toc-modified-id="2.5 Comprehensive completion of tasks">2.5 Comprehensive completion of tasks</a></span></li>
</ul>
</li>
<li><span><a href="#Three topic inference" data-toc-modified-id="Three, topic inference">Three, topic inference</a></span></li>
<ul class="toc-item">
<li><span><a href="#31-Infer discussion topic" data-toc-modified-id="3.1 Infer discussion topic">3.1 Infer the topic of discussion</a></span></li> 
<li><span><a href="#32-Make news alerts for specific topics" data-toc-modified-id="3.2 Make news alerts for specific topics">3.2 Make news alerts for specific topics</a></span></li>
</ul>
</ul>
</div>

## 1. Introduction

An inference task can be thought of as a process where a model receives text as input and performs some kind of analysis. This involves extracting labels, extracting entities, understanding text sentiment, and so on. If you want to extract positive or negative sentiment from a piece of text, in a traditional machine learning workflow, you need to collect a labeled dataset, train a model, figure out how to deploy the model in the cloud, and perform inference. This may work well, but it takes a lot of work to perform the entire process. And for each task, such as sentiment analysis, extracting entities, and so on, you need to train and deploy a separate model.

A very good feature of LLM is that for many of these tasks, you only need to write a prompt to start producing results without doing a lot of work. This greatly speeds up application development. You can also use only one model and one API to perform many different tasks without having to figure out how to train and deploy many different models.

In [1]:
import openai
# Import third-party libraries

openai.api_key = "sk-..."
# Set API_KEY, please replace it with your own API_KEY

In [2]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

## 2. Sentiment Inference and Information Extraction
### 2.1 Sentiment Classification

Take the comments about a desk lamp on an e-commerce platform as an example, the emotions it conveys can be classified into two categories (positive/negative).

In [3]:
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

In [4]:
# Chinese
lamp_review_zh = """
我需要一盏漂亮的卧室灯，这款灯具有额外的储物功能，价格也不算太高。\
我很快就收到了它。在运输过程中，我们的灯绳断了，但是公司很乐意寄送了一个新的。\
几天后就收到了。这款灯很容易组装。我发现少了一个零件，于是联系了他们的客服，他们很快就给我寄来了缺失的零件！\
在我看来，Lumina 是一家非常关心顾客和产品的优秀公司！
"""

Now let's write a prompt to classify the sentiment of this review. If I want the system to tell me what the sentiment of this review is, I just write the prompt "What is the sentiment of the following product review" with the usual separators and the review text, etc.

Then let's run it. The results show that the sentiment of this product review is positive, which seems to be very correct. Although this lamp is not perfect, this customer seems to be very satisfied. This seems to be a great company that cares about its customers and products, and it can be assumed that positive sentiment seems to be the right answer.

In [5]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Review text: ```{lamp_review}```
"""
response = get_completion(prompt)
print(response)

The sentiment of the product review is positive.


In [6]:
# Chinese
prompt = f"""
以下用三个反引号分隔的产品评论的情感是什么？

评论文本: ```{lamp_review_zh}```
"""
response = get_completion(prompt)
print(response)

情感是积极的/正面的。


If you want to give a more concise answer so that it is easier to post-process, you can add another command to the above prompt: *Answer with one word: "Positive" or "Negative"*. This will only print the word "positive", which makes the output more uniform and convenient for subsequent processing.

In [7]:
prompt = f"""
What is the sentiment of the following product review, 
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: ```{lamp_review}```
"""
response = get_completion(prompt)
print(response)

positive


In [8]:
prompt = f"""
以下用三个反引号分隔的产品评论的情感是什么？

用一个单词回答：「正面」或「负面」。

评论文本: ```{lamp_review_zh}```
"""
response = get_completion(prompt)
print(response)

正面


### 2.2 Identify sentiment type

Still using the lamp comment, let’s try another prompt. This time I need the model to identify the sentiment expressed by the comment author and summarize it into a list of no more than five items.

In [9]:
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: ```{lamp_review}```
"""
response = get_completion(prompt)
print(response)

satisfied, grateful, impressed, content, pleased


In [10]:
# Chinese
prompt = f"""
识别以下评论的作者表达的情感。包含不超过五个项目。将答案格式化为以逗号分隔的单词列表。

评论文本: ```{lamp_review_zh}```
"""
response = get_completion(prompt)
print(response)

满意,感激,信任,赞扬,愉快


Large language models are very good at extracting specific things from a piece of text. In the example above, the sentiment expressed by the reviews helps understand how customers view a specific product.

### 2.3 Identifying Anger

For many businesses, it is important to know if a customer is very angry. So the following classification question arises: Is the author of the following review expressing anger? Because if someone is really angry, it may be worth paying extra attention and having the customer support or customer success team contact the customer to understand the situation and resolve the issue for the customer.

In [11]:
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: ```{lamp_review}```
"""
response = get_completion(prompt)
print(response)

No


In [12]:
# Chinese
prompt = f"""
以下评论的作者是否表达了愤怒？评论用三个反引号分隔。给出是或否的答案。

评论文本: ```{lamp_review_zh}```
"""
response = get_completion(prompt)
print(response)

否


In the example above, the customer is not angry. Note that if you were to build all of these classifiers using regular supervised learning, you wouldn't be able to do this in a few minutes. We encourage you to try changing some of these prompts, perhaps asking if the customer expressed joy, or asking if there were any missing parts, and see if you can get the prompt to make different inferences about this light fixture review.

### 2.4 Product Information Extraction

Next, let's extract richer information from customer reviews. Information extraction is a part of Natural Language Processing (NLP) that is concerned with extracting certain things you want to know from text. So in this prompt, I asked it to identify the following: the name of the company that bought the item and the one that made it.

Similarly, if you are trying to summarize many reviews for an online shopping e-commerce site, for those reviews, figuring out what the item is, who made the item, and figuring out the positive and negative sentiments can help track user sentiment trends for a particular item or manufacturer.

In the example below, we ask it to format the response as a JSON object with item and brand as keys.

In [13]:
prompt = f"""
Identify the following items from the review text: 
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys. 
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
  
Review text: ```{lamp_review}```
"""
response = get_completion(prompt)
print(response)

{
  "Item": "lamp with additional storage",
  "Brand": "Lumina"
}


In [14]:
# Chinese
prompt = f"""
从评论文本中识别以下项目：
- 评论者购买的物品
- 制造该物品的公司

评论文本用三个反引号分隔。将你的响应格式化为以 “物品” 和 “品牌” 为键的 JSON 对象。
如果信息不存在，请使用 “未知” 作为值。
让你的回应尽可能简短。
  
评论文本: ```{lamp_review_zh}```
"""
response = get_completion(prompt)
print(response)

{
  "物品": "卧室灯",
  "品牌": "Lumina"
}


As shown above, it will say that the item is a bedroom lamp and the brand is Luminar, which you can easily load into a Python dictionary and then do other things with this output.

### 2.5 Comprehensive Task Completion

It took 3 or 4 prompts to extract all the above information, but it is actually possible to write a single prompt to extract all of this information at the same time.

In [15]:
prompt = f"""
Identify the following items from the review text: 
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: ```{lamp_review}```
"""
response = get_completion(prompt)
print(response)

{
  "Sentiment": "positive",
  "Anger": false,
  "Item": "lamp with additional storage",
  "Brand": "Lumina"
}


In [16]:
# Chinese
prompt = f"""
从评论文本中识别以下项目：
- 情绪（正面或负面）
- 审稿人是否表达了愤怒？（是或否）
- 评论者购买的物品
- 制造该物品的公司

评论用三个反引号分隔。将您的响应格式化为 JSON 对象，以 “Sentiment”、“Anger”、“Item” 和 “Brand” 作为键。
如果信息不存在，请使用 “未知” 作为值。
让你的回应尽可能简短。
将 Anger 值格式化为布尔值。

评论文本: ```{lamp_review_zh}```
"""
response = get_completion(prompt)
print(response)

{
  "Sentiment": "正面",
  "Anger": false,
  "Item": "卧室灯",
  "Brand": "Lumina"
}


In this example, we tell it to format the outrage value as a boolean and then output a JSON. You can try different variations on your own, or even try completely different comments to see if you can still extract the content accurately.

## 3. Topic Inference

Another cool application of large language models is to infer topics. Given a long text, what is this text about? What are the topics? Take the following fictional newspaper report as an example.

In [17]:
story = """
In a recent survey conducted by the government, 
public sector employees were asked to rate their level 
of satisfaction with the department they work at. 
The results revealed that NASA was the most popular 
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings, 
stating, "I'm not surprised that NASA came out on top. 
It's a great place to work with amazing people and 
incredible opportunities. I'm proud to be a part of 
such an innovative organization."

The results were also welcomed by NASA's management team, 
with Director Tom Johnson stating, "We are thrilled to 
hear that our employees are satisfied with their work at NASA. 
We have a talented and dedicated team who work tirelessly 
to achieve our goals, and it's fantastic to see that their 
hard work is paying off."

The survey also revealed that the 
Social Security Administration had the lowest satisfaction 
rating, with only 45% of employees indicating they were 
satisfied with their job. The government has pledged to 
address the concerns raised by employees in the survey and 
work towards improving job satisfaction across all departments.
"""

In [18]:
# Chinese
story_zh = """
在政府最近进行的一项调查中，要求公共部门的员工对他们所在部门的满意度进行评分。
调查结果显示，NASA 是最受欢迎的部门，满意度为 95％。

一位 NASA 员工 John Smith 对这一发现发表了评论，他表示：
“我对 NASA 排名第一并不感到惊讶。这是一个与了不起的人们和令人难以置信的机会共事的好地方。我为成为这样一个创新组织的一员感到自豪。”

NASA 的管理团队也对这一结果表示欢迎，主管 Tom Johnson 表示：
“我们很高兴听到我们的员工对 NASA 的工作感到满意。
我们拥有一支才华横溢、忠诚敬业的团队，他们为实现我们的目标不懈努力，看到他们的辛勤工作得到回报是太棒了。”

调查还显示，社会保障管理局的满意度最低，只有 45％的员工表示他们对工作满意。
政府承诺解决调查中员工提出的问题，并努力提高所有部门的工作满意度。
"""

### 3.1 Inferring Discussion Topics

The above is a fictitious newspaper article about how government workers feel about the agency they work for. We can have it identify five topics being discussed, describe each topic in one or two words, and format the output as a comma-delimited list.

In [19]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long. 

Format your response as a list of items separated by commas.

Text sample: ```{story}```
"""
response = get_completion(prompt)
print(response)

government survey, public sector employees, job satisfaction, NASA, Social Security Administration


In [20]:
response.split(sep=',')

['government survey',
 ' public sector employees',
 ' job satisfaction',
 ' NASA',
 ' Social Security Administration']

In [21]:
# Chinese
prompt = f"""
确定以下给定文本中讨论的五个主题。

每个主题用1-2个单词概括。

输出时用逗号分割每个主题。

给定文本: ```{story_zh}```
"""
response = get_completion(prompt)
print(response)

调查结果, NASA, 社会保障管理局, 员工满意度, 政府承诺


### 3.2 Making news alerts for specific topics

Suppose we have a news site or something similar, and here are the topics we are interested in: NASA, local government, engineering, employee satisfaction, federal government, etc. Suppose we want to figure out, given a news article, what topics are covered in it. We can use a prompt like this: Determine whether each item in the following list of topics is a topic in the following text. Give a list of answers as 0 or 1.

In [22]:
topic_list = [
    "nasa", "local government", "engineering", 
    "employee satisfaction", "federal government"
]

In [23]:
prompt = f"""
Determine whether each item in the following list of \
topics is a topic in the text below, which
is delimited with triple backticks.

Give your answer as list with 0 or 1 for each topic.\

List of topics: {", ".join(topic_list)}

Text sample: ```{story}```
"""
response = get_completion(prompt)
print(response)

nasa: 1
local government: 0
engineering: 0
employee satisfaction: 1
federal government: 1


In [24]:
topic_dict = {i.split(': ')[0]: int(i.split(': ')[1]) for i in response.split(sep='\n')}
if topic_dict['nasa'] == 1:
    print("ALERT: New NASA story!")

ALERT: New NASA story!


In [25]:
# Chinese
prompt = f"""
判断主题列表中的每一项是否是给定文本中的一个话题，

以列表的形式给出答案，每个主题用 0 或 1。

主题列表：美国航空航天局、当地政府、工程、员工满意度、联邦政府

给定文本: ```{story_zh}```
"""
response = get_completion(prompt)
print(response)

美国航空航天局：1
当地政府：0
工程：0
员工满意度：1
联邦政府：1


As you can see, the story is about NASA, employee satisfaction, and the federal government, but not about local government, engineering. This is sometimes called a zero-shot learning algorithm in machine learning because we didn't give it any labeled training data. Just from the prompt, it can determine which topics are covered in the news article.

If we want to generate a news alert, we can also use this process to process news. Let's say I really like the work NASA does, and I can build a system like this to output a reminder every time NASA news comes out.

In [26]:
topic_dict = {i.split('：')[0]: int(i.split('：')[1]) for i in response.split(sep='\n')}
if topic_dict['美国航空航天局'] == 1:
    print("提醒: 关于美国航空航天局的新消息")

提醒: 关于美国航空航天局的新消息


That’s all about inference, in just a few minutes we can build multiple systems for reasoning about text, which previously took a skilled machine learning developer days or even weeks. This is very exciting, both for skilled machine learning developers and for beginners, you can use Prompt to build and get started with fairly complex natural language processing tasks very quickly.