<a href="https://colab.research.google.com/github/royam0820/prompt-engineering/blob/main/l5_inferring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Inferring
In this lesson, you will infer sentiment and topics from product reviews and news articles.


In the context of prompt engineering, **inferring** refers to the process of guiding an AI model to draw conclusions, make interpretations, or deduce information from a given text or data. This involves crafting prompts that instruct the model to go beyond the explicit content and extract implicit meaning, relationships, or insights. Inferring is useful for tasks such as sentiment analysis, understanding motivations, identifying underlying themes, or predicting outcomes based on provided information.



## Key Aspects of Inferring in Prompt Engineering

- **Explicit Instructions**: The prompt should clearly indicate that the model needs to infer information. Using directives like "infer," "deduce," or "conclude" helps set the expectation.

- **Contextual Understanding**: Inferring often requires the model to consider context. Prompts can include background information or set up scenarios to provide the necessary context for accurate inference.

- **Specific Queries**: The prompts can ask specific questions that require inference, such as "What is the sentiment of the following review?" or "Why did the character make this decision?"

- **Evidence-Based Reasoning**: The inference should be based on evidence from the text. Prompts can guide the model to explain its reasoning or cite parts of the text that support its inference.

## Applications of Inferring in AI
- **Sentiment Analysis**: Determining the emotional tone of a piece of text (e.g., positive, negative, neutral).
- **Motivation and Intent**: Understanding the reasons behind a character's or person's actions in a narrative.
- **Theme Identification**: Recognizing underlying themes or messages in a text.
- **Predictive Analysis**: Making predictions based on patterns or trends identified in the data.

## Tips for Effective Inferring Prompts
- Clarity and Precision: Be clear and precise about what needs to be inferred.
- Provide Sufficient Context: Ensure the model has enough information to make accurate inferences.
- Guide Reasoning: Encourage the model to explain its inferences or provide supporting evidence.
- Iterate and Refine: Test and refine prompts to improve the accuracy and relevance of the inferences made by the model.


## Setup

In [2]:
!pip install openai

Collecting openai
  Downloading openai-1.35.3-py3-none-any.whl (327 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m327.4/327.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [3]:
import openai
import os
from google.colab import userdata

In [4]:
from pprint import pprint

In [5]:
print(openai.__version__)

1.35.3


In [6]:
openai_api_key = userdata.get("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = openai_api_key

In [7]:
client = openai.OpenAI()

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content

## Product review text

In [12]:
# produc review about a lamp
lamp_review = """
Needed a nice lamp for my bedroom, and this one had \
additional storage and not too high of a price point. \
Got it fast.  The string to our lamp broke during the \
transit and the company happily sent over a new one. \
Came within a few days as well. It was easy to put \
together.  I had a missing part, so I contacted their \
support and they very quickly got me the missing piece! \
Lumina seems to me to be a great company that cares \
about their customers and products!!
"""

## Sentiment (positive/negative)

In [39]:
# requesting sentiment analysis in the prompt
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
pprint(response)

('The sentiment of the review is positive. The reviewer is satisfied with the '
 'lamp they purchased, mentioning the additional storage, fast delivery, '
 'excellent customer service, and ease of assembly. They also praise the '
 'company for caring about their customers and products.')


In [40]:
# extracting ony the sentiment anslysis
prompt = f"""
What is the sentiment of the following product review,
which is delimited with triple backticks?

Give your answer as a single word, either "positive" \
or "negative".

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

Positive


## Identify types of emotions

In [41]:
# identify a list of emotions
prompt = f"""
Identify a list of emotions that the writer of the \
following review is expressing. Include no more than \
five items in the list. Format your answer as a list of \
lower-case words separated by commas.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

happy, satisfied, grateful, impressed, pleased


## Identify anger

In [42]:
# identify anger
prompt = f"""
Is the writer of the following review expressing anger?\
The review is delimited with triple backticks. \
Give your answer as either yes or no.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

No


## Extract product and company name from customer reviews

In [43]:
# extracting information such as the item and the brand of the item
prompt = f"""
Identify the following items from the review text:
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
    "Item": "lamp",
    "Brand": "Lumina"
}


## Doing multiple tasks at once

In [44]:
# outputting a json format for sentiment, anger (boolean value), item, and brand.
prompt = f"""
Identify the following items from the review text:
- Sentiment (positive or negative)
- Is the reviewer expressing anger? (true or false)
- Item purchased by reviewer
- Company that made the item

The review is delimited with triple backticks. \
Format your response as a JSON object with \
"Sentiment", "Anger", "Item" and "Brand" as the keys.
If the information isn't present, use "unknown" \
as the value.
Make your response as short as possible.
Format the Anger value as a boolean.

Review text: '''{lamp_review}'''
"""
response = get_completion(prompt)
print(response)

{
    "Sentiment": "positive",
    "Anger": false,
    "Item": "lamp",
    "Brand": "Lumina"
}


NB: regarding the output, the boolean value is a boolean value and not a string. Useful, if we want to take this json output into a python directory for example.

## Inferring topics

In [45]:
# the text of the story
story = """
In a recent survey conducted by the government,
public sector employees were asked to rate their level
of satisfaction with the department they work at.
The results revealed that NASA was the most popular
department with a satisfaction rating of 95%.

One NASA employee, John Smith, commented on the findings,
stating, "I'm not surprised that NASA came out on top.
It's a great place to work with amazing people and
incredible opportunities. I'm proud to be a part of
such an innovative organization."

The results were also welcomed by NASA's management team,
with Director Tom Johnson stating, "We are thrilled to
hear that our employees are satisfied with their work at NASA.
We have a talented and dedicated team who work tirelessly
to achieve our goals, and it's fantastic to see that their
hard work is paying off."

The survey also revealed that the
Social Security Administration had the lowest satisfaction
rating, with only 45% of employees indicating they were
satisfied with their job. The government has pledged to
address the concerns raised by employees in the survey and
work towards improving job satisfaction across all departments.
"""

## Infer 5 topics

In [52]:
# from the story above, we asked the inference of 5 topics
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""
response = get_completion(prompt)
print(response)

1. Survey
2. Job satisfaction
3. NASA
4. Social Security Administration
5. Government pledge


In [59]:
response.split(sep=',')

['[1', ' 0', ' 0', ' 1', ' 1]']

Assuming, you have a topic list as dislayed below and you would like to know which news article belongs to that topic list. A way to classify your news information.

In [60]:
topic_list = [
    "nasa", "local government", "engineering",
    "employee satisfaction", "federal government"
]

## Make a news alert for certain topics

In [77]:
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a list of items separated by commas.

Text sample: '''{story}'''
"""

# Assume get_completion is a function that sends the prompt to the language model and gets the response
response = get_completion(prompt)
print("Response:", response)

# Convert the response to lowercase
response_lower = response.lower()

# Count occurrences of each topic in the topic_list within the response
topic_counts = {topic: response_lower.count(topic) for topic in topic_list}

# Print the counts
print("Topic counts:", topic_counts)

Response: Government survey, Public sector employees, Satisfaction rating, NASA, Popular department
Topic counts: {'nasa': 1, 'local government': 0, 'engineering': 0, 'employee satisfaction': 0, 'federal government': 0}


In [None]:
from IPython.display import Markdown, display

In [None]:
# Convert topic_counts to markdown format
markdown_output = "\n".join(f"- **{topic}**: {count}" for topic, count in topic_counts.items())

# Display the markdown output
display(Markdown(markdown_output))

- **nasa**: 1
- **local government**: 0
- **engineering**: 0
- **employee satisfaction**: 0
- **federal government**: 0

NB: this is referred as **zero-shot prompting**, that is without task training, The model hasn't been fine-tuned or trained on specific examples of the task. It relies on its pre-trained knowledge base.

In [None]:
# Check if there is a new NASA story and print alert if true
if topic_counts['nasa'] == 1:
    print("ALERT: New NASA story!")

ALERT: New NASA story!


NB: from the ouput above, the news article `story` can be classified as : Nasa, employee's satisfaction, and government.

## Using a JSON format instead of a list

In [80]:
import json

In [78]:
# Prompt for the language model
prompt = f"""
Determine five topics that are being discussed in the \
following text, which is delimited by triple backticks.

Make each item one or two words long.

Format your response as a JSON object with the topics as keys and their counts as values.

Text sample: '''{story}'''
"""

In [81]:
# Get the response from the language model
response = get_completion(prompt)
print("Response:", response)

# Convert the response to a dictionary
topic_counts = json.loads(response)

# Convert topic_counts to markdown format
markdown_output = "\n".join(f"- **{topic}**: {count}" for topic, count in topic_counts.items())

# Display the markdown output
display(Markdown(markdown_output))

# Check if there is a new NASA story and print alert if true
if topic_counts.get('nasa', 0) == 1:
    print("ALERT: New NASA story!")

Response: {
  "survey": 1,
  "government": 1,
  "public sector": 1,
  "NASA": 1,
  "satisfaction rating": 1
}


- **survey**: 1
- **government**: 1
- **public sector**: 1
- **NASA**: 1
- **satisfaction rating**: 1

## Try experimenting on your own!

In [17]:
text = f"""
Despite the setbacks, Jane was determined to finish the project on time. She stayed late every night and worked through weekends.
"""

In [18]:
prompt = f"""
based on ```{text}```,  what can you infer about Jane's attitude towards the project?
"""
response = get_completion(prompt)
pprint(response)

('From the given statement, it can be inferred that Jane is very committed and '
 'dedicated to the project. Despite facing setbacks, she remains determined to '
 'complete the project on time. Her willingness to stay late every night and '
 'work through weekends shows that she is highly motivated and focused on '
 "achieving her goal. Overall, Jane's attitude towards the project can be "
 'described as determined, hardworking, and persistent.')


In [None]:
# identifying a sentiment - and making a choice between positive or negative sentiment
prompt = f"""
based on ```{text}```,  what can you infer about Jane's attitude towards the project? What sentiment can you suggest.
Give only one sentiment.
"""
response = get_completion(prompt)
pprint(response)

'Dedication.'


In [25]:
# identifying a sentiment
prompt = f"""
based on ```{text}```,  what can you infer about Jane's attitude towards the project?
Identify a list of emotions in bullet points.
"""
response = get_completion(prompt)
pprint(response)


'- Determination\n- Perseverance\n- Commitment\n- Dedication'


In [32]:
type(response)

list

In [35]:
readable_output = "\n".join(f"- {item}" for item in response)

In [36]:
type(readable_output)

str

In [38]:
from IPython.display import Markdown, display
# Display the output as markdown
display(Markdown(readable_output))


- Determination
- Perseverance
- Commitment
- Dedication