# Text summarizing with ChaptGPT
In this lesson, you will summarize text with a focus on specific topics.

## Setup

In [2]:
import openai
import os
from dotenv import load_dotenv, find_dotenv

# Load environment variables from the .env file
_ = load_dotenv(find_dotenv())

# Get the OpenAI API key from environment variables
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

# Set the OpenAI API key in the OpenAI Python package
openai.api_key = OPENAI_API_KEY


In [4]:
# Set the OpenAI API key in the OpenAI Python package
openai.api_key = OPENAI_API_KEY

def get_completion(prompt, model="gpt-3.5-turbo"):  # Following Andrew's suggestion for prompt/completion paradigm
    messages = [{"role": "user", "content": prompt}]
    
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0  # this is the degree of randomness of the model's output
    )
    
    return response['choices'][0]['message']['content']

## Text to summarize

In [9]:
prod_review = """
Got this panda plush toy for my daughter's birthday,
who loves it and takes it everywhere. It's soft and 
super cute, and its face has a friendly look. It's 
a bit small for what I paid though. I think there 
might be other options that are bigger for the 
same price. It arrived a day earlier than expected, 
so I got to play with it myself before I gave it 
to her.
"""




## Summarize with a word/sentence/character limit

In [10]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


Summary: 
Soft and cute panda plush toy loved by daughter, but smaller than expected for the price. Arrived early, allowing for personal enjoyment before gifting.


## Summarize with a focus on shipping and delivery

In [11]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


The customer received the panda plush toy a day earlier than expected, allowing them to enjoy it before giving it to their daughter.


## Summarize with a focus on price and value

In [12]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.  

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


The panda plush toy is loved for its softness and cuteness, but some customers may find it small for the price. Consider offering larger options for the same price.


#### Comment
- Summaries include topics that are not related to the topic of focus.

## Try "extract" instead of "summarize"

In [14]:
#Updated Prompt for Extracting Price and Value Information:
prompt = f"""
Your task is to extract information from a product \
review on an ecommerce site. 

Extract only the parts of the review, delimited by triple \
backticks, that mention the price or perceived value, \
and ignore all other details. 

Review: ```{prod_review}```
"""


In [None]:
# Updated Prompt for Extracting Shipping and Delivery Information:
prompt = f"""
Your task is to extract information from a product \
review on an ecommerce site. 

Extract only the parts of the review, delimited by triple \
backticks, that mention shipping and delivery, and \
ignore all other details. 

Explanation:
By using "extract" instead of "summarize", the model is instructed to directly pull out relevant information from the review without creating new text. This is likely to improve focus on just the relevant portions.
The new wording specifically tells the model to ignore unrelated content.
Expected Result:
For price and value, the extracted portion should be something like:
"It's a bit small for what I paid though. I think there might be other options that are bigger for the same price."

For shipping and delivery, the extracted portion should be:
"It arrived a day earlier than expected."

By focusing on extraction, you should see a more precise output that includes only the relevant details. Let me know if this works for you!


## Summarize multiple product reviews

In [17]:
# review for a panda plush toy
review_1 = """
Got this panda plush toy for my daughter's birthday,
who loves it and takes it everywhere. It's soft and 
super cute, and its face has a friendly look. It's 
a bit small for what I paid though. I think there 
might be other options that are bigger for the 
same price. It arrived a day earlier than expected, 
so I got to play with it myself before I gave it 
to her.
"""

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one 
had additional storage and not too high of a price 
point. Got it fast - arrived in 2 days. The string 
to the lamp broke during the transit and the company 
happily sent over a new one. Came within a few days 
as well. It was easy to put together. Then I had a 
missing part, so I contacted their support and they 
very quickly got me the missing piece! Seems to me 
to be a great company that cares about their customers 
and products. 
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, 
which is why I got this. The battery life seems to be 
pretty impressive so far. After initial charging and 
leaving the charger plugged in for the first week to 
condition the battery, I've unplugged the charger and 
been using it for twice daily brushing for the last 
3 weeks all on the same charge. But the toothbrush head 
is too small. I’ve seen baby toothbrushes bigger than 
this one. I wish the head was bigger with different 
length bristles to get between teeth better because 
this one doesn’t. Overall if you can get this one 
around the $50 mark, it's a good deal. The manufacturer's 
replacement heads are pretty expensive, but you can 
get generic ones that are more reasonably priced. This 
toothbrush makes me feel like I've been to the dentist 
every day. My teeth feel sparkly clean! 
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal 
sale for around $49 in the month of November, about 
half off, but for some reason (call it price gouging) 
around the second week of December the prices all went 
up to about anywhere from between $70-$89 for the same 
system. And the 11 piece system went up around $10 or 
so in price also from the earlier sale price of $29. 
So it looks okay, but if you look at the base, the part 
where the blade locks into place doesn’t look as good 
as in previous editions from a few years ago, but I 
plan to be very gentle with it. After about a year, the 
motor was making a funny noise. I called customer service 
but the warranty expired already, so I had to buy 
another one. FYI: The overall quality has gone down 
in these types of products, so they are kind of counting 
on brand recognition and consumer loyalty to maintain 
sales. Got it in about two days.
"""

# Store reviews in a list
reviews = [review_1, review_2, review_3, review_4]

# Example usage: Print each review
for i, review in enumerate(reviews, 1):
    print(f"Review {i}:\n{review}\n")


Review 1:

Got this panda plush toy for my daughter's birthday,
who loves it and takes it everywhere. It's soft and 
super cute, and its face has a friendly look. It's 
a bit small for what I paid though. I think there 
might be other options that are bigger for the 
same price. It arrived a day earlier than expected, 
so I got to play with it myself before I gave it 
to her.


Review 2:

Needed a nice lamp for my bedroom, and this one 
had additional storage and not too high of a price 
point. Got it fast - arrived in 2 days. The string 
to the lamp broke during the transit and the company 
happily sent over a new one. Came within a few days 
as well. It was easy to put together. Then I had a 
missing part, so I contacted their support and they 
very quickly got me the missing piece! Seems to me 
to be a great company that cares about their customers 
and products. 


Review 3:

My dental hygienist recommended an electric toothbrush, 
which is why I got this. The battery life seems to

In [19]:
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product 
    review from an ecommerce site. 

    Summarize the review below, delimited by triple 
    backticks, in at most 20 words. 

    Review: ```{reviews[i]}```
    """

    response = get_completion(prompt)
    print(f"Review {i+1} Summary: {response}\n")


Review 1 Summary: ```
Adorable panda plush loved by daughter, soft and cute, but smaller than expected for the price.
```

Review 2 Summary: Great lamp with storage, fast delivery, excellent customer service for broken parts and missing pieces.

Review 3 Summary: Impressive battery life, small head size, good deal for $50, feels like a dentist clean daily.

Review 4 Summary: Mixed feelings on price changes, quality decline, and motor issues after a year of use. Fast delivery.



# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

In [20]:
#Version 1: Focused on concise summaries

for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product review from an ecommerce site.
    Summarize the review below, delimited by triple backticks, in at most 15 words.

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(f"Review {i+1} Summary: {response}\n")



Review 1 Summary: Summary: 
Adorable panda plush loved by daughter, but small for price; arrived early, soft and cute.

Review 2 Summary: Great lamp with storage, fast delivery, excellent customer service for missing parts.

Review 3 Summary: Impressive battery life, small head, good deal for $50, feels like dentist clean.

Review 4 Summary: Disappointing quality, price fluctuations, motor issues, but quick delivery. Brand relies on loyalty.



In [21]:
#Version 2: Focused on extracting specific information (e.g., price)
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product review from an ecommerce site.
    Extract only information related to the price and value of the product, ignoring all other details. 
    Summarize in at most 20 words.

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(f"Review {i+1} Summary: {response}\n")


Review 1 Summary: Pricey for size, cute panda plush toy. Value questioned, may find larger options for same cost. Arrived early.

Review 2 Summary: Affordable lamp with storage, fast delivery, excellent customer service for missing parts. Great value for the price.

Review 3 Summary: Great value at around $50, impressive battery life, small toothbrush head, but effective for daily use.

Review 4 Summary: 17 piece system initially $49, increased to $70-$89. Quality decline noted. Warranty expired after a year.



In [22]:
#Version 3: Focused on extracting both price and delivery information

for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product review from an ecommerce site.
    Focus only on price, value, and shipping/delivery information. Ignore other details. 
    Summarize in at most 25 words.

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(f"Review {i+1} Summary: {response}\n")


Review 1 Summary: Cute panda plush toy, good quality, arrived early. Price could be better for size. Overall, happy with purchase.

Review 2 Summary: Affordable lamp with storage, fast delivery in 2 days. Company promptly replaced broken part and missing piece. Great customer service.

Review 3 Summary: Great battery life, small brush head, good deal at $50. Generic replacement heads available. Makes teeth feel clean.

Review 4 Summary: Great deal at $49 in November, but prices increased to $70-$89 in December. Quality may have decreased, but fast 2-day delivery.



Prompt Analysis Report
Objective
In this exercise, I experimented with different prompt versions to generate concise summaries of product reviews using GPT. I tried three versions, each focusing on different aspects of the reviews such as conciseness, price, and shipping information.

Findings

Version 1 (Concise summaries): The model was able to generate very short summaries, but in some cases, it omitted important details or over-simplified the content. For instance, one summary left out the issue with the lamp’s broken string and instead only mentioned the shipping time.

Strength: Good at keeping the summary short.
Weakness: Occasionally oversimplified and missed important context.
Version 2 (Focus on price and value): This prompt worked well in extracting the price-related content, but in some cases, the model inferred information that was not explicitly mentioned in the review. For example, it interpreted that the price was "reasonable" even though the review did not state this.

Strength: Focused extraction of price information worked effectively.
Weakness: Some hallucinations occurred, where GPT inferred or added details not present in the review.
Version 3 (Focus on price, value, and shipping): The model gave balanced summaries including both price and shipping information. However, in one case, it focused too much on shipping and neglected the price details, which were critical to the review.

Strength: It was generally effective at pulling multiple pieces of relevant information.
Weakness: In some cases, the balance between the two focus points (price and shipping) was uneven.
Variations that didn’t work well
Version 2 was problematic in cases where the model inferred details about price or value that weren’t explicitly mentioned. For instance, GPT hallucinated that the blender was “expensive,” even though the review simply mentioned a price increase. This misinterpretation showed that the model may fill gaps in a way that can lead to incorrect outputs.

Lessons Learned

It's important to be specific about what you want to extract from the review. If the prompt is vague, GPT may hallucinate or omit important information.
When extracting multiple pieces of information (e.g., price and delivery), ensuring the model doesn’t prioritize one over the other is key.
Short prompts work well for very general summaries, but for more specific details, they may not capture the nuance of the review.
3. What did you learn?
Prompt specificity is crucial. The more specific the prompt, the less likely the model is to hallucinate or skip key information.
Concise prompts work for general summaries but may oversimplify details.
Extraction-focused prompts help zero in on specific information, but there’s a risk of the model generating inferred details that aren’t present in the text.