# Text summarizing with ChaptGPT
In this lesson, you will summarize text with a focus on specific topics.

## Setup

In [1]:
! pip install python-dotenv
! pip install openai

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1
[0mCollecting openai
  Downloading openai-1.51.2-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting typing-extensions<5,>=4.11 (from openai)
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.51.2-py3-none-any.whl (383 kB

In [2]:
from openai import OpenAI
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_API_KEY = 

In [3]:
client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)


def get_completion(prompt, model="gpt-3.5-turbo"): # Andrew mentioned that the prompt/ completion paradigm is preferable for this class
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message.content


## Text to summarize

In [4]:
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \ 
super cute, and its face has a friendly look. It's \ 
a bit small for what I paid though. I think there \ 
might be other options that are bigger for the \ 
same price. It arrived a day earlier than expected, \ 
so I got to play with it myself before I gave it \ 
to her.
"""

## Summarize with a word/sentence/character limit

In [5]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


Soft, cute panda plush toy loved by daughter, arrived early. Small for price, but friendly face and quality. Consider larger options for same cost.


## Summarize with a focus on shipping and delivery

In [6]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


The customer was pleased with the early delivery of the panda plush toy, but felt it was slightly small for the price paid.


## Summarize with a focus on price and value

In [7]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.  

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


The panda plush toy is loved for its softness and cuteness, but some customers feel it's a bit small for the price.


#### Comment
- Summaries include topics that are not related to the topic of focus.

## Try "extract" instead of "summarize"

In [8]:
prompt = f"""
Your task is to extract relevant information from \ 
a product review from an ecommerce site to give \
feedback to the Shipping department. 

From the review below, delimited by triple quotes \
extract the information relevant to shipping and \ 
delivery. Limit to 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

Feedback: The product arrived a day earlier than expected, which was a pleasant surprise. Customers may appreciate faster shipping times for future orders.


## Summarize multiple product reviews

In [9]:

review_1 = prod_review 

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one \
had additional storage and not too high of a price \
point. Got it fast - arrived in 2 days. The string \
to the lamp broke during the transit and the company \
happily sent over a new one. Came within a few days \
as well. It was easy to put together. Then I had a \
missing part, so I contacted their support and they \
very quickly got me the missing piece! Seems to me \
to be a great company that cares about their customers \
and products. 
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, \
which is why I got this. The battery life seems to be \
pretty impressive so far. After initial charging and \
leaving the charger plugged in for the first week to \
condition the battery, I've unplugged the charger and \
been using it for twice daily brushing for the last \
3 weeks all on the same charge. But the toothbrush head \
is too small. I’ve seen baby toothbrushes bigger than \
this one. I wish the head was bigger with different \
length bristles to get between teeth better because \
this one doesn’t.  Overall if you can get this one \
around the $50 mark, it's a good deal. The manufactuer's \
replacements heads are pretty expensive, but you can \
get generic ones that're more reasonably priced. This \
toothbrush makes me feel like I've been to the dentist \
every day. My teeth feel sparkly clean! 
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal \
sale for around $49 in the month of November, about \
half off, but for some reason (call it price gouging) \
around the second week of December the prices all went \
up to about anywhere from between $70-$89 for the same \
system. And the 11 piece system went up around $10 or \
so in price also from the earlier sale price of $29. \
So it looks okay, but if you look at the base, the part \
where the blade locks into place doesn’t look as good \
as in previous editions from a few years ago, but I \
plan to be very gentle with it (example, I crush \
very hard items like beans, ice, rice, etc. in the \ 
blender first then pulverize them in the serving size \
I want in the blender then switch to the whipping \
blade for a finer flour, and use the cross cutting blade \
first when making smoothies, then use the flat blade \
if I need them finer/less pulpy). Special tip when making \
smoothies, finely cut and freeze the fruits and \
vegetables (if using spinach-lightly stew soften the \ 
spinach then freeze until ready for use-and if making \
sorbet, use a small to medium sized food processor) \ 
that you plan to use that way you can avoid adding so \
much ice if at all-when making your smoothie. \
After about a year, the motor was making a funny noise. \
I called customer service but the warranty expired \
already, so I had to buy another one. FYI: The overall \
quality has gone done in these types of products, so \
they are kind of counting on brand recognition and \
consumer loyalty to maintain sales. Got it in about \
two days.
"""

reviews = [review_1, review_2, review_3, review_4]

In [10]:
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product \ 
    review from an ecommerce site. 

    Summarize the review below, delimited by triple \
    backticks in at most 20 words. 

    Review: ```{reviews[i]}```
    """

    response = get_completion(prompt)
    print(i, response, "\n")

0 Soft, cute panda plush loved by daughter, but small for price. Arrived early, friendly face. 

1 Summary: 
Versatile lamp with storage, fast delivery, excellent customer service for missing parts. Great value for price. 

2 Impressive battery life, small toothbrush head, good deal for $50, generic replacement heads available, leaves teeth feeling clean. 

3 17-piece system on sale for $49, quality decline, motor issue after a year, price increase, customer service, brand loyalty. 



# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

In [11]:
# Version 1: Summarizing with a Focus on Product Quality
# We will focus on summarizing each review by highlighting the product's quality.

# Summarizing with a focus on product quality
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of the product quality mentioned in a product review from an ecommerce site.

    Summarize the review below, delimited by triple backticks, focusing on product quality in at most 20 words.

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(i, response, "\n")


0 Summary: 
Soft, cute panda plush toy with friendly face. Small for price, but good quality. Arrived early. 

1 Product quality: Lamp with storage, affordable, fast delivery. String broke but replaced quickly. Missing part resolved promptly. Great company. 

2 The electric toothbrush has impressive battery life, but the small head size is a drawback. Good deal for $50. 

3 Product quality: Blade locking mechanism not as good as previous editions, motor made funny noise after a year. 



In [12]:
# Version 2: Extracting Information About Customer Service
# Here, the prompt will focus on summarizing any information related to customer service and post-purchase experience.

# Extracting customer service-related information
for i in range(len(reviews)):
    prompt = f"""
    Your task is to extract information about customer service from a product review on an ecommerce site.

    Summarize the review below, delimited by triple backticks, with a focus on customer service, in at most 20 words.

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(i, response, "\n")


0 Customer service exceeded expectations with early delivery, but product size was smaller than expected for the price. 

1 Great customer service - fast shipping, quick replacement of broken parts, and efficient support for missing pieces. 

2 Customer service not mentioned in review. 

3 Customer service was contacted for a motor issue after warranty expired, leading to a new purchase. Delivery was quick. 



In [13]:
# Version 3: Extracting Pricing Details
# This version will focus on extracting the pricing and value information from the review.

# Extracting pricing-related information
for i in range(len(reviews)):
    prompt = f"""
    Your task is to extract information about the price and value of the product from an ecommerce review.

    Summarize the review below, delimited by triple backticks, with a focus on price and value, in at most 20 words.

    Review: ```{reviews[i]}```
    """
    response = get_completion(prompt)
    print(i, response, "\n")


0 Summary: Panda plush toy is soft and cute, but small for the price. Other options may offer better value. Arrived early. 

1 Affordable lamp with storage, fast delivery, excellent customer service for missing parts. Great value for price. 

2 The reviewer finds the electric toothbrush's battery life impressive. They suggest a $50 price point is a good deal. 

3 The product was initially on sale for $49 in November, but price increased to $70-$89 in December. Quality declined. 



Step 2: One Page Report Summarizing the Findings
1. Overview of Prompt Experiments:
In this exercise, we experimented with different versions of summarizing prompts for product reviews on an ecommerce site. The prompts focused on different aspects such as product quality, customer service, and pricing information. The goal was to generate summaries that are specific, concise, and relevant to the given focus.

2. Observations:
Version 1 (Product Quality Focus): The model was able to accurately summarize the product quality, capturing key elements such as durability, functionality, and any issues mentioned by the reviewer. However, some nuances (e.g., “motor making a funny noise after a year” in the blender review) were sometimes missing from the summary.
Version 2 (Customer Service Focus): This version worked particularly well when the review included interactions with customer support, such as the standing lamp review where the company replaced a broken part. However, when reviews lacked direct customer service mentions (e.g., the panda plush review), the summary sometimes invented unnecessary details about the service, which can be considered a hallucination.
Version 3 (Pricing Focus): The model was able to consistently capture pricing information from the reviews, often highlighting specific price points and value considerations. This version worked best when the reviewer explicitly discussed prices, sales, or deals. In reviews where pricing was less central (e.g., the toothbrush review), the summary was somewhat vague.
3. Challenges and Limitations:
Hallucinations: When the reviews lacked details on the specific aspect we were focusing on (e.g., no customer service interaction in a review), the model sometimes fabricated details to fit the prompt. This was evident in Version 2 when reviews without customer service feedback still returned summaries about customer service.
Nuanced Language: When dealing with more nuanced or mixed reviews (e.g., a product is great but has a minor flaw), the model struggled to capture that balance. It tended to lean towards either the positive or negative side without reflecting the nuance.
4. What I Learned:
Prompt Specificity: The more specific the prompt, the better the output. Using the keyword “extract” instead of “summarize” when looking for specific information (like customer service or pricing) significantly improved the model’s response, as it did not try to interpret or infer irrelevant details.
Model's Dependency on Context: The model performed well when the context in the review matched the requested focus (e.g., customer service or price). However, when a review lacked relevant information, the model sometimes introduced irrelevant content, making it important to handle outputs carefully and validate them.
Conciseness vs. Detail: Setting strict word or character limits forced the model to omit certain details, which is useful for generating highly concise summaries but can lead to loss of important information. Depending on the use case, one might need to adjust the prompt to allow more space for details.
5. Conclusion:
Overall, this exercise demonstrated the effectiveness of large language models like GPT-3.5 in generating targeted summaries. However, careful prompt engineering is required to ensure accurate and useful outputs. Hallucinations are a potential issue when the requested focus isn't directly addressed in the source text, but this can be mitigated by using specific prompts that aim to extract rather than summarize.






