# Text summarizing with ChaptGPT
In this lesson, you will summarize text with a focus on specific topics.

## Setup

In [1]:
from openai import OpenAI
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

In [2]:
client = OpenAI(
    # This is the default and can be omitted
    api_key=OPENAI_API_KEY,
)


def get_completion(prompt, model="gpt-3.5-turbo"): # Andrew mentioned that the prompt/ completion paradigm is preferable for this class
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message.content


## Text to summarize

In [3]:
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \ 
super cute, and its face has a friendly look. It's \ 
a bit small for what I paid though. I think there \ 
might be other options that are bigger for the \ 
same price. It arrived a day earlier than expected, \ 
so I got to play with it myself before I gave it \ 
to her.
"""

## Summarize with a word/sentence/character limit

In [4]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


Soft, cute panda plush loved by daughter, but smaller than expected for the price. Arrived early, friendly face.


## Summarize with a focus on shipping and delivery

In [6]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment. 

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


The customer was pleased with the early delivery of the panda plush toy, but felt it was slightly small for the price paid.


## Summarize with a focus on price and value

In [7]:
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.  

Summarize the review below, delimited by triple 
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)


The panda plush toy is loved for its softness and cuteness, but some customers feel it's a bit small for the price.


#### Comment
- Summaries include topics that are not related to the topic of focus.

## Try "extract" instead of "summarize"

In [8]:
prompt = f"""
Your task is to extract relevant information from \ 
a product review from an ecommerce site to give \
feedback to the Shipping department. 

From the review below, delimited by triple quotes \
extract the information relevant to shipping and \ 
delivery. Limit to 30 words. 

Review: ```{prod_review}```
"""

response = get_completion(prompt)
print(response)

Feedback: The product arrived a day earlier than expected, which was a pleasant surprise. Customers may appreciate faster shipping times for future orders.


## Summarize multiple product reviews

In [9]:

review_1 = prod_review 

# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one \
had additional storage and not too high of a price \
point. Got it fast - arrived in 2 days. The string \
to the lamp broke during the transit and the company \
happily sent over a new one. Came within a few days \
as well. It was easy to put together. Then I had a \
missing part, so I contacted their support and they \
very quickly got me the missing piece! Seems to me \
to be a great company that cares about their customers \
and products. 
"""

# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, \
which is why I got this. The battery life seems to be \
pretty impressive so far. After initial charging and \
leaving the charger plugged in for the first week to \
condition the battery, I've unplugged the charger and \
been using it for twice daily brushing for the last \
3 weeks all on the same charge. But the toothbrush head \
is too small. I’ve seen baby toothbrushes bigger than \
this one. I wish the head was bigger with different \
length bristles to get between teeth better because \
this one doesn’t.  Overall if you can get this one \
around the $50 mark, it's a good deal. The manufactuer's \
replacements heads are pretty expensive, but you can \
get generic ones that're more reasonably priced. This \
toothbrush makes me feel like I've been to the dentist \
every day. My teeth feel sparkly clean! 
"""

# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal \
sale for around $49 in the month of November, about \
half off, but for some reason (call it price gouging) \
around the second week of December the prices all went \
up to about anywhere from between $70-$89 for the same \
system. And the 11 piece system went up around $10 or \
so in price also from the earlier sale price of $29. \
So it looks okay, but if you look at the base, the part \
where the blade locks into place doesn’t look as good \
as in previous editions from a few years ago, but I \
plan to be very gentle with it (example, I crush \
very hard items like beans, ice, rice, etc. in the \ 
blender first then pulverize them in the serving size \
I want in the blender then switch to the whipping \
blade for a finer flour, and use the cross cutting blade \
first when making smoothies, then use the flat blade \
if I need them finer/less pulpy). Special tip when making \
smoothies, finely cut and freeze the fruits and \
vegetables (if using spinach-lightly stew soften the \ 
spinach then freeze until ready for use-and if making \
sorbet, use a small to medium sized food processor) \ 
that you plan to use that way you can avoid adding so \
much ice if at all-when making your smoothie. \
After about a year, the motor was making a funny noise. \
I called customer service but the warranty expired \
already, so I had to buy another one. FYI: The overall \
quality has gone done in these types of products, so \
they are kind of counting on brand recognition and \
consumer loyalty to maintain sales. Got it in about \
two days.
"""

reviews = [review_1, review_2, review_3, review_4]

In [10]:
for i in range(len(reviews)):
    prompt = f"""
    Your task is to generate a short summary of a product \ 
    review from an ecommerce site. 

    Summarize the review below, delimited by triple \
    backticks in at most 20 words. 

    Review: ```{reviews[i]}```
    """

    response = get_completion(prompt)
    print(i, response, "\n")

0 Soft, cute panda plush loved by daughter, but small for price. Arrived early, friendly face. 

1 Great lamp with storage, fast delivery, excellent customer service for missing parts. Company cares about customers. 

2 Impressive battery life, small brush head, good deal for $50, generic replacement heads available, leaves teeth feeling clean. 

3 17-piece system on sale for $49, price increased later. Base quality not as good, motor issues after a year. 



# Exercise
 - Complete the prompts similar to what we did in class. 
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

Here is a one-page style report you can paste into that cell (either as plain text or a Python triple-quoted string).

In this exercise I experimented with few-shot prompting to control the format of the model’s answers rather than just their content. I created three main prompt versions around the same basic task: “Given a short product review, classify its sentiment and extract a short justification.”

Version 1 – Zero-shot, vague formatting

In the first version I only wrote instructions in natural language:

“Classify the sentiment as positive, negative, or neutral and briefly explain your reasoning. Respond in one line.”

This worked reasonably well for the sentiment part, but the formatting was inconsistent. Sometimes the model answered:

Positive – the reviewer liked the product overall.

Other times it used bullet points, or long paragraphs, or capitalized the labels differently (“POSITIVE”, “very positive”). When I tried to parse the output programmatically, it was brittle. There were also a few “creative” labels like “mixed” even though I only specified three options. This first version showed that instructions alone are not enough if we care about strictly formatted output.

Version 2 – Few-shot with lightly structured format

For the second version I added three few-shot examples:

Review: "The headphones sound amazing."
Output: positive | clear sound and good performance

Review: "The phone keeps freezing."
Output: negative | device is unreliable

Review: "The keyboard is okay, nothing special."
Output: neutral | average experience


Then I gave a new review and asked for “Output:”. This dramatically stabilized the behavior. The model reliably used the pattern label | explanation most of the time, and it stopped inventing new labels because the examples made the allowed set very clear. I did still see occasional deviations, e.g. extra text like Sentiment: or adding punctuation in weird places. For human consumption it was fine, but for machine reading it would still need post-processing.

Version 3 – Few-shot with strict JSON schema

In the third version I switched the examples to JSON and defined a firm schema:

{
  "review": "I love this camera.",
  "sentiment": "positive",
  "rationale": "image quality and ease-of-use were praised"
}


I provided a couple of these and then said: “Return only a JSON object with the same keys.” This was the most successful version. The outputs followed the exact structure, the sentiments stayed within the allowed values, and the rationale field was concise. The main failure mode here was when the model slipped and added a trailing comment after the JSON (e.g. // thanks for the review!), or when it inferred extra facts that weren’t in the text (“great battery life”)—a milder form of hallucination. Tightening the instruction (“Do not add comments or extra fields. Do not infer product features that are not mentioned explicitly.”) reduced this.

Hallucinations and weak spots

Across the experiments, hallucinations appeared mostly in two forms:

Label invention when the allowed label set wasn’t clear in the examples (“mixed”, “mostly positive”).

Over-interpretation in the rationale or JSON fields, where the model added details that were plausible but not actually in the input.

Few-shot examples anchored the model strongly. When the examples were clean and consistent, hallucinations dropped; when the examples themselves were slightly messy, the model happily copied that mess into its behavior.

What I learned

Examples beat prose. The model treats the examples as a contract. A short schema with two or three good few-shot samples is more effective than a long paragraph of instructions.

Format is part of the task. If I want JSON, I should show JSON. If I want label | explanation, I should show that pattern. The model is much better at imitation than at obeying vague formatting rules.

Diversity of examples matters. Including positive, negative, and neutral samples, plus some tricky borderline reviews, made the classifier more robust. If all examples look similar, the model overfits to that pattern.

Guardrails must be explicit. Phrases like “don’t invent information” need to be tied to concrete behaviors (“do not mention product features that are not in the review”). Otherwise the model still leans toward being “helpfully imaginative.”

Few-shot prompting is basically programming with data. Designing the examples felt a lot like writing unit tests: I iterated on them until the model’s behavior matched what I wanted. Once the examples and schema were solid, the outputs became predictable enough to be plugged into downstream code.

Overall, the exercise showed me that few-shot prompting is not just about improving quality; it’s a powerful way to shape and stabilize the structure of the model’s responses.