## Homework 2, Part 1, CS678 Fall 2024

### This is due on **October 11th, 2024**. Please read the report PDF for submission instruction.
### **Note that this is only the Part 1 of the homework.**

#### **IMPORTANT**: After copying this notebook to a Google Drive or One Drive, please paste a link to the PDF report ("Your Notebook solution"). To get a publicly-accessible link, hit the *Share* button at the top right, then click "Get shareable link" and copy over the result. If you fail to do this, you will receive no credit for this homework!

---

##### *How to do this problem set:*

- Some questions require writing Python code and computing results, and the rest of them have written answers. For coding problems, you will have to fill out all code blocks that say `YOUR CODE HERE`.

- This assignment is designed so that you can run all cells almost instantly. If it is taking longer than that, you have made a mistake in your code.

- Note that there are more questions in the PDF than the ones present in this notebook (which only includes the ones requiring code).

---

##### *How to submit this problem set:*
- After filling in the missing code, provide all the answers in LaTeX template released with the assignment. Once again, you should create a shareable link of your completed notebook and paste it to the LaTex report. The PDF report compiled from running the LaTex template should be submitted to Gradescope.
  
---

##### *Academic honesty*

- We will audit the notebooks from a set number of students, chosen at random. The audits will check that the code you wrote actually generates the answers in your PDF. If you turn in correct answers on your PDF without code that actually generates those answers, we will consider this a serious case of cheating. See the course page for honesty policies.

- We will also run automatic checks of notebooks for plagiarism. Copying code from others is also considered a serious case of cheating.

---

### Task 0: Environment Configuration

#### Step 1: Set up an OpenAI API key
Set up your OpenAI API key below. If you don't have one, register one from OpenAI's website: https://platform.openai.com/api-keys.
This assignment will mainly use **gpt-4o-mini**. Its pricing can be found here: https://openai.com/api/pricing/ (\$0.150 / 1M input tokens, $0.600 / 1M output tokens).

**NOTE: Please delete your key after you complete this homework. This is your private key that should not be shared with others (including instructor/TA).**

In [1]:
OPENAI_API_KEY="__paste_key_here__"

#### Step 2: Install the openai Python library

To complete this notebook, we will use the "openai" library for calling OpenAI's language models.

Execute the following command to pip install the library.

In [2]:
!pip install openai

Collecting openai
  Downloading openai-1.51.0-py3-none-any.whl.metadata (24 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp312-cp312-macosx_11_0_arm64.whl.metadata (3.6 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading openai-1.51.0-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.5/383.5 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━

Now, you should be able to run the following code, which gives a response to an input message "Hello!"

Specifically,
- `client = OpenAI(api_key=OPENAI_API_KEY)` defines a client call with your private API key;
- `client.chat.completions.create` calls OpenAI's chat completion function (https://platform.openai.com/docs/api-reference/chat/create);
    - Field `model` specifies the LLM version to use, here being "gpt-4o-mini"
    - Field `messages` contains the chat history which is used to prompt the LLM for a response, including
        - `{"role": "system", "content": "You are a helpful assistant."}` which specifies the system description (being a helpful assistant),
        - `{"role": "user", "content": "Hello!"}` which specifies the user input "Hello!"

The returned chat completion object (https://platform.openai.com/docs/api-reference/chat/object), includes one possible responses (`choices[0]`) whose message content is "Hello! How can I assist today?"

You can also have a look at: https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models

In [3]:
from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)

completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message.content)

Hello! How can I assist you today?


In this assignment, we will use this chat completion function to prompt gpt-4o-mini for a few tasks. For the ease of the work, let's define the following wrapper function called "ChatCompletion" on top of OpenAI's chat completion.

Note that in the function, we have included two additional arguments to the API call:
- `n_samples` is passed as the argument `n` to `client.chat.completions.create`, which specifies the number of samples requested from the LLM;
- `top_p` is passed as the argument `top_p` to `client.chat.completions.create`, which specifies the p% probability mass to sample from.

In [5]:
def ChatCompletion(prompt, n_samples=1, top_p=1.0, return_object=False):
    assert n_samples >= 1
    assert top_p <= 1 and top_p > 0
    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        n=n_samples,
        top_p=top_p
    )

    if n_samples == 1:
        print("Response: ", completion.choices[0].message.content)
    else:
        print("The call returns %d responses:\n" % n_samples)
        for i in range(n_samples):
            print("*Response %d*: " % i, completion.choices[i].message.content)
            print("-" * 10)

    if return_object:
        return completion

### Task 1: Story Generation with Different Sampling Strategies

In the first question, we will learn about different generation effects with a sampling approach called "nucleus sampling". We will try its difference configurations with different `top_p`.


### Question 1 (5 points)

Can you use the ChatCompletion function to generate a story about an Indian student studying abroad (e.g., at George Mason University)? Please use the default setting and generate only one story.

In [6]:
prompt = "Write a story about an Indian student studying at George Mason University, detailing their experiences, challenges, and personal growth."
ChatCompletion(prompt)

Response:  **Title: Across Cultures: Mitali’s Journey at George Mason University**

Mitali Sharma had always dreamt of studying in the United States. Growing up in Pune, India, she spent countless nights poring over images of university campuses and reading success stories of Indian students who had carved out vibrant lives abroad. When she received her acceptance letter from George Mason University, she felt as if part of her dreams had come true. It was the beginning of her adventure in a new land.

Mitali arrived in Fairfax, Virginia, in late August, just as the leaves were beginning to change color. The campus was bustling, vibrant with the diversity of its student body. As a young woman hailing from a middle-class background in India, she embraced the thrill that came with a new environment. However, she soon found herself navigating the complexities of a different culture.

Her first challenge was the language barrier. Although Mitali was fluent in English, the American accent an

### Question 2 (5 points)

Now, can you do the same but try to get 2 generations with `top_p` set to be 1?

In [8]:
ChatCompletion(prompt, n_samples=2, top_p=1.0)

The call returns 2 responses:

*Response 0*:  Aarya Patel arrived at George Mason University with a heart full of ambition and a suitcase packed with her dreams. Coming from the vibrant city of Pune, India, where the air was rich with the scents of spices and the laughter of friends, she found herself staring at the wide, grassy lawns and brick buildings of the Fairfax campus—so different yet fascinating.

At first, the excitement of being in a new country engulfed Aarya. She had always excelled in her studies back in India, and she was determined to make the most of her business administration degree. The initial weeks were a whirlwind of orientation sessions, making new friends from all over the world, and learning how to navigate the campus. She reveled in late-night study sessions at the library, coffee dates at the local café, and exploring the historical sites of Washington, D.C. Each new experience painted her life with vibrant colors, and she felt on top of the world.

However,

### Question 3 (5 points)

How about 2 generations with `top_p` set to be 0.5?

In [9]:
ChatCompletion(prompt, n_samples=2, top_p=0.5)

The call returns 2 responses:

*Response 0*:  **Title: A Journey of Dreams**

**Chapter 1: New Beginnings**

Arjun Patel stepped off the plane at Dulles International Airport, his heart racing with a mix of excitement and anxiety. The sprawling campus of George Mason University awaited him, a place where he hoped to transform his dreams into reality. Hailing from a small town in Gujarat, India, Arjun had always envisioned studying abroad, and now, as he stood in the bustling airport, he felt the weight of that dream on his shoulders.

As he navigated through the airport, he recalled the conversations with his family back home. His parents had sacrificed so much to provide him with this opportunity. They had instilled in him the values of hard work and perseverance, and he was determined to make them proud.

**Chapter 2: Cultural Shock**

The first few weeks at George Mason were a whirlwind. Arjun was in awe of the campus's size and diversity. Students from all over the world mingled, e

### Question 4 (10 points)

What did you observe from Q1 - Q3? Did the different `top_p` configurations give you the same or different results? Why?
<br />
**Answer:**
_The different `top_p` configurations in Q1 to Q3 have produced varying levels of response diversity. In Q1, with `top_p=1`, all possible token choices were considered, allowing for more randomness in the responses. In Q3, with `top_p=0.5`, only the top 50% of probable tokens were chosen, leading to more focused and deterministic outputs. This difference arises because `top_p` controls how much of the token probability distribution is considered when generating text—lower values like 0.5 restrict diversity, while higher values like 1 allow more variation, influencing the creativity and uniqueness of the generated stories._

### Task 2: gpt-4o-mini for Solving Mathematical Problems

The second task we will try is about solving a math problem.

The math problem we consider is:

> Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?

For your reference, the correct answer should be 18, following the reasoning chain below:

> First multiply the five remaining vacuum cleaners by two to find out how many Melanie had before she visited the orange house: 5 * 2 = 10;
> Then add two to figure out how many vacuum cleaners she had before visiting the red house: 10 + 2 = 12;
> Now we know that 2/3 * x = 12, where x is the number of vacuum cleaners Melanie started with. We can find x by dividing each side of the equation by 2/3, which produces x = 18


### Question 5 (5 points)
Can you use the ChatCompletion function and prompt gpt-4o-mini to work out the problem?

In [10]:
math_problem = 'Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?'

ChatCompletion(math_problem)

Response:  Let \( x \) be the total number of vacuum cleaners Melanie started with.

1. She sold a third of her vacuum cleaners at the green house:
   \[
   \text{Vacuum cleaners sold at green house} = \frac{x}{3}
   \]
   
   After selling at the green house, the number of vacuum cleaners left is:
   \[
   x - \frac{x}{3} = \frac{2x}{3}
   \]

2. She sold 2 vacuum cleaners at the red house:
   \[
   \text{Vacuum cleaners left after red house} = \frac{2x}{3} - 2
   \]

3. Then she sold half of what was left at the orange house:
   \[
   \text{Vacuum cleaners sold at orange house} = \frac{1}{2} \left(\frac{2x}{3} - 2\right)
   \]

   The number of vacuum cleaners left after selling at the orange house is:
   \[
   \left(\frac{2x}{3} - 2\right) - \frac{1}{2} \left(\frac{2x}{3} - 2\right)
   \]

   Let's simplify that:
   \[
   \text{Remaining} = \left(\frac{2x}{3} - 2\right) - \frac{1}{2} \left(\frac{2x}{3} - 2\right)
   \]
   This can be rewritten as:
   \[
   \text{Remaining} = \left(\

Did gpt-4o-mini solve the problem correctly? If not, where did it go wrong?
<br />
**Answer:** _Yes, it solved the problem correctly!_

### Question 6 (10 points)

Now, try to get 10 solutions from gpt-4o-mini with `top_p` set to 0.7.

In [11]:
ChatCompletion(math_problem, n_samples=10, top_p=0.7)

The call returns 10 responses:

*Response 0*:  Let \( x \) be the number of vacuum cleaners Melanie started with.

1. She sold a third of her vacuum cleaners at the green house:
   \[
   \text{Sold at green house} = \frac{x}{3}
   \]
   After this sale, the number of vacuum cleaners left is:
   \[
   x - \frac{x}{3} = \frac{2x}{3}
   \]

2. Next, she sold 2 more to the red house:
   \[
   \text{Sold at red house} = 2
   \]
   After this sale, the number of vacuum cleaners left is:
   \[
   \frac{2x}{3} - 2
   \]

3. Then, she sold half of what was left at the orange house:
   \[
   \text{Sold at orange house} = \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]
   The number of vacuum cleaners left after this sale is:
   \[
   \left( \frac{2x}{3} - 2 \right) - \frac{1}{2} \left( \frac{2x}{3} - 2 \right)
   \]

   We can simplify this:
   \[
   \frac{2x}{3} - 2 - \frac{1}{2} \left( \frac{2x}{3} - 2 \right) = \frac{2x}{3} - 2 - \frac{1}{2} \cdot \frac{2x}{3} + 1
   \]
   Simplifying furth

You may see multiple different answers produced by gpt-4o-mini. Summarize the answers in the table on the report. Did gpt-4o-mini do right in all of the solutions? If there are any mistakes, what are the common errors that gpt-4o-mini make?
<br/>
**Answer:**
Summary of responses given by gpt-4o-mini $-$
<br/>
**Melanie's Vacuum Cleaner Puzzle**
<br/>
To solve the puzzle where Melanie is selling vacuum cleaners and gives away vacuum cleaners at different points of her day:
<br/>
**Problem Setup**
<br/>
Melanie ends the day with $n$ vacuum cleaners left. At each point she gives away vacuum cleaners based on the formula:
$$
\text{{vacuum cleaners left}} = \frac{{x}}{2} + 3
$$
where $x$ is the number of vacuum cleaners before giving them away.

The objective is to determine how many vacuum cleaners she had at the beginning of the day, given that she had 1 vacuum cleaner left at the end.
<br/>
**Solving Step by Step:**
<br/>
At the end of the day, Melanie has 1 vacuum cleaner left. Working backwards using the equation for each giveaway, we calculate the number of vacuum cleaners she had at each previous step.

Let:
$
n_i = \frac{x_i}{2} + 3
$
Where:
- $n_i$ is the number of vacuum cleaners after the $i^{th}$ giveaway.
- $x_i$ is the number of vacuum cleaners before the $i^{th}$ giveaway.

Thus, we work backward from the final giveaway using this formula repeatedly.
<br/>
**Working Through the Calculations:**
<br/>
For each giveaway, we work backward from $n_{i+1} = 1$ to find the initial number of vacuum cleaners, $x_0$.
<br/>
**Result:**
<br/>
After solving the sequence backward, we find that Melanie started with **18 vacuum cleaners** at the beginning of the day. 
The solutions have consistently resulted in Melanie starting with 18 vacuum cleaners in all 10 responses. The reasoning and calculations in each response, despite varying presentations, lead to the same conclusion.

### Question 7 (10 points)

Can you try other ways to prompt gpt-4o-mini to give correct solutions more stably? Be creative!

It may be helpful to design your prompt considering multiple math problems together. Hence we provided another one below:

The problem is:
> John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home.  He tries to get home in 4 hours but spends the first 2 hours in standstill traffic.  He spends the next half-hour driving at a speed of 30mph, before being able to drive the remaining time of the 4 hours going at 80 mph.  How far is he from home at the end of those 4 hours?

For your reference, the correct answer is 45:
> When he turned around he was 3*60=180 miles from home
> He was only able to drive 4-2=2 hours in the first four hours.
> In half an hour he goes 30*.5=15 miles. He then drives another 2-.5=1.5 hours. In that time he goes 80*1.5=120 miles. So he drove 120+15=135 miles
> So he is 180-135=45 miles away from home

Include your prompt design and the answer on the report. Why do you think it works or not?

In [13]:
combined_math_problem="""
    You are tasked with solving two mathematical problems step by step. Please ensure you provide detailed calculations and arrive at a final answer for each problem. 
    
    Problem 1:
    Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?

    Problem 2:
    John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home. He tries to get home in 4 hours but spends the first 2 hours in standstill traffic. He spends the next half-hour driving at a speed of 30mph, before being able to drive the remaining time of the 4 hours going at 80 mph. How far is he from home at the end of those 4 hours?

    Please provide clear calculations for both problems and the final answers.
    """
ChatCompletion(combined_math_problem)

Response:  Let’s solve each of the problems step by step.

### Problem 1:
Melanie's sales can be broken down as follows:

Let \( x \) represent the number of vacuum cleaners Melanie started with.

1. **Sales at the green house**: She sold a third of her vacuum cleaners at the green house.
   \[
   \text{Vacuum cleaners sold at green house} = \frac{1}{3}x
   \]
   Remaining vacuum cleaners after green house:
   \[
   x - \frac{1}{3}x = \frac{2}{3}x
   \]

2. **Sales at the red house**: She sold 2 more at the red house.
   \[
   \text{Remaining after red house} = \frac{2}{3}x - 2
   \]

3. **Sales at the orange house**: She sold half of what was left at the orange house.
   \[
   \text{Vacuum cleaners sold at orange house} = \frac{1}{2}\left(\frac{2}{3}x - 2\right)
   \]
   Remaining vacuum cleaners after orange house:
   \[
   \left(\frac{2}{3}x - 2\right) - \frac{1}{2}\left(\frac{2}{3}x - 2\right)
   \]

   To simplify this:
   Substitute \( y = \frac{2}{3}x - 2 \):
   \[
   \text{Rema

**Prompt design:**

_Prompt:_

_You are tasked with solving two mathematical problems step by step. Please ensure you provide detailed calculations and arrive at a final answer for each problem._

_Problem 1: Melanie is a door-to-door saleswoman. She sold a third of her vacuum cleaners at the green house, 2 more to the red house, and half of what was left at the orange house. If Melanie has 5 vacuum cleaners left, how many did she start with?_

_Problem 2: John drives for 3 hours at a speed of 60 mph and then turns around because he realizes he forgot something very important at home. He tries to get home in 4 hours but spends the first 2 hours in standstill traffic. He spends the next half-hour driving at a speed of 30 mph, before being able to drive the remaining time of the 4 hours going at 80 mph. How far is he from home at the end of those 4 hours?_

_Please provide clear calculations for both problems and the final answers._

**Analysis of Prompt Effectiveness:**

- Clarity: The prompt clearly states what is expected, encouraging detailed calculations and final answers. This reduces ambiguity for the model.
- Step-by-Step Approach: By explicitly asking for step-by-step solutions, the prompt guides the model to break down complex problems into manageable parts, which enhances accuracy.
- Contextualization: Including realistic scenarios (saleswoman and a driving situation) helps engage the model more effectively, likely leading to better understanding and processing of the problems.
- Structured Format: The structured format (Problem 1 and Problem 2) allows the model to distinguish between tasks, reducing the chances of confusion.

In conclusion, this prompt design works well due to its clarity, structured approach, and encouragement for detailed explanations, fostering better problem-solving performance from the model.

#### Acknowledgement: The math problems used in this notebook come from the GSM8k dataset: Training Verifiers to Solve Math Word Problems, Cobbe et al., 2021. https://huggingface.co/datasets/gsm8k