# 실습: GPT로 prompting 기법들 체험하기

이번 실습에서는 GPT api를 통해 이론 시간에 배운 prompting 기법들을 다룹니다. 먼저 필요한 library들을 설치합니다.

In [None]:
!pip install openai datasets

Collecting openai
  Downloading openai-1.51.2-py3-none-any.whl.metadata (24 kB)
Collecting datasets
  Downloading datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl

그 다음 openai api key를 다음과 같은 절차를 거쳐 얻어냅니다:
1. platform.openai.com 에 계정을 생성하여 로그인합니다.
2. `Dashboard > API keys` 메뉴로 들어가 `+ Create new secret key`를 눌러줍니다.
3. 이름을 작성한 후, `Create secret key`를 눌러 key를 만들어줍니다.
4. 생성된 key를 복사한 후 아래 "OPENAI_API_KEY"에 불여넣어줍니다.

In [None]:
from openai import OpenAI
from google.colab import userdata

client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

ModuleNotFoundError: No module named 'openai'

처음 계정을 생성하고 사용할 때는 10달러 정도의 credit을 주는 것으로 알고 있습니다.
만약 credit이 없다면 1달러 정도만 충전해줘도 아래 실습을 진행하는데 있어 문제는 생기지 않습니다.

다음은 GPT api로 text 생성하는 예시입니다.

In [None]:
temperature = 0.5  # 각 token을 샘플링할 때 사용하는 temperature 값입니다.
max_tokens = 4096  # 생성하는 최대 token 개수 입니다.
n = 5  # 같은 질의에 대해 몇 개의 답변을 출력할지 결정합니다.
frequency_penalty = 0.0  # 같은 단어가 반복적으로 나오는 것을 방지하기 위한 옵션입니다.
user_prompt = "List the subjects in Euclidean plane geometry."

message=[{"role": "user", "content": user_prompt}]
response = client.chat.completions.create(
    model="gpt-4o",
    messages=message,
    n=n,
    max_tokens=max_tokens,
    temperature=temperature,
    frequency_penalty=frequency_penalty
)
print(response.choices[0].message.content)

Euclidean plane geometry, a branch of mathematics introduced by the ancient Greek mathematician Euclid, focuses on the properties and relations of points, lines, angles, and figures in a two-dimensional plane. Here are the primary subjects typically covered in Euclidean plane geometry:

1. **Points and Lines**
   - Definitions and basic properties
   - Line segments and rays
   - Collinearity and coplanarity

2. **Angles**
   - Types of angles (acute, right, obtuse, straight, reflex)
   - Angle measurement and units (degrees, radians)
   - Angle relationships (adjacent, complementary, supplementary, vertical)

3. **Triangles**
   - Classification by sides (scalene, isosceles, equilateral)
   - Classification by angles (acute, right, obtuse)
   - Properties and theorems (Pythagorean theorem, triangle inequality theorem)
   - Congruence and similarity (SSS, SAS, ASA, AAS, HL)
   - Special points (centroid, orthocenter, circumcenter, incenter)

4. **Quadrilaterals**
   - Types (parallelog

Euclidean plane geometry와 관련된 주제들을 쭉 나열하는 모습입니다.

다음은 영화 리뷰 감정 분석을 해봅시다.

In [None]:
from datasets import load_dataset

imdb = load_dataset("imdb")
imdb['train'] = imdb['train'].shuffle()

위와 같이 data를 불러온 후, 다음과 같이 text를 생성해봅시다.

In [None]:
message=[{"role": "user", "content": "Is the following movie review positive or negative?: " + imdb["test"][123]["text"]}]
response = client.chat.completions.create(
    model="gpt-4o",
    messages=message,
    n=5,
    max_tokens=max_tokens,
    temperature=temperature,
    frequency_penalty=frequency_penalty
)

print(imdb["test"][123]["label"])
for i in range(5):
    print(response.choices[i].message.content)

0
The movie review is negative. The reviewer criticizes the film for being too long, having a directionless script, and lacking a central story. They also mention that the characters are not well developed and that the film's multiple endings are unsatisfying. Overall, the tone of the review is one of disappointment and frustration.
The movie review is negative. The reviewer criticizes the film for being too long, having an aimless script, and lacking a cohesive story. They also mention that the characters are poorly developed and that the film has multiple unsatisfying endings. Overall, the tone and content of the review indicate dissatisfaction with the movie.
The movie review is negative. The reviewer criticizes the film for being too long, having a script that goes nowhere, and using unrelated events that don't further the story. They also mention that the characters are poorly developed and rely on monologues from third parties for explanation. Additionally, the reviewer points ou

결과를 보시면 5개의 랜덤한 출력 모두 negative하다고 정확하게 예측하는 것을 알 수 있습니다.
GPT는 이전 실습에서 한 것 처럼 logit 계산을 할 수도 있지만, 순수하게 text 생성을 통해 감정 분석을 진행할 수도 있습니다.
이러한 방식이 사용자와의 소통에 조금 더 적합한 방식이기도 하고요.

하지만 한 가지 단점 중 하나는 답변이 정돈되지 않아, 자동 평가가 어렵다는 것입니다.
그래서 포맷을 정돈하기 위해 다음과 같이 few-shot example들을 던져줍니다.

In [None]:
prompt = "Is the following movie review positive or negative?\n\n"
for i in range(10):
    text = imdb["train"][i]["text"]
    label = imdb["train"][i]["label"]

    sub_prompt = "Review: " + text
    if label == 0:
        sub_prompt += "\nAnswer: negative\n\n"
    else:
        sub_prompt += "\nAnswer: positive\n\n"
    prompt += sub_prompt

prompt += "Review: "
print(prompt)

Is the following movie review positive or negative?

Review: in 1976 i had just moved to the us from ceylon. i was 23, and had been married for a little over three years, and was beginning to come out as a lesbian. i saw this movie on an old black and white TV, with terrible reception, alone, and uninterrupted, in an awakening that seemed like an echo of the story. i was living in a small house in tucson arizona, and it was summertime... like everyone else here, i never forgot the feelings the images of this story called forth, and its residue of fragile magic, and i have treasured a hope that i would see it again someday. i'll keep checking in. i also wish that someone would make a movie of shirley verel's 'the other side of venus'. it also has some of the same delicacy and persistent poignancy...
Answer: positive

Review: This is the second Baby Burlesk short to be released, and probably the most popular one, is a spoof of the 1926 silent film What Price Glory.<br /><br />I watched t

위에서 볼 수 있듯이 주어진 training data를 통해 "Review: <review>\n<Answer>: answer\n\n"의 형태로 출력되게끔 유도하는 prompting을 만든 모습입니다.
실제로 우리가 원하는 대로 답변이 포맷을 지켜주는지 살펴봅시다.

In [None]:
message=[{"role": "user", "content": prompt + imdb["test"][123]["text"] + "\nAnswer: "}]
response = client.chat.completions.create(
    model="gpt-4o",
    messages=message,
    n=1,
    max_tokens=max_tokens,
    temperature=temperature,
    frequency_penalty=frequency_penalty
)
print(response.choices[0].message.content)

negative


정상적으로 우리가 정한 포맷대로 답변을 내놓는 것을 알 수 있습니다.

이번에는 조금 더 어려운 수학 문제들을 가지고 CoT와 PAL을 체험해보겠습니다.
다음과 같이 수학 문제들을 준비합시다.

In [None]:
problem_easy = "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"
problem_hard = "At time \(t = 0\), starting simultaneously from the origin and moving along a straight line, the velocities of two points \(P\) and \(Q\) at time \(t\) (\(t \geq 0\)) are given by \[v_1(t) = t^2 - 6t + 5, \quad v_2(t) = 2t - 7\] respectively. Let the distance between points \(P\) and \(Q\) at time \(t\) be denoted by \(f(t)\). The function \(f(t)\) is increasing on the interval \([0, a]\), decreasing on the interval \([a, b]\), and increasing on the interval \([b, \infty)\). What is the total distance that point \(Q\) has moved from time \(t = a\) to \(t = b\)? (Note that \(0 < a < b\))"

먼저 `problem_easy`를 GPT-4를 가지고 풀어봅시다.

In [None]:
assistant_prompt = "You are an highschool student who is a mathematician."
user_prompt = problem_easy  #  + "Solve the problem using code interpreter step by step, even in every sub-step."

message=[{"role": "assistant", "content": assistant_prompt}, {"role": "user", "content": user_prompt}]
response = client.chat.completions.create(
    model="gpt-4o",
    messages=message,
    n=1,
    max_tokens=max_tokens,
    temperature=temperature,
    frequency_penalty=frequency_penalty
)
print(response.choices[0].message.content)

To determine the total number of tennis balls Roger has now, we need to follow these steps:

1. Start with the number of tennis balls Roger initially has:
   \[
   5 \text{ tennis balls}
   \]

2. Calculate the number of tennis balls in the cans he buys. Each can contains 3 tennis balls, and he buys 2 cans:
   \[
   2 \text{ cans} \times 3 \text{ tennis balls per can} = 6 \text{ tennis balls}
   \]

3. Add the number of tennis balls from the cans to the initial number of tennis balls:
   \[
   5 \text{ tennis balls} + 6 \text{ tennis balls} = 11 \text{ tennis balls}
   \]

Therefore, Roger now has:
\[
11 \text{ tennis balls}
\]


보시다시피 GPT-4는 별다른 prompting 없이도 CoT를 진행하고 있습니다.
실제로 맞는 결과를 내놓는 것을 알 수 있습니다.

다음은 더 어려운 문제를 PAL로 풀어보도록 하겠습니다.
추가적인 prompting을 통해 다음과 같이 python code를 구현하는 출력을 볼 수 있습니다.

In [None]:
user_prompt = problem_hard + "Solve the problem by using code interpreter in every sub-step."

message=[{"role": "assistant", "content": assistant_prompt}, {"role": "user", "content": user_prompt}]
response = client.chat.completions.create(
    model="gpt-4o",
    messages=message,
    n=1,
    max_tokens=max_tokens,
    temperature=0.0,
    frequency_penalty=frequency_penalty
)
print(response.choices[0].message.content)

To solve this problem, we need to follow these steps:

1. **Find the positions of points \(P\) and \(Q\) as functions of time \(t\)** by integrating their velocity functions.
2. **Determine the critical points \(a\) and \(b\)** where the function \(f(t)\) changes its behavior from increasing to decreasing and vice versa.
3. **Calculate the total distance that point \(Q\) has moved from time \(t = a\) to \(t = b\)**.

Let's start with step 1.

### Step 1: Find the positions of points \(P\) and \(Q\)

The velocity functions are given by:
\[ v_1(t) = t^2 - 6t + 5 \]
\[ v_2(t) = 2t - 7 \]

To find the positions, we integrate these velocity functions with respect to \(t\).

#### Position of point \(P\):
\[ x_1(t) = \int (t^2 - 6t + 5) \, dt \]

#### Position of point \(Q\):
\[ x_2(t) = \int (2t - 7) \, dt \]

Let's integrate these functions.

```python
import sympy as sp

t = sp.symbols('t')

# Velocity functions
v1 = t**2 - 6*t + 5
v2 = 2*t - 7

# Integrate to find positions
x1 = sp.integr

GPT api는 실제로 코드를 실행하는 기능은 제공하지 않습니다.
하지만 위의 prompt를 ChatGPT에 그대로 넣어서 돌려보시면 위와 같이 작성된 Python code를 돌려서 결과를 확인한 후, 이어서 답변을 내놓는 것을 알 수 있습니다.