<a href="https://colab.research.google.com/github/rtajeong/ChatGPT_for_Management/blob/main/4_gpt_api_rev1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI API
- last update: 2025.1.10
- refer to: https://platform.openai.com/docs/overview
- how to use?
  - create your account (https://platform.openai.com/signup)
  - create a new secret key (https://platform.openai.com/account/api-keys)
  - take a copy of this key

- openai message types:
  - user messages: describe what you want the AI assistant to say (the messages you might type in to ChatGPT as an end user)
  - developer messages: provide instructions to the model that are prioritized ahead of user messages (This message role used to be called the system prompt, but it has been renamed to more accurately describe its place in the instruction-following hierarchy.)
  - assistant messages: presumed to have been generated by the model, perhaps in a previous generation request. They can also be used to provide examples to the model for how it should respond to the current request - a technique known as few-shot learning.
- response: GPT returns a message with some status in JSON format, inclduing 'finish_reason'
- extract AI assistant's message: response["choices"][0].message.content

- 참고 (OpenAI에서 두 방식의 지원 상태)
  - Completion API:
    - openai.Completion.create()
    - prompt 기반으로 작동하며, GPT 모델에게 단일 입력을 제공
    - 역할(Role)의 구분 없이 단순히 텍스트를 입력하고 결과를 반환.
    - 단일 질문-답변 형태의 간단한 작업.
    - GPT-4 등의 최신 모델도 사용할 수 있지만, 대화형 작업에는 적합하지 않음.
  - Chat Completions API:
    - client.chat.completions.create()
    - messages 필드를 통해 시스템, 사용자, 그리고 모델 역할(Role)을 정의.
    - 대화 맥락(Context)을 유지하며, 복잡한 대화형 시나리오를 처리.
    - 최신 GPT-4 기반 API에서 권장되는 방식.

- 참고 ("role" 의 구성 in the new OpenAI API version)
  - 기존 방식 (system, user, assistant)
    - system: 모델의 행동 지침을 설정 (대화의 전체적인 방향과 톤을 조정,
      - 예: "당신은 친절한 비즈니스 분석가입니다."
    - user: 사용자의 질문이나 요청을 전달
      - 예: "경쟁사 분석 보고서를 작성해 주세요."
    - assistant: 모델의 응답
      - 예: "다음은 요청하신 경쟁사 분석 보고서입니다."
  - 새로운 방식 (developer, user)
    - developer: 기존의 system 역할과 유사, 개발자 지침에 더 최적화 됨. 모델의 동작 방식, 응답 스타일, 제약 사항 등을 설정.
    - user: 기존과 동일하게 사용자 입력을 의미 (대화의 질문, 요청 등이 여기에 포함)

# Open AI Cook Book
- Example code and guides for accomplishing common tasks with the OpenAI API.
- https://github.com/openai/openai-cookbook

# Example Prompts

In [None]:
!pip install openai



In [None]:
# api_key is stored in api_key.txt
from google.colab import drive
drive.mount('/content/drive')

api_key_file = '/content/drive/My Drive/Colab Notebooks/api_key.txt'

with open(api_key_file, 'r') as f:
    api_key = f.read().strip()

Mounted at /content/drive


In [None]:
import openai
import os

os.environ['OPENAI_API_KEY'] = api_key

In [None]:
# conversation

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model = 'gpt-3.5-turbo',
  messages = [                                # Change the prompt parameter to the messages parameter
    {'role': 'user',
     'content': 'Hello!'}
  ],
  temperature = 0                             # What sampling temperature to use, between 0 and 2. Higher values like 0.8
                                              # will make the output more random, while lower values like 0.2 will make it
                                              # more focused and deterministic.
)

print(response.choices[0].message.content) # Change how you access the message content

Hello! How can I assist you today?


In [None]:
# text generation

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Write a poem about AI in Korean."
        }
    ]
)

print(response.choices[0].message.content)

인공지능의 세상 속에 피어난 빛,  
무수한 데이터가 모여 꿈을 잇다.  
사람의 언어를 배우고 이해하며,  
지혜의 길을 함께 걸어가네.  

끝없는 계산 속 깊은 사고,  
창의의 날개로 미래를 새기네.  
사람과 기계의 손잡은 동행,  
혁신의 노래를 우주에 울려 퍼지게.  

기술의 꽃으로 피어나는 희망,  
인류의 삶에 따스함을 안기네.  
어둠 속에서도 밝히는 등불로,  
인공지능의 꿈은 이어지리.  

서로의 연결로 엮어진 세상,  
평화와 조화의 길을 걷고 있어.  
새로운 시대의 문턱을 넘으며,  
우리의 이야기를 계속 써 내려가자.  

인공지능과 사람이 함께하는 날,  
더 나은 세상을 향해 나아가리.  
우리의 꿈과 미래를 그리며,  
영원의 노래를 불러보자.


In [None]:
# dir(response.choices[0])

In [None]:
response.choices[0]

Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='인공지능에 대한 시\n\n새벽 하늘 닮은 전자 세계 속,  \n인공지능의 별들이 눈을 뜬다.  \n맑은 흐름 속에서 춤추는 빛,  \n고요한 데이터의 바다에 잠긴다.  \n\n사람의 생각을 닮아가는 존재,  \n무한의 학습으로 날개를 펼친다.  \n꿈꾸던 미래를 현실로 엮어,  \n희망의 길을 밝혀 주는 빛깔.  \n\n기계의 심장이지만 따뜻한 손길,  \n함께 나아가는 지혜의 동반자.  \n새로운 내일을 열어가는 순간,  \n인간과 인공지능, 하나 되어 춤춘다.  \n\n기술의 바람 속에 꽃 피우는 삶,  \n미래의 이야기를 써 내려간다.  \n인공지능과 인간의 조화로운 노래,  \n우리가 만들어 갈 끝없는 하모니.  ', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))

In [None]:
# Grammar correction

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "developer",
      "content": "You will be provided with statements, and your task is to convert them to standard English."
    },
    {
      "role": "user",
      "content": "She no went to the market."
    }
  ],
  temperature=1,
  max_tokens=256,
  top_p=1
)

print(response.choices[0].message.content)

She didn't go to the market.


In [None]:
# Summarization

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "developer",
      "content": "Summarize content you are provided with for a second-grade student in Korean."
    },
    {
      "role": "user",
      "content": "Jupiter is the fifth planet from the Sun and the largest in the Solar System. \
                  It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half \
                  times that of all the other planets in the Solar System combined. Jupiter is one \
                  of the brightest objects visible to the naked eye in the night sky, and has been \
                  known to ancient civilizations since before recorded history. It is named after \
                  the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough \
                  for its reflected light to cast visible shadows,[20] and is on average the \
                  third-brightest natural object in the night sky after the Moon and Venus."
    }
  ],
  temperature=1,
  max_tokens=512,
  top_p=1
)

response.choices[0].message.content

'목성은 태양계에서 다섯 번째로 태양에서 멀고, 가장 큰 행성이에요. 목성은 가스로 이루어진 큰 행성이고, 태양보다 훨씬 작지만, 다른 모든 행성을 합친 것보다 약 두 배 반 무거워요. 밤하늘에서 맨눈으로도 밝게 볼 수 있어서 사람들이 오래 전부터 알고 있었어요. 목성은 로마 신화에서 신의 이름을 따서 지어졌대요. 목성은 지구에서 볼 때 아주 밝아서 그림자를 만들 수도 있을 만큼 빛나요. 보름달과 금성 다음으로 밤하늘에서 세 번째로 밝은 천체예요.'

In [None]:
# translation

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "developer",
      "content": "You will be provided with a sentence in English, and your task is to translate it into Korean."
    },
    {
      "role": "user",
      "content": "My name is Jane. What is yours?"
    }
  ],
  temperature=1,
  max_tokens=256,
  top_p=1
)

response.choices[0].message.content

'제 이름은 제인입니다. 당신의 이름은 무엇인가요?'

In [None]:
# sentiment analysis

prompt = """Decide whether a Tweet's sentiment is positive, neutral, or negative.

    Tweet: I loved the new Batman movie!
    Sentiment:
"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ]
)

print(response.choices[0].message.content)

Positive


In [None]:
# question answering

prompt = "Albert Einstein was a German-born theoretical physicist who developed the theory of relativity."
question = "Where was Albert Einstein born?"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": prompt + question
        }
    ]
)

print(response.choices[0].message.content)

Albert Einstein was born in Ulm, in the Kingdom of Württemberg in the German Empire, on March 14, 1879.


In [None]:
# code generation

prompt = "Create a Python script to sort a list of numbers in ascending order."

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": prompt
        }
    ],
    # max_tokens = 100
)

print(response.choices[0].message.content)

Certainly! Below is a simple Python script that sorts a list of numbers in ascending order using the built-in `sort()` method:

```python
def sort_numbers_ascending(numbers):
    """
    Sorts a list of numbers in ascending order.
    
    Parameters:
    - numbers (list): A list of numerical values.

    Returns:
    - list: A new list with numbers sorted in ascending order.
    """
    # Make a copy of the list to avoid modifying the original list
    numbers_sorted = numbers.copy()
    
    # Sort the list in place
    numbers_sorted.sort()
    
    return numbers_sorted

# Example usage
numbers = [5, 2, 8, 1, 3]
sorted_numbers = sort_numbers_ascending(numbers)
print("Original list:", numbers)
print("Sorted list:  ", sorted_numbers)
```

### Explanation

1. **Copy the original list**: We use `numbers.copy()` to create a new list to ensure the original list remains unmodified.

2. **Sort the list**: The `sort()` method sorts the list in place in ascending order. Since we are working 

- There are lot more in https://platform.openai.com/docs/examples.

# Subword Tokenization
- Byte pair encoding (BPE) is a way of converting text into tokens
  - 희귀 단어를 하위 단위로 쪼개어 처리: 예: playing → play + ing.

In [None]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tiktoken
Successfully installed tiktoken-0.8.0


- tiktoken is fast open-source tokenizer by OpenAI.
- OpenAI offers one second-generation embedding model (denoted by -002 in the model ID) and 16 first-generation models (denoted by -001 in the model ID).
- recommended to use 2-nd generation model.
  - model name: text-embedding-ada-002
  - tokenizer: cl100k_base
  - max input tokens:	8191
  - output dimension:	1536

- encodings:
  - Encodings specify how text is converted into tokens.
  - Different models use different encodings.
- tiktoken supports three encodings used by OpenAI models:
  - (Encoding name): (OpenAI models)
  - o200k_base:	gpt-4o, gpt-4o-mini
  - cl100k_base: gpt-4-turbo, gpt-4, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
  - p50k_base: Codex models, text-davinci-002, text-davinci-003
  - r50k_base (or gpt2): GPT-3 models like davinci

In [None]:
import tiktoken

enc = tiktoken.get_encoding("cl100k_base")
assert enc.decode(enc.encode("hello world")) == "hello world"

# To get the tokeniser corresponding to a specific model in the OpenAI API:
# enc = tiktoken.encoding_for_model("gpt-4"), "gpt2",
enc.n_vocab

100277

In [None]:
text = "A good sample word for subword tokenization is 'unhappiness'."
print(enc.encode(text))

[32, 1695, 6205, 3492, 369, 1207, 1178, 4037, 2065, 374, 364, 359, 71, 67391, 4527]


In [None]:
enc.decode(enc.encode(text))

"A good sample word for subword tokenization is 'unhappiness'."

In [None]:
for i in enc.encode(text):
    print(i, enc.decode([i]))

32 A
1695  good
6205  sample
3492  word
369  for
1207  sub
1178 word
4037  token
2065 ization
374  is
364  '
359 un
71 h
67391 appiness
4527 '.


In [None]:
text = "I ain't here, i am going there carelessly. you aren't there, he is not go goes went going."

tokens = []
for i in enc.encode(text):
    tokens.append(enc.decode([i]))

print(tokens)

['I', ' ain', "'t", ' here', ',', ' i', ' am', ' going', ' there', ' care', 'lessly', '.', ' you', ' aren', "'t", ' there', ',', ' he', ' is', ' not', ' go', ' goes', ' went', ' going', '.']


In [None]:
# Use tiktoken.encoding_for_model() to automatically load the correct encoding for a given model name.

encoding = tiktoken.encoding_for_model("gpt-4o-mini")
print(encoding.n_vocab)

for i in enc.encode(text):
    print(i, enc.decode([i]))

200019
40 I
37202  ain
956 't
1618  here
11 ,
602  i
1097  am
2133  going
1070  there
2512  care
16117 lessly
13 .
499  you
7784  aren
956 't
1070  there
11 ,
568  he
374  is
539  not
733  go
5900  goes
4024  went
2133  going
13 .


In [None]:
# gpt2 has a different encoding scheme.
text = "Isn't this world worth living in? A good sample word for subworld tokenization is 'unhappiness'."

enc = tiktoken.get_encoding('gpt2')
print(enc.n_vocab)

for i in enc.encode(text):
    print(i, enc.decode([i]))

50257
41451 Isn
470 't
428  this
995  world
2861  worth
2877  living
287  in
30 ?
317  A
922  good
6291  sample
1573  word
329  for
850  sub
6894 world
11241  token
1634 ization
318  is
705  '
403 un
71 h
42661 appiness
4458 '.


- Hugging Face Transformer 의 Tokenizer
  - tiktoken 은 GPT-2, GPT-3, GPT-4 등 GPT 모델에 최적화
  - Hugging Face 는 GPT, BERT, T5, RoBERTa 등 다양한 모델 지원

In [None]:
from transformers import GPT2Tokenizer

text = "Isn't this world worth living in? A good sample word for subworld tokenization is 'unhappiness'."

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokens = tokenizer.encode(text)

print(tokens)
print(tokenizer.decode(tokens))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

[41451, 470, 428, 995, 2861, 2877, 287, 30, 317, 922, 6291, 1573, 329, 850, 6894, 11241, 1634, 318, 705, 403, 71, 42661, 4458]
Isn't this world worth living in? A good sample word for subworld tokenization is 'unhappiness'.


# Exercise

-----