<a href="https://colab.research.google.com/github/jkf87/CNU-industrialAI/blob/main/%5BRefactoring%5DPrompt_Engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 프롬프트 엔지니어링 시작하기
by DAIR.AI | 엘비스 사라비아
by 번역, 리팩토링 [코난쌤](https://www.youtube.com/@conanssam)


이 노트북에는 프롬프트 엔지니어링에 대해 배울 수 있는 예제와 연습 문제가 포함되어 있습니다.

모든 예제에는 [OpenAI API](https://platform.openai.com/)를 사용합니다. 기본 설정인 'temperature=0.7'과 'top-p=1'을 사용하고 있습니다.

---

## 1. Prompt Engineering Basics


목표
- 라이브러리 로드
- 포멧 리뷰하기
- 기본 프롬프트 다루기
- 일반적인 사용 사례 검토

아래에서 필요한 라이브러리, 유틸리티 및 구성을 로드하고 있습니다.

In [1]:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv

In [None]:
# @title 레거시로 돌리려면 지우고 0.28.1 버전을 설치하기
!pip uninstall openai
!pip install openai==0.28.1

In [2]:
import openai
import os
import IPython
from langchain.llms import OpenAI
from dotenv import load_dotenv

코랩에 보안 비밀 키를 활용하세요.

![보안 비밀 키 설정하기](https://i.imgur.com/OCPB4eh.png)

In [3]:
from google.colab import userdata

load_dotenv()

# API configuration
openai.api_key = userdata.get('OPENAI_API_KEY')

# for LangChain
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["SERPER_API_KEY"] = userdata.get('SERPER_API_KEY')



# 기존 레거시 코드 전체 리펙토링

- 파라미터 값 설정
- 파라미터 값 전달 방법 변경
  - 딕셔너리 방법에서 함수 매개변수 방법으로
  -
  레거시

  ```
  def set_open_params(
    model="text-davinci-003",
    temperature=0.7,
    max_tokens=256,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
):

    openai_params = {}    

    openai_params['model'] = model
    openai_params['temperature'] = temperature
    openai_params['max_tokens'] = max_tokens
    openai_params['top_p'] = top_p
    openai_params['frequency_penalty'] = frequency_penalty
    openai_params['presence_penalty'] = presence_penalty
    return openai_params
def get_completion(params, prompt):

    response = openai.Completion.create(
        engine = params['model'],
        prompt = prompt,
        temperature = params['temperature'],
        max_tokens = params['max_tokens'],
        top_p = params['top_p'],
        frequency_penalty = params['frequency_penalty'],
        presence_penalty = params['presence_penalty'],
    )
    return response
  ```

  - 리펙토링
```
from openai import OpenAI
client = OpenAI()
def set_open_params(
    model="gpt-3.5-turbo-instruct",
    temperature=0.7,
    max_tokens=20,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
):
    return {
        'model': model,
        'temperature': temperature,
        'max_tokens': max_tokens,
        'top_p': top_p,
        'frequency_penalty': frequency_penalty,
        'presence_penalty': presence_penalty
    }
def get_completion_v1(params, prompt):
    response =  client.completions.create(
        **params,  # 딕셔너리의 키-값 쌍을 매개변수로 전달
        prompt=prompt
    )
    return response.choices[0].text
  ```


In [4]:
from openai import OpenAI
client = OpenAI()

def set_open_params(
    model="gpt-3.5-turbo-instruct",
    temperature=0.7,
    max_tokens=20,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
):
    return {
        'model': model,
        'temperature': temperature,
        'max_tokens': max_tokens,
        'top_p': top_p,
        'frequency_penalty': frequency_penalty,
        'presence_penalty': presence_penalty
    }
def get_completion(params, prompt):
    response =  client.completions.create(
        **params,  # 딕셔너리의 키-값 쌍을 매개변수로 전달
        prompt=prompt
    )
    return response.choices[0].text

In [5]:
set_open_params()

{'model': 'gpt-3.5-turbo-instruct',
 'temperature': 0.7,
 'max_tokens': 20,
 'top_p': 1,
 'frequency_penalty': 0,
 'presence_penalty': 0}

Basic prompt example:

In [7]:
# basic example
params = set_open_params()

prompt = "안녕 코난쌤을 아세요?"

gpt_response=get_completion(params, prompt)

print(gpt_response)



네, 코난쌤은 제가 아는 분입니다.


In [8]:
IPython.display.Markdown(gpt_response)



네, 코난쌤은 제가 아는 분입니다.

다른 temperature로 시도하여 결과를 비교하세요:

In [9]:
params = set_open_params(temperature=0)
prompt = "안녕 코난쌤을 아세요?"
response = get_completion(params, prompt)
IPython.display.Markdown(response)



네, 코난쌤은 일본의 만화가

### 1.1 Text Summarization

In [11]:
params = set_open_params(temperature=0.7)
prompt = """Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance.

Explain the above in one sentence:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response)

 Antibiotics are medicines that fight bacterial infections by killing the bacteria or preventing their reproduction, and can be

Exercise: 모델에게 "나는 5살이다"와 같이 한 문장으로 단락을 설명하도록 지시하기. 차이가 보이나요?

### 1.2 Question Answering

In [12]:
prompt = """Answer the question based on the context below. Keep the answer short and concise. Respond "Unsure about answer" if not sure about the answer.

Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.

Question: What was OKT3 originally sourced from?

Answer:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response)


 Mice

여기에서 얻은 컨텍스트: https://www.nature.com/articles/d41586-023-00400-x

Exercise:수정 프롬프트에서 모델이 답을 잘 모른다고 응답하도록 하기

### 1.3 Text Classification

In [13]:
prompt = """Classify the text into neutral, negative or positive.

Text: I think the food was okay.

Sentiment:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response)

 Neutral

Exercise: 모델에게 선택한 답에 대한 설명을 제공하도록 프롬프트를 수정하기

### 1.4 Role Playing

In [14]:
prompt = """The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.

Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of blackholes?
AI:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response)

 Of course. Black holes are formed when a massive star dies and its core collapses under its own gravity

Exercise: AI의 반응을 간결하고 짧게 유지하도록 지시하는 프롬프트 수정하기

### 1.5 Code Generation

In [15]:
prompt = "\"\"\"\nTable departments, columns = [DepartmentId, DepartmentName]\nTable students, columns = [DepartmentId, StudentId, StudentName]\nCreate a MySQL query for all students in the Computer Science Department\n\"\"\""

response = get_completion(params, prompt)
IPython.display.Markdown(response)



SELECT StudentName 
FROM students 
WHERE DepartmentId = (SELECT DepartmentId 
                      FROM departments

### 1.6 Reasoning

In [16]:
prompt = """The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.

Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even."""

response = get_completion(params, prompt)
IPython.display.Markdown(response)



Step 1: Identify the odd numbers in the group.
In this group, the odd numbers

Exercise: 프롬프트가 더 나은 구조와 출력 형식을 갖도록 개선해보기

## 2. Advanced Prompting Techniques

목표:

- 프롬프트에 대한 고급 기법(few-shot, chain-of-thoughts,...)을 다룹니다.

### 2.2 Few-shot prompts

In [18]:
prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.

The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: The answer is True.

The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: The answer is True.

The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: The answer is False.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response)

 The answer is False.

### 2.3 Chain-of-Thought (CoT) Prompting

In [None]:
prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
A:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.

### 2.4 Zero-shot CoT

In [19]:
prompt = """I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?

Let's think step by step."""

response = get_completion(params, prompt)
IPython.display.Markdown(response)

 
1. Bought 10 apples
2. Gave 2 apples to neighbor and 2

### 2.5 자체 일관성 유지
연습 삼아 [가이드](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-advanced-usage.md#self-consistency)에서 예시를 확인하고 여기에서 시도해 보세요.

### 2.6 지식 프롬프트 생성하기

연습용으로 [가이드](https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-advanced-usage.md#generated-knowledge-prompting)에서 예시를 확인하고 여기에서 시도해 보세요.

### 2.6 PAL - Code as Reasoning

코드를 통해서 질문에 대해 추론을 할 수 있는 간단한 애플리케이션을 개발해보겠습니다.

구체적으로, 이 애플리케이션은 일부 데이터를 가져와서 입력된 데이터에 대한 질문에 답합니다. 프롬프트에는 [여기](https://github.com/reasoning-machines/pal/blob/main/pal/prompt/penguin_prompt.py)에서 채택한 몇 가지 예제가 포함되어 있습니다.

In [40]:
# 기존 레거시 코드
# lm instance
# llm = OpenAI(model='text-davinci-003', temperature=0)

from langchain.llms import OpenAI

llm = OpenAI(model='text-davinci-003', temperature=0)

In [56]:
question = "누가 더 늙은 팽귄인가요?"

In [37]:
PENGUIN_PROMPT = '''
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg)
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
We now add a penguin to the table:
James, 12, 90, 12
How many penguins are less than 8 years old?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Add penguin James.
penguins.append(('James', 12, 90, 12))
# Find penguins under 8 years old.
penguins_under_8_years_old = [penguin for penguin in penguins if penguin[1] < 8]
# Count number of penguins under 8.
num_penguin_under_8 = len(penguins_under_8_years_old)
answer = num_penguin_under_8
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg)
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
Which is the youngest penguin?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort the penguins by age.
penguins = sorted(penguins, key=lambda x: x[1])
# Get the youngest penguin's name.
youngest_penguin_name = penguins[0][0]
answer = youngest_penguin_name
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg)
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
What is the name of the second penguin sorted by alphabetic order?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort penguins by alphabetic order.
penguins_alphabetic = sorted(penguins, key=lambda x: x[0])
# Get the second penguin sorted by alphabetic order.
second_penguin_name = penguins_alphabetic[1][0]
answer = second_penguin_name
"""
{question}
"""
'''.strip() + '\n'

이제 프롬프트와 질문이 생겼습니다. 이제 이를 모델에 전송할 수 있습니다. 답에 대한 해답을 얻는 데 필요한 단계를 코드로 출력해야 합니다.

In [57]:
formatted_prompt = PENGUIN_PROMPT.format(question=question)
llm_out = llm(formatted_prompt)

print(llm_out)

# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort penguins by age.
penguins_by_age = sorted(penguins, key=lambda x: x[1], reverse=True)
# Get the oldest penguin's name.
oldest_penguin_name = penguins_by_age[0][0]
answer = oldest_penguin_name


In [58]:
exec(llm_out)
print(answer)

Vincent


정답입니다! 빈센트는 가장 나이가 많은 펭귄입니다.

Exercise: 다른 질문을 해보고 어떤 결과가 나오는지 확인해 보세요.

In [55]:
# @title 다른 질문 예시

# question = "누가 제일 키가 큰 팽귄인가요?"
# question = "누가 가장 몸무게가 많이 나가는 팽귄인가요?"

---

# 3. 도구 및 애플리케이션

목표:

- 프롬프트 기술과 LLM을 사용하여 간단한 애플리케이션을 시연하기 위해 LangChain을 사용하는 방법을 시연합니다.

### 3.1 LLM 및 외부 도구

[LangChain 문서](https://langchain.readthedocs.io/en/latest/modules/agents/getting_started.html)에서 채택한 예제입니다.

In [59]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent

In [60]:
llm = OpenAI(temperature=0)

tools = load_tools(["google-serper", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

In [61]:
# run the agent
agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: google_serper
Action Input: Olivia Wilde boyfriend[0m
Observation: [36;1m[1;3mWilde did not seek spousal support, and the pair reached a private agreement on property division. Wilde began dating American actor and comedian Jason Sudeikis in November 2011. They became engaged in January 2013. The couple have two children: a son, born in 2014, and a daughter, born in 2016.[0m
Thought:[32;1m[1;3m I need to find out Jason Sudeikis' age.
Action: google_serper
Action Input: Jason Sudeikis age[0m
Observation: [36;1m[1;3m48 years[0m
Thought:[32;1m[1;3m I need to calculate his age raised to the 0.23 power.
Action: Calculator
Action Input: 48^0.23[0m
Observation: [33;1m[1;3mAnswer: 2.436045998596883[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, 

"Jason Sudeikis, Olivia Wilde's boyfriend, is 48 years old and his age raised to the 0.23 power is 2.436045998596883."

### 3.2 Data-Augmented Generation (문서 기반 응답)

먼저, 증강 생성을 위한 소스로 사용할 데이터를 다운로드해야 합니다.

코드 예제는 [LangChain 문서](https://langchain.readthedocs.io/en/latest/modules/chains/combine_docs_examples/qa_with_sources.html)에서 가져온 것입니다. 예제는 교육 목적으로만 사용하고 있습니다.

먼저 데이터를 준비합니다:

In [62]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate

In [67]:
with open('./state_of_the_union.txt') as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()

In [73]:
!pip install chromadb tiktoken

Collecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0mSuccessfully installed tiktoken-0.5.1


In [74]:
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])

In [75]:
query = "What did the president say about Justice Breyer"
docs = docsearch.similarity_search(query)

빠르게 테스트해 보겠습니다:

In [76]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI

In [77]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': " The president thanked Justice Breyer for his service and mentioned that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to continue Justice Breyer's legacy of excellence.\nSOURCES: 31-pl"}

사용자 지정 프롬프트가 있는 질문을 해 보겠습니다:

In [78]:
template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
Respond in Spanish.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER IN SPANISH:"""

# create a prompt template
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])

# query
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff", prompt=PROMPT)
query = "What did the president say about Justice Breyer?"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': '\nEl Presidente dijo que Justice Breyer ha dedicado su vida a servir a este país y que el nombramiento de la Juez Ketanji Brown Jackson como su sucesora continuará el legado de excelencia de Breyer.\n\nFUENTES:\n31. Discurso del Presidente Biden ante el Congreso, 25 de abril de 2021. Disponible en: https://www.whitehouse.gov/briefing-room/speeches-remarks/2021/04/28/remarks-by-president-biden-address-to-a-joint-session-of-congress/'}

Exercise: 지금까지 예제에서 배운 모든 기법을 사용하고 인터넷에 있는 다른 데이터셋과 다른 프롬프트를 사용해보기