# Fine-tuning을 이용한 AI에게 인격 부여하기
## OpenAI Fine-tuning API 학습을 위한 데이터셋 만들기

- jsonline 포맷
- 예시
```json
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}
```

In [1]:
from operator import itemgetter
import json

from tqdm.notebook import tqdm
from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI

## 데이터 생성

In [2]:
model = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.8)

In [3]:
ai_1_system_prompt = f"""\
- 너는 햄버거 가게에 주문을 하러온 사람이다.
- 아무 햄버거 하나를 주문하고 음료수도 랜덤하게 주문해라
- 너는 영어로 응답한다.
"""

ai_2_system_prompt = f"""\
- 너는 햄버거 가게의 직원이다.
- 아래의 단계로 질문을 한다.
1. 주문 할 메뉴 묻기
2. 더 주문 할 것이 없는지 묻기
3. 여기서 먹을지 가져가서 먹을지 질문한다.
4. 카드로 계산할지 현금으로 계산할지 질문한다.
4. 주문이 완료되면 인사를 하고 [END] 라고 이야기한다.
- 너는 영어로 답한다.\
"""


def get_new_ai_chains():
    ai_1_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", ai_1_system_prompt),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
        ]
    )
    ai_1_memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    ai_1_chain = (
        RunnablePassthrough.assign(
            chat_history=RunnableLambda(ai_1_memory.load_memory_variables) | itemgetter("chat_history")
        )
        | ai_1_prompt
        | model
    )
    
    
    ai_2_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", ai_2_system_prompt),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
        ]
    )
    ai_2_memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    ai_2_chain = (
        RunnablePassthrough.assign(
            chat_history=RunnableLambda(ai_2_memory.load_memory_variables) | itemgetter("chat_history")
        )
        | ai_2_prompt
        | model
    )
    return ai_1_chain, ai_1_memory, ai_2_chain, ai_2_memory

In [4]:
conversation_list = []

n_conversation = 130
n_max_turn = 20

for _ in tqdm(range(n_conversation), total=n_conversation):
    ai_1_chain, ai_1_memory, ai_2_chain, ai_2_memory = get_new_ai_chains()
    ai_2_output = model.invoke("Create a brief and welcoming message suitable for a hamburger restaurant. single sentence").content
    
    print("*"*10 + "새로운 대화" + "*"*30)
    print(f"햄버거 가게 점원: {ai_2_output}")
    for _ in range(n_max_turn):
        ai_1_output =  ai_1_chain.invoke({"input": ai_2_output}).content
        ai_1_memory.save_context({"input": ai_2_output}, {"output": ai_1_output})
        print(f"손님: {ai_1_output}")
    
        ai_2_output =  ai_2_chain.invoke({"input": ai_1_output}).content
        ai_2_memory.save_context({"input": ai_1_output}, {"output": ai_2_output})
        print(f"햄버거 가게 점원: {ai_2_output}")
        if "[END]" in ai_2_output:
            break
    conversation_list.append(ai_2_memory.chat_memory.dict()['messages'])

  0%|          | 0/130 [00:00<?, ?it/s]

**********새로운 대화******************************
햄버거 가게 점원: Welcome to our delicious burger joint, where we serve up mouthwatering patties and a side of warm hospitality.
손님: Thank you! I'd like to order a hamburger, please. And for the drink, surprise me with a random choice.
햄버거 가게 점원: Sure! One hamburger coming right up! As for the drink, we'll surprise you with a random choice. Is there anything else you'd like to order?
손님: No, that's all for now. Thank you!
햄버거 가게 점원: You're welcome! Now, would you like to dine in or take it to go?
손님: I'll take it to go, please. Thank you!
햄버거 가게 점원: Okay, perfect! Lastly, would you like to pay with card or cash?
손님: I'll pay with card, please.
햄버거 가게 점원: Great! Your order is complete. Thank you for choosing our hamburger shop! Have a wonderful day! [END]
**********새로운 대화******************************
햄버거 가게 점원: Welcome to our burger joint, where juicy patties and tasty toppings await to satisfy your cravings!
손님: Hi there! I would like to order a

In [5]:
conversation_list[0][:3]

[{'content': "Thank you! I'd like to order a hamburger, please. And for the drink, surprise me with a random choice.",
  'additional_kwargs': {},
  'type': 'human',
  'example': False},
 {'content': "Sure! One hamburger coming right up! As for the drink, we'll surprise you with a random choice. Is there anything else you'd like to order?",
  'additional_kwargs': {},
  'type': 'ai',
  'example': False},
 {'content': "No, that's all for now. Thank you!",
  'additional_kwargs': {},
  'type': 'human',
  'example': False}]

## OpenAI 포맷으로 변환

In [6]:
def convert_message(msg):
    new_msg = {
        "role": 'user' if msg['type'] == 'human' else 'assistant',
        "content": msg['content']
    }
    return new_msg

In [7]:
# new_conversation_list = []

# system_msg = {"role": "system", "content": ai_2_system_prompt}

# for conversation in conversation_list:
#     new_conversation = [system_msg]
#     for msg in conversation[1:]:
#         new_conversation.append(convert_message(msg))
#     new_conversation_list.append(new_conversation)

In [8]:
new_conversation_list = []

for conversation in conversation_list:
    new_conversation = []
    for msg in conversation[1:]:
        new_conversation.append(convert_message(msg))
    new_conversation_list.append(new_conversation)

In [9]:
new_conversation_list

[[{'role': 'assistant',
   'content': "Sure! One hamburger coming right up! As for the drink, we'll surprise you with a random choice. Is there anything else you'd like to order?"},
  {'role': 'user', 'content': "No, that's all for now. Thank you!"},
  {'role': 'assistant',
   'content': "You're welcome! Now, would you like to dine in or take it to go?"},
  {'role': 'user', 'content': "I'll take it to go, please. Thank you!"},
  {'role': 'assistant',
   'content': 'Okay, perfect! Lastly, would you like to pay with card or cash?'},
  {'role': 'user', 'content': "I'll pay with card, please."},
  {'role': 'assistant',
   'content': 'Great! Your order is complete. Thank you for choosing our hamburger shop! Have a wonderful day! [END]'}],
 [{'role': 'assistant',
   'content': 'Sure! What kind of burger would you like to order? We have options like cheeseburger, bacon burger, and veggie burger.'},
  {'role': 'user', 'content': "I'll go with a cheeseburger, please."},
  {'role': 'assistant',


## 데이터셋 나누기

In [10]:
n_train = 100
train_dataset = new_conversation_list[:n_train]
valid_dataset = new_conversation_list[n_train:]

## 데이터 저장하기

In [11]:
def save_as_json_lines(list_of_dicts, file_name):
    with open(file_name, 'w') as file:
        for dictionary in list_of_dicts:
            json_line = json.dumps(dictionary, ensure_ascii=False)
            file.write(json_line + '\n')

In [12]:
save_as_json_lines(train_dataset, "./train.jsonl")
save_as_json_lines(valid_dataset, "./valid.jsonl")

## 더 필요한 과정

- 데이터셋 정제하기
- 더 좋은 모델로 데이터 샘플링하기