# 개인 데이터셋을 통한 llama2 fine-tune

llama2를 사용한 fine-tune의 간편화에 목적을 두었습니다.

- T4 GPU 방법
  - T4 GPU에서 llama2 기반 모델 load를 위한 parameter 조정
  - 4bit load 적용
  - SFT Trainer를 사용한 학습
    - SFT는 특정 task에 모델을 조정하는 방법
    - 이번에 SFT를 사용하는 이유는 개인 데이터셋으로 만든 데이터에 최적화하기 위함 (다른 trainer를 사용해도 문제없습니다.)
    - 개인 데이터셋으로 만든 데이터로 SFT학습 후 주어진 task에 해당하는 질문에 유사한 답변을 하는 것이 목표여서 해당 Trainer를 사용했습니다.

&nbsp;

In [None]:
!pip install llama-index==0.9.27
!pip install gdown
!pip install llama-hub
!pip install PyMuPDF
!pip install nest-asyncio
!pip install jsonlines
!pip install gradio==3.48.0
!pip install trl
!pip install pypdf
!pip install langchain
!pip install chromadb
!pip install pydantic==1.10.13
!pip install gradientai
!pip install sentence-transformers

!git clone https://github.com/choijhyeok/easy_finetuner.git
%cd easy_finetuner
!pip install -r requirements.txt
%cd ..

In [None]:
import os
import gdown
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import RetrievalQAWithSourcesChain
from llama_hub.file.pymu_pdf.base import PyMuPDFReader
from langchain.document_loaders import PyPDFLoader
from langchain.schema import Document
import re
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
import warnings
from langchain.embeddings import HuggingFaceEmbeddings
import jsonlines
from datasets import Dataset
warnings.filterwarnings('ignore')

#gdown.download(url="https://drive.google.com/file/d/16hHL4hLer3nWhX18STvr061LcHzfFgN2/view?usp=sharing", output="qa_버거킹_train.jsonl", quiet=False)
#gdown.download(url="https://drive.google.com/file/d/1nB6ERfII2ODEDS_1xY3C5TBZHeOMMcPI/view?usp=sharing", output="qa_버거킹_train_ko.jsonl", quiet=False)
#gdown.download(url="https://drive.google.com/file/d/11U7let6PY_YCJpgRT0Dpr5DXSO3Ceqep/view?usp=sharing", output="버거킹.pdf", quiet=False)
#gdown.download(url="https://drive.google.com/file/d/1BifMUDNX2v_4B7hb4YHv6eDinQqAjTZK/view?usp=sharing", output="Burger-King.pdf", quiet=False)

Downloading...
From: https://drive.google.com/file/d/16hHL4hLer3nWhX18STvr061LcHzfFgN2/view?usp=sharing
To: /content/qa_버거킹_train.jsonl
82.5kB [00:00, 308MB/s]
Downloading...
From: https://drive.google.com/file/d/1nB6ERfII2ODEDS_1xY3C5TBZHeOMMcPI/view?usp=sharing
To: /content/qa_버거킹_train_ko.jsonl
82.6kB [00:00, 306MB/s]
Downloading...
From: https://drive.google.com/file/d/11U7let6PY_YCJpgRT0Dpr5DXSO3Ceqep/view?usp=sharing
To: /content/버거킹.pdf
82.4kB [00:00, 310MB/s]
Downloading...
From: https://drive.google.com/file/d/1BifMUDNX2v_4B7hb4YHv6eDinQqAjTZK/view?usp=sharing
To: /content/Burger-King.pdf
83.2kB [00:00, 311MB/s]


'Burger-King.pdf'

&nbsp;

# colab 무료 GPU를 통한 fine-tune

## llama2 fine tune 데이터 생성

### meta llama2 7b fine tune 데이터 생성

In [None]:
os.environ["huggingface_token"] = "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"

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
bergerking_dataset = []
with jsonlines.open("/content/drive/MyDrive/DL & Jobs/qa_버거킹_train.jsonl") as f:
    for line in f.iter():
        # bergerking_dataset.append(f'<s>[INST] {line["inputs"]} [/INST] {line["response"]} </s>')
        bergerking_dataset.append(f'<s>### Instruction: \n{line["inputs"]} \n\n### Response: \n{line["response"]}</s>')

# 데이터셋 확인
print('데이터셋 확인')
print(bergerking_dataset[:5])

# 데이터셋 생성 및 저장
burgerking_dataset = Dataset.from_dict({"text": bergerking_dataset})
burgerking_dataset.save_to_disk('/content/easy_finetuner/example-datasets/burgerking_dataset')

# 데이터셋 info 확인
print('데이터셋 info 확인')
print(burgerking_dataset)

데이터셋 확인
['<s>### Instruction: \nWhat makes the Monster X set spicy? \n\n### Response: \nThe Monster X set is made spicy by the intense flavor of the Diablo sauce.</s>', '<s>### Instruction: \nWhat is the cost of a regular French Fries at Burger King? \n\n### Response: \nThe cost of a regular French Fries at Burger King is 3,000 won.</s>', '<s>### Instruction: \nHow much does the Whole Shrimp Whopper set cost at Burger King? \n\n### Response: \nThe Whole Shrimp Whopper set at Burger King costs 11,300 won.</s>', '<s>### Instruction: \nWhat type of sauce is used in the Side Shrimp Burger at Burger King? \n\n### Response: \nThe Side Shrimp Burger at Burger King uses a sweet and sour sauce.</s>', "<s>### Instruction: \nWhat is the main ingredient in Burger King's Whopper? \n\n### Response: \nThe main ingredient in Burger King's Whopper is a freshly grilled beef patty.</s>"]


Saving the dataset (0/1 shards):   0%|          | 0/81 [00:00<?, ? examples/s]

데이터셋 info 확인
Dataset({
    features: ['text'],
    num_rows: 81
})


&nbsp;

### ko llama2 7b fine tune 데이터 생성

In [None]:
bergerking_dataset = []
with jsonlines.open("/content/drive/MyDrive/DL & Jobs/qa_버거킹_train_ko.jsonl") as f:
    for line in f.iter():
        bergerking_dataset.append(f'<s>### Instruction: \n{line["inputs"]} \n\n### Response: \n{line["response"]}</s>')

# 데이터셋 확인
print('데이터셋 확인')
print(bergerking_dataset[:5])

# 데이터셋 생성 및 저장
burgerking_dataset = Dataset.from_dict({"text": bergerking_dataset})
burgerking_dataset.save_to_disk('/content/easy_finetuner/example-datasets/burgerking_dataset_ko')

# 데이터셋 info 확인
print('데이터셋 info 확인')
print(burgerking_dataset)

데이터셋 확인
['<s>### Instruction: \n와퍼의 주요 재료는 무엇인가요? \n\n### Response: \n와퍼의 주요 재료는 순 쇠고기 패티와 다양한 야채입니다.</s>', '<s>### Instruction: \n코카-콜라L의 가격은 얼마인가요? \n\n### Response: \n코카-콜라L의 가격은 3,100원입니다.</s>', '<s>### Instruction: \n코코넛슈림프 6조각과 스위트칠리소스의 가격은 얼마인가요? \n\n### Response: \n코코넛슈림프 6조각과 스위트칠리소스의 가격은 4,800원입니다.</s>', '<s>### Instruction: \n크리미모짜볼5조각의 가격은 얼마인가요? \n\n### Response: \n크리미모짜볼5조각의 가격은 3,500원입니다.</s>', '<s>### Instruction: \n치킨버거 세트에는 어떤 소스로 고소함을 더했나요? \n\n### Response: \n치킨버거 세트에는 풍부한 마요 소스로 고소함을 더했습니다.</s>']


Saving the dataset (0/1 shards):   0%|          | 0/97 [00:00<?, ? examples/s]

데이터셋 info 확인
Dataset({
    features: ['text'],
    num_rows: 97
})


&nbsp;

## Easy Finetuner 실행

- https://github.com/choijhyeok/easy_finetuner
- 기존의 여러 gradio를 참고해서 만들었습니다. (T4 GPU에 최적화)

In [None]:
"버거킹에서 판매하는 스프라이트 제로의 특징은 무엇인가요?"

In [None]:
%cd /content/easy_finetuner
!python app.py --share

/content/easy_finetuner
2024-02-02 13:25:25.552314: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-02 13:25:25.552374: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-02 13:25:25.553873: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Running on local URL:  http://127.0.0.1:8000
Running on public URL: https://6d6746b68b11e6897a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
You're using a LlamaTokenizerFast tokenizer. Please note that wit

&nbsp;

## fine-tune llama2 RAG

In [None]:
from transformers import AutoTokenizer, LlamaForCausalLM
import torch
import argparse
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
import warnings
warnings.filterwarnings('ignore')


def llama2_prompt(input_text):
  return f'### Instruction:\n{input_text}\n\n### Response:'

def llama2_output(ouput_text):
  sep = ouput_text[0]['generated_text'].split('### Response:')[1].split('### Instruction')[0].split('## Instruction')[0].split('# Instruction')[0].split('Instruction')[0]
  sep = sep[1:] if sep[0] == '.' else sep
  sep = sep[:sep.find('.')+1] if '.' in sep else sep
  return sep

adapter_name = 'beomi_llama-2-ko-7b_buger_sjb'


compute_dtype = getattr(torch, 'float16')

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False
)

model = AutoModelForCausalLM.from_pretrained('beomi/llama-2-ko-7b', quantization_config=bnb_config, device_map={'': 0}, use_auth_token=os.environ["huggingface_token"])
model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained('beomi/llama-2-ko-7b', trust_remote_code=True, use_auth_token=os.environ["huggingface_token"])
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"



model = PeftModel.from_pretrained(model, f'/content/easy_finetuner/lora/{adapter_name}')

Loading checkpoint shards:   0%|          | 0/15 [00:00<?, ?it/s]

In [None]:
pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_length=150,
                do_sample=True,
                temperature=0.1,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id,
                top_k=3,
                # top_p=0.3,
                repetition_penalty = 1.3,
                framework='pt'
                # early_stopping=True
)

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'O

In [None]:
prompt = "스모키 바비큐 X의 독특한 특징은 무엇인가요?"
result = pipe(llama2_prompt(prompt))
print(llama2_output(result))

스모키 바비큐 X의 독특한 특징은 훈연향이 가득한 100% 순 쇠고기 패티와 진하고 달콤한 스모크향과 풍미가 느껴지는 치즈로 만든 소스의 조화입니다.


In [None]:
prompt = "버거킹에서 판매하는 스프라이트 제로의 특징은 무엇인가요?"
result = pipe(llama2_prompt(prompt))
print(llama2_output(result))

스프라이트 제로는 0kcal이며, 칼로리 걱정 없이 즐길 수 있는 음료입니다.


In [None]:
prompt = "코코넛슈림프 3조각과 스위트칠리소스의 가격은 얼마인가요?"
result = pipe(llama2_prompt(prompt))
print(llama2_output(result))

코코넛슈림프 3조각과 스위트칠리소스의 가격은 5,800원입니다.


&nbsp;

In [None]:
%cd /content/
loader = PyPDFLoader('/content/drive/MyDrive/DL & Jobs/버거킹.pdf')
documents = loader.load()

/content


In [None]:
output = []
# text 정제
for page in documents:
    text = page.page_content
    text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text)   # 안녕-\n하세요 -> 안녕하세요
    text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip()) # "인\n공\n\n지능펙\n토리 -> 인공지능펙토리
    text = re.sub(r"\n\s*\n", "\n\n", text) # \n버\n\n거\n\n킹\n -> 버\n거\n킹
    output.append(text)

In [None]:
doc_chunks = []

for line in output:
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=2000, # 최대 청크 길이
        separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""], #  텍스트를 청크로 분할하는 데 사용되는 문자 목록
        chunk_overlap=0, # 인접한 청크 간에 중복되는 문자 수
    )
    chunks = text_splitter.split_text(line)
    for i, chunk in enumerate(chunks):
        doc = Document(
            page_content=chunk, metadata={ "source": '버거킹.pdf', "page": i}
        )
        doc_chunks.append(doc)

In [None]:
embed_model = HuggingFaceEmbeddings(
    model_name="jhgan/ko-sbert-sts"
)
index = Chroma.from_documents(doc_chunks, embed_model)
retriever = index.as_retriever(search_kwargs={"k": 1})

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/4.44k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/620 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/443M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/538 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/248k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/495k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline


pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_length=780,
                do_sample=True,
                temperature=0.1,
                num_return_sequences=1,
                eos_token_id=tokenizer.eos_token_id,
                top_k=3,
                # top_p=0.3,
                repetition_penalty = 1.3,
                framework='pt'
                # early_stopping=True
)


llm = HuggingFacePipeline(pipeline=pipe)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausal

In [None]:

def bkchain_output(text):
  text = text.split('Machine:')[1] if 'Machine:' in text else text
  text = text.split('Human:')[0] if 'Human:' in text else text
  return text.strip()



system_template="""To answer the question at the end, use the following context. If you don't know the answer, just say you don't know and don't try to make up an answer.
I want you to act as my Burger King menu recommender. It tells you your budget and suggests what to buy.

please answer in korean.


{summaries}
"""
messages = [
    SystemMessagePromptTemplate.from_template(system_template),
    HumanMessagePromptTemplate.from_template("{question}")
]
prompt = ChatPromptTemplate.from_messages(messages)


chain_type_kwargs = {"prompt": prompt}
bk_chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs=chain_type_kwargs,
    return_source_documents=True,
    reduce_k_below_max_tokens=True ##max_length 가 넘어가면 답변을 안하는 현상을 막음/ Legth만큼 줄여서 답변을 함
)

In [None]:
result = bk_chain({"question": "스모키 바비큐 X의 독특한 특징은 무엇인가요?"})

print(f"질문 : {result['question']}")
print()
print(f"답변 : {bkchain_output(result['answer'])}")

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1699 > 1024). Running this sequence through the model will result in indexing errors


질문 : 스모키 바비큐 X의 독특한 특징은 무엇인가요?

답변 : 스모키 바비큐 X는 강력한 불 맛을 느낄 수 있는 제품입니다. 또한, 스모크 향미가 느껴지는 육즙 가득한 소고기 패티와 함께 다양한 재료들이 어우러져 있어 더욱 풍성한 맛을 즐길 수 있습니다.​


In [None]:
result = bk_chain({"question": "버거킹에서 판매하는 스프라이트 제로의 특징은 무엇인가요?"})

print(f"질문 : {result['question']}")
print()
print(f"답변 : {bkchain_output(result['answer'])}")

질문 : 버거킹에서 판매하는 스프라이트 제로의 특징은 무엇인가요?

답변 : 스프라이트 제로는 칼로리가 낮으며 설탕 함량이 전혀 없습니다. 또한 인공감미료도 첨가되지 않았습니다.​


In [None]:
result = bk_chain({"question": "롱치킨버거에는 어떤 소스와 야채가 사용되었나요?"})

print(f"질문 : {result['question']}")
print()
print(f"답변 : {bkchain_output(result['answer'])}")

질문 : 롱치킨버거에는 어떤 소스와 야채가 사용되었나요?

답변 : 롱치킨버거는 부드럽고 달콤한 마요네즈 소스에 신선한 양상추를 곁들여 더욱더욱 푸짐합니다. ​


&nbsp;

### Local llm Gradio

In [None]:
import os
import logging
import sys
import gradio as gr
import torch
import gc

def reset_state():
    return [], [], "Reset Done"
def reset_textbox():
    return gr.update(value=""),""
def transfer_input(inputs):
    textbox = reset_textbox()
    return (
        inputs,
        gr.update(value=""),
        gr.Button.update(visible=True),
    )

title = """<h1 align="left" style="min-width:350px; margin-top:0;"> <img src="https://lh3.google.com/u/0/d/1txdmhh6pWjdJBpqGBRMdC0qQX2f7pzxI=w2020-h952-iv1" width="32px" style="display: inline"> AIF 버거킹 chat </h1>"""
description_top = """\
<div align="left">
<p></p>
<p>
</p >
</div>
"""

CONCURRENT_COUNT = 100


ALREADY_CONVERTED_MARK = "<!-- ALREADY CONVERTED BY PARSER. -->"

small_and_beautiful_theme = gr.themes.Soft(
        primary_hue=gr.themes.Color(
            c50="#02C160",
            c100="rgba(2, 193, 96, 0.2)",
            c200="#02C160",
            c300="rgba(2, 193, 96, 0.32)",
            c400="rgba(2, 193, 96, 0.32)",
            c500="rgba(2, 193, 96, 1.0)",
            c600="rgba(2, 193, 96, 1.0)",
            c700="rgba(2, 193, 96, 0.32)",
            c800="rgba(2, 193, 96, 0.32)",
            c900="#02C160",
            c950="#02C160",
        ),
        secondary_hue=gr.themes.Color(
            c50="#576b95",
            c100="#576b95",
            c200="#576b95",
            c300="#576b95",
            c400="#576b95",
            c500="#576b95",
            c600="#576b95",
            c700="#576b95",
            c800="#576b95",
            c900="#576b95",
            c950="#576b95",
        ),
        neutral_hue=gr.themes.Color(
            name="gray",
            c50="#f9fafb",
            c100="#f3f4f6",
            c200="#e5e7eb",
            c300="#d1d5db",
            c400="#B2B2B2",
            c500="#808080",
            c600="#636363",
            c700="#515151",
            c800="#393939",
            c900="#272727",
            c950="#171717",
        ),
        radius_size=gr.themes.sizes.radius_sm,
    ).set(
        button_primary_background_fill="#06AE56",
        button_primary_background_fill_dark="#06AE56",
        button_primary_background_fill_hover="#07C863",
        button_primary_border_color="#06AE56",
        button_primary_border_color_dark="#06AE56",
        button_primary_text_color="#FFFFFF",
        button_primary_text_color_dark="#FFFFFF",
        button_secondary_background_fill="#F2F2F2",
        button_secondary_background_fill_dark="#2B2B2B",
        button_secondary_text_color="#393939",
        button_secondary_text_color_dark="#FFFFFF",
        # background_fill_primary="#F7F7F7",
        # background_fill_primary_dark="#1F1F1F",
        block_title_text_color="*primary_500",
        block_title_background_fill="*primary_100",
        input_background_fill="#F6F6F6",
    )

with open("/content/easy_finetuner/custom.css", "r", encoding="utf-8") as f:
    customCSS = f.read()

logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s [%(levelname)s] [%(filename)s:%(lineno)d] %(message)s",
)


total_count = 0
def predict(input_text,
            history):
    global bk_chain

    result = bk_chain({"question": input_text})
    history = history + [((input_text, None))]
    history = history + [((None, bkchain_output(result['answer'])))]
    return history, history, "Generate: Success"


with gr.Blocks(css=customCSS, theme=small_and_beautiful_theme) as demo:
    history = gr.State([])
    user_question = gr.State("")
    with gr.Row():
        gr.HTML(title)
        status_display = gr.Markdown("Success", elem_id="status_display")
    gr.Markdown(description_top)
    with gr.Row(scale=1).style(equal_height=True):
        with gr.Column(scale=5):
            with gr.Row(scale=1):
                chatbot = gr.Chatbot(avatar_images=('https://yt3.googleusercontent.com/_JbQDtNPfI8h6RPW_9Og5qlGhSBhpMp5qX3JR7iNeSC9XZL4btbNE3dFB4ec77tauPA-nLGQTQ=s900-c-k-c0x00ffffff-no-rj', 'https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7'),elem_id="chuanhu_chatbot").style(height="100%")
            with gr.Row(scale=1):
                with gr.Column(scale=12):
                    user_input = gr.Textbox(
                        show_label=False, placeholder="Enter text"
                    ).style(container=False)
                with gr.Column(min_width=70, scale=1):
                    submitBtn = gr.Button("Send")
                with gr.Column(min_width=70, scale=1):
                    cancelBtn = gr.Button("Stop")
            with gr.Row(scale=1):
                emptyBtn = gr.Button(
                    "🧹 New Conversation",
                )


    predict_args = dict(
        fn=predict,
        inputs=[
            user_question,
            history
        ],
        outputs=[chatbot, history, status_display],
        show_progress=True,
    )

    reset_args = dict(
        fn=reset_textbox, inputs=[], outputs=[user_input, status_display]
    )

    # Chatbot
    transfer_input_args = dict(
        fn=transfer_input, inputs=[user_input], outputs=[user_question, user_input, submitBtn], show_progress=True
    )



    predict_event1 = user_input.submit(**transfer_input_args).then(**predict_args)
    predict_event2 = submitBtn.click(**transfer_input_args).then(**predict_args)

    gr.Markdown("<h2>버거킹 chat 시연 리스트</h2>")
    gr.Examples(
        examples=[
            "스모키 바비큐 X의 독특한 특징은 무엇인가요?",
            "버거킹에서 판매하는 스프라이트 제로의 특징은 무엇인가요?",
            "롱치킨버거에는 어떤 소스와 야채가 사용되었나요?"
                  ],
        inputs=user_input
    )


    emptyBtn.click(
        reset_state,
        outputs=[chatbot, history, status_display],
        show_progress=True,
    )
    emptyBtn.click(**reset_args)


demo.queue(concurrency_count=1).launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://a1649d88765e70b2f3.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


