# 1. Install Packages

In [1]:
!pip install -q langchain transformers langchain-huggingface langchain-community langchain-core langchain-text-splitters bitsandbytes docx2txt langchain-chroma

  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-upstage 0.3.0 requires tokenizers<0.20.0,>=0.19.1, but you have tokenizers 0.20.1 which is incompatible.


# 2. Evaluating LLM Performance with Phi3
  - English language models generally outperform Korean language models when utilized via HuggingFace.
  - Prior to committing to the development of a comprehensive Retrieval-Augmented Generation (RAG) pipeline, it is prudent to first validate the effectiveness of the following code snippet.
  - Following this, we will test the quantized version to assess the performance improvement.

In [2]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
)

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
chat_model = ChatHuggingFace(llm=llm)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
ai_message = chat_model.invoke("what is huggingface?")

You are not running the flash-attention implementation, expect numerical differences.


In [None]:
ai_message.content

"<|user|>\nwhat is huggingface?<|end|>\n<|assistant|>\n Hugging Face is a company and community that provides tools for building and deploying state-of-the-art machine learning models, particularly in the field of natural language processing (NLP). It was founded by Ilya Sutskever, Opus Yang, and Ilya Polosukhin. The most notable tool provided by Hugging Face is the Transformers library, which offers pre-trained models like BERT, GPT-2, and T5, along with utilities to fine-tune these models on custom datasets.\n\nThe Hugging Face ecosystem includes:\n\n1. **Transformers**: A Python package providing thousands of pre-trained models to perform tasks such as text classification, question answering, summarization, translation, and more.\n\n2. **Datasets**: A collection of publicly available datasets used for training and evaluating NLP models.\n\n3. **Training**: Tools and resources to help users train their own models using the pre-trained models from the Transformers library.\n\n4. **Mod

In [None]:
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
)

In [None]:
quantized_llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
    model_kwargs={"quantization_config": quantization_config},
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [None]:
quantized_chat_model = ChatHuggingFace(llm=quantized_llm)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
quantized_ai_message = quantized_chat_model.invoke("what is huggingface?")

In [None]:
quantized_ai_message.content

'<|user|>\nwhat is huggingface?<|end|>\n<|assistant|>\n Hugging Face is a company and community that provides tools for building AI models, particularly in the field of natural language processing (NLP). It offers an extensive library of pre-trained models and APIs that can be used to perform various NLP tasks such as text generation, translation, summarization, and more. The company also focuses on making AI accessible to developers by providing user-friendly interfaces and documentation.'

# 3. Try a Korean model to build a full RAG pipeline

In [None]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

quantized_korean_llm = HuggingFacePipeline.from_model_id(
    model_id="upstage/SOLAR-10.7B-v1.0",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
    model_kwargs={"quantization_config": quantization_config},
)

`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]



In [None]:
quantized_korean_llm = ChatHuggingFace(llm=quantized_korean_llm)

In [None]:
korean_ai_message = quantized_korean_llm.invoke("인프런은 어떤 회사인가요?")

No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.


In [None]:
korean_ai_message.content

"<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n인프런은 어떤 회사인가요? [/INST]\n\n[INST]\n인프런은 2019년 1월 1일에 ㈜인프런이 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프런을 ㈜인프�"

# 4. Insert data into a local Chroma vector store

In [None]:
from langchain_community.document_loaders import Docx2txtLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=200,
)

loader = Docx2txtLoader('./tax_with_markdown.docx')
document_list = loader.load_and_split(text_splitter=text_splitter)

In [None]:
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embedding = HuggingFaceEmbeddings(model_name='snunlp/KR-SBERT-V40K-klueNLI-augSTS')

In [None]:
from langchain_chroma import Chroma

# 데이터를 처음 저장할 때
# database = Chroma.from_documents(documents=document_list, embedding=embedding, collection_name='chroma-tax', persist_directory="./chroma_markdown")

# 이미 저장된 데이터를 사용할 때
database = Chroma(collection_name='chroma-tax', persist_directory="./chroma_markdown", embedding_function=embedding)

# 5. Use `create_retrieval_chain` to create a full RAG pipeline
 - This example only retrieves one chunk due to the token limit

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain import hub

retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
retriever = database.as_retriever(search_kwargs={"k": 1})
combine_docs_chain = create_stuff_documents_chain(
    quantized_korean_llm, retrieval_qa_chat_prompt
)
retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)

  prompt = loads(json.dumps(prompt_object.manifest))


In [None]:
rag_chain_message = retrieval_chain.invoke({"input": "연봉 5천만원인 직장인의 소득세는 얼마인가요?"})

No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.


In [None]:
rag_chain_message

{'input': '연봉 5천만원인 직장인의 소득세는 얼마인가요?',
 'context': [Document(metadata={'source': './tax_with_markdown.docx'}, page_content='④ 제1항에도 불구하고 추가 납부세액이 10만원을 초과하는 경우 원천징수의무자는 해당 과세기간의 다음 연도 2월분부터 4월분의 근로소득을 지급할 때까지 추가 납부세액을 나누어 원천징수할 수 있다.<신설 2015. 3. 10.>\n\n[전문개정 2010. 12. 27.]\n\n\n\n제137조의2(2인 이상으로부터 근로소득을 받는 사람에 대한 근로소득세액의 연말정산) ① 2인 이상으로부터 근로소득을 받는 사람(일용근로자는 제외한다)이 대통령령으로 정하는 바에 따라 주된 근무지와 종된 근무지를 정하고 종된 근무지의 원천징수의무자로부터 제143조제2항에 따른 근로소득 원천징수영수증을 발급받아 해당 과세기간의 다음 연도 2월분의 근로소득을 받기 전에 주된 근무지의 원천징수의무자에게 제출하는 경우 주된 근무지의 원천징수의무자는 주된 근무지의 근로소득과 종된 근무지의 근로소득을 더한 금액에 대하여 제137조에 따라 소득세를 원천징수한다.\n\n② 제1항에 따라 근로소득 원천징수영수증을 발급하는 종된 근무지의 원천징수의무자는 해당 근무지에서 지급하는 해당 과세기간의 근로소득금액에 기본세율을 적용하여 계산한 종합소득산출세액에서 제134조제1항에 따라 원천징수한 세액을 공제한 금액을 원천징수한다.\n\n③ 제150조제3항에 따라 납세조합에 의하여 소득세가 징수된 제127조제1항제4호 각 목에 따른 근로소득과 다른 근로소득이 함께 있는 사람(일용근로자는 제외한다)에 대한 근로소득세액의 연말정산에 관하여는 제1항 및 제2항을 준용한다.\n\n[본조신설 2010. 12. 27.]\n\n\n\n제138조(재취직자에 대한 근로소득세액의 연말정산) ① 해당 과세기간 중도에 퇴직하고 새로운 근무지에 취직한 근로소득자가 종전 근무지에서 해당 과세기간의 1월부터 퇴직한 날이