<a href="https://colab.research.google.com/github/rickiepark/MLQandAI/blob/main/supplementary/q18-using-llms/03_retrieval-augmented-generation/retrieval-augmented-generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Llama Index를 사용한 RAG 예제

<img src="https://github.com/rickiepark/MLQandAI/blob/main/supplementary/q18-using-llms/03_retrieval-augmented-generation/images/rag-1.webp?raw=1" width=700>

In [None]:
!pip install llama_index llama-index-embeddings-huggingface llama-index-llms-huggingface

In [None]:
!pip install --quiet watermark

%load_ext watermark
%watermark -p torch,llama_index

### 1) 임베딩 모델과 LLM을 로드합니다:

***mistralai/Mistral-7B-Instruct-v0.2을 사용하려면 먼저 [허깅페이스 사이트](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2])에서 사용 허락을 받아야 합니다.***

In [None]:
from llama_index.core import Settings, PromptTemplate
from llama_index.core.embeddings import resolve_embed_model
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM

import torch

Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.25, "do_sample": False},
    model_name="mistralai/Mistral-7B-Instruct-v0.2",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.2",
    device_map="auto",

    model_kwargs={
        "torch_dtype": torch.bfloat16,
        # 메모리가 제한된 소규모 GPU를 사용하기 때문에 True로 설정합니다.
        # 고성능 GPU를 사용하여 속도를 높이려면 False로 지정하세요.
        "offload_buffers": True,
    }
)

Settings.chunk_size = 512

config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

### 2) 데이터를 로드하고 문서를 읽습니다

In [None]:
!wget https://raw.githubusercontent.com/rickiepark/MLQandAI/refs/heads/main/supplementary/q18-using-llms/03_retrieval-augmented-generation/sample-data/Basic-Scientific-Food-Preparation-Lab-Manual.txt

In [None]:
from pathlib import Path
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

data_dir = "./"

documents = SimpleDirectoryReader(data_dir).load_data()

# 무결성 검사
unique_docs = set(d.metadata["file_name"] for d in documents)
print(f"읽은 문서: {unique_docs}")

Read documents: {'Basic-Scientific-Food-Preparation-Lab-Manual.pdf', 'Basic-Scientific-Food-Preparation-Lab-Manual.txt'}


### 3) 벡터 데이터베이스를 만듭니다

- VectorStoreIndex는 인메모리 벡터 데이터베이스입니다

In [None]:
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
)

num_chunks = len(documents)
print(f"데이터베이스는 {num_chunks}개의 청크로 구성됩니다")

Parsing nodes:   0%|          | 0/252 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/567 [00:00<?, ?it/s]

Database consists of 252 chunks


### 4) 사용자 정의 프롬프트 템플릿을 준비합니다

In [None]:
from llama_index.core import PromptTemplate

template = (
    "We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the question: {query_str}\n"
)
qa_template = PromptTemplate(template)
query_engine = index.as_query_engine()
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_template}
)

### 5) 벡터 데이터베이스에 쿼리를 수행합니다

In [None]:
response = query_engine.query("What is dehydration?")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



Dehydration is a method of preserving food by removing most of the water content. It can be done through various methods such as sun drying, oven drying, or using a dehydrator. Properly dehydrated food can be stored for long periods of time without spoiling and can be rehydrated or used as is for consumption. In this lab, students will be pretreating, drying, storing, and serving some commonly dehydrated fruits and vegetables, and observing the characteristics of various dried fruits.


In [None]:
response = query_engine.query("How many tbsp. butter to use for the Cream Puffs?")
print(response)
# 정답은 2입니다

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



Answer: 2 tbsp. butter are required for the Cream Puffs.


In [None]:
response = query_engine.query("How many tbsp. butter to use for the Cream Puffs?")
print(response)
# 정답은 2입니다

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



Answer: 2 tbsp. butter are required for the Cream Puffs.


In [None]:
response = query_engine.query(
    "What are the toppings for the Braided Bread "
    "of the Braids, Coffeecake, and Sweet Rolls recipe?"
)
print(response)
# 정답은 다음과 같습니다.
# 2 tsp. caraway seeds and ½ cup shredded Cheddar cheese.
# ½ cup diced Swiss cheese and paprika.
# ...

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



The toppings for the Braided Bread of the Braids, Coffeecake, and Sweet Rolls recipe are:

1. 2 tsp. caraway seeds and ½ cup shredded Cheddar cheese.
2. ½ cup diced Swiss cheese and paprika.
