In [1]:
import logging
import sys
import os

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from dotenv import load_dotenv
load_dotenv()

True

# RAG: LLamaIndex Basic Tutorial

#### Llama index란
* LlamaIndex는 대규모 자연어 모델(LLM)과 함께 작동하는 데이터 프레임워크

* 다양한 응용 프로그램, 예를 들어 질문 응답 시스템, 대화형 챗봇 또는 RAG를 위해 LLM과 함께 사용할 수 있습니다. 

* 검색 조작 증강(RAG) 메커니즘을 통해 LLM의 능력을 강화하고 사용자의 자연어 질문에 대한 응답을 생성합니다.

* LlamaIndex is a data framework that works with large natural language models (LLMs)

* It can be used with LLM for a variety of applications, such as question answering systems, interactive chatbots, or RAGs.

* We enhance LLM's capabilities through search manipulation augmentation (RAG) mechanisms and generate responses to users' natural language questions.

#### LlamaIndex vs Langchain

| 특징              | LlamaIndex                                             | Langchain                                                                        |
|------------------|-------------------------------------------------------|----------------------------------------------------------------------------------|
| 주요 목적         | 검색 및 검색 작업                                      | 대규모 언어 모델(LLMs)을 활용한 응용 프로그램 구축                                 |
| 강점              | 빠르고, 효율적이며, 정확하고, 대규모 데이터 세트에 이상적이며, 간단한 인터페이스 제공 | 유연하며, 다재다능하고, 맞춤 설정이 가능하며, 다양한 LLM을 지원하고, 고급 컨텍스트 인식 기능 제공 |
| 약점              | 검색 및 검색 작업에 한정되며, 맞춤 설정에서의 유연성이 떨어짐                   | 학습 곡선이 가파르고, 초보자에게 복잡하며, 복잡한 응용 프로그램에 대해 자원 집약적일 수 있음          |
| 사용 사례         | 문서 검색, 코드 생성, 고객 서비스 챗봇, 콘텐츠 필터링                           | 챗봇, 가상 비서, 지식 기반, 개인화된 학습 플랫폼, 창의적 글쓰기 도구                          |
| 사용의 용이성     | 보통                                                    | 쉬움                                                                              |
| 문서화            | 좋음                                                    | 광범위                                                                            |
| 비용              | 무료                                                    | 무료                                                                              |

| Feature          | LlamaIndex                                               | Langchain                                                                          |
|------------------|----------------------------------------------------------|------------------------------------------------------------------------------------|
| Primary Purpose  | Search and retrieval tasks                               | Building applications powered by large language models (LLMs)                      |
| Strengths        | Fast, efficient, accurate, ideal for large data sets, simple interface | Flexible, versatile, customizable, supports diverse LLMs, advanced context-awareness |
| Weaknesses       | Limited to search and retrieval tasks, less flexibility in customization | Steeper learning curve, complex for beginners, resource-intensive for complex applications |
| Use Cases        | Document search, code generation, customer service chatbots, content filtering | Chatbots, virtual assistants, knowledge bases, personalized learning platforms, creative writing tools |
| Ease of Use      | Moderate                                                 | Easy                                                                               |
| Documentation    | Good                                                     | Extensive                                                                          |
| Cost             | Free                                                     | Free                                                                               |


LlamaIndex is optimized for indexing and retrieval, making it ideal for applications that demand high efficiency in these areas.   
It is a go-to choice for applications that require efficient search and retrieval. 

On the other hand, Langchain is a comprehensive framework that offers a broader range of functionalities compared to LlamaIndex, which is more focused and streamlined.   
Langchain is more flexible and customizable, allowing users to customize the application according to their needs.  
It is particularly favored by those seeking a robust and versatile environment for their AI-driven projects.

![RAG 기본 시나리오](https://miro.medium.com/v2/resize:fit:1400/1*tAGA8bIvsul5hNyUXyib7w.png)

## Table of Contents
1. [Load Document](#Load)

2. [Load VectorStore](#paragraph1)

3. [Basic Retriever](#paragraph2)

4. [Query Engine](#paragraph3)

5. [Custom Prompt](#paragraph4)

6. [Query LLM](#paragraph5)


In [None]:
import os

os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

## 1. Load Document <a name="Load"></a>

아래 예제에서는 PDF 문서파일을 통해 지식기반 검색을 수행합니다.

* 먼저 `SimpleDirectoryReader` 를 통해 문서를 로드합니다.

* `SimpleDirectoryReader`는 로컬 파일의 데이터를 LlamaIndex로 로드하는 가장 간단한 방법

* 기본적으로 `SimpleDirectoryReader` 찾은 모든 파일을 읽으려고 시도하여 모두 텍스트로 처리
    - .csv - 쉼표로 구분된 값

    - .docx - 마이크로소프트 워드
    - .epub - EPUB 전자책 형식
    - .hwp - 한글 워드 프로세서
    - .ipynb - 주피터 노트북
    - .jpeg, .jpg - JPEG 이미지
    - .mbox - MBOX 이메일 아카이브
    - .md - 마크다운
    - .mp3, .mp4 - 오디오 및 비디오
    - .pdf - 휴대용 문서 형식
    - .png - 휴대용 네트워크 그래픽
    - .ppt, .pptm, .pptx - Microsoft PowerPoint

#### Example

```python
from llama_index.core import SimpleDirectoryReader

SimpleDirectoryReader(
    input_files=["path/to/file1", "path/to/file2"],
    exclude=["path/to/file1", "path/to/file2"],
    required_exts=[".pdf", ".docx"],
    num_files_limit=100, # 로드할 최대 파일 수를 설정할
    encoding="latin-1",
)

"""메타데이터 추출"""
def get_meta(file_path):
    return {"foo": "bar", "file_path": file_path}

SimpleDirectoryReader(input_dir="path/to/directory", file_metadata=get_meta)


"""외부 파일 시스템 지원"""
from s3fs import S3FileSystem

s3_fs = S3FileSystem(key="...", secret="...")
bucket_name = "my-document-bucket"

reader = SimpleDirectoryReader(
    input_dir=bucket_name,
    fs=s3_fs,
    recursive=True,  # recursively searches all subdirectories
)

documents = reader.load_data()
```

#### Download articles
> TechCrunch Article

In [1]:
!wget -q https://github.com/kairess/toy-datasets/raw/master/techcrunch-articles.zip
!unzip -q techcrunch-articles.zip -d articles

In [2]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="articles")
docs = reader.load_data()

In [3]:
print(f"Count of Techcrunch articles: {len(docs)}")
print(docs[0])

Count of Techcrunch articles: 21
Doc ID: 48151188-fb25-49c3-8ced-338e75175f9d
Text: Signaling that investments in the supply chain sector remain
robust, Pando, a startup developing fulfillment management
technologies, today announced that it raised $30 million in a Series B
round, bringing its total raised to $45 million.  Iron Pillar and
Uncorrelated Ventures led the round, with participation from existing
investors Nexus Vent...


## 2. Load한 Document를 VectorStore에 로드(Text to Vector) <a name="paragraph1"></a>

* `VectorStoreIndex`는 LlamaIndex에서 사용하는 데이터베이스 형식으로, 문서를 벡터로 변환합니다.

* `VectorStoreIndex`는 문서가 벡터 인덱싱 되어진 스토어 입니다.
* `VectorStoreIndex`를 통해 문서를 벡터로 변환할 때는 `Embed Model`를 통해 문서를 벡터로 변환합니다.
*  기본적인 `Embed Model`은 OpenAI의 `text-embedding-ada-002` 입니다. (Dimesion: 1536)
*  `VectorStoreIndex`로 document의 text가 임베딩될때 기본 청크 크기는 1024, 기본 청크 중첩은 20

## 2. Loaded Document to VectorStore (Text to Vector) 

* 'VectorStoreIndex' is a database format used by LlamaIndex that converts documents into vectors.

* 'VectorStoreIndex' is the store where the document is vector indexed.
* When converting a document into a vector through 'VectorStoreIndex', the document is converted into a vector through 'Embed Model'.
*  The basic 'Embed Model' is OpenAI's 'text-embedding-ada-002'. (Dimesion: 1536)
*  When the text of the document is embedded with 'VectorStoreIndex', the default chunk size is 1024, and the default chunk overlap is 20

##### 방법1: Documents에서 VectorStoreIndex를 직접로드

In [6]:
from llama_index.core import Settings, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter

# 1. Documents에서 VectorStoreIndex를 직접로드
# 1. Load VectorStoreIndex directly from Documents
index = VectorStoreIndex.from_documents(docs, show_progress=True)

Parsing nodes:   0%|          | 0/21 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/51 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


##### 방법2: `Splitter`(chunk_size, chunk_overlap)와 embed model을 직접선택해서 `VectorStoreIndex`를 로드

`방법2`는 결국 `방법1`과 동일

In [4]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex

# 2. Splitter(chunk_size, chunk_overlap)와 embed model을 직접선택해서 VectorStoreIndex를 로드

# embed_model = HuggingFaceEmbedding(
#     model_name="jhgan/ko-sbert-nli",
#     normalize=True,
# )

embed_model = OpenAIEmbedding()

node_parser = SentenceSplitter(chunk_size=256, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents(docs)
index = VectorStoreIndex(nodes, embed_model=embed_model, show_progress=True)

Generating embeddings:   0%|          | 0/208 [00:00<?, ?it/s]

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## 3. 기본 검색기 <a name="paragraph2"></a>

`similarity_top_k`는 유사도가 가장 높은 상위 k개의 chunk를 가져오는 방법입니다.

In [5]:
base_retriever = index.as_retriever(similarity_top_k=5)

source_nodes = base_retriever.retrieve("What is the CMA generative ai?")

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [10]:
for node in source_nodes:
    # print(node.metadata)
    print(f"---------------------------------------------")
    print(f"Score: {node.score:.3f}")
    print(node.get_content())
    print(f"---------------------------------------------\n\n")

---------------------------------------------
Score: 0.865
The expectation is that the CMA’s Digital Markets Unit, up and running since 2021 in shadow form, will (finally) gain legislative powers in the coming years to apply pro-active “pro-competition” rules which are tailored to platforms that are deemed to have “strategic market status” (SMS). So we can speculate that providers of powerful foundational AI models may, down the line, be judged to have SMS — meaning they could expect to face bespoke rules on how they must operate vis-a-vis rivals and consumers in the U.K. market.

The U.K.’s data protection watchdog, the ICO, also has its eye on generative AI. It’s another existing oversight body which the government has tasked with paying special mind to AI under its plan for context-specific guidance to steer development of the tech through the application of existing laws.

---------------------------------------------


---------------------------------------------
Score: 0.862
Wel

## 4. 쿼리엔진 <a name="paragraph3"></a>

쿼리엔진에서 사용하는 기본 LLM model은 OpenAI `gpt-3.5-turbo`, `temperature` 는 0.1 입니다.

In [11]:
from llama_index.llms.openai import OpenAI

# llm = OpenAI(model="gpt-3.5-turbo",temperature=0)
# llm = OpenAI(model="gpt-4",temperature=0)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=2)

## 5. LLM 커스텀 프롬프트 세팅 <a name="paragraph4"></a>

LLM 커스텀 프롬프트 세팅은 `LlamaIndex`에서 제공하는 `PromptTemplate`을 통해 프롬프트를 세팅합니다.

`PromptTemplate`을 작성할때 `context_str`과 `query_str`은 외부에서 주입해주는 변수입니다.
- context_str: 청크된 문서의 내용
- query_str: 사용자의 질문

In [7]:

from llama_index.core import PromptTemplate
prompt_tmpl_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "You MUST answer in Korean."
    "Query: {query_str}\n"
    "Answer: "
)
prompt_tmpl = PromptTemplate(prompt_tmpl_str)
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": prompt_tmpl}
)

In [None]:
from IPython.display import Markdown, display

def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}<br>" f"**Text:** <br>"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("<br><br>"))


dict = query_engine.get_prompts()
display_prompt_dict(dict)

## 6. 쿼리실행 <a name="paragraph5"></a>

In [12]:
response = query_engine.query("What is the CMA generative ai?")
response.print_response_stream()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The CMA generative AI refers to generative artificial intelligence models that are being reviewed by the Competition and Markets Authority (CMA) in the UK. These models include large language models and generative AI technologies like those powering AI art platforms such as OpenAI’s DALL-E or Midjourney.

In [13]:
for node in response.source_nodes:
    print("-----")
    text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
    print(f"Text:\t {text_fmt} ...")
    print(f"Metadata:\t {node.node.metadata}")
    print(f"Score:\t {node.score:.3f}")

-----
Metadata:	 {'file_path': '/Users/heewungsong/Experiment/Visa_Rag/study/llama-index/Basic Tutorial/articles/05-04-cma-generative-ai-review.txt', 'file_name': '05-04-cma-generative-ai-review.txt', 'file_type': 'text/plain', 'file_size': 7607, 'creation_date': '2024-04-08', 'last_modified_date': '2023-05-08'}
Score:	 0.865
-----
Text:	 Well that was fast. The U.K.’s competition watchdog has announced an initial review of “AI foundational models”, such as the large language models (LLMs) which underpin OpenAI’s ChatGPT and Microsoft’s New Bing. Generative AI models which power AI art platforms such as OpenAI’s DALL-E or Midjourney will also likely fall in scope.  The Competition and Markets Authority (CMA) said its review will look at competition and consumer protection considerations in the development and use of AI foundational models — with the aim of understanding “how foundation models are developing and producing an assessment of the conditions and principles that will best gui