# RAG (Retrival Aumentated Generation)
- LLM이 사용자에게 답변을 보다 정확하고 풍부하게 제공할 수 있도록, context를 제공하는 방법이다.
    - 사용자가 query와 context를 LLM 모델에 제공한다.
    - 모델은 기존에 학습된 데이터와 context를 고려하여, query에 대한 답변을 제공한다.
- RAG를 수행하기 위하여, 아래의 component가 존재한다.
    - DataLoader: 사용자로부터 Context(Document)를 로드하는 역할
    - Splitter: 제공받은 Context를 여러 chunk로 분할하는 역할 -> query와 관련성이 높은 일부 context만 가져올 수 있다 -> embedding에 필요한 token을 절약
    - Embedding: 분할한 document chunk를 벡터로 변환하는 역할 -> 벡터를 이용하여, query와 context 간의 관계를 고려할 수 있다.
    - Vector store: 임베딩 한 context를 저장하는 공간 -> query와 유사한 context가 무엇인지 vector를 통해 계산할 수 있다. + 캐싱을 통한 검색 시간 최소화

## DataLoader and Splitter

In [7]:
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

### UnstructuredFileLoader
- UnstructuredFileLoader 클래스를 이용하여, 파일 확장자에 관계없이 일관성 있게 document를 로드할 수 있다!

In [5]:
# data reference: http://www.george-orwell.org/1984/2.html
loader = UnstructuredFileLoader("./files/george-orwell_chap3.txt")

loader.load()

[nltk_data] Downloading package punkt to
[nltk_data]     /home/jihoahn9303/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jihoahn9303/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


[Document(page_content="Winston was dreaming of his mother.\n\nHe must, he thought, have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he remembered more vaguely as dark and thin, dressed always in neat dark clothes (Winston remembered especially the very thin soles of his father's shoes) and wearing spectacles. The two of them must evidently have been swallowed up in one of the first great purges of the fifties.\n\nAt this moment his mother was sitting in some place deep down beneath him, with his young sister in her arms. He did not remember his sister at all, except as a tiny, feeble baby, always silent, with large, watchful eyes. Both of them were looking up at him. They were down in some subterranean place -- the bottom of a well, for instance, or a very deep grave -- but it was a place which, already far below him, was itself moving downwards. They were in the

In [8]:
splitter = RecursiveCharacterTextSplitter()

loader.load_and_split(text_splitter=splitter)

[Document(page_content="Winston was dreaming of his mother.\n\nHe must, he thought, have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he remembered more vaguely as dark and thin, dressed always in neat dark clothes (Winston remembered especially the very thin soles of his father's shoes) and wearing spectacles. The two of them must evidently have been swallowed up in one of the first great purges of the fifties.\n\nAt this moment his mother was sitting in some place deep down beneath him, with his young sister in her arms. He did not remember his sister at all, except as a tiny, feeble baby, always silent, with large, watchful eyes. Both of them were looking up at him. They were down in some subterranean place -- the bottom of a well, for instance, or a very deep grave -- but it was a place which, already far below him, was itself moving downwards. They were in the

- 텍스트 분할 시, chunk의 크기를 조절할 수 있다.
- 이때, context가 문장 단위로 분할되지 않을 수 있다.
- 따라서, chunk_overlap 파라미터를 통하여, 이전 document의 끝 부분에서 일부 내용을 가져올 수 있도록 한다. 

In [10]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=200
)

loader.load_and_split(text_splitter=splitter)

[Document(page_content='Winston was dreaming of his mother.', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='He must, he thought, have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='he thought, have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he remembered', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he remembered more vaguely', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='been ten or 

In [13]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=50
)

loader.load_and_split(text_splitter=splitter)

[Document(page_content='Winston was dreaming of his mother.', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='He must, he thought, have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content="and magnificent fair hair. His father he remembered more vaguely as dark and thin, dressed always in neat dark clothes (Winston remembered especially the very thin soles of his father's shoes) and", metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content="the very thin soles of his father's shoes) and wearing spectacles. The two of them must evidently have been swallowed up in one of the first great purges of the fifties.", metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='At this moment his mother was sitting in som

### CharacterTextSplitter

- CharacterTextSplitter 클래스를 활용하여, 텍스트 구분 단위(separator)를 추가로 지정하는 것이 가능하다.
- 문장 혹은 문단 단위로 document를 분할할 수 있도록 도움을 줌

In [14]:
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=500,
    chunk_overlap=100
)

loader.load_and_split(text_splitter=splitter)

Created a chunk of size 1132, which is longer than the specified 500
Created a chunk of size 1322, which is longer than the specified 500
Created a chunk of size 763, which is longer than the specified 500
Created a chunk of size 716, which is longer than the specified 500
Created a chunk of size 805, which is longer than the specified 500
Created a chunk of size 1113, which is longer than the specified 500
Created a chunk of size 993, which is longer than the specified 500
Created a chunk of size 1037, which is longer than the specified 500
Created a chunk of size 1285, which is longer than the specified 500
Created a chunk of size 847, which is longer than the specified 500
Created a chunk of size 992, which is longer than the specified 500
Created a chunk of size 1447, which is longer than the specified 500


[Document(page_content='Winston was dreaming of his mother.', metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content="He must, he thought, have been ten or eleven years old when his mother had disappeared. She was a tall, statuesque, rather silent woman with slow movements and magnificent fair hair. His father he remembered more vaguely as dark and thin, dressed always in neat dark clothes (Winston remembered especially the very thin soles of his father's shoes) and wearing spectacles. The two of them must evidently have been swallowed up in one of the first great purges of the fifties.", metadata={'source': './files/george-orwell_chap3.txt'}),
 Document(page_content='At this moment his mother was sitting in some place deep down beneath him, with his young sister in her arms. He did not remember his sister at all, except as a tiny, feeble baby, always silent, with large, watchful eyes. Both of them were looking up at him. They were down in some subterranean pla

## Embedding and Vector store

In [1]:
from langchain.embeddings import OpenAIEmbeddings

embedder = OpenAIEmbeddings()

In [2]:
embedder.embed_query("Hello!")

[-0.019001544776856838,
 -0.011228185719292232,
 -0.012974791745159937,
 -0.02391507586352323,
 -0.034778586407903316,
 0.02212368439911965,
 -0.003058161023061777,
 0.0025383377567286484,
 -0.004500870914615603,
 -0.013384253289930498,
 0.03969211376927952,
 -0.013128339824448897,
 -0.015968974169020676,
 -0.022328415171504933,
 0.006010758032650689,
 -0.016647144619716284,
 0.027101193619325805,
 -0.006487395872579968,
 0.018067461792002176,
 -0.012955598118833497,
 -0.024170988397682284,
 0.004574445861318586,
 0.007766961337342886,
 -0.015137257501681198,
 -0.01555951355557266,
 -0.005921188552562765,
 0.0014275150149889405,
 -0.01930864093543476,
 0.029762688072399193,
 -0.03523922971444765,
 0.00932803068281302,
 0.002640702910090653,
 -0.0066217503255424906,
 -0.012034311505744822,
 -0.0029014143164925914,
 -0.03411321232897369,
 -0.0004358519369583586,
 -0.005035089633524196,
 0.021867770002315507,
 -0.01605854504609242,
 0.0237871195964437,
 0.008880182816712125,
 0.0048911383

In [4]:
vector = embedder.embed_documents(
    [
        "Hi,",
        "how ",
        "are ",
        "you?"
    ]
)

print(len(vector), len(vector[0]))

4 1536


### 캐싱 기능을 활용한 Vector store 구축

In [13]:
from langchain.vectorstores.chroma import Chroma
from langchain.embeddings import OpenAIEmbeddings, CacheBackedEmbeddings
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.storage import LocalFileStore


In [14]:
loader = UnstructuredFileLoader("./files/george-orwell_chap3.txt")

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",
    chunk_size=500,
    chunk_overlap=100
)

docs = loader.load_and_split(text_splitter=splitter)

embeddings = OpenAIEmbeddings()

cache_dir = LocalFileStore("./.cache/")  # 캐싱할 공간을 생성함

cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embeddings,    # 임베딩 모델과
    document_embedding_cache=cache_dir   # 캐싱할 공간이 필요함!
)

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=cached_embeddings   # 캐싱이 된 임베딩을 이용할 수 있음!!
)

In [15]:
vectorstore.similarity_search(
    query="Did Oceania be in alliance with Eurasia?"
)

[Document(page_content='Since about that time, war had been literally continuous, though strictly speaking it had not always been the same war. For several months during his childhood there had been confused street fighting in London itself, some of which he remembered vividly. But to trace out the history of the whole period, to say who was fighting whom at any given moment, would have been utterly impossible, since no written record, and no spoken word, ever made mention of any other alignment than the existing one. At this moment, for example, in 1984 (if it was 1984), Oceania was at war with Eurasia and in alliance with Eastasia. In no public or private utterance was it ever admitted that the three powers had at any time been grouped along different lines. Actually, as Winston well knew, it was only four years since Oceania had been at war with Eastasia and in alliance with Eurasia. But that was merely a piece of furtive knowledge which he happened to possess because his memory was

- 앞서 캐싱 기반 Vector store를 구축하였기 때문에, 동일한 document를 임베딩하지 않는다!
- 따라서, 동일한 코드를 실행할 떄, 실행 시간이 0임을 확인

In [16]:
loader = UnstructuredFileLoader("./files/george-orwell_chap3.txt")

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",
    chunk_size=500,
    chunk_overlap=100
)

docs = loader.load_and_split(text_splitter=splitter)

embeddings = OpenAIEmbeddings()

cache_dir = LocalFileStore("./.cache/") 

cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embeddings,    
    document_embedding_cache=cache_dir   
)

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=cached_embeddings   
)

# Custom Chain 
- Stuff
- Map reduce
- Map Re-rank
- Refine

In [21]:
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings, CacheBackedEmbeddings
from langchain.storage import LocalFileStore
from langchain.vectorstores.chroma import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough

## Stuff

In [20]:
loader = UnstructuredFileLoader('./files/george-orwell_chap3.txt')

splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator='\n',
    chunk_size=500,
    chunk_overlap=100
)

docs = loader.load_and_split(text_splitter=splitter)

embedding = OpenAIEmbeddings()

local_storage = LocalFileStore('./.cache/')

cached_embeddings = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings=embedding,
    document_embedding_cache=local_storage
)

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=cached_embeddings
)

llm = ChatOpenAI(temperature=0.2)


In [26]:
retriver = vectorstore.as_retriever()

prompt = ChatPromptTemplate.from_messages(
    messages = [
        (
            "system", 
            """
            You are a helpful assistant. Answer questions using only the following context.
            If you don't know the answer, just say you don't know, and don't make it up:\n\n{context}
            """  
        ),
        ("human", "{question}")
    ]
    
)

chain = (
    {
        "context": retriver,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
)

chain.invoke("Please explain how Winston feel now.")

AIMessage(content='Winston is feeling a mix of emotions. He is saddened and empathetic towards the old man who is suffering from genuine and unbearable grief. Winston also feels a sense of understanding about the terrible event that has occurred, possibly the loss of someone the old man loved, like a granddaughter.')

## Map reduce

1. Send query to retriever

2. Retriever will return list of docs that may be relavant with query

3. for doc in list of docs | prompt | llm

4. for response in list of llm's response | put them all together (final doc)

5. final doc | prompt | llm