# RAG (Retrieval Augmented Generation)
## Definition
- **NicoìŒ¤**: A technique using all retrieving data from some private sources, we can increase the capacity of the LLM.
- **ChatGPT**: A method where a language model first looks up relevant information from a large database and then uses that information to generate more accurate and context-aware responses to user queries. It combines the strengths of both searching for facts and creating natural-sounding text.

### UnstructuredFiledLoader
can download any format of files

### Tiktoken
`tiktoken` is a fast **BPE** tokeniser for use with OpenAI's models.
- **[Byte Pair Encoding](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.CharacterTextSplitter.html#langchain_text_splitters.character.CharacterTextSplitter.from_tiktoken_encoder)** is an algorithm, first described in 1994 by Philip Gage for encoding strings of text into tabular form for use in downstream modeling. Its modification is notable as the large language model tokenizer with an ability to combine both tokens that encode single characters (including single digits or single punctuation marks) and those that encode whole words (even the longest compound words). This modification, in the first step, assumes all unique characters to be an initial set of 1-character long n-grams (i.e. initial "tokens"). Then, successively the most frequent pair of adjacent characters is merged into a new, 2-character long n-gram and all instances of the pair are replaced by this new token. This is repeated until a vocabulary of prescribed size is obtained

In [1]:
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

# splitter = RecursiveCharacterTextSplitter(
#     separators="\n",
#     chunk_overlap = 50,
#     chunk_size=200
# )

# from_tiktoken_encoder: splitting the text as what the model would treat the text messages is a better practice. 
splitter = CharacterTextSplitter.from_tiktoken_encoder(
    separator="\n",
    chunk_overlap = 50,
    chunk_size=200
)

loader = UnstructuredFileLoader("files/example.md")

docs = loader.load_and_split(text_splitter=splitter)

## Embedding


In [2]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

#No caching
embedder = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embedder)

In [3]:
results = vectorstore.similarity_search("How can i install TechGuide?")


In [4]:
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore

#Caching then store in vectorstore
cache_dir = LocalFileStore("../.cache/")
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    embedder, cache_dir
)

vectorstore = Chroma.from_documents(docs, cached_embedder)

In [5]:
results = vectorstore.similarity_search("How can i install TechGuide?")

print(results)

[Document(page_content="- Processor: Intel i5 or equivalent\n- Memory: 8GB RAM\n- Storage: 50GB free disk space\n- Network: Stable internet connection\nInstallation\nDownload the Installer: Visit the official TechGuide website and download the appropriate installer for your operating system.\nRun the Installer: Double-click the downloaded file and follow the on-screen instructions to install TechGuide on your machine.\nFirst-Time Setup: After installation, launch TechGuide. You will be prompted to create an account or log in with existing credentials. Follow the setup wizard to configure your initial settings.\nConfiguration\nOnce TechGuide is installed, you need to configure it to connect with your infrastructure. Follow these steps:\n1. Cloud Provider Integration: Navigate to the 'Integrations' section in the settings menu. Here, you can connect your AWS, Azure, or Google Cloud accounts by entering your credentials.", metadata={'source': 'files/example.md'}), Document(page_content="-

## Retrieval
### LangChain Expression Language (LCEL)
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Retrievers can be created from vector stores, but are also broad enough to include Wikipedia search and Amazon Kendra.

Retrievers accept a string query as input and return a list of Document's as output.

** 

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePathThrough

llm = ChatOpenAI(
    temperature=0.1,
)
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer questions using only following context. If you don't know the answer just say you don't know, don't make it up:\n\n{context}"),
    ("human", "{question}")
])

retriever = vectorstore.as_retriever()

chain = {"context":retriever, "question": RunnablePathThrough() } | prompt | llm

chain.invoke(question = "How can i install TechGuide?")