# RAG Flow

Load Data -> Docs -> Divide docs into chunks -> Convert chunks into vector (Vector Embedding) -> Store vectors into VectorStore

1. Data Loader - Loading the data
2. Data Tranformation -> Divide the loaded data into chunks
3. Data Embedding -> Convert chunks data into vectors 
4. Vector Store -> Store the vectors into DB

In [1]:
from langchain_community.document_loaders import WebBaseLoader 
import bs4 ## Beautiful Scrap -> For WebScrapping

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [2]:
loader = WebBaseLoader("https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/")
documents = loader.load()
documents

[Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/', 'title': 'Output Parsers | 🦜️🔗 LangChain', 'description': 'Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.', 'language': 'en'}, page_content="\n\n\n\n\nOutput Parsers | 🦜️🔗 LangChain\n\n\n\n\n\n\n\nSkip to main contentThis is documentation for LangChain v0.1, which is no longer actively maintained. Check out the docs for the latest version here.ComponentsIntegrationsGuidesAPI ReferenceMorePeopleVersioningContributingTemplatesCookbooksTutorialsYouTubev0.1Latestv0.2v0.1🦜️🔗LangSmithLangSmith DocsLangServe GitHubTemplates GitHubTemplates HubLangChain HubJS/TS Docs💬SearchModel I/OPromptsChat modelsLLMsOutput parsersQuickstartOutput ParsersCustom Output ParserstypesRetrievalDocument loadersText splittersEmbedding modelsVector storesRetrieversIn

In [3]:
## Text Splitter 
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap = 200) # why chunk_overlap 
splitted_documents = text_splitter.split_documents(documents)
splitted_documents

[Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/', 'title': 'Output Parsers | 🦜️🔗 LangChain', 'description': 'Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.', 'language': 'en'}, page_content='Output Parsers | 🦜️🔗 LangChain'),
 Document(metadata={'source': 'https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/', 'title': 'Output Parsers | 🦜️🔗 LangChain', 'description': 'Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.', 'language': 'en'}, page_content='Skip to main contentThis is documentation for LangChain v0.1, which is no longer actively maintained. Check out the docs for the latest version here.ComponentsIntegrationsGuidesA

## 3. Embeddings

In [4]:
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model="gemma2:2b") # Default -> Llama2
embeddings

  embeddings = OllamaEmbeddings(model="gemma2:2b") # Default -> Llama2


OllamaEmbeddings(base_url='http://localhost:11434', model='gemma2:2b', embed_instruction='passage: ', query_instruction='query: ', mirostat=None, mirostat_eta=None, mirostat_tau=None, num_ctx=None, num_gpu=None, num_thread=None, repeat_last_n=None, repeat_penalty=None, temperature=None, stop=None, tfs_z=None, top_k=None, top_p=None, show_progress=False, headers=None, model_kwargs=None)

## 4. VectorStore

In [5]:
from langchain_community.vectorstores import FAISS 
faiss_db = FAISS.from_documents(documents=splitted_documents, embedding=embeddings)
faiss_db

<langchain_community.vectorstores.faiss.FAISS at 0x11ac57230>

In [6]:
query = "There are reasonable limits to concurrent request"
result = faiss_db.similarity_search(query)
context = result[0].page_content
context

'Skip to main contentThis is documentation for LangChain v0.1, which is no longer actively maintained. Check out the docs for the latest version here.ComponentsIntegrationsGuidesAPI ReferenceMorePeopleVersioningContributingTemplatesCookbooksTutorialsYouTubev0.1Latestv0.2v0.1🦜️🔗LangSmithLangSmith DocsLangServe GitHubTemplates GitHubTemplates HubLangChain HubJS/TS Docs💬SearchModel I/OPromptsChat modelsLLMsOutput parsersQuickstartOutput ParsersCustom Output ParserstypesRetrievalDocument loadersText splittersEmbedding modelsVector storesRetrieversIndexingCompositionToolsAgentsChainsMoreComponentsThis is documentation for LangChain v0.1, which is no longer actively maintained.For the current stable version, see this version (Latest).Model I/OOutput parsersOutput ParsersOutput parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.Besides having a large'

In [7]:
## LLM with RAG

from langchain_core.prompts import ChatPromptTemplate 

prompt = ChatPromptTemplate.from_template(
    """
    Answer the following question based only on the provided context:
    <context>{context}</context>    
    """
)

### create_stuff_documents_chain

In [9]:
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain=create_stuff_documents_chain(model,prompt)
document_chain

NameError: name 'model' is not defined