# Adaptive RAG -- With local LLMs

[link to repo](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_adaptive_rag_local.ipynb)

### LLMs
**Local Embeddings**

Using `GPT4AllEmbeddings()` from Nomic



**Local LLM**

Using `llama3`

In [1]:
# llama 3 model
local_llm = "llama3"

### Index

Documents will be loaded from multiple sources:
- JSON file
- HTML sites
- PDF files
- more...

Each of these will be loaded through a document loader. Once loaded they will be embedded.

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

ImportError: cannot import name 'LangSmithParams' from 'langchain_core.language_models.chat_models' (/Users/seanschumacher/opt/miniconda3/envs/llama-rag/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py)

Lets load all of the documents from AIM

In [3]:
import requests 
from bs4 import BeautifulSoup
import re

In [6]:
# dotenv
import dotenv
dotenv.load_dotenv()

True

In [4]:
base_url = "https://www.faa.gov/air_traffic/publications/atpubs/aim_html/"

response = requests.get(base_url)
soup = BeautifulSoup(response.content, "html.parser")

links = soup.find_all('a', href=True)
subpages = set()

for link in links:
        href = link['href']
        if re.search(r'(chap|appendix)', href, re.IGNORECASE):  # Adjust regex to match 'chapter' or 'appendix'
                full_url = f"{base_url.rstrip('/')}/{href.lstrip('/')}"
                subpages.add(full_url)

subpages


{'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./appendix_1.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./appendix_2.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./appendix_3.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./appendix_4.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./appendix_5.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./chap0_cfr.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./chap0_chap0_policy.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./chap0_faa_desc.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./chap0_info_eoc.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./chap0_subscription_info.html',
 'https://www.faa.gov/air_traffic/publications/atpubs/aim_html/./chap10_section_1.html',
 'https://www.faa.gov/air_traffic/publications/atpubs

In [5]:
docs = [WebBaseLoader(url).load() for url in subpages]
docs_list = [item for sublist in docs for item in sublist]

In [7]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

In [3]:
persist_directory = "../dev-playground"

# create and load chroma vector store
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=GPT4AllEmbeddings(),
    persist_directory=persist_directory
)

# persist the vector store to the directory
vectorstore.persist()

retriever = vectorstore.as_retriever()

NameError: name 'Chroma' is not defined

### LLMs

#### Router

This will route the question to the necessary agent (RAG or web-search)

In [2]:
### Router

from langchain.prompts import PromptTemplate
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import JsonOutputParser

### LLM for router
llm = ChatOllama(model=local_llm, format="json", temperature=0)

# Prompt template instructing llm how to route questions
prompt = PromptTemplate(
    template="""You are an expert at routing a user question to a vectorstore or web search. \n
    Use the vectorstore for questions on LLM  agents, prompt engineering, and adversarial attacks. \n
    You do not need to be stringent with the keywords in the question related to these topics. \n
    Otherwise, use web-search. Give a binary choice 'web_search' or 'vectorstore' based on the question. \n
    Return the a JSON with a single key 'datasource' and no premable or explanation. \n
    Question to route: {question}""",
    input_variables=["question"],
)

question_router = prompt | llm | JsonOutputParser() # this is the chain

question = "instrument approach"

docs = retriever.get_relevant_documents(question)
doc_text = docs[1].page_content
print(question_router.invoke({"question": question}))

NameError: name 'retriever' is not defined