# Advanced RAG on Hugging Face Collections with LangChain, Cohere, and Llama 3

This demo shows how to perform advanced Retrieval Augmented Generation (RAG) on documents contained in a Hugging Face Collection.

For an introduction to advanced RAG, you can check out this [other cookbook](https://huggingface.co/learn/cookbook/en/advanced_rag).

We will use the following tools:

<div>
    <table>
        <tr>
            <th>requirement</th>
            <th>purpose</th>
            <th>link</th>
        </tr>
        <tr>
            <td>LangChain</td>
            <td>LLM workflow framework</td>
            <td><a href="https://python.langchain.com/v0.2/docs/introduction/">docs</a></td>
        </tr>
        <tr>
            <td>Comet LLM</td>
            <td>tracking LLM workflows</td>
            <td><a href="https://www.comet.com/docs/v2/guides/comet-llm/quickstart/">docs</a></td> 
        </tr>
        <tr>
            <td>Unstructured</td>
            <td>document processing</td>
            <td><a href="https://docs.unstructured.io/welcome">docs</a></td> 
        </tr>
        <tr>
            <td>Llama 3 70B Instruct</td>
            <td>synthesizer model</td>
            <td><a href="https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct">docs</a></td> 
        </tr>
        <tr>
            <td>Cohere Reranker</td>
            <td>reranking model</td>
            <td><a href="https://cohere.com/rerank">docs</a></td> 
        </tr>
        <tr>
            <td>Hugging Face Hub API</td>
            <td>interact with the HF Hub</td>
            <td><a href="https://huggingface.co/docs/huggingface_hub/index">docs</a></td> 
        </tr>
        <tr>
            <td>Hugging Face Inference API</td>
            <td>serverless inference for prototyping</td>
            <td><a href="https://huggingface.co/docs/api-inference/index">docs</a></td> 
        </tr>
        <tr>
            <td>Weaviate</td>
            <td>vector database</td>
            <td><a href="https://huggingface.co/docs/api-inference/index">docs</a></td> 
        </tr>
        <tr>
            <td>python-dotenv</td>
            <td>reading environment variables</td>
            <td><a href="https://saurabh-kumar.com/python-dotenv/">docs</a></td> 
        </tr>
    </table>
</div>

## Installs

<ol>
    <li>langchain</li>
    <li>langchain-weaviate</li>
    <li>langchain-cohere</li>
    <li>langchain-huggingface</li>
    <li>langchain-community</li>
    <li>weaviate-client</li>
    <li>comet-llm</li>
    <li>huggingface-hub</li>
    <li>unstructured[all-docs]</li>
</ol>

In [None]:
!pip install langchain huggingface-hub comet-llm "unstructured[all-docs]" weaviate-client langchain-weaviate langchain-cohere langchain-huggingface langchain-community python-dotenv

## Steps

We need to do the following:

<ol>
  <li>Create the <a href="https://huggingface.co/docs/hub/collections">collection</a> on Hugging Face</li>
  <li>Fetch the collection with the Hugging Face Hub API</li>
  <li>Preprocess the documents contained in the collection with Unstructured</li>
  <li>Use Weaviate and LangChain to create a vector store and retriever</li>
  <li>Use Hugging Face's Inference API to synthesize an answer using Meta's Llama 3 70B Instruct</li>
</ol>

## API keys we need

<ol>
    <li><a href="https://www.comet.com/site/">Comet</a></li>
    <li><a href="https://unstructured.io/api-key-free">Unstructured</a></li>
    <li><a href="https://weaviate.io/">Weaviate</a></li>  
    <li><a href="https://cohere.com/">Cohere</a></li>  
</ol>

# Breaking down RAG

Lorem ipsum dolor sit amet

## Retrieval

Lorem ipsum dolor sit amet

### Vector search with Hierarchical Navigable Small Worlds (HNSW)

Lorem ipsum dolor sit amet

### Embedding models

Lorem ipsum dolor sit amet

### Reranking models

Lorem ipsum dolor sit amet

## Synthesis aka Generation

Lorem ipsum dolor sit amet

# Combining LangChain and Comet LLM

See the Comet LLM [docs](https://www.comet.com/docs/v2/guides/comet-llm/integrations/langchain/) for more.

Lorem ipsum dolor sit amet

## What is LangChain

Lorem ipsum dolor sit amet

## What is Comet LLM

Lorem ipsum dolor sit amet

# The Workflow

## Get the collection and download the files

In [None]:
import os
from huggingface_hub import get_collection

ensure the data directory exists

In [None]:
datadir = os.path.join(".", "documents")

if not os.path.isdir(datadir):
    os.mkdir(datadir)

get the collection's files

In [None]:
# get data dir contents
data = os.listdir(datadir)
# get tool use paper collection
collection = get_collection("jxtngx/tool-use-papers-664c6cd9cc9c64354af51e86")
# make arxiv urls
urls = ["".join(["https://arxiv.org/pdf/", c.item_id]) for c in collection.items]

# download files
for url in urls:
    docname = "".join([url.split('/')[-1], ".pdf"])
    if not os.path.exists(os.path.join(datadir, docname)):
        os.system(f"wget -O {os.path.join(datadir, docname)} {url}")

## Prep PDFs with Unstructured

This section takes inspiration from the example in [Building RAG with Custom Unstructured Data](https://github.com/huggingface/cookbook/blob/main/notebooks/en/rag_with_unstructured_data.ipynb), by Maria Khalusova.

See the sections titled `Unstructured data preprocessing` and `Chunking` of that example for a detailed walk through provided by Maria.

In [None]:
import logging
import os

from unstructured.ingest.connector.local import SimpleLocalConfig
from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
from unstructured.ingest.runner import LocalRunner
from unstructured.staging.base import elements_from_json
from unstructured.chunking.title import chunk_by_title
from langchain_core.documents import Document
from dotenv import load_dotenv

load_dotenv()

In [None]:
# Optional cell to reduce the amount of logs
logger = logging.getLogger("unstructured.ingest")

if logger.root.handlers:
    logger.root.removeHandler(logger.root.handlers[0])

In [None]:
output_path = "./local-ingest-output"

runner = LocalRunner(
    processor_config=ProcessorConfig(
        # logs verbosity
        verbose=True,
        # the local directory to store outputs
        output_dir=output_path,
        num_processes=2,
        ),
    read_config=ReadConfig(),
    partition_config=PartitionConfig(
        partition_by_api=True,
        api_key=os.environ["UNSTRUCTURED_API_KEY"],
        ),
    connector_config=SimpleLocalConfig(
        input_path="./documents",
        # whether to get the documents recursively from given directory
        recursive=False,
        ),
    )
runner.run()

In [None]:
elements = []

for filename in os.listdir(output_path):
    filepath = os.path.join(output_path, filename)
    elements.extend(elements_from_json(filepath))

In [None]:
chunked_elements = chunk_by_title(elements,
                                  # maximum for chunk size
                                  max_characters=512,
                                  # You can choose to combine consecutive elements that are too small
                                  # e.g. individual list items
                                  combine_text_under_n_chars=200,
                                  )

In [None]:
documents = []

for chunked_element in chunked_elements:
    metadata = chunked_element.metadata.to_dict()
    metadata["source"] = metadata["filename"]
    del metadata["languages"]
    documents.append(Document(page_content=chunked_element.text, metadata=metadata))

## Create the vector store and retriever

See the [LangChain docs](https://python.langchain.com/v0.2/docs/integrations/text_embedding/huggingfacehub/) for more on the Hugging Face embeddings integration.

See the [LangChain docs](https://python.langchain.com/v0.1/docs/integrations/vectorstores/weaviate/) for more on the Weaviate integration.

In [None]:
import weaviate
from weaviate.auth import AuthApiKey

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.chains import RetrievalQA

from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain_cohere import CohereRerank

Connect to Weaviate Cloud

In [None]:
weaviate_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=AuthApiKey(os.getenv("WEAVIATE_API_KEY")),
)

In [None]:
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=os.environ["HF_INFERENCE_API_KEY"], 
    model_name="BAAI/bge-base-en-v1.5"
)
vectorstore = WeaviateVectorStore.from_documents(documents, embeddings, client=weaviate_client)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [None]:
compressor = CohereRerank(cohere_api_key=os.environ["COHERE_API_KEY"])
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

## Using Hugging Face Inference Endpoints for prototyping

See the [LangChain docs](https://python.langchain.com/v0.2/docs/integrations/llms/huggingface_endpoint/) for more.

In [None]:
from textwrap import wrap
from langchain_huggingface import HuggingFaceEndpoint

In [None]:
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-70B-Instruct",
    model_kwargs={"max_length": 128},
    temperature=0.5,
    huggingfacehub_api_token=os.environ["HF_INFERENCE_API_KEY"],
)

In [None]:
chain = RetrievalQA.from_chain_type(
    llm=llm, 
    retriever=compression_retriever,
)

In [None]:
result = chain.invoke("What is chain of abstraction")['result']

In [None]:
for sequence in wrap(result):
    for token in sequence.split():
        print(token, end=" ", flush=True)
    print()