# Retrieval Augmented Generation (RAG) with BeeAI

In this example, you will...

Use:
* Hugging Face model X for generate embeddings for documents and queries
* Redis vector store to cache and query the embeddings
* langchain for...
* BeeAI framework to build an AI agent to...


## Use case

TBD: Placeholder for whatever data and use case we use this is a test

## Setup

### Python package installs

> NOTE! Remember to navigate to the `beeai_fw_tavily_redis` folder of this repo and run `uv sync` before running your kernel.  
Make sure to choose the kernel that aligns with the uv python environment. The path should look something like `beeai-workshop/beeai_fw_tavily_redis/.venv/bin/python`


In [None]:
# Python version check
import sys
assert sys.version_info >= (3, 11) and sys.version_info < (3, 12), "Use Python 3.11 to run this notebook."


### Imports

In [None]:
import os
import redis
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_redis import RedisConfig, RedisVectorStore

### Constants

Constants (or variables that you might want to change) are set here and used later in the notebook.

In [None]:
EMBEDDINGS_MODEL_NAME = "sentence-transformers/all-mpnet-base-v2"
REDIS_URL = os.getenv("REDIS_URL", "redis://localhost:6379")  # Local Redis default

## Setup for the embeddings model

The embeddings model will be used to create embedding vectors from the documents and the queries.
With HuggingFaceEmbeddings we can download a sentence-transformers model to run locally for our embeddings.

In [None]:
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDINGS_MODEL_NAME)

## Setup the vector store

Redis is being used as the vector store.

In [None]:
# Test connection with Redis client
print(f"Connecting to Redis at: {REDIS_URL}")
redis_client = redis.from_url(REDIS_URL)
print(f"Connected = {redis_client.ping()}")

In [None]:
# Configure and init the vector store with our embeddings model
config = RedisConfig(
    index_name="internal_docs",
    redis_url=REDIS_URL,
    metadata_schema=[
        {"name": "document", "type": "tag"},
    ],
)

vector_store = RedisVectorStore(embeddings, config=config)

## Read and split the documents

In [None]:
from langchain_text_splitters import MarkdownHeaderTextSplitter

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[
        ("#", "Header_1"),
        ("##", "Header_2"),
        ("###", "Header_3"),
    ],
    strip_headers=True,
)
    
# Get the list of all files in the directory
path = '../../example_docs'
files = os.listdir(path)

n_docs = 0
metadata = []
splits = []
for file in files:
    filename = os.path.join(path, file)
    if not os.path.isdir(filename) and filename.endswith(".md"):
        with open(filename) as f:
           file_contents = f.read()
           n_docs += 1
           for split in splitter.split_text(file_contents):
               splits.append(split.page_content)
               metadata.append({"document": filename})
                
print(f"{n_docs} documents split in to {len(splits)} chunks of text")


In [None]:
# Add the text and metadata to the vector store
_ids = vector_store.add_texts(splits, metadata)
# print(_ids)

In [None]:
# assumes you're running Redis locally (use --host, --port, --password, --username, to change this)
!rvl index listall --port 6379

In [None]:
!rvl index info -i internal_docs --port 6379

In [None]:
!rvl stats -i internal_docs --port 6379

In [None]:
query = "What is our target market for the pilot?"
results = vector_store.similarity_search(query, k=2)

print("Simple Similarity Search Results:")
for doc in results:
    print(f"Content: {doc.page_content[:100]}...")
    print(f"Metadata: {doc.metadata}")
    print()

In [None]:
# Create a retriever

retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 2})
results = retriever.invoke("What is our target market for the pilot?")
for doc in results:
    print(f"Content: {doc.page_content[:100]}...")
    print(f"Metadata: {doc.metadata}")
    print()