# Sample Retrieval-Augmented Generation (RAG) Pipeline

## Overview
This notebook demonstrates a **sample Retrieval-Augmented Generation (RAG) workflow** using LangChain, OpenAI embeddings, and Qdrant and is used to support the Internet Brands AI Knowledge Base Assistant proposal.

The documents used in this notebook are **AI-generated**, mock documents only used for demonstration purposes.

This approach is meant to showcase the **technical approach** to the proposal.

### Setup
Install dependencies by running:
```bash
pip install -r requirements.txt

## 1. Environment Setup

In [1]:
import os 
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass("Enter your OpenAI API key: ")

## 2. Load Documents

In [2]:
from langchain_community.document_loaders import DirectoryLoader, PyMuPDFLoader, TextLoader, CSVLoader

path = "knowledgebase/"

pdf_loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
txt_loader = DirectoryLoader(path, glob="*.txt", loader_cls=TextLoader)
csv_loader = DirectoryLoader(path, glob="*.csv", loader_cls=CSVLoader)

docs = (
    pdf_loader.load()
    + txt_loader.load()
    + csv_loader.load()
)

## 3. Split Documents into Chunks

In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_documents = text_splitter.split_documents(docs)


## 4. Generate Embeddings

In [4]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

## 5. Store Embeddings in Vector Database

In [5]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(":memory:")

client.create_collection(
    collection_name = "Internet_Brands_Knowledge_Base",
    vectors_config = VectorParams(size = 3072, distance = Distance.COSINE)
)

vectorstore = QdrantVectorStore(
    client = client,
    collection_name = "Internet_Brands_Knowledge_Base",
    embedding=embeddings
)

vectorstore.add_documents(split_documents)

['d550164c1f814ff7a3e756f74ae705e3',
 '0d7154b81f024404b310c1bf8541f437',
 'a3fa6559ef4f494494be22fab651c81c',
 'd5d675ce91f14a7097fda57b818c38b0',
 '48868150a9c349999361e2b84502a3ff',
 '664286061f2a4c68980e38b8f431972a',
 'c9d610a89f9b4833870fe76a17ac8b55',
 '640cf42226b9407e9eb1f9a02441d898']

## 6. Build Retriever

In [6]:
retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

## 7. Define RAG Prompt

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_PROMPT = """\
You are a helpful assistant who answers questions using only the provided Internet Brands documents. You must not use outside knowledge. If the answer is not contained in the provided context, respond with ‘I don’t know.’ When possible, cite the specific document(s) used to generate the answer.

Context:
{context}

Question:
{question}

Answer:
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

## 8. RAG Pipeline

In [10]:
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

openai_chat_model = ChatOpenAI(model="gpt-4o")

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

rag_chain = (
    {
        "context": retriever,
        "question": RunnablePassthrough(),
    }
    | rag_prompt
    | openai_chat_model
    | StrOutputParser()
)

response = rag_chain.invoke(
    "What is Internet Brands' policy for remote work?"
)

print(response)


Internet Brands' policy for remote work is to support flexible work arrangements that permit employees to perform their duties remotely, when appropriate. However, not all roles are eligible for remote work, as eligibility is determined by job responsibilities, team requirements, and manager approval. 

Employees approved for remote work must ensure they are available during core business hours, maintain reliable internet connectivity, comply with company security and confidentiality policies, and attend required meetings (virtual or in-person). Company systems must be accessed using approved devices and secure networks, with employees responsible for protecting company data while working remotely. Additionally, remote work arrangements may be reviewed or modified at any time based on business needs. 

(Source: knowledgebase/remote_work_policy.pdf)
