## RAG example with Langchain, Qdrant, and Ollama

### Requirements

- A **Ollama** LLM Server instance running on your NERC OpenShift environment, deployed by following [these instructions](../../../llm-servers/ollama/README.md). Ensure that the **GPU** section is uncommented to enable a GPU-based pod for hosting your standalone Ollama server instance.

- A **Qdrant** vector database, set up according to [these instructions](../../../vector-databases/qdrant/README.md).

- Environment variables must be configured for connecting to Qdrant:

  - `QDRANT_COLLECTION`

  - `QDRANT_API_KEY`

- Update the Ollama **BASE_URL** and **QDRANT_HOST** in this notebook to match your deployment settings.

## The Architecture of RAG Approach

![The Architecture of RAG Approach](datasource/RAG-architecture.png)

### Needed packages and imports

In [1]:
## Install Python dependencies
!pip install --upgrade pip
!pip install --upgrade qdrant_client langchain_qdrant langchain_community
!pip install --upgrade langchain-ollama
!pip install pypdf



In [2]:
from uuid import uuid4
import os

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import ChatPromptTemplate
from langchain.callbacks import StdOutCallbackHandler
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from qdrant_client.models import PointStruct
from langchain.chains import create_retrieval_chain
from langchain_ollama import OllamaLLM
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore

#### Bases parameters, Inference server and Qdrant info

In [3]:
# Replace values according to your Ollama deployment (including the Model name, Ollama and Qdrant Base URLs and Qdrant API Key)
MODEL = "phi3:14b"
# BASE_URL = "https://ollama-route-<your-namespace>.apps.shift.nerc.mghpcc.org"
BASE_URL = "http://ollama-service:11434"
# BASE_URL = "http://ollama-service.<your-namespace>.svc.cluster.local:11434"

# QDRANT_HOST = "https://qdrant-route-<your-namespace>.apps.shift.nerc.mghpcc.org/"
# QDRANT_PORT = 443

QDRANT_HOST = "http://qdrant-service"
# QDRANT_HOST = "http://qdrant-service.<your-namespace>.svc.cluster.local"
QDRANT_PORT = 6333

QDRANT_COLLECTION = os.getenv('QDRANT_COLLECTION')
QDRANT_API_KEY = os.getenv('QDRANT_API_KEY')

#### Initialize the connection

In [4]:
# Initialize the model and embedding
model = OllamaLLM(model=MODEL, base_url=BASE_URL)
embedding = OllamaEmbeddings(model=MODEL, base_url=BASE_URL)

### 1. Document Loading

In [5]:
loader = PyPDFLoader("datasource/The_Forgotten_Lighthouse_Book.pdf")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
document = loader.load_and_split(text_splitter)

In [6]:
document[0]

Document(metadata={'producer': 'Skia/PDF m80', 'creator': 'Chromium', 'creationdate': '2024-10-01T00:19:42+00:00', 'moddate': '2024-10-01T00:19:42+00:00', 'source': 'datasource/The_Forgotten_Lighthouse_Book.pdf', 'total_pages': 19, 'page': 0, 'page_label': '1'}, page_content='The_Forgotten_Lighthouse.md 2024-10-01\n1 / 19\nThe Forgotten Lighthouse\nChapter 1: The Key to Another World\nSarah stood at the edge of the cliff, her windswept hair whipping around her face as she gazed out at the\nchurning sea. The old lighthouse loomed behind her, its paint peeling and windows clouded with age. It had\nbeen years since anyone had lived there, years since its beacon had guided ships safely to shore.')

In [7]:
len(document)

129

#### Connect Qdrant via Client with API Key

In [8]:
client = QdrantClient(QDRANT_HOST, port=QDRANT_PORT, api_key=QDRANT_API_KEY)

  client = QdrantClient(QDRANT_HOST, port=QDRANT_PORT, api_key=QDRANT_API_KEY)


In [9]:
client.delete_collection(collection_name=QDRANT_COLLECTION)

True

In [10]:
client.create_collection(
    collection_name=QDRANT_COLLECTION,
    timeout=10,
    vectors_config={"content": VectorParams(size=5120, distance=Distance.COSINE)},
)

True

### 2. Splitting/Chunking Document

In [11]:
def chunked_metadata(data, client=client, collection_name=QDRANT_COLLECTION):
    chunked_metadata = []

    for item in data:
        id = str(uuid4())
        content = item.page_content
        source = item.metadata["source"]
        page = item.metadata["page"]

        content_vector = embedding.embed_documents([content])[0]
        vector_dict = {"content": content_vector}

        payload = {
            "page_content": content,
            "metadata": {
                "id": id,
                "page_content": content,
                "source": source,
                "page": page,
            },
        }

        metadata = PointStruct(id=id, vector=vector_dict, payload=payload)
        chunked_metadata.append(metadata)

    ### 3. Storage in Vector Database
    client.upsert(collection_name=collection_name, wait=True, points=chunked_metadata)

In [12]:
chunked_metadata(document[:10])

In [13]:
client.get_collections()

CollectionsResponse(collections=[CollectionDescription(name='story')])

In [14]:
story_collection = client.get_collection(QDRANT_COLLECTION)

print(f"Points: {story_collection.points_count} ")

Points: 10 


In [15]:
client.search(
    collection_name=QDRANT_COLLECTION,
    query_vector=("content", embedding.embed_documents(["Story Details"])[0]),
    with_payload=["page_content", "source"],
    limit=5,
)

  client.search(


[ScoredPoint(id='2fb02e87-bf50-46a3-94d4-63a4e61e9c1e', version=0, score=0.8864837, payload={'page_content': "sprawled across the page:\nMy dearest Sarah,\nIf you're reading this, then I'm gone. I'm sorry I couldn't tell you this in person, but there are secrets about\nour family and the lighthouse that I've kept hidden for far too long. It's time you knew the truth.\nGo to the lighthouse. In the top room, behind the old lens, you'll find a hidden compartment. What's inside\nwill explain everything.\nI love you, my little starfish. Be brave.\nGrandpa"}, vector=None, shard_key=None, order_value=None),
 ScoredPoint(id='e762c7fa-8b2f-46e6-a23b-a7e2e7303770', version=0, score=0.87809247, payload={'page_content': 'The_Forgotten_Lighthouse.md 2024-10-01\n1 / 19\nThe Forgotten Lighthouse\nChapter 1: The Key to Another World\nSarah stood at the edge of the cliff, her windswept hair whipping around her face as she gazed out at the\nchurning sea. The old lighthouse loomed behind her, its paint p

In [16]:
vectorstore = QdrantVectorStore(
    client=client, collection_name=QDRANT_COLLECTION, embedding=embedding, vector_name="content"
)

#### Initialize query chain

In [17]:
template = """

You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \


Question: {input}
Context: {context}

Answer:

"""

retriever = vectorstore.as_retriever()

In [18]:
prompt = ChatPromptTemplate.from_template(template)
handler = StdOutCallbackHandler()

### 4. Retrieval

In [19]:
combine_docs_chain = create_stuff_documents_chain(model, prompt)
retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)

#### Query example

In [20]:
result = retrieval_chain.invoke(
    {"input": "Who is the starfish and how do you know it?"}
)

### 5. Output

In [21]:
result

{'input': 'Who is the starfish and how do you know it?',
 'context': [Document(metadata={'id': 'e762c7fa-8b2f-46e6-a23b-a7e2e7303770', 'page_content': 'The_Forgotten_Lighthouse.md 2024-10-01\n1 / 19\nThe Forgotten Lighthouse\nChapter 1: The Key to Another World\nSarah stood at the edge of the cliff, her windswept hair whipping around her face as she gazed out at the\nchurning sea. The old lighthouse loomed behind her, its paint peeling and windows clouded with age. It had\nbeen years since anyone had lived there, years since its beacon had guided ships safely to shore.', 'source': 'datasource/The_Forgotten_Lighthouse_Book.pdf', 'page': 0, '_id': 'e762c7fa-8b2f-46e6-a23b-a7e2e7303770', '_collection_name': 'story'}, page_content='The_Forgotten_Lighthouse.md 2024-10-01\n1 / 19\nThe Forgotten Lighthouse\nChapter 1: The Key to Another World\nSarah stood at the edge of the cliff, her windswept hair whipping around her face as she gazed out at the\nchurning sea. The old lighthouse loomed behi

#### Retrieve Anwser

In [22]:
result["answer"]

'The starfish in the story is a metaphorical reference to Sarah herself by her Grandpa. He calls her his "little starfish". This could be because of some unique characteristic or quality he sees in her that reminds him of a starfish, such as resilience or adaptability. However, there\'s no clear explanation given in the provided context about why exactly Sarah is likened to a starfish.'