# Smart RAG with Jina and Qdrant

This notebook will show you how to make a basic RAG engine using the [LlamaIndex framework](https://www.llamaindex.ai/), the open-source [Mistral-7B-Instruct](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) LLM, [Jina Embeddings v2](https://jina.ai/embeddings), and [Qdrant’s AI-ready vector database](https://qdrant.tech/).

The first step in building this RAG system is to install the prerequisites:

In [None]:
!pip install llama-index qdrant-client pdfminer.six

You will need access to the Hugging Face Inference API, including an access token. If you have a Hugging Face account, you can get one from [your account settings page](https://huggingface.co/settings/tokens).

If you do not have an account, first set one up [on the Hugging Face website](https://huggingface.co/join), then go to [your account settings](https://huggingface.co/settings/tokens) page to create an access token.

In [None]:
hf_inference_api_key: str = "<your HuggingFace Inference API token>"

Next, we construct a prompt template:

In [None]:
from llama_index import PromptTemplate

qa_prompt_tmpl = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query. Please be brief, concise, and complete.\n"
    "If the context information does not contain an answer to the query, "
    "respond with \"No information\".\n"
    "Query: {query_str}\n"
    "Answer: "
)
qa_prompt = PromptTemplate(qa_prompt_tmpl)

Finally, we create and initialize an object for the LlamaIndex framework that holds the connection to Mistral-7B-Instruct.

In [None]:
import requests
from llama_index.llms import (
    CustomLLM,
    CompletionResponse,
    CompletionResponseGen,
    LLMMetadata,
)
from llama_index.llms.base import llm_completion_callback
from typing import Any


class MixtralLLM(CustomLLM):
    context_window: int = 4096
    num_output: int = 1024
    model_name: str = "mistralai/Mixtral-8x7B-Instruct-v0.1"
    api_key: str = hf_inference_api_key

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model_name,
        )

    def do_hf_call(self, prompt: str) -> str:
        data = {
            "inputs": prompt
        }

        response = requests.post(
            'https://api-inference.huggingface.co/models/' + self.model_name,
            headers={
                'authorization': f'Bearer {self.api_key}',
                'content-type': 'application/json',
            },
            json=data,
            stream=True
        )
        if response.status_code != 200 or not response.json() or 'error' in response.json():
            print(f"Error: {response}")
            return "Unable to answer for technical reasons."
        full_txt = response.json()[0]['generated_text']
        offset = full_txt.find("---------------------")
        ss = full_txt[offset:]
        offset = ss.find("Answer:")
        return ss[offset+7:].strip()

    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        response = self.do_hf_call(prompt)
        return CompletionResponse(text=response)

    @llm_completion_callback()
    def stream_complete(
            self, prompt: str, **kwargs: Any
    ) -> CompletionResponseGen:
        response = ""
        for token in self.do_hf_call(prompt):
            response += token
            yield CompletionResponse(text=response, delta=token)


mixtral_llm = MixtralLLM()

Finally, we create and initialize an object for the LlamaIndex framework that holds the connection to Mistral-7B-Instruct.

# Information Retrieval with Qdrant and Jina Embeddings

To set up the retrieval system, you will need a Jina Embeddings API key. You can get one with one million free tokens at the [Jina Embeddings website](https://jina.ai/embeddings/).



In [None]:
jina_emb_api_key = "<your Jina Embeddings API key>"

In [None]:
from llama_index.embeddings.jinaai import JinaEmbedding

jina_embedding_model = JinaEmbedding(
    api_key=jina_emb_api_key,
    model="jina-embeddings-v2-base-en",
)

Then, create a connector object using LlamaIndex for the Jine Embeddings server, selecting specifically the English monolingual model:

In [None]:
import urllib.request

uri = "https://www.whitehouse.gov/wp-content/uploads/2023/05/National-Artificial-Intelligence-Research-and-Development-Strategic-Plan-2023-Update.pdf"
pdf_data = urllib.request.urlopen(uri).read()

## Load text data

Next, we will load the document and split it up into paragraphs. First, download the PDF from the White House website into the variable pdf_data:



In [None]:
import regex as re
from io import BytesIO, StringIO
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfparser import PDFParser

text_paras = []
parser = PDFParser(BytesIO(pdf_data))
doc = PDFDocument(parser)
rsrcmgr = PDFResourceManager()
for page in PDFPage.create_pages(doc):
    output_string = StringIO()
    device = TextConverter(rsrcmgr, output_string, laparams=LAParams())
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    interpreter.process_page(page)
    page_text = output_string.getvalue()
    text_paras.extend(re.split(r'\n\s*\n', page_text))

Check that everything is loaded:


In [None]:
assert len(text_paras) == 615

Next, we will convert this list of short texts into LlamaIndex Document objects:


In [None]:
from llama_index.readers import StringIterableReader

rag_docs = StringIterableReader().load_data(text_paras)

And you can inspect the text:


In [None]:
print(rag_docs[0].text)


NATIONAL ARTIFICIAL INTELLIGENCE 
RESEARCH AND DEVELOPMENT 
STRATEGIC PLAN 
2023 UPDATE



## Set up a Qdrant Vector Database

You will need to create an account on the [Qdrant Cloud website](https://cloud.qdrant.io/login) before continuing.

Once you have an account and are logged in, you will need to create a cluster.

Follow the [“quick start” instructions on the Qdrant Website](https://qdrant.tech/documentation/cloud/quickstart-cloud/) to set up a free cluster and get an API and the name of the Qdrant host server name.


Store the key and hostname in variables:

In [None]:
qdrant_api_key = "<your API key>"
qdrant_server = "https://<your server>"

Next, we will need to import the relevant components from the `qdrant_client` and `llama_index` packages:



In [None]:
import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore

client = qdrant_client.QdrantClient(qdrant_server, api_key=qdrant_api_key)
vector_store = QdrantVectorStore(client=client, collection_name="NTSC")

This creates a collection named `NTSC` in your free cluster.

## Complete the full RAG system

Now we will assemble these components into a complete RAG system using boilerplate code for LlamaIndex. This may take several minutes to run.

In [None]:
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.retrievers import VectorIndexRetriever
from llama_index.storage.storage_context import StorageContext
from llama_index import (
		VectorStoreIndex,
		ServiceContext,
		get_response_synthesizer,
)

# set up the service and storage contexts
service_context = ServiceContext.from_defaults(
    llm=mixtral_llm, embed_model=jina_embedding_model
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create an index
index = VectorStoreIndex.from_documents(
    rag_docs, storage_context=storage_context, service_context=service_context
)

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    service_context=service_context,
    text_qa_template=qa_prompt,
    response_mode="compact",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

Now the system is ready to use.

# Querying a Document

Let’s try a straightforward query:

In [None]:
response = query_engine.query("""
What is the Biden Administration's policy with regard to AI? 
""")
print(response.response)

Or something more specific:

In [None]:
response = query_engine.query("""
What protections does the AI Bill of Rights propose to offer?
""")
print(response.response)

Or even very specific:

In [None]:
response = query_engine.query("Who is Kei Koizumi?")
print(response.response)

You can also ask more fanciful questions:

In [None]:
response = query_engine.query("""
What rights will AIs receive under President Biden's proposed 
AI Bill of Rights?
""")
print(response.response)

In [None]:
response = query_engine.query("""
Why is President Biden's proposing an AI Bill of Rights?
Does AI really need rights?
""")
print(response.response)

In [None]:
response = query_engine.query("""
Has Donald Trump weighed in on AI?
Will he Make Humans Great Again?
""")
print(response.response)