# RAG using Meta AI Llama-3


<img src="./resources/rag_architecture.png" width=800px>

In [1]:
%pip -q install llama-index
%pip -q install llama-index-llms-huggingface
%pip -q install llama-index-llms-ollama 
%pip -q install llama-index-embeddings-huggingface
%pip -q install llama-index-embeddings-instructor


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
import nest_asyncio
from dotenv import load_dotenv
from IPython.display import Markdown, display

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import PromptTemplate
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# allows nested access to the event loop
nest_asyncio.apply()

In [3]:
# add your documents in this directory, you can drag & drop
input_dir_path = '/Users/kiwitech/Desktop/RAG-Llama3/test-dir'

In [4]:
# setup llm & embedding model
llm=Ollama(model="llama3", request_timeout=120.0)
# embed_model = HuggingFaceEmbedding( model_name="Snowflake/snowflake-arctic-embed-m", trust_remote_code=True)
embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)

modules.json: 100%|██████████| 349/349 [00:00<00:00, 826kB/s]
config_sentence_transformers.json: 100%|██████████| 124/124 [00:00<00:00, 780kB/s]
README.md: 100%|██████████| 94.6k/94.6k [00:00<00:00, 442kB/s]
sentence_bert_config.json: 100%|██████████| 52.0/52.0 [00:00<00:00, 481kB/s]
config.json: 100%|██████████| 779/779 [00:00<00:00, 4.45MB/s]
model.safetensors: 100%|██████████| 1.34G/1.34G [03:21<00:00, 6.66MB/s]
tokenizer_config.json: 100%|██████████| 366/366 [00:00<00:00, 2.00MB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 532kB/s]
tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 1.22MB/s]
special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 335kB/s]
1_Pooling/config.json: 100%|██████████| 191/191 [00:00<00:00, 937kB/s]


In [5]:
# load data
loader = SimpleDirectoryReader(
            input_dir = input_dir_path,
            required_exts=[".pdf"],
            recursive=True
        )
docs = loader.load_data()

# Creating an index over loaded data
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(docs, show_progress=True)

# Create the query engine, where we use a cohere reranker on the fetched nodes
Settings.llm = llm
query_engine = index.as_query_engine()

# ====== Customise prompt template ======
qa_prompt_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

# Generate the response
response = query_engine.query("What exactly is DSPy?",)

Parsing nodes: 100%|██████████| 17/17 [00:00<00:00, 648.66it/s]
Generating embeddings: 100%|██████████| 26/26 [00:04<00:00,  6.18it/s]


In [6]:
display(Markdown(str(response)))

Based on the provided context, DSPy is described as a framework for programmatically solving advanced tasks with language and retrieval models through composing and declaring modules. It's designed to replace brittle "prompt engineering" tricks with composable modules and automatic (typically discrete) optimizers. In essence, DSPy allows developers to define signatures that specify what an LM needs to do declaratively, rather than relying on free-form string prompts.