# RAG using Meta AI Llama-3

To setup:
ollama run hf.co/bartowski/Llama-3.2-3B-Instruct-GGUF:latest

add files for rag to documents

In [1]:
import nest_asyncio
from dotenv import load_dotenv
from IPython.display import Markdown, display

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.core import PromptTemplate
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader

In [2]:
# allows nested access to the event loop
nest_asyncio.apply()

In [3]:
# add your documents in this directory, you can drag & drop
input_dir_path = './documents'

In [4]:

# setup llm & embedding model
llm=Ollama(model="bartowski/Llama-3.2-3B-Instruct-GGUF:latest", request_timeout=120.0)
# embed_model = HuggingFaceEmbedding( model_name="Snowflake/snowflake-arctic-embed-m", trust_remote_code=True)
embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)

In [5]:
# load data
loader = SimpleDirectoryReader(
            input_dir = input_dir_path,
            required_exts=[".pdf"],
            recursive=True
        )
docs = loader.load_data()

# Creating an index over loaded data
Settings.embed_model = embed_model
index = VectorStoreIndex.from_documents(docs, show_progress=True)

# Create the query engine, where we use a cohere reranker on the fetched nodes
Settings.llm = llm
query_engine = index.as_query_engine()

# ====== Customise prompt template ======
qa_prompt_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information above I want you to think step by step to answer the query in a crisp manner, incase case you don't know the answer say 'I don't know!'.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

# Generate the response

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

In [6]:
 response = query_engine.query("Where is Yash interning in summer 2025? What school does he go to? What are his on-campus involvements?")
 display(Markdown(str(response)))

To answer your question step by step:

1. The context information provides Yash's work experience, but not his internship plans for Summer 2025.
2. To find this information, I'd suggest looking at the most recent section of his bio, which mentions his current work experience and education timeline.
3. However, since the context information only goes up to May 2027, I don't have any information on Yash's summer internship plans for 2025.

Regarding his school:

1. According to the context information, Yash attends the University of Illinois Urbana-Champaign.

As for his on-campus involvements:

1. Unfortunately, the context information does not provide a comprehensive list of Yash's on-campus involvements.
2. However, it mentions that he is currently a New Member Coordinator for Alpha Kappa Psi - Professional Business Fraternity and has been involved with Quant Illinois (Quantitative Developer) since February 2024.

Please note that I don't have information on any specific summer internship plans for Yash in 2025.

In [7]:
# check GPU usage

!nvidia-smi

zsh:1: command not found: nvidia-smi


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
