## Building a RAG System Using LlamaIndex, HuggingFace, and OpenAI

In [None]:
# !pip install llama_index
# !pip install llama-index-embeddings-huggingface
# !pip install llama-index-llms-huggingface

from llama_index.core import SimpleDirectoryReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex

In [None]:
loader = SimpleDirectoryReader(
    input_dir=".",
    recursive=True,
    required_exts=[".pdf"],
)

# Load the documents
documents = loader.load_data()
documents

In [None]:
embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

print(embedding_model._model.device)  # Device that the model is running on

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

cpu


In [None]:
# Creates embeddings for the sentences and stores them
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embedding_model,
)

# Save the index in the current directory
index.storage_context.persist(persist_dir="./huggingfaceembeddings")


In [None]:
# Viewing the chunks
for doc in index.docstore.docs.values():
    print("Document ID:", doc.ref_doc_id)
    print("Text Chunk:", doc.text)
    print("=" * 50)


In [10]:
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embedding_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
storage_context = StorageContext.from_defaults(persist_dir="./huggingfaceembeddings")
index = load_index_from_storage(storage_context, embed_model=embedding_model)

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from llama_index.llms.huggingface import HuggingFaceLLM
import torch

device = (
    torch.device("cuda") if torch.cuda.is_available() else
    torch.device("mps") if torch.backends.mps.is_available() else
    torch.device("cpu")
)

# Load a model and tokenizer from Hugging Face
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B-Instruct").to(device)

# Initialize HuggingFaceLLM
huggingface_llm = HuggingFaceLLM(
    model=model,
    tokenizer=tokenizer,
)

# Set the LLM to use
query_engine = index.as_query_engine(llm=huggingface_llm)


In [16]:
while True:
    question = input("Question: ")
    if question.lower() == "quit":
        break
    print(query_engine.query(question).response)


Question: which miRNA was discovered?
The specific miRNA that was discovered is not mentioned in the provided information.
Question: what was the conclusion?
The study identified seven specific miRNAs significantly associated with changes in FEV1% in patients receiving ICS treatment, even after adjusting for potential confounders. This suggests that these miRNAs may play a role in predicting the response to ICS therapy in the treatment of respiratory conditions.
Question: quit
