<a href="https://colab.research.google.com/github/mriduldewan/llm/blob/main/RAG_LLM_with_Llama-3_8b_and_LlamaIndex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
#!pip install pypdf
#!pip install sentence-transformers
#!pip install -q transformers einops accelerate langchain bitsandbytes
#!pip install quanto
#!pip install llama_index
#%pip install llama-index-embeddings-langchain
#%pip install llama-index-llms-huggingface



In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.core import PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM

In [3]:
# Load the documents from the specified directory

#documents = SimpleDirectoryReader("/content/data").load_data() # Path if working in colab
documents = SimpleDirectoryReader("/data").load_data()

In [4]:
## Create the system prompt template
# This prompt will be used to guide the behavior of the language model

system_prompt="""
You are a Q&A assistant. Your goal is to answer questions as accurately as
possible based on the instructions and context provided. If you do not know
the answer you can say that you dont know, do not try to make up an answer.
"""

## Default format supportable by LLama3
# This prompt template will wrap the user's query and the system prompt
# in a specific format that the language model can understand

query_wrapper_prompt=PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")

In [5]:
# Log in to the Hugging Face platform

!huggingface-cli login # Command might be different if you are on Jupyter notebooks locally


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
[1m[31mCannot authenticate through 

In [7]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, QuantoConfig

# Model name that we want to load from HF
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = QuantoConfig(weights="int8")

# Load the model and the tokenizer
# The model is loaded with the specified quantization configuration
# and the "auto" device mapping for efficient inference
model = AutoModelForCausalLM.from_pretrained(
          model_id,
          torch_dtype=torch.float32,
          quantization_config=quantization_config,
          device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/136 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [18]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding

# Create the embeddings
# Use the "sentence-transformers/all-mpnet-base-v2" model for embeddings
lc_embed_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)
embed_model = LangchainEmbedding(lc_embed_model)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [20]:
# Use the "sentence-transformers/all-mpnet-base-v2" model for embeddings

service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    embed_model=embed_model,
    llm=HuggingFaceLLM(context_window=4096,
                       max_new_tokens=256,
                       generate_kwargs={"temperature": 0.3, "do_sample": False},
                       model=model, 
                       tokenizer=tokenizer,
                       system_prompt=system_prompt,
                       query_wrapper_prompt=query_wrapper_prompt
)

  service_context = ServiceContext.from_defaults(


In [22]:
# Create the index from the documents
index = VectorStoreIndex.from_documents(
            documents,
            service_context=service_context
            )

In [24]:
# Get the query engine
query_engine = index.as_query_engine()

In [28]:
# Execute the query and print the response
response = query_engine.query("How many indigenous students are also disabled from the VET survey 2021?")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [30]:
print(response)

9% of Indigenous students are also disabled, as per the data provided in the VET survey 2021. This information is available in the table "Percentage of VET students who are Indigenous are also:" under the section "PARTICIPATION IN VET IN 2021". The exact number of Indigenous students who are also disabled is not provided in the table, but the percentage is given as 9%. Therefore, we cannot determine the exact number of Indigenous students who are also disabled from the provided data. However, we can infer that 9% of Indigenous students are also disabled.
