# RAG Implementation using Llama-2 model

This is a simple RAG implementation using all-mpnet-base-v2 embedding model, chromadb vector database and Llama-2 7B model.

Note: Download pytorch with CUDA to use GPU

In [1]:
import PyPDF2
from sentence_transformers import SentenceTransformer
import chromadb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
import torch
import os
from pdfminer.high_level import extract_text
import time
import camelot
import json

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

  from tqdm.autonotebook import tqdm, trange


## Chunking

In [149]:
reader = PyPDF2.PdfReader('manual2.pdf')
doc = ""
for page in reader.pages:
    text = page.extract_text()
    doc += text

ignore '/Perms' verify failed


In [151]:
def overlapping_chunking(text, chunk_size, overlap_size):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap_size):
        chunks.append(' '.join(words[i:i + chunk_size]))
    return chunks

In [183]:
chunks = overlapping_chunking(doc, 100, 20)

In [3]:
def extract_text_by_page(pdf_path):
    # Open the PDF file
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfFileReader(file)
        
        text_chunks = []
        
        # Iterate through each page
        for page_num in range(reader.getNumPages()):
            # Get the page
            page = reader.getPage(page_num)
            
            # Extract text from the page
            text = page.extract_text()
            print(len(text))
            
            # Store the text in the dictionary with the page number as the key
            text_chunks.append(text)
    
    return text_chunks


# Example usage
pdf_path = 'manual.pdf'
chunks = extract_text_by_page(pdf_path)

ignore '/Perms' verify failed


41
0
2351
2929
5202
3708
207
738
427
219
720
290
369
248
888
892
1086
1104
1057
730
1174
1467
832
658
1201
1234
1163
516
846
209
233
940
1271
371
804
1057
1356
565
552
456
517
320
1379
614
662
305
435
906
32
1873
763
587
615
991
561
1114
1069
668
835
865
1194
295
345
1189
1474
796
783
548
265
2241
928
1072
1326
1428
638
1005
737
463
610
1880
919
445
418
721
696
667
442
816
684
274
1019
1000
265
1468
212
58
63
192
1223
515
800
1674
898
1809
1179
1420
2320
2218
692
1937
580
66
939
657
0
384


In [126]:
chunks[0]

'Hello ASSISTA\nQuick Set-up Guide\nRV-5AS-D'

In [202]:
embedding_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2', cache_folder = '/data/base_models')
chunk_embeddings = embedding_model.encode(chunks)
chunk_embeddings.shape

(116, 768)

## Indexing

In [205]:
chroma_client = chromadb.Client()

collection = chroma_client.create_collection(name="rag8", metadata={"hnsw:space": "cosine"})

In [207]:
collection.add(
    embeddings = chunk_embeddings,
    documents=chunks,
    ids= [str(i) for i in range(len(chunks))]
)

## Retrieving

In [213]:
def retrieve_vector_db(query, n_results=1):
    results = collection.query(
    query_embeddings = embedding_model.encode(query).tolist(),
    n_results=n_results
    )
    return results

In [215]:
query = "What is the Manual No. of the following manual “Collaborative Robot Safety Manual”?"
retrieved_results = retrieve_vector_db(query)

In [217]:
retrieved_results

{'ids': [['3']],
 'distances': [[0.34433144330978394]],
 'metadatas': [[None]],
 'embeddings': None,
 'documents': [['2ABOUT THIS MANUAL\nThis manual explains the robot setup procedure a nd the steps required for pick-and-place operations.\nIt also explains how to setup the Easy-setup kit (option).\nThe manual consists of the following parts: • Part 1 SETUP\nThis part explains how to unpack, install, wire, and operate the robot.\n • Part 2 OPERATION\nThis part explains how to create and run programs using RT VisualBox.\n • Part 3 VISION SENSORThis part explains how to create and run vision sensor programs using RT VisualBox.\nMANUALS INCLUDED WITH THIS PRODUCT\n*1 Instances where the CR800-D controller is mentioned also refer to the CR800-05VD.Manual name Description Manual No.\nCollaborative Robot Safety \nManualTo ensure the safety of robot users, this ma nual provides information on common precautions and \nsafety measures that should be taken when ha ndling the robot or creating an

In [219]:
context = '\n\n'.join(retrieved_results['documents'][0])
print(context)

2ABOUT THIS MANUAL
This manual explains the robot setup procedure a nd the steps required for pick-and-place operations.
It also explains how to setup the Easy-setup kit (option).
The manual consists of the following parts: • Part 1 SETUP
This part explains how to unpack, install, wire, and operate the robot.
 • Part 2 OPERATION
This part explains how to create and run programs using RT VisualBox.
 • Part 3 VISION SENSORThis part explains how to create and run vision sensor programs using RT VisualBox.
MANUALS INCLUDED WITH THIS PRODUCT
*1 Instances where the CR800-D controller is mentioned also refer to the CR800-05VD.Manual name Description Manual No.
Collaborative Robot Safety 
ManualTo ensure the safety of robot users, this ma nual provides information on common precautions and 
safety measures that should be taken when ha ndling the robot or creating and designing robot 
systems. Read this manual first.BFP-A3733
Hello ASSISTA Quick Set-up 
Guide (This manual)Describes procedures i

## Answer Generation

In [13]:
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-chat-hf",
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", 
                                         )

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [15]:
def get_llama2_chat_reponse(prompt, max_new_tokens=250):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, temperature= 0.00001)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

In [5]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [113]:
model.to("cuda")

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head):

In [9]:
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
def get_llama3_chat_reponse(messages):
    input_ids = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt").to(model.device)
    outputs = model.generate(
    input_ids,
    max_new_tokens=800,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.01,
    top_p=0.9,)
    response = outputs[0][input_ids.shape[-1]:]
    print(tokenizer.decode(response, skip_special_tokens=True))

In [15]:
query = "Write code in python for creating a machine learning model"
messages = [
    {"role": "system", "content": "You are a chatbot who gives code based on a query"},
    {"role": "user", "content": {query}},
]
start_time = time.time()
get_llama3_chat_reponse(messages)
print("--- %.2f seconds ---" % (time.time() - start_time))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


assistant

Here is a basic example of how you can create a machine learning model in Python using scikit-learn library:

```
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Gaussian Naive Bayes classifier
gnb = GaussianNB()

# Train the model using the training sets
gnb.fit(X_train, y_train)

# Predict the response for test dataset
y_pred = gnb.predict(X_test)

# Model Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))
```

This code creates a Gaussian Naive Bayes classifier and trains it on the iris dataset. It then uses the trained model to predict the labels of the test set and calculates the a

In [229]:
query = "What is the unit of operating range for table 2.1 the RV-5AS-D robot?"
retrieved_results = retrieve_vector_db(query, n_results=3)
context = '\n\n'.join(retrieved_results[0])
messages = [
    {"role": "system", "content": "You are a chatbot who gives an answer strictly based on the content provided"},
    {"role": "system", "content": {context}},
    {"role": "user", "content": {query}},
]
start_time = time.time()
get_llama3_chat_reponse(messages)
print("--- %.2f seconds ---" % (time.time() - start_time))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


assistant

According to Table 2-1: Standard specifications of RV-5AS-D robot, the unit of operating range is "Degree" for Waist (J1), Shoulder (J2), and Elbow (J3), and "mm" for Wrist twist (J4), Wrist pitch (J5), and Wrist roll (J6).
--- 4.20 seconds ---


In [231]:
print(context)

Mitsubishi Electric Industrial Robot
CR800-05VD Controller
RV-5AS-D Standard Specifications Manual
BFP-A3727-M

2-12   Standard specifications2 Robot arm
2 Robot arm
2.1 Standard specifications
2.1.1 Basic specifications
Table 2-1: Standard specific ations of RV-5AS-D robot
Item Unit Specifications
Type RV-5AS-D
Environment Oil mist environments
Degree of freedom 6
Installation posture Floor mounted / ceiling mountedStructure Vertical articulated robots
Drive system AC servo motor (brake provided on all axes)
Position detection method Absolute encoderMotor capac-
ityWaist (J1) W 200
Shoulder (J2) 200
Elbow (J3) 150Wrist twist (J4) 100
Wrist pitch (J5) 100
Wrist roll (J6) 50
Operating
rangeWaist (J1) Degree ±240
Shoulder (J2) ±148
Elbow (J3) ±150Wrist twist (J4) ±200Wrist pitch (J5) ±120
Wrist roll (J6) ±200
Speed of
motion 
Note1) Waist (J1) Degree/
s124 (59.6)
Shoulder (J2) 124 (34.0)
Elbow (J3) 124 (34.0)
Wrist twist (J4) 297 (142)Wrist pitch (J5) 356 (215)
Wrist roll (J6) 360
Maximu

## Without RAG

In [29]:
query = "How do I install the Safety Extension Unit?"

In [19]:
get_llama2_chat_reponse(query)

  attn_output = torch.nn.functional.scaled_dot_product_attention(


'How do I install the Safety Extension Unit?\n\nThe Safety Extension Unit (SEU) is a hardware component that can be installed on your Raspberry Pi to provide additional safety features, such as emergency shutdown and power-off protection. Here are the steps to install'

## RAG

In [221]:
query = "What is the Manual No. of the following manual “Collaborative Robot Safety Manual”?"

retrieved_results = retrieve_vector_db(query, n_results=1)
context = '\n\n'.join(retrieved_results['documents'][0])
prompt = f'''
[INST]
Give answer for the question strictly based on the context provided.

Question: {query}

Context : {context}
[/INST]
'''
start_time = time.time()
print(get_llama2_chat_reponse(prompt, max_new_tokens=800))
print("--- %.2f seconds ---" % (time.time() - start_time))


[INST]
Give answer for the question strictly based on the context provided.

Question: What is the Manual No. of the following manual “Collaborative Robot Safety Manual”?

Context : 2ABOUT THIS MANUAL
This manual explains the robot setup procedure a nd the steps required for pick-and-place operations.
It also explains how to setup the Easy-setup kit (option).
The manual consists of the following parts: • Part 1 SETUP
This part explains how to unpack, install, wire, and operate the robot.
 • Part 2 OPERATION
This part explains how to create and run programs using RT VisualBox.
 • Part 3 VISION SENSORThis part explains how to create and run vision sensor programs using RT VisualBox.
MANUALS INCLUDED WITH THIS PRODUCT
*1 Instances where the CR800-D controller is mentioned also refer to the CR800-05VD.Manual name Description Manual No.
Collaborative Robot Safety 
ManualTo ensure the safety of robot users, this ma nual provides information on common precautions and 
safety measures that sh

In [None]:
print(context)

In [21]:
print(get_llama2_chat_reponse(prompt, max_new_tokens=800))


[INST]
Give answer for the question strictly based on the context provided. Keep answers short and to the point.

Question: How do I install the Safety Extension Unit?

Context : 1 STEPS FROM UNPACKING THE ROBOT TO INSTALLING THE ROBOT 1.11 Installing the Safety Extension Unit 2111.11 Installing the Safety Extension Unit The following equipment is required. Install the Safety extension unit in the Earth le akage breaker / Safety extension unit power supply box. 1. Remove the fixing screws on the front co ver of the Earth leakage breaker / safety extension unit power supply box (four places). 2. Fix the Safety extension unit in plac e using the fixing screws (three places).Item number Item Quantity A-2) Earth leakage breaker / Safety extension unit power supply box1 A-8) Safety

8)2 A-2) Earth leakage breaker / Safety extension unit power supply box1 D-3) Phillips-head screwdriver (length: 18 cm or longer) 1 A-5) Support bracket 1 Accessory A RIO cable 1 A-6) Mode selector switch key 1

In [22]:
# import gc
# gc.collect()
# from pynvml import *
# nvmlInit()
# h = nvmlDeviceGetHandleByIndex(0)
# info = nvmlDeviceGetMemoryInfo(h)
# print(f'total    : {info.total}')
# print(f'free     : {info.free}')
# print(f'used     : {info.used}')