<a href="https://colab.research.google.com/github/muhammadalinoor-1982/GenAI/blob/main/RAG_with_huggingface_Meta_Llama_3_8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Step-by-Step Code of RAG pipeline with huggingface_Meta-Llama-3-8B**

#**1. Install Necessary Libraries**

In [None]:
!pip install faiss-cpu pymupdf

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Collecting pymupdf
  Downloading PyMuPDF-1.24.9-cp310-none-manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting PyMuPDFb==1.24.9 (from pymupdf)
  Downloading PyMuPDFb-1.24.9-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.4 kB)
Downloading faiss_cpu-1.8.0.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading PyMuPDF-1.24.9-cp310-none-manylinux2014_x86_64.whl (3.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.5/3.5 MB[0m [31m28.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading PyMuPDFb-1.24.9-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (15.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m51.0 MB/s[0m et

#**2. Import Libraries and Setup Environment**

In [None]:
import fitz  # PyMuPDF
import nltk
import sqlite3
import torch
import numpy as np
import faiss
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM

nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

#**3. Database Operations**

In [None]:
def create_database(db_name="documents.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute('''CREATE TABLE IF NOT EXISTS chunks
                      (id INTEGER PRIMARY KEY, content TEXT)''')
    conn.commit()
    conn.close()
    print(f"Database '{db_name}' created with table 'chunks'.")

def insert_chunks(chunks, db_name="documents.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.executemany("INSERT INTO chunks (content) VALUES (?)", [(chunk,) for chunk in chunks])
    conn.commit()
    conn.close()
    print(f"Inserted {len(chunks)} chunks into 'chunks' table.")

#**4. Text Extraction and Chunking**

In [None]:
def extract_and_chunk_text_from_pdf(pdf_path, chunk_size=200):
    document = fitz.open(pdf_path)
    text = ""
    for page_num in range(len(document)):
        page = document.load_page(page_num)
        text += page.get_text()

    sentences = nltk.sent_tokenize(text)
    chunks = [' '.join(sentences[i:i + chunk_size]) for i in range(0, len(sentences), chunk_size)]
    print(f"Extracted and chunked text from {pdf_path}. Number of chunks: {len(chunks)}")
    return chunks

#**5. Embedding and Retrieval**

In [None]:
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModelForCausalLM.from_pretrained('sentence-transformers/all-MiniLM-L6-v2').to('cuda')  # Move model to GPU

def embed_text(texts):
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True).to('cuda')  # Move inputs to GPU
    with torch.no_grad():
        embeddings = model(**inputs).last_hidden_state.mean(dim=1).cpu().numpy()  # Move embeddings to CPU
    return embeddings

index = faiss.IndexFlatL2(384)  # Dimension should match the embedding size

def load_chunks_and_index(db_name="documents.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    cursor.execute("SELECT content FROM chunks")
    chunks = [row[0] for row in cursor.fetchall()]
    conn.close()

    if chunks:
        embeddings = embed_text(chunks)
        index.add(embeddings)
        print(f"Loaded {len(chunks)} chunks and added to FAISS index.")
    else:
        print("No chunks loaded from the database.")

    return chunks

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of BertLMHeadModel were not initialized from the model checkpoint at sentence-transformers/all-MiniLM-L6-v2 and are newly initialized: ['cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


#**6. Retrieval and Ranking**

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

def retrieve_and_rank(chunks, query, top_k=5):
    query_embedding = embed_text([query])
    distances, indices = index.search(query_embedding, top_k)

    if len(indices[0]) == 0:
        print("No chunks retrieved from the index.")
        return []

    retrieved_chunks = [chunks[i] for i in indices[0] if i < len(chunks)]

    if not retrieved_chunks:
        print("No valid chunks retrieved after filtering.")
        return []

    chunk_embeddings = embed_text(retrieved_chunks)
    similarities = cosine_similarity(query_embedding, chunk_embeddings)[0]
    ranked_chunks = [retrieved_chunks[i] for i in np.argsort(similarities)[::-1]]

    return ranked_chunks

#**7. Generate Responses with LLaMA**

In [None]:
access_token = "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"
model_name = 'meta-llama/Meta-Llama-3-8B'  # Replace with the actual model name you are using
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=access_token)
model = LlamaForCausalLM.from_pretrained(model_name, use_auth_token=access_token).to('cuda')

def generate_response(chunks, query, top_k=5, prompt="Answer the following question based on the provided context:"):
    ranked_chunks = retrieve_and_rank(chunks, query, top_k)

    if not ranked_chunks:
        return "No relevant chunks found to generate a response."

    context = " ".join(ranked_chunks) + "\n" + prompt + "\n" + query

    inputs = tokenizer(context, return_tensors='pt').to('cuda')
    outputs = model.generate(inputs.input_ids, max_new_tokens=150, pad_token_id=tokenizer.eos_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.strip()



tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

#**8. RAG Pipeline Function**

In [None]:
def rag_pipeline(pdf_paths, query, top_k=5, chunk_size=200, prompt="Answer the following question based on the provided context:"):
    create_database()
    for pdf_path in pdf_paths:
        chunks = extract_and_chunk_text_from_pdf(pdf_path, chunk_size)
        insert_chunks(chunks)

    chunks = load_chunks_and_index()

    response = generate_response(chunks, query, top_k, prompt)

    return response

#**9. Upload Multiple PDFs**

In [None]:
from google.colab import files

uploaded = files.upload()

pdf_paths = list(uploaded.keys())

#**10.  Run the RAG Pipeline**

In [None]:
queries = [
    "Configuration File Syntax in linux?",
    "network configuration utility (ncat)",
    "Basic requirements and setup for linux?",
    "Why Guest Security Matters in linux"
]

for query in queries:
    response = rag_pipeline(pdf_paths, query)
    print('\n', '\n')
    print('*' * 100)
    print('Query: ', query)
    print('*'*100)
    print('\n')
    print('-'*100)
    print('Response: ', response)
    print('*' * 100)