<a href="https://colab.research.google.com/github/gvgabison/Sample-LLM-/blob/main/sampleRAGLLMv2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# Install required libraries
!pip install sentence-transformers faiss-cpu transformers PyMuPDF

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting PyMuPDF
  Downloading pymupdf-1.25.4-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==

In [4]:

# Import libraries
from google.colab import files
import fitz  # PyMuPDF
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForCausalLM
import faiss
import torch
import numpy as np

# Upload PDF files to Google Colab
uploaded = files.upload()

# Function to extract text from a PDF file using PyMuPDF
def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        text += page.get_text()
    return text

# Extract text from all uploaded PDF files
pdf_texts = {}
for pdf_file in uploaded.keys():
    if pdf_file.endswith(".pdf"):
        pdf_texts[pdf_file] = extract_text_from_pdf(pdf_file)

# Combine the extracted text into a single string
full_text = " ".join(pdf_texts.values())

#Chunking process
# Split the text into chunks for document retrieval,
documents = [full_text[i:i+500] for i in range(0, len(full_text), 500)]  # 500-character chunks
print(f"Number of chunks loaded: {len(documents)}")

# Generate embeddings and build FAISS index
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
document_embeddings = embedding_model.encode(documents)
dimension = document_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(document_embeddings))

Saving HR-Manual.pdf to HR-Manual.pdf
Number of chunks loaded: 305


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
# Retrieval Process
# Load the pre-trained LLM
model_name = "facebook/opt-350m"  # Use a smaller model suitable for CPU
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cpu")  # Force CPU usage

#this one is for with GPU - paid version, fyi no more free tier for chatgpt as of Nov 1 2024
# Load LLaMA or OpenLLM
#model_name = "meta-llama/Llama-2-7b-hf"  # Example model (you may need GPU for larger models)
#tokenizer = AutoTokenizer.from_pretrained(model_name)
#model = AutoModelForCausalLM.from_pretrained(model_name, device_map="gpu")

# Function to retrieve documents
def retrieve_documents(query, top_k=2):
    query_embedding = embedding_model.encode([query])
    distances, indices = index.search(query_embedding, top_k)
    results = [documents[idx] for idx in indices[0]]
    return results

# Function to generate a response using RAG
def generate_rag_answer(query):
    # Retrieve relevant documents
    retrieved_docs = retrieve_documents(query)

    # Combine retrieved documents into context
    context = " ".join(retrieved_docs)

    # Construct the input prompt
    input_prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"

    # Tokenize and generate response
    inputs = tokenizer(input_prompt, return_tensors="pt").to("cpu")  # Force CPU usage
    outputs = model.generate(
        inputs['input_ids'],
        max_new_tokens=100,  # Control the output length
        num_return_sequences=1
    )
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return answer

# Function to log questions and answers
def log_to_file(question, answer, file_name="questions_answers_log.txt"):
    with open(file_name, "a") as log_file:
        log_file.write(f"Question: {question}\nAnswer: {answer}\n\n")

# Function to address the repeating answer
def remove_repetition(text):
    sentences = text.split(". ")
    unique_sentences = []
    for sentence in sentences:
        if sentence not in unique_sentences:
            unique_sentences.append(sentence)
    return ". ".join(unique_sentences)

# Interactive Loop for Asking Questions
while True:
    user_query = input("Ask a question about the PDF (type 'exit' to quit): ")
    if user_query.lower() == "exit":
        print("Exiting. Goodbye!")
        break
    answer = generate_rag_answer(user_query)
    answer = remove_repetition(answer)
    print(f"Answer: {answer}\n")

    # Log the question and answer to a file
    log_to_file(user_query, answer)

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/662M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Ask a question about the PDF (type 'exit' to quit): can i be demoted
Answer: Context:  extended, if deemed appropriate, by Division Director.  
 
3. Transferred or promoted employees who do not meet job requirements in their new position 
during introductory period, may be returned to their original job, if a vacancy exists, or be 
terminated at the discretion of the Organization. 
 
4. Upon completion of the introductory period, an employee enters the “regular” employment 
classification and may be eligible for organization sponsored benefits.   
 
 
HUMAN RESOURCES RECORDS 
Eff ily increases. Individuals hired on a seasonal basis are not eligible for benefits 
except those legally required (e.g., Workers’ Compensation (WC) and Social Security), and TCCAP 
designated holidays. 
 
Substitutes 
 
A substitute employee is an individual who is hired either full-time or part-time for a limited period (120 
days) under the following conditions: 
 
1. Substitute Teachers and Teacher Assistan