To set up a Retrieval-Augmented Generation (RAG) system using your Mac with the Ollama platform, we’ll need to handle a few main components:

1.	LLM Model (using Ollama): We’ll call an open-source language model through Ollama for generation.
2.	Data Storage and Retrieval: This component indexes and retrieves relevant documents in response to a query.
3.	Application Logic: This ties together the retrieval and generation, handling inputs and outputs.

# 1. LLM Models (using Ollama)
- Download and install ollama from ollama.com
- Install some LLM models in your machine:
  ```ollama install <model_name>```
- List the existing models
- Consider a Model with Retrieval-Augmented Fine-Tuning (like LlamaIndex, GPT-Neo)

In [24]:
!ollama list

NAME                               ID              SIZE      MODIFIED     
wizardlm-uncensored:latest         886a369d74fc    7.4 GB    2 months ago    
wizard-vicuna-uncensored:latest    72fc3c2b99dc    3.8 GB    2 months ago    
llava:latest                       8dd30f6b0cb1    4.7 GB    4 months ago    
llama2-uncensored:latest           44040b922233    3.8 GB    4 months ago    
phi3:medium                        1e67dff39209    7.9 GB    4 months ago    
llama3:latest                      a6990ed6be41    4.7 GB    6 months ago    
llama2:latest                      78e26419b446    3.8 GB    6 months ago    


In [25]:
# !pip install requests

In [26]:
import requests

In [27]:
def call_ollama(prompt, model_name="llama2"):
    response = requests.post(
        "http://localhost:11434/v1/completions",
        json={"model": model_name, "prompt": prompt},
    )
    return response.json()['choices'][0]['text']

Testing call_ollama() function

In [28]:
prompt = "say hello world as an emoji!"
call_ollama(prompt)

'👋'

# 2. Loading .pdf and .docx to a Data Storage (using FAISS IndexFlatL2)

### Converting from .pdf to text

In [29]:
# !pip install pymupdf

In [30]:
import fitz  # PyMuPDF for PDFs

def load_pdf(file_path):
    text = ""
    with fitz.open(file_path) as doc:
        for page in doc:
            text += page.get_text("text")
    return text

### Converting from .docx to text (using python-docx)

In [31]:
# !pip install python-docx

In [32]:
import docx

def load_docx(file_path):
    doc = docx.Document(file_path)
    text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
    return text

### Converting text to embedding (using transformer from HuggingFace 'all-MiniLM-L6-v2' model)

In [33]:
# !pip install transformers torch ipywidgets

- sentence-transformers/all-MiniLM-L6-v2
- distilbert-base-uncased
- all-mpnet-base-v2


In [34]:
from transformers import AutoTokenizer, AutoModel
import torch
import numpy as np

model_name = "sentence-transformers/all-MiniLM-L6-v2"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

def embed_text(text):
    # Tokenize input text and convert to tensor
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) #returns PyTorch tensors
    
    # Forward pass through the model to get hidden states
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Get the embeddings from the last hidden layer
    hidden_states = outputs.last_hidden_state  # Shape: (1, seq_len, hidden_dim)
    
    # Average pool over the token embeddings to get a single vector
    embedding = hidden_states.mean(dim=1).squeeze().numpy()  # Shape: (hidden_dim,)
    
    return embedding

## Initialize the database (FAISS index for 768-dim embeddings)

In [35]:
# !pip install faiss-cpu

- IndexFlatL2
- IndexIVFFlat
- IndexHNSWFlat

In [43]:
import faiss
dimension = 384 #good for distilbert-base-uncased or sentence-transformers/all-MiniLM-L6-v2
index = faiss.IndexFlatL2(dimension)

### Loading files from a folder into the database (FAISS)

In [48]:
import os

def load_and_index_files(folder_path, index):
    documents = []
    for filename in os.listdir(folder_path):
        file_path = os.path.join(folder_path, filename)
        
        # Check file extension and load text
        if filename.endswith(".pdf"):
            text = load_pdf(file_path)
        elif filename.endswith(".docx"):
            text = load_docx(file_path)
        else:
            continue  # Skip non-supported files
        
        # Embed and add to index
        embedding = embed_text(text)
        index.add(np.array([embedding]))
        
        # Track document content and metadata (optional)
        documents.append({"filename": filename, "content": text, "embedding": embedding})
    
    return documents

In [61]:
folder_path = 'folder_example/'
documents = load_and_index_files(folder_path, index)
print(f"Total documents indexed: {len(documents)}")

Total documents indexed: 1


### Retrieve documents

In [65]:
def retrieve_documents(query_embedding, index, documents, top_k=3):
    # Search for the nearest neighbors
    distances, indices = index.search(np.array([query_embedding], dtype='float32'), top_k)
    
    # Print indices for debugging
    print(f"indices: {indices}, distances: {distances}")
    
    # Ensure indices is structured correctly and has expected dimensions
    if indices.size == 0 or indices[0][0] == -1:
        print("No matching documents found.")
        return []  # Return empty list if no results are found

    # Fetch top_k documents using indices
    return [documents[i] for i in indices[0] if i < len(documents)]

# Create the RAG Workflow

In [77]:
def rag_pipeline(query, model_name="llama2", top_k=3):
    query_embedding = embed_text(query) 

    # 2. Retrieve relevant documents
    relevant_docs = retrieve_documents(query_embedding, index, documents, top_k)
    
    # 3. Construct the prompt with file information
    context = "\n".join([f"File: {doc['filename']}\nContent:\n{doc['content']}" for doc in relevant_docs])
    prompt = f"""
    Context (from relevant documents):
    {context}
    
    Question: {query}
    
    Answer strictly based on the provided context. If the context does not include any information relevant to the question, respond exactly with "The context does not provide information on this topic."
    """

    # 4. Call the language model
    response = call_ollama(prompt, model_name)
    return response

# Testing everything

In [78]:
query = "What is Northwave cybersecurity"
response = rag_pipeline(query)
print(response)

indices: [[0 1 2]], distances: [[33.14148 33.14148 33.14148]]
Northwave cybersecurity refers to a company or organization that specializes in providing cybersecurity services and products. The name "Northwave" suggests that the company is focused on providing cutting-edge security solutions for businesses and organizations. From the context provided, it appears that Northwave may be involved in developing and implementing artificial intelligence (AI) models for cybersecurity purposes. However, without further information, it is impossible to determine the specific nature of Northwave's services or products.
