# **INM's Private AI - RAG Streamlit App**

This notebook provides a detailed, step-by-step explanation of a Retrieval-Augmented Generation (RAG) system built using Streamlit. The system:

- **Supports multiple document uploads** (PDF, DOCX, PPTX)
- **Extracts and cleans text** using libraries such as *pdfplumber*, *python-docx*, and *python-pptx*
- **Embeds text chunks** using a SentenceTransformer model
- **Performs retrieval** using FAISS
- **Generates answers** using a Hugging Face language model (with GPU support if available)
- **Post-processes** the model’s output to remove any `<think>...</think>` tags so that only the final answer is shown

The code is organized into sections with detailed explanations for each function.


### Cell 2: Importing Required Libraries


In [None]:
# **Standard and Numerical Libraries**
import os              # For file and path operations.
import re              # For regex-based text cleaning.
import tempfile        # For creating temporary files during document uploads.
import numpy as np     # For numerical operations, especially for handling embeddings.

# **Deep Learning and Device Management**
import torch           # PyTorch for model inference and device (GPU/CPU) management.

# **Streamlit for UI**
import streamlit as st # For building the web UI of the RAG system.

# **Document Processing Libraries**
import pdfplumber      # For advanced PDF text extraction.
from docx import Document  # For reading DOCX files.
from pptx import Presentation # For reading PPTX files.

# **Hugging Face Libraries**
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM  
# AutoConfig loads model configurations; AutoTokenizer tokenizes text; AutoModelForCausalLM is for language generation.

# **Sentence Transformers**
from sentence_transformers import SentenceTransformer  
# To generate dense vector embeddings from text.

# **FAISS for Similarity Search**
import faiss           # For efficient vector similarity search (runs on CPU).



# Explanation:

Purpose: We import all the necessary libraries to build our RAG system. 
Key Points: 
os, re, tempfile, numpy: Standard modules for file handling, regular expressions, temporary file management, and numerical operations. 
torch: For deep learning, handling models, and using GPUs. 
streamlit: Provides the web interface.  
pdfplumber, python-docx, python-pptx: To extract text from various document types.  
transformers and SentenceTransformer: To load and run our language and embedding models.  
faiss: For quickly retrieving similar text chunks.  

# Cell 3: Streamlit Configuration and Device Setupt


In [None]:
# **Streamlit Configuration:**
# Set up the Streamlit app's title and layout. This must be the first Streamlit call.
st.set_page_config(page_title="INM's Private AI", layout="wide")

# **Device Setup:**
# Determine whether to use GPU ("cuda") or CPU ("cpu") based on availability.
device = "cuda" if torch.cuda.is_available() else "cpu"
st.write(f"Using device: {device}")


## Explanation:

st.set_page_config: Configures the app's title and layout.  
Device Check: We check if a GPU is available using torch.cuda.is_available(). This is critical for performance with large models.  

# Cell 4: Model Selection

In [None]:
# **Model Selection Sidebar:**
st.sidebar.header("Model Selection")
model_option = st.sidebar.selectbox(
    "Select the model to use", 
    options=["DeepSeek-R1-Distill-Qwen-1.5B", "Gemma-3-1b-it"]
)

# **Setting Model Paths and Configurations:**
# Use placeholders for your directories. Replace <MODEL_PATH_DEEPSEEK> and <MODEL_PATH_GEMMA> with actual paths.
if model_option == "DeepSeek-R1-Distill-Qwen-1.5B":
    MODEL_PATH = r"<MODEL_PATH_DEEPSEEK>"  # e.g., "C:\Path\To\DeepSeek-R1-Distill-Qwen-1.5B"
    disable_swa = True  # Disable sliding window attention for DeepSeek-R1.
else:
    MODEL_PATH = r"<MODEL_PATH_GEMMA>"       # e.g., "C:\Path\To\gemma-3-1b-it"
    disable_swa = False

# **Embedding Model:**
# We use the same embedding model for both options.
EMBEDDING_MODEL = "sentence-transformers/all-mpnet-base-v2"


## Explanation:

Model Selection: The sidebar allows selection between two models.  
Configuration: Depending on the selection, a model path is set and a flag (disable_swa) is used to modify the configuration.  
Placeholders: Personal directories are replaced by placeholders.  

# Cell 5: Text Cleaning Helpers


In [None]:
def collapse_repeated_chars(text: str, threshold=3) -> str:
    """
    **Purpose:** Collapse runs of a character that repeat 'threshold' or more times.
    
    **Example:** "aaaaa" becomes "aa" if threshold=3.
    """
    pattern = rf"(.)\1{{{threshold-1},}}"
    return re.sub(pattern, r"\1\1", text)

def clean_text(text: str) -> str:
    """
    **Purpose:** Clean raw text by:
      - Removing unwanted lines (e.g., headers like "PAGE" or copyright symbols).
      - Collapsing repeated characters.
    
    **Explanation:** Splits the text into lines, filters out those containing "PAGE" or "©", then joins and collapses repeated characters.
    """
    lines = text.split("\n")
    cleaned_lines = []
    for line in lines:
        if "PAGE" in line.upper():
            continue
        if "©" in line:
            continue
        cleaned_lines.append(line)
    joined = "\n".join(cleaned_lines)
    return collapse_repeated_chars(joined)


# Explanation:

collapse_repeated_chars: Uses regular expressions to simplify sequences of repeated characters.     
clean_text: Filters out lines that are likely not useful and then applies the collapse function.     

# Cell 6: Post-Processing Function


In [None]:
def post_process_response(raw_answer: str) -> str:
    """
    **Purpose:** Clean the model's raw output by:
      1. Removing any text enclosed in `<think>...</think>` tags.
      2. Removing the "Final Answer:" prefix if it appears.
    
    **Explanation:** The function uses regex to detect and remove these tags, ensuring the final output contains only the answer.
    """
    # Remove any chain-of-thought markers within <think>...</think>
    processed = re.sub(r"<think>.*?</think>", "", raw_answer, flags=re.DOTALL).strip()
    # Remove the "Final Answer:" prefix (case-insensitive)
    processed = re.sub(r"(?i)^final answer:\s*", "", processed)
    return processed


## Explanation:

post_process_response: Strips out internal reasoning markers and any leading "Final Answer:" text so that only the final answer remains. 

# Cell 7: RAG System Class Definition


In [None]:
class DocumentRAG:
    def __init__(self):
        """
        **Initialization:**
        - Loads model configuration from the chosen model path.
        - Disables sliding window attention if specified.
        - Loads the tokenizer, language model (LLM) with half-precision for efficiency, and the SentenceTransformer embedding model.
        - Initializes a FAISS index for retrieval.
        - Sets up counters and chunking parameters.
        """
        config = AutoConfig.from_pretrained(MODEL_PATH)
        if disable_swa:
            config.use_sliding_window_attention = False

        self.tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
        self.llm = AutoModelForCausalLM.from_pretrained(
            MODEL_PATH,
            config=config,
            torch_dtype=torch.float16
        ).to(device)

        self.embedder = SentenceTransformer(EMBEDDING_MODEL, device=device)

        dim = self.embedder.get_sentence_embedding_dimension()
        self.index = faiss.IndexFlatL2(dim)
        self.doc_store = []

        self.input_tokens = 0
        self.output_tokens = 0

        self.chunk_size = 1000  # Length of each text chunk
        self.overlap = 100      # Overlap between consecutive chunks

    def _extract_text_pdf(self, file_path: str) -> str:
        """
        **Purpose:** Extracts text from a PDF using pdfplumber.
        **Explanation:** Iterates through each page and collects the text.
        """
        text = ""
        with pdfplumber.open(file_path) as pdf:
            for page in pdf.pages:
                page_text = page.extract_text() or ""
                text += page_text + "\n"
        return text

    def _extract_text_docx(self, file_path: str) -> str:
        """
        **Purpose:** Extracts text from a DOCX file.
        **Explanation:** Reads all paragraphs and joins their text.
        """
        doc = Document(file_path)
        return "\n".join(p.text for p in doc.paragraphs)

    def _extract_text_pptx(self, file_path: str) -> str:
        """
        **Purpose:** Extracts text from a PPTX file.
        **Explanation:** Iterates over slides and shapes to collect text.
        """
        prs = Presentation(file_path)
        text_runs = []
        for slide in prs.slides:
            for shape in slide.shapes:
                if hasattr(shape, "text"):
                    text_runs.append(shape.text)
        return "\n".join(text_runs)

    def _extract_text(self, file_path: str, extension: str) -> str:
        """
        **Purpose:** Determines the file type (PDF, DOCX, PPTX) and extracts text accordingly.
        **Explanation:** Calls the appropriate extraction function and then cleans the text.
        """
        extension = extension.lower()
        if extension.endswith(".pdf"):
            raw_text = self._extract_text_pdf(file_path)
        elif extension.endswith(".docx"):
            raw_text = self._extract_text_docx(file_path)
        elif extension.endswith(".pptx"):
            raw_text = self._extract_text_pptx(file_path)
        else:
            raise ValueError("Unsupported file format")
        return clean_text(raw_text)

    def _chunk_text(self, text: str) -> list:
        """
        **Purpose:** Splits the cleaned text into overlapping chunks.
        **Explanation:** This is useful to fit the text into the model's context window.
        """
        chunks = []
        stride = self.chunk_size - self.overlap
        for i in range(0, len(text), stride):
            chunks.append(text[i : i + self.chunk_size])
        return chunks

    def ingest_document(self, file_path: str, original_filename: str):
        """
        **Purpose:** Ingests a document by:
          - Extracting text based on file type.
          - Cleaning and chunking the text.
          - Generating embeddings for each chunk.
          - Storing the embeddings in the FAISS index and the raw chunks in the document store.
        """
        ext = os.path.splitext(original_filename)[1]
        text = self._extract_text(file_path, ext)
        chunks = self._chunk_text(text)
        embeddings = self.embedder.encode(chunks, convert_to_tensor=True)
        embeddings = embeddings.cpu().numpy().astype(np.float32)
        self.index.add(embeddings)
        self.doc_store.extend(chunks)
        self.input_tokens += sum(len(self.tokenizer.encode(chunk)) for chunk in chunks)

    def query(self, question: str, top_k: int = 3, max_length: int = 512) -> str:
        """
        **Purpose:** Answers a user query by:
          - Encoding the query.
          - Retrieving the top_k most similar text chunks from the FAISS index.
          - Building a prompt with the retrieved context.
          - Generating a response using the language model.
          - Post-processing the response to remove any unwanted markers.
        """
        if not self.doc_store:
            return "No documents have been uploaded yet. Please upload a file first."

        query_tensor = self.embedder.encode([question], convert_to_tensor=True)
        query_vec = query_tensor.cpu().numpy().astype(np.float32)
        distances, indices = self.index.search(query_vec, top_k)
        retrieved_chunks = [self.doc_store[i] for i in indices[0] if i < len(self.doc_store)]
        context = "\n".join(retrieved_chunks)

        # Build the prompt. Note that we include context and the question.
        prompt = (
            "You are a helpful assistant. Based on the context provided, give a concise final answer. "
            "Do not include your chain-of-thought or internal reasoning in the output.\n\n"
            f"Context:\n{context}\n\n"
            f"Question: {question}\n\n"
            "Final Answer:"
        )

        inputs = self.tokenizer(prompt, return_tensors="pt")
        inputs = {k: v.to(device) for k, v in inputs.items()}
        outputs = self.llm.generate(**inputs, max_length=max_length)
        raw_answer = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        final_answer = post_process_response(raw_answer)
        self.input_tokens += len(inputs["input_ids"][0])
        self.output_tokens += len(outputs[0])
        return final_answer


## Explanation:

**Initialization:**
The __init__ method loads the chosen language model, tokenizer, and embedding model onto the appropriate device (GPU if available). It also initializes a FAISS index and sets chunking parameters.

**Text Extraction Functions:**
These functions extract text from PDFs, DOCX, and PPTX files. They are then cleaned using our clean_text helper.

**Chunking:**
The _chunk_text method splits the cleaned text into overlapping chunks so that it fits within the model's context window.

**Ingestion:**
The ingest_document method extracts, cleans, chunks, embeds, and stores document text.

**Query:**
The query method retrieves relevant text chunks using FAISS, constructs a prompt, and uses the language model to generate an answer. The answer is post-processed to remove <think> tags and any leading "Final Answer:" text.



# Cell 8: Streamlit UI



In [None]:
# **Initializing the RAG System in Session State:**
if "rag" not in st.session_state:
    with st.spinner(f"Loading {model_option} model..."):
        try:
            st.session_state.rag = DocumentRAG()
        except Exception as e:
            st.error(f"Failed to initialize model: {str(e)}")
            st.stop()

# **Sidebar - Document Upload:**
with st.sidebar:
    st.header("Upload Your Documents")
    uploaded_files = st.file_uploader(
        "Upload PDFs, DOCX, or PPTX files",
        type=["pdf", "docx", "pptx"],
        accept_multiple_files=True
    )
    if uploaded_files:
        for file in uploaded_files:
            ext = os.path.splitext(file.name)[1]
            # Create a temporary file with the original extension
            with tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
                tmp.write(file.getvalue())
                st.session_state.rag.ingest_document(tmp.name, file.name)
            os.unlink(tmp.name)
        st.success(f"Processed {len(uploaded_files)} file(s)")
    st.markdown("---")
    st.markdown(f"**Input Tokens:** `{st.session_state.rag.input_tokens}`")
    st.markdown(f"**Output Tokens:** `{st.session_state.rag.output_tokens}`")

# **Main Chat Interface:**
st.title("🔎 INM's Private AI")

if "messages" not in st.session_state:
    st.session_state.messages = []

# **Display Chat History:**
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# **Chat Input and Response:**
if prompt := st.chat_input("Ask a question about your documents..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
    with st.spinner("Thinking..."):
        try:
            response = st.session_state.rag.query(prompt, max_length=1024)
        except Exception as e:
            response = f"Error: {str(e)}"
    with st.chat_message("assistant"):
        st.markdown(response)
    st.session_state.messages.append({"role": "assistant", "content": response})


## Explanation:

**Session State Initialization:**
We initialize our DocumentRAG instance and store it in Streamlit’s session state to avoid reloading on every interaction.

**File Upload Section (Sidebar):**
Users can upload multiple documents. Each document is temporarily saved (with its original extension preserved) and processed.

**Chat Interface:**
Displays past messages and provides an input field for new queries. When a query is submitted, it is processed, and the response is displayed.

**Token Counters:**
The sidebar shows input and output token counts (useful for monitoring model usage).



# Cell 9: How to Run the App (Markdown)


## **How to Run the App**

1. **Save this notebook as a Python script** (e.g., `rag_app.py`) or export it as one.
2. **Open your Command Prompt or Anaconda Prompt** and navigate to the directory containing your script.  
   Example:
   ```bash
   cd <YOUR_PROJECT_DIRECTORY>


3. Run the Streamlit app using:

  ** python -m streamlit run rag_app.py **


Access the app in your browser (usually at http://localhost:8501).