# üöÄ RAG Pipeline - Kaggle Backend with Ngrok

This notebook sets up a complete RAG backend with FastAPI and exposes it via ngrok.

**Prerequisites:**
1. Enable **Internet** in Kaggle notebook settings
2. Get your ngrok auth token from: https://dashboard.ngrok.com/get-started/your-authtoken

**Run cells in order!**

In [109]:
# CELL 1: Install Dependencies
!pip install fastapi uvicorn pyngrok python-multipart --quiet
!pip install torch transformers faiss-cpu rank_bm25 rouge_score sentence-transformers PyPDF2 --quiet
!pip install scikit-learn psutil nltk pydantic --quiet
!pip install torch transformers faiss-cpu rank_bm25 rouge_score sentence-transformers PyPDF2 --quiet
!pip install spacy

print("‚úÖ All dependencies installed!")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

‚úÖ All dependencies installed!


In [None]:
# CELL 2: Configure Ngrok
from pyngrok import ngrok, conf
from huggingface_hub import login
login("hf_token_here")

# ‚ö†Ô∏è REPLACE WITH YOUR NGROK TOKEN!
NGROK_AUTH_TOKEN = "ngrok_auth_token_here"

conf.get_default().auth_token = NGROK_AUTH_TOKEN
print("‚úÖ Ngrok configured successfully!")
print("üìù Don't have a token? Get one at: https://dashboard.ngrok.com/signup")

‚úÖ Ngrok configured successfully!
üìù Don't have a token? Get one at: https://dashboard.ngrok.com/signup


In [111]:
import re
import time
import numpy as np
import spacy

from PyPDF2 import PdfReader
from transformers import AutoTokenizer, AutoModel, AutoModelForSeq2SeqLM
from sklearn.metrics.pairwise import cosine_similarity
from rank_bm25 import BM25Okapi
import torch
import os
import pickle
from rouge_score import rouge_scorer
from sentence_transformers.util import cos_sim

try:
    from sentence_transformers import CrossEncoder, SentenceTransformer, util
except ImportError:
    print("Warning: sentence_transformers not available. Adaptive chunking methods may fail.")
try:
    import psutil
except ImportError:
    print("Warning: psutil not available. Resource usage metrics will be set to 0.0.")
    psutil = None
try:
    import faiss
except ImportError:
    print("Warning: faiss not available. FAISS retrieval disabled.")
    faiss = None
import nltk
from nltk.tokenize import sent_tokenize
from collections import Counter
import math
import matplotlib.pyplot as plt
import itertools
import json

nltk.download('punkt', quiet=True)

True

In [112]:
from io import BytesIO

In [113]:
# CELL 4: Paste Your RAG Code Here
# Copy all your chunking functions and OptimizedRAG class from your existing notebook
# For example:

def clean_text(text):
    text = re.sub(r'\s+', ' ', text.strip())
    text = re.sub(r'[^\x20-\x7E]', '', text)
    return text

def read_pdf_from_bytes(pdf_bytes):
    """Read PDF from bytes"""
    try:
        pdf_file = BytesIO(pdf_bytes)
        reader = PdfReader(pdf_file)
        pages = []
        for page in reader.pages:
            text = page.extract_text()
            if text:
                pages.append(text)
        return pages
    except Exception as e:
        print(f"Error: {e}")
        return []

# TODO: Add your chunking methods here
# - chunk_with_overlap
def chunk_with_overlap(text, chunk_size=150, overlap=70, chunk_limit=150):
    text = clean_text(text)
    tokens = text.split()
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        chunk = tokens[start:end]
        chunks.append(" ".join(chunk))
        start += chunk_size - overlap
        if start >= len(tokens):
            break
    return chunks[:chunk_limit] if chunk_limit else chunks

#adaptive overlap chunking
def adaptive_overlap_chunking(text_pages, chunk_size=300, min_overlap=30, max_overlap=80, chunk_limit=1000):
    try:
        model = SentenceTransformer('all-MiniLM-L6-v2')
    except:
        print("SentenceTransformer unavailable. Skipping adaptive chunking.")
        return []
    if isinstance(text_pages, list):
        text = ' '.join(str(page) for page in text_pages if page)
    else:
        text = str(text_pages)
    text = clean_text(text)
    paragraphs = re.split(r'\n\n+', text)
    chunks = []
    for para in paragraphs:
        if not para.strip():
            continue
        sentences = re.split(r'(?<=[.!?]) +', para)
        current = []
        current_word_count = 0
        for sent in sentences:
            sent = sent.strip()
            if not sent:
                continue
            sent_word_count = len(sent.split())
            if current_word_count + sent_word_count > chunk_size:
                if current:
                    chunk_text = " ".join(current)
                    chunks.append(chunk_text)
                    if len(chunks) >= 2:
                        try:
                            prev_chunk = chunks[-2]
                            curr_chunk = chunks[-1]
                            embeddings = model.encode([prev_chunk, curr_chunk], 
                                                    convert_to_tensor=True, show_progress_bar=False)
                            similarity = util.cos_sim(embeddings[0], embeddings[1]).item()
                            overlap_words_count = int(min_overlap + (max_overlap - min_overlap) * (1 - similarity))
                            overlap_words_count = max(min_overlap, min(overlap_words_count, max_overlap))
                            overlap_words_count = min(overlap_words_count, len(chunk_text.split()))
                        except Exception as e:
                            print(f"Similarity calculation failed: {e}")
                            overlap_words_count = min_overlap
                    else:
                        overlap_words_count = min_overlap
                    overlap_words = " ".join(chunk_text.split()[-overlap_words_count:])
                    current = [overlap_words, sent] if overlap_words else [sent]
                    current_word_count = len(overlap_words.split()) + sent_word_count if overlap_words else sent_word_count
                else:
                    current = [sent]
                    current_word_count = sent_word_count
            else:
                current.append(sent)
                current_word_count += sent_word_count
        if current:
            chunks.append(" ".join(current))
    return chunks[:chunk_limit] if chunk_limit else chunks


def improved_sentence_adaptive_chunking_wrt_sentence_density(
    text_pages,
    target_sentences=15,
    min_overlap=2,
    max_overlap=8,
    alpha=1.25,
    similarity_model='all-MiniLM-L6-v2',
    verbose=False
):
    try:
        model = SentenceTransformer(similarity_model)
    except:
        print("SentenceTransformer unavailable. Skipping improved_sentence_adaptive_wrt_sentence_density chunking.")
        return []
    if isinstance(text_pages, list):
        text = ' '.join(str(p) for p in text_pages if p)
    else:
        text = str(text_pages)
    text = re.sub(r'\s+', ' ', text).strip()
    sentences = sent_tokenize(text)
    sentences = [s.strip() for s in sentences if s.strip()]
    merged_sentences = []
    i = 0
    while i < len(sentences):
        if len(sentences[i].split()) < 6 and i + 1 < len(sentences):
            merged_sentences.append(sentences[i] + " " + sentences[i+1])
            i += 2
        else:
            merged_sentences.append(sentences[i])
            i += 1
    sentences = merged_sentences
    avg_words_per_sentence = np.mean([len(s.split()) for s in sentences])
    dynamic_target_sentences = max(8, int(150 / avg_words_per_sentence))
    words = [w.lower() for s in sentences for w in re.findall(r'\b\w+\b', s)]
    freq = Counter(words)
    important_keywords = {w for w, c in freq.items() if c >= 3 and len(w) > 3}
    chunks = []
    start = 0
    while start < len(sentences):
        end = min(start + dynamic_target_sentences, len(sentences))
        chunk = sentences[start:end]
        if end < len(sentences):
            for kw in important_keywords:
                if (sentences[end-1].lower().endswith(kw) or sentences[end].lower().startswith(kw)) and end + 1 < len(sentences):
                    end += 1
        chunk = sentences[start:end]
        chunks.append(' '.join(chunk))
        if end >= len(sentences):
            break
        next_start = end
        next_end = min(next_start + dynamic_target_sentences, len(sentences))
        next_chunk_preview = sentences[next_start:next_end]
        try:
            embeddings = model.encode([' '.join(chunk), ' '.join(next_chunk_preview)], convert_to_tensor=True)
            similarity = util.cos_sim(embeddings[0], embeddings[1]).item()
            overlap_sentences = int(min_overlap + (max_overlap - min_overlap) * (1 - similarity) ** alpha)
            overlap_sentences = max(min_overlap, min(overlap_sentences, max_overlap))
        except:
            overlap_sentences = min_overlap
        if overlap_sentences >= end - start:
            overlap_sentences = min_overlap
        start = end - overlap_sentences
    if len(chunks) > 1 and len(chunks[-1].split()) < (0.6 * dynamic_target_sentences * avg_words_per_sentence):
        chunks[-2] += " " + chunks[-1]
        chunks.pop()
    final_chunks = []
    seen = set()
    for chunk in chunks:
        lines = chunk.split('. ')
        unique_lines = []
        for l in lines:
            if l not in seen:
                unique_lines.append(l)
                seen.add(l)
        final_chunks.append('. '.join(unique_lines))
    return final_chunks

# - Gradient_chunking
def Gradient_chunking(
    text_pages,
    target_sentences=15,
    min_overlap=2,
    max_overlap=8,
    alpha=1.25,
    similarity_model='all-MiniLM-L6-v2',
    verbose=True
):
    """
    Enhanced Sentence-Adaptive Chunking with CADS + AOSG:
    ‚úÖ Content-Aware Dynamic Sizing (CADS): Adjusts chunk size based on entity/keyword density
    ‚úÖ Adaptive Overlap Smoothing via Semantic Gradient (AOSG): Overlap based on similarity gradient
    ‚úÖ Prevents out-of-index errors during keyword anchoring and overlaps
    ‚úÖ Merges tiny sentences (<6 words) with next
    ‚úÖ Handles small last chunk gracefully
    ‚úÖ Reduces redundancy (removes duplicate sentences across chunks)
    """

    # ‚úÖ Combine text
    if isinstance(text_pages, list):
        text = ' '.join(str(p) for p in text_pages if p)
    else:
        text = str(text_pages)
    text = re.sub(r'\s+', ' ', text).strip()

    # ‚úÖ Handle empty or invalid input
    if not text:
        if verbose:
            print("[INFO] Empty or invalid input provided.")
        return []

    # ‚úÖ Split into sentences
    try:
        sentences = sent_tokenize(text)
    except Exception as e:
        if verbose:
            print(f"[WARNING] Sentence tokenization failed: {e}. Falling back to regex-based splitting.")
        sentences = re.split(r'[.!?]+\s+', text)
    sentences = [s.strip() for s in sentences if s.strip()]

    # ‚úÖ Merge tiny sentences (<6 words) with next
    merged_sentences = []
    i = 0
    while i < len(sentences):
        if len(sentences[i].split()) < 6 and i + 1 < len(sentences):
            merged_sentences.append(sentences[i] + " " + sentences[i+1])
            i += 2
        else:
            merged_sentences.append(sentences[i])
            i += 1
    sentences = merged_sentences

    if verbose:
        print(f"[INFO] After merging tiny sentences: {len(sentences)} sentences")

    # ‚úÖ Calculate average sentence length density (words per sentence)
    avg_words_per_sentence = np.mean([len(s.split()) for s in sentences])
    base_target_sentences = max(8, int(150 / avg_words_per_sentence))  # Baseline ~150 words
    if verbose:
        print(f"[INFO] Avg words/sentence: {avg_words_per_sentence:.2f}, base target: {base_target_sentences} sentences")

    # ‚úÖ Extract important keywords for anchoring (frequency-based)
    words = [w.lower() for s in sentences for w in re.findall(r'\b\w+\b', s)]
    freq = Counter(words)
    important_keywords = {w for w, c in freq.items() if c >= 3 and len(w) > 3}

    # ‚úÖ Compute content density using entities and keywords (CADS)
    nlp = spacy.load("en_core_web_sm", disable=["parser", "lemmatizer"])  # Fast NER
    content_density = []
    for sent in sentences:
        doc = nlp(sent)
        entity_count = len([ent for ent in doc.ents if ent.label_ in ["PERSON", "ORG", "GPE"]])
        keyword_count = sum(1 for w in re.findall(r'\b\w+\b', sent.lower()) if w in important_keywords)
        density = entity_count + keyword_count * 0.5  # Weight entities higher
        content_density.append(density)
    
    # Normalize density to scale chunk sizes
    max_density = max(content_density) if content_density else 1
    chunk_sizes = [max(4, min(base_target_sentences * 2, int(base_target_sentences * (1 + d / max_density)))) 
                   for d in content_density]  # Scale between 0.5x and 2x base

    model = SentenceTransformer(similarity_model)
    sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
    
    chunks = []
    start = 0
    i = 0

    while start < len(sentences):
        # ‚úÖ CADS: Use content-aware chunk size
        chunk_size = chunk_sizes[min(i, len(chunk_sizes)-1)] if i < len(chunk_sizes) else base_target_sentences
        end = min(start + chunk_size, len(sentences))
        
        # ‚úÖ Ensure keywords are not split between chunks
        if end < len(sentences):
            for kw in important_keywords:
                if (sentences[end-1].lower().endswith(kw) or sentences[end].lower().startswith(kw)) and end + 1 < len(sentences):
                    end += 1

        chunk = sentences[start:end]
        chunks.append(' '.join(chunk))

        if end >= len(sentences):
            break

        # ‚úÖ AOSG: Compute semantic gradient for overlap
        next_start = end
        next_end = min(next_start + base_target_sentences, len(sentences))
        next_chunk_preview = sentences[next_start:next_end]

        # Compute similarities over a window around the boundary
        window_size = max_overlap * 2  # Look back/forward
        sim_window_start = max(0, end - window_size)
        sim_window_end = min(len(sentences) - 1, end + window_size)
        similarities = []
        for j in range(sim_window_start, sim_window_end):
            if j + 1 < len(sentences):
                sim = cos_sim(sentence_embeddings[j], sentence_embeddings[j + 1]).item()
                similarities.append(sim)
        
        # Gradient: Rate of similarity change
        gradients = [abs(similarities[j] - similarities[j - 1]) for j in range(1, len(similarities))]
        if gradients:
            avg_gradient = np.mean(gradients)
            # Scale overlap by gradient: higher gradient (rapid change) -> larger overlap
            #overlap_sentences = int(min_overlap + (max_overlap - min_overlap) * min(avg_gradient / 0.5, 1))
            overlap_sentences = int(min_overlap + (max_overlap - min_overlap) * min(avg_gradient / 0.5, 1) )

        else:
            overlap_sentences = min_overlap

        overlap_sentences = max(min_overlap, min(overlap_sentences, max_overlap))
        
        # ‚úÖ Prevent negative or too large overlaps
        if overlap_sentences >= end - start:
            overlap_sentences = min_overlap

        start = end - overlap_sentences
        i += 1

        if verbose:
            print(f"[INFO] Chunk {len(chunks)}: {chunk_size} sentences (density: {content_density[start]:.2f}), overlap {overlap_sentences} (gradient: {avg_gradient:.3f} if defined)")

    # ‚úÖ Fix last small chunk (<60% of target)
    if len(chunks) > 1 and len(chunks[-1].split()) < (0.6 * base_target_sentences * avg_words_per_sentence):
        chunks[-2] += " " + chunks[-1]
        chunks.pop()
        if verbose:
            print("[INFO] Last chunk merged (too small)")

    # ‚úÖ Post-process redundancy
    final_chunks = []
    seen = set()
    for chunk in chunks:
        lines = chunk.split('. ')
        unique_lines = []
        for l in lines:
            if l not in seen:
                unique_lines.append(l)
                seen.add(l)
        final_chunks.append('. '.join(unique_lines))

    if verbose:
        print(f"[DONE] Total chunks: {len(final_chunks)}")

    return final_chunks

# - Gradient_chunking_final

def Gradient_chunking_final(
    text_pages,
    target_sentences=15,
    min_overlap=2,
    max_overlap=8,
    alpha=1.25,
    similarity_model='all-MiniLM-L6-v2',
    verbose=True
):
    """
    Enhanced Sentence-Adaptive Chunking with Adaptive Overlap Smoothing via Semantic Gradient (AOSG):
    ‚úÖ Uses average sentence length density to adjust chunk size dynamically
    ‚úÖ Prevents out-of-index errors during keyword anchoring and overlaps
    ‚úÖ Smart overlap based on semantic similarity gradient (novel: AOSG)
    ‚úÖ Merges tiny sentences (<6 words) with next
    ‚úÖ Handles small last chunk gracefully
    ‚úÖ Reduces redundancy (removes duplicate sentences across chunks)
    """

    # ‚úÖ Combine text
    if isinstance(text_pages, list):
        text = ' '.join(str(p) for p in text_pages if p)
    else:
        text = str(text_pages)
    text = re.sub(r'\s+', ' ', text).strip()

    # ‚úÖ Split into sentences
    sentences = sent_tokenize(text)
    sentences = [s.strip() for s in sentences if s.strip()]

    # ‚úÖ Merge tiny sentences (<6 words) with next
    merged_sentences = []
    i = 0
    while i < len(sentences):
        if len(sentences[i].split()) < 6 and i + 1 < len(sentences):
            merged_sentences.append(sentences[i] + " " + sentences[i+1])
            i += 2
        else:
            merged_sentences.append(sentences[i])
            i += 1
    sentences = merged_sentences

    if verbose:
        print(f"[INFO] After merging tiny sentences: {len(sentences)} sentences")

    # ‚úÖ Calculate average sentence length density (words per sentence)
    avg_words_per_sentence = np.mean([len(s.split()) for s in sentences])
    dynamic_target_sentences = max(8, int(150 / avg_words_per_sentence))  # aim for ~150 words per chunk
    if verbose:
        print(f"[INFO] Avg words/sentence: {avg_words_per_sentence:.2f}, dynamic target: {dynamic_target_sentences} sentences")

    # ‚úÖ Extract important keywords for anchoring (simple frequency-based)
    words = [w.lower() for s in sentences for w in re.findall(r'\b\w+\b', s)]
    freq = Counter(words)
    important_keywords = {w for w, c in freq.items() if c >= 3 and len(w) > 3}

    model = SentenceTransformer(similarity_model)
    
    # ‚úÖ Precompute sentence embeddings for efficiency
    sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
    
    chunks = []
    start = 0

    while start < len(sentences):
        end = min(start + dynamic_target_sentences, len(sentences))
        
        # ‚úÖ Ensure keywords are not split between chunks (anchor adjustment)
        if end < len(sentences):
            for kw in important_keywords:
                if (sentences[end-1].lower().endswith(kw) or sentences[end].lower().startswith(kw)) and end + 1 < len(sentences):
                    end += 1

        chunk = sentences[start:end]
        chunks.append(' '.join(chunk))

        if end >= len(sentences):
            break

        # ‚úÖ Compute semantic gradient for smarter overlap (AOSG)
        next_start = end
        next_end = min(next_start + dynamic_target_sentences, len(sentences))
        next_chunk_preview = sentences[next_start:next_end]

        # Compute similarities over a window around the boundary
        window_size = max_overlap * 2  # Look back/forward
        sim_window_start = max(0, end - window_size)
        sim_window_end = min(len(sentences) - 1, end + window_size)
        similarities = []
        for i in range(sim_window_start, sim_window_end):
            if i + 1 < len(sentences):
                sim = cos_sim(sentence_embeddings[i], sentence_embeddings[i + 1]).item()
                similarities.append(sim)
        
        # Gradient: Rate of similarity change
        gradients = [abs(similarities[i] - similarities[i - 1]) for i in range(1, len(similarities))]
        if gradients:
            avg_gradient = np.mean(gradients)
            # Scale overlap by gradient: higher gradient (rapid change) -> larger overlap
            overlap_sentences = int(min_overlap + (max_overlap - min_overlap) * min(avg_gradient / 0.5, 1))
        else:
            overlap_sentences = min_overlap

        overlap_sentences = max(min_overlap, min(overlap_sentences, max_overlap))
        
        # ‚úÖ Prevent negative or too large overlaps
        if overlap_sentences >= end - start:
            overlap_sentences = min_overlap

        start = end - overlap_sentences

        if verbose:
            print(f"[INFO] Chunk {len(chunks)}: overlap {overlap_sentences} sentences (gradient: {avg_gradient:.3f} if defined)")

    # ‚úÖ Fix last small chunk (<60% of target)
    if len(chunks) > 1 and len(chunks[-1].split()) < (0.6 * dynamic_target_sentences * avg_words_per_sentence):
        chunks[-2] += " " + chunks[-1]
        chunks.pop()
        if verbose:
            print("[INFO] Last chunk merged (too small)")

    # ‚úÖ Post-process redundancy (remove excessive duplicate sentences)
    final_chunks = []
    seen = set()
    for chunk in chunks:
        lines = chunk.split('. ')
        unique_lines = []
        for l in lines:
            if l not in seen:
                unique_lines.append(l)
                seen.add(l)
        final_chunks.append('. '.join(unique_lines))

    if verbose:
        print(f"[DONE] Total chunks: {len(final_chunks)}")

    return final_chunks


# - etc.

# TODO: Add your evaluate_chunk_quality function here
def evaluate_chunk_quality(chunks, text):
    if isinstance(text, list):
        text = ' '.join(str(page) for page in text if page)
    text = str(text).strip()
    try:
        model = SentenceTransformer('all-MiniLM-L6-v2')
        chunk_embeddings = model.encode(chunks, convert_to_tensor=True, show_progress_bar=False)
        text_embedding = model.encode([text], convert_to_tensor=True, show_progress_bar=False)[0]
        coherence_scores = []
        for i in range(len(chunks) - 1):
            sim = util.cos_sim(chunk_embeddings[i], chunk_embeddings[i + 1]).item()
            coherence_scores.append(sim)
        avg_coherence = np.mean(coherence_scores) if coherence_scores else 0.0
        similarities = util.cos_sim(chunk_embeddings, text_embedding.unsqueeze(0))
        first_words = set(text.lower().split()[:10])
        term_presence = sum(1 for chunk in chunks if any(word in chunk.lower() for word in first_words)) / len(chunks) if chunks else 0.0
        avg_context_preservation = term_presence
    except:
        coherence_scores = []
        for i in range(len(chunks) - 1):
            set1 = set(chunks[i].lower().split())
            set2 = set(chunks[i + 1].lower().split())
            sim = len(set1 & set2) / len(set1 | set2) if set1 | set2 else 0
            coherence_scores.append(sim)
        avg_coherence = np.mean(coherence_scores) if coherence_scores else 0.0
        first_words = set(text.lower().split()[:10])
        term_presence = sum(1 for chunk in chunks if any(word in chunk.lower() for word in first_words)) / len(chunks) if chunks else 0.0
        avg_context_preservation = term_presence
    chunk_lengths = [len(chunk.split()) for chunk in chunks]
    avg_chunk_size = np.mean(chunk_lengths) if chunk_lengths else 0.0
    std_chunk_size = np.std(chunk_lengths) if chunk_lengths else 0.0
    size_consistency = std_chunk_size / avg_chunk_size if avg_chunk_size > 0 else 0
    redundancy_score = 0
    for i in range(len(chunks) - 1):
        set1 = set(chunks[i].split())
        set2 = set(chunks[i + 1].split())
        if set1:
            redundancy_score += len(set1 & set2) / len(set1)
    redundancy_score /= (len(chunks) - 1) if len(chunks) > 1 else 0
    original_words = set(text.split())
    chunk_words = set(" ".join(chunks).split())
    coverage = len(chunk_words) / len(original_words) if original_words else 0
    compression_ratio = len(" ".join(chunks)) / len(text) if len(text) > 0 else 0
    try:
        sentences = [s.strip() for s in re.split(r'(?<=[.!?]) +', text) if s.strip()]
        sentence_embeddings = model.encode(sentences, convert_to_tensor=True)
        semantic_scores = []
        for sent_emb in sentence_embeddings:
            sim = util.cos_sim(sent_emb, chunk_embeddings).max().item()
            semantic_scores.append(sim)
        semantic_coverage = np.mean(semantic_scores) if semantic_scores else 0
    except:
        semantic_coverage = 0.0
    def shannon_entropy(text_segment):
        words = text_segment.split()
        if not words:
            return 0
        counts = Counter(words)
        total = len(words)
        entropy = -sum((count / total) * math.log2(count / total) for count in counts.values())
        return entropy
    info_density_scores = [shannon_entropy(chunk) for chunk in chunks]
    avg_info_density = np.mean(info_density_scores) if info_density_scores else 0
    weights = {
        "coherence": 0.25,
        "context_preservation": 0.25,
        "coverage": 0.20,
        "semantic_coverage": 0.20,
        "redundancy": -0.10
    }
    weighted_score = (
        avg_coherence * weights["coherence"] +
        avg_context_preservation * weights["context_preservation"] +
        coverage * weights["coverage"] +
        semantic_coverage * weights["semantic_coverage"] +
        redundancy_score * weights["redundancy"]
    )
    return {
        "avg_coherence": avg_coherence,
        "context_preservation": avg_context_preservation,
        "avg_chunk_size": avg_chunk_size,
        "size_consistency": size_consistency,
        "redundancy": redundancy_score,
        "coverage": coverage,
        "compression_ratio": compression_ratio,
        "semantic_coverage": semantic_coverage,
        "avg_information_density": avg_info_density,
        "weighted_score": weighted_score
    }


# TODO: Add your OptimizedRAG class here
class OptimizedRAG:
    def __init__(self, documents, chunk_size=300, overlap=50, top_k=10, embed_model="all-MiniLM-L6-v2", llm_model="google/flan-t5-large", device=None,
                 use_bm25=True, use_cosine=True, use_faiss=False, use_quantization=False, use_embedding_cache=True, use_batch_embedding=True,
                 use_recursive_chunking=True, generation_mode="beam", rerank_enabled=True, rerank_model="cross-encoder/ms-marco-MiniLM-L-6-v2",
                 rerank_top_k=8, chunk_limit=1000, chunking_method="adaptive"):
        self.device = device or "cpu"
        if not documents:
            raise ValueError("No documents provided.")
        chunk_funcs = {
            "fixed": lambda x: chunk_with_overlap(x, chunk_size, overlap, chunk_limit),
            "adaptive": lambda x: adaptive_overlap_chunking(x, chunk_size, overlap, overlap * 3, chunk_limit),
            "improved_sentence_adaptive_chunking_wrt_sentence_density": lambda x: improved_sentence_adaptive_chunking_wrt_sentence_density(x, target_sentences=chunk_size//20, min_overlap=max(1, overlap//20), max_overlap=max(2, overlap//10)),
             "Gradient_chunking": lambda x: Gradient_chunking(x, target_sentences=chunk_size//20, min_overlap=max(1, overlap//20), max_overlap=max(2, overlap//10),alpha = 1.25),
             "Gradient_chunking_final": lambda x: Gradient_chunking_final(x, target_sentences=chunk_size//20, min_overlap=max(1, overlap//20), max_overlap=max(2, overlap//10))


        }
        self.chunking_method = chunking_method
        all_chunks = []
        for doc in documents:
            chunks = chunk_funcs[chunking_method](doc)
            all_chunks.extend(chunks)
        self.chunks = all_chunks
        if not self.chunks:
            raise ValueError("No chunks generated from documents.")
        self.sentences = []
        for chunk in self.chunks:
            sentences = re.split(r'(?<=[.!?]) +', chunk)
            cleaned = [re.sub(r"http\S+|www\.\S+", "", s).strip() for s in sentences if len(s.strip()) > 20]
            self.sentences.extend(cleaned)
        self.use_bm25 = use_bm25
        if use_bm25:
            tokenized_chunks = [c.split() for c in self.chunks]
            self.bm25 = BM25Okapi(tokenized_chunks)
        self.use_cosine = use_cosine
        self.use_faiss = use_faiss and faiss is not None
        self.embed_models = {
            "all-MiniLM-L6-v2": "sentence-transformers/all-MiniLM-L6-v2",
            "all-mpnet-base-v2": "sentence-transformers/all-mpnet-base-v2"
        }
        try:
            self.tokenizer = AutoTokenizer.from_pretrained(self.embed_models.get(embed_model, "sentence-transformers/all-MiniLM-L6-v2"))
            self.model = AutoModel.from_pretrained(self.embed_models.get(embed_model, "sentence-transformers/all-MiniLM-L6-v2")).to(self.device)
        except:
            print("Warning: Embedding model loading failed. Using dummy embeddings.")
            self.chunk_embeddings = np.random.rand(len(self.chunks), 384)
        self.use_embedding_cache = use_embedding_cache
        cache_file = f"embeddings_{embed_model}.pkl"
        self.chunk_embeddings = None
        if use_embedding_cache and os.path.exists(cache_file):
            try:
                with open(cache_file, "rb") as f:
                    self.chunk_embeddings = pickle.load(f)
            except Exception as e:
                print(f"Error loading cache: {e}. Computing new embeddings.")
        if self.chunk_embeddings is None:
            self.chunk_embeddings = self._encode_chunks(use_batch_embedding)
            if use_embedding_cache:
                try:
                    with open(cache_file, "wb") as f:
                        pickle.dump(self.chunk_embeddings, f)
                except Exception as e:
                    print(f"Error saving cache: {e}")
        if self.use_faiss:
            try:
                dimension = self.chunk_embeddings.shape[1]
                self.index = faiss.IndexFlatL2(dimension)
                self.index.add(self.chunk_embeddings)
            except Exception as e:
                print(f"FAISS index initialization failed: {e}. Disabling FAISS.")
                self.use_faiss = False
                self.use_cosine = True
        self.llm_models = {
            "google/flan-t5-large": "google/flan-t5-large"
        }
        try:
            self.llm_tokenizer = AutoTokenizer.from_pretrained(self.llm_models.get(llm_model, "google/flan-t5-large"))
            self.llm_model = AutoModelForSeq2SeqLM.from_pretrained(self.llm_models.get(llm_model, "google/flan-t5-large")).to(self.device)
        except:
            print("Warning: LLM model loading failed. Generation may fail.")
        self.use_quantization = use_quantization
        if use_quantization and self.device == "cpu":
            try:
                self.llm_model = torch.quantization.quantize_dynamic(self.llm_model, {torch.nn.Linear}, dtype=torch.qint8)
            except Exception as e:
                print(f"Quantization failed: {e}. Proceeding without quantization.")
                self.use_quantization = False
        self.generation_mode = generation_mode
        self.top_k = top_k
        self.rerank_enabled = rerank_enabled
        if rerank_enabled:
            self.rerank_model = rerank_model
            self.rerank_top_k = rerank_top_k
            try:
                self.reranker = CrossEncoder(rerank_model)
            except:
                print("Warning: CrossEncoder unavailable. Disabling reranking.")
                self.rerank_enabled = False

    def _encode_chunks(self, use_batch_embedding):
        embeddings = []
        try:
            with torch.no_grad():
                if use_batch_embedding:
                    batch_size = 16
                    for i in range(0, len(self.chunks), batch_size):
                        batch = self.chunks[i:i + batch_size]
                        inputs = self.tokenizer(batch, return_tensors="pt", truncation=True, padding=True).to(self.device)
                        outputs = self.model(**inputs)
                        emb = outputs.last_hidden_state.mean(dim=1).cpu().numpy()
                        embeddings.append(emb)
                    embeddings = np.vstack(embeddings)
                else:
                    for text in self.chunks:
                        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(self.device)
                        outputs = self.model(**inputs)
                        emb = outputs.last_hidden_state.mean(dim=1).squeeze().cpu().numpy()
                        embeddings.append(emb)
                    embeddings = np.array(embeddings)
        except:
            print("Embedding failed. Using dummy embeddings.")
            embeddings = np.random.rand(len(self.chunks), 384)
        return embeddings

    def retrieve(self, query):
        start_time = time.time()
        try:
            with torch.no_grad():
                inputs = self.tokenizer(query, return_tensors="pt", truncation=True, padding=True).to(self.device)
                outputs = self.model(**inputs)
                query_emb = outputs.last_hidden_state.mean(dim=1).squeeze().cpu().numpy()
        except:
            print("Query embedding failed. Using dummy query embedding.")
            query_emb = np.random.rand(384)
        top_chunks = []
        bm25_chunks = []
        if self.use_bm25:
            bm25_scores = self.bm25.get_scores(query.split())
            bm25_indices = np.argsort(bm25_scores)[-self.top_k:][::-1]
            bm25_chunks = [self.chunks[i] for i in bm25_indices]
        if self.use_cosine:
            sims = cosine_similarity(query_emb.reshape(1, -1), self.chunk_embeddings)[0]
            top_indices = np.argsort(sims)[-self.top_k:][::-1]
            top_chunks = [self.chunks[i] for i in top_indices]
        if self.use_faiss:
            try:
                _, indices = self.index.search(query_emb.reshape(1, -1), self.top_k)
                top_chunks = [self.chunks[i] for i in indices[0]]
            except Exception as e:
                print(f"FAISS search failed: {e}. Falling back to cosine similarity.")
                self.use_faiss = False
                self.use_cosine = True
                sims = cosine_similarity(query_emb.reshape(1, -1), self.chunk_embeddings)[0]
                top_indices = np.argsort(sims)[-self.top_k:][::-1]
                top_chunks = [self.chunks[i] for i in top_indices]
        if (self.use_cosine or self.use_faiss) and self.use_bm25:
            merged = list(dict.fromkeys(top_chunks + bm25_chunks))
            retrieved = merged[:self.top_k]
        elif self.use_bm25:
            retrieved = bm25_chunks
        else:
            retrieved = top_chunks
        if self.rerank_enabled:
            try:
                pairs = [[query, chunk] for chunk in retrieved]
                scores = self.reranker.predict(pairs)
                reranked_indices = np.argsort(scores)[::-1][:self.rerank_top_k]
                retrieved = [retrieved[i] for i in reranked_indices]
                query_terms = set(query.lower().split())
                retrieved = [chunk for chunk in retrieved if any(term in chunk.lower() for term in query_terms) and 
                             scores[reranked_indices[retrieved.index(chunk)]] > 0.25]
            except:
                print("Reranking failed. Using original retrieved chunks.")
        return retrieved

    def clean_chunk(self, chunk):
        cleaned = re.sub(r'[\x00-\x1f\x7f-\xff]', ' ', chunk)
        cleaned = re.sub(r'\\n|\\uf0b7|\s+', ' ', cleaned)
        cleaned = re.sub(r'\s+', ' ', ''.join(c for c in cleaned if not c.isspace() or c == ' '))
        cleaned = cleaned.strip()
        return cleaned

    def generate_with_llm(self, query, context):
        if not context:
            context = "No relevant context available."
        max_context_length = 1024
        query_terms = set(query.lower().split())
        sentences = re.split(r'(?<=[.!?]) +', context)
        filtered_sentences = sentences
        if not filtered_sentences:
            filtered_sentences = sentences
        try:
            combined_text = " ".join(filtered_sentences)
            sentences = [s.strip() for s in combined_text.split(".") if len(s) > 5]
            summarized_context = "".join(sentences)[:max_context_length]
            summarized_context = self.clean_chunk(summarized_context)
        except Exception as e:
            print(f"Extractive summarization failed: {e}. Using original context.")
            summarized_context = " ".join(filtered_sentences)[:max_context_length]
        prompt = f"""
        You are an expert assistant. Answer '{query}' in 2-3 concise sentences, focusing only on the topic. Explain its purpose and process briefly. Do not repeat the context word-for-word; synthesize a unique explanation. Limit to 50 words.
        Context: {summarized_context}
        Question: {query}
        Answer:
        """
        inputs = self.llm_tokenizer(prompt, return_tensors="pt", truncation=True, padding=True, max_length=1024).to(self.device)
        kwargs = {
            "max_new_tokens": 75,
            "min_length": 10,
            "num_beams": 5,
            "no_repeat_ngram_size": 3,
            "temperature": 0.6,
            "top_k": 40,
            "top_p": 0.90,
            "do_sample": True,
            "early_stopping": True
        }
        try:
            output_ids = self.llm_model.generate(**inputs, **kwargs)
            response = self.llm_tokenizer.decode(output_ids[0], skip_special_tokens=True)
        except Exception as e:
            print(f"LLM generation failed: {e}. Returning fallback response.")
            response = "Error processing request. Please try again."
        return response

    def generate_answer(self, query, context_chunks):
        if not context_chunks:
            return "No relevant content found."
        cleaned_chunks = [self.clean_chunk(chunk) for chunk in context_chunks]
        combined_context = " ".join(cleaned_chunks)
        try:
            sentences = [s.strip() for s in combined_context.split(".") if len(s) > 5]
            context = "".join(sentences)[:2048]
        except Exception as e:
            print(f"Extractive summarization failed: {e}. Using cleaned chunks.")
            context = combined_context[:2048]
        return self.generate_with_llm(query, context)

    def evaluate(self, query, ground_truth=None):
        start_time = time.time()
        memory_usage = 0.0
        cpu_usage = 0.0
        if psutil:
            process = psutil.Process()
            memory_start = process.memory_info().rss / 1024 / 1024
            cpu_times = []
            def monitor_cpu():
                try:
                    cpu_times.append(process.cpu_percent(interval=0.1))
                except:
                    pass
            import threading
            cpu_monitor = threading.Thread(target=monitor_cpu)
            cpu_monitor.daemon = True
            cpu_monitor.start()
        retrieved = self.retrieve(query)
        response = self.generate_answer(query, retrieved)
        latency = time.time() - start_time
        if psutil:
            memory_end = process.memory_info().rss / 1024 / 1024
            memory_usage = max(memory_end - memory_start, 0.0)
            cpu_usage = np.mean(cpu_times) if cpu_times else 0.0
            cpu_monitor.join(timeout=0.1)
        quality_metrics = evaluate_chunk_quality(self.chunks, " ".join(self.chunks))
        result = {
            "query": query,
            "response": response,
            "latency": latency,
            "memory_usage": memory_usage,
            "cpu_usage": cpu_usage,
            "chunk_coherence": quality_metrics["avg_coherence"],
            "chunk_context_preservation": quality_metrics["context_preservation"],
            "avg_chunk_size": quality_metrics["avg_chunk_size"],
            "size_consistency": quality_metrics["size_consistency"],
            "redundancy": quality_metrics["redundancy"],
            "coverage": quality_metrics["coverage"],
            "compression_ratio": quality_metrics["compression_ratio"],
            "semantic_coverage": quality_metrics["semantic_coverage"],
            "avg_information_density": quality_metrics["avg_information_density"],
            "weighted_score": quality_metrics["weighted_score"]
        }
        if ground_truth:
            scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
            scores = scorer.score(ground_truth, response)
            result["rouge1"] = scores['rouge1'].fmeasure
            result["rougeL"] = scores['rougeL'].fmeasure
        return result


print("‚úÖ RAG functions loaded! (Make sure you pasted your code above)")

‚úÖ RAG functions loaded! (Make sure you pasted your code above)


In [114]:
# CELL 5: FastAPI Setup
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import List
import uvicorn
import nest_asyncio

nest_asyncio.apply()

app = FastAPI(title="RAG Pipeline API")

# CORS
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "http://localhost:3000",              # your local React dev server
        "http://localhost:5173",              # if using Vite instead of CRA
        "http://127.0.0.1:3000",              # sometimes needed
        "*"                                   # ‚Üê temporary wildcard for dev (less secure, but very convenient while testing)
    ],
    allow_credentials=True,
    allow_methods=["*"],                      # allow GET, POST, PUT, DELETE, OPTIONS, etc.
    allow_headers=["*"],                      # allow Content-Type, Authorization, etc.
)


# Models
class RAGConfig(BaseModel):
    chunkSize: int = 500
    overlap: int = 50
    method: str = "gradient"
    useBM25: bool = True
    useCosine: bool = True
    useFaiss: bool = False
    rerankEnabled: bool = True
    topK: int = 4

class ProcessRequest(BaseModel):
    text: str
    query: str
    config: RAGConfig

class ChunkData(BaseModel):
    id: int
    content: str

class MetricsData(BaseModel):
    num_chunks: int
    weighted_score: float
    latency: float
    avg_coherence: float
    context_preservation: float
    avg_information_density: float
    coverage: float
    semantic_coverage: float
    cpu_usage: float
    memory_usage: float

class RAGResponse(BaseModel):
    response: str
    chunks: List[ChunkData]
    retrievedChunks: List[int]
    metrics: MetricsData

@app.get("/")
async def root():
    return {"message": "RAG Pipeline API", "status": "healthy"}

@app.post("/upload_document")
async def upload_document(file: UploadFile = File(...)):
    try:
        content = await file.read()
        if not file.filename.endswith('.pdf'):
            raise HTTPException(400, "Only PDF files")
        
        documents = read_pdf_from_bytes(content)
        if not documents:
            raise HTTPException(400, "Could not extract text")
        
        return {
            "filename": file.filename,
            "extracted_text": " ".join(documents),
            "message": "Success"
        }
    except Exception as e:
        raise HTTPException(500, str(e))

@app.post("/process", response_model=RAGResponse)
async def process_rag(request: ProcessRequest):
    try:
        start_time = time.time()
        process = psutil.Process()
        memory_start = process.memory_info().rss / 1024 / 1024
        
        # Method mapping
        method_map = {
            "fixed": "fixed",
            "sentence_density":"improved_sentence_adaptive_chunking_wrt_sentence_density",
            "adaptive":"adaptive",
            "gradient": "Gradient_chunking",
            "gradient_final": "Gradient_chunking_final"
        }
        method = method_map.get(request.config.method)
        if method is None:
            raise ValueError(f"Unknown chunking method: {request.config.method}")
        # Build config
        rag_config = {
            "chunking_method": method_map.get(request.config.method),
            "chunk_size": request.config.chunkSize,
            "overlap": request.config.overlap,
            "use_bm25": request.config.useBM25,
            "use_cosine": request.config.useCosine,
            "use_faiss": request.config.useFaiss,
            "rerank_enabled": request.config.rerankEnabled,
            "top_k": request.config.topK,
        }
        
        # Initialize RAG
        rag = OptimizedRAG([request.text], **rag_config)
        
        # Retrieve and generate
        retrieved = rag.retrieve(request.query)
        response_text = rag.generate_answer(request.query, retrieved)
        
        # Metrics
        latency = time.time() - start_time
        memory_end = process.memory_info().rss / 1024 / 1024
        memory_usage = max(memory_end - memory_start, 0.0)
        cpu_usage = process.cpu_percent(interval=0.1)
        
        quality = evaluate_chunk_quality(rag.chunks, request.text)
        
        # Find retrieved indices
        retrieved_indices = []
        for rc in retrieved:
            for idx, chunk in enumerate(rag.chunks):
                if chunk == rc:
                    retrieved_indices.append(idx)
                    break
        
        return RAGResponse(
            response=response_text,
            chunks=[ChunkData(id=i, content=c) for i, c in enumerate(rag.chunks)],
            retrievedChunks=retrieved_indices,
            metrics=MetricsData(
                num_chunks=len(rag.chunks),
                weighted_score=quality["weighted_score"],
                latency=latency * 1000,
                avg_coherence=quality["avg_coherence"],
                context_preservation=quality["context_preservation"],
                avg_information_density=quality["avg_information_density"],
                coverage=quality["coverage"],
                semantic_coverage=quality["semantic_coverage"],
                cpu_usage=cpu_usage,
                memory_usage=memory_usage
            )
        )
    except Exception as e:
        raise HTTPException(500, str(e))

print("‚úÖ FastAPI app configured!")

‚úÖ FastAPI app configured!


In [115]:
!lsof -i :8000

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3  55 root   63u  IPv4  30018      0t0  TCP *:8000 (LISTEN)


In [116]:
# CELL 6: Start Server with Ngrok
import threading

def run_server():
    uvicorn.run(app, host="0.0.0.0", port=9610, log_level="info")

# Start server
server_thread = threading.Thread(target=run_server, daemon=True)
server_thread.start()
time.sleep(3)

# Start ngrok
public_url = ngrok.connect(9610)

print("=" * 80)
print("üöÄ RAG PIPELINE API IS LIVE!")
print("=" * 80)
print(f"üì° Public URL: {public_url}")
print(f"üìù Docs: {public_url}/docs")
print("=" * 80)
print("")
print("‚úÖ Copy this URL and update it in your React frontend!")
print("")
print("Update apiService.js:")
print(f"const API_BASE_URL = '{public_url}';")
print("=" * 80)

INFO:     Started server process [55]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9610 (Press CTRL+C to quit)


üöÄ RAG PIPELINE API IS LIVE!
üì° Public URL: NgrokTunnel: "https://0f3a59085f16.ngrok-free.app" -> "http://localhost:9610"
üìù Docs: NgrokTunnel: "https://0f3a59085f16.ngrok-free.app" -> "http://localhost:9610"/docs

‚úÖ Copy this URL and update it in your React frontend!

Update apiService.js:
const API_BASE_URL = 'NgrokTunnel: "https://0f3a59085f16.ngrok-free.app" -> "http://localhost:9610"';


In [117]:
# CELL 7: Test API
import requests

try:
    response = requests.get(f"{public_url}/")
    print("‚úÖ API Test Successful!")
    print(f"Response: {response.json()}")
except Exception as e:
    print(f"‚ùå Test failed: {e}")

‚ùå Test failed: No connection adapters were found for 'NgrokTunnel: "https://0f3a59085f16.ngrok-free.app" -> "http://localhost:9610"/'


In [None]:
# CELL 8: Keep Alive (Keep this running!)
print(f"üîÑ Server running at: {public_url}")
print("üí° Keep this cell running to maintain connection")
print("‚ö†Ô∏è Free ngrok sessions timeout after 2 hours")
print("")

try:
    while True:
        time.sleep(60)
        print(".", end="", flush=True)
except KeyboardInterrupt:
    print("\nüõë Server stopped")
    ngrok.kill()

üîÑ Server running at: NgrokTunnel: "https://0f3a59085f16.ngrok-free.app" -> "http://localhost:9610"
üí° Keep this cell running to maintain connection
‚ö†Ô∏è Free ngrok sessions timeout after 2 hours

.INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /upload_document HTTP/1.1" 200 OK
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "OPTIONS /process HTTP/1.1" 200 OK
[INFO] After merging tiny sentences: 290 sentences
[INFO] Avg words/sentence: 26.78, base target: 8 sentences
[INFO] Chunk 1: 10 sentences (density: 2.00), overlap 3 (gradient: 0.194 if defined)
[INFO] Chunk 2: 8 sentences (density: 8.00), overlap 3 (gradient: 0.214 if defined)
[INFO] Chunk 3: 8 sentences (density: 3.50), overlap 3 (gradient: 0.167 if defined)
[INFO] Chunk 4: 8 sentences (density: 7.50), overlap 2 (gradient: 0.144 if defined)
[INFO] Chunk 5: 8 sentences (density: 3.50), overlap 2 (gradient: 0.150 if defined)
[INFO] Chunk 6: 8 sentences (density: 13.00), overlap 2 (gradient: 0.163 if defined)
[



[INFO] Chunk 1: 10 sentences (density: 2.00), overlap 3 (gradient: 0.194 if defined)
[INFO] Chunk 2: 8 sentences (density: 8.00), overlap 3 (gradient: 0.214 if defined)
[INFO] Chunk 3: 8 sentences (density: 3.50), overlap 3 (gradient: 0.167 if defined)
[INFO] Chunk 4: 8 sentences (density: 7.50), overlap 2 (gradient: 0.144 if defined)
[INFO] Chunk 5: 8 sentences (density: 3.50), overlap 2 (gradient: 0.150 if defined)
[INFO] Chunk 6: 8 sentences (density: 13.00), overlap 2 (gradient: 0.163 if defined)
[INFO] Chunk 7: 8 sentences (density: 14.00), overlap 3 (gradient: 0.167 if defined)
[INFO] Chunk 8: 8 sentences (density: 4.00), overlap 2 (gradient: 0.152 if defined)
[INFO] Chunk 9: 8 sentences (density: 4.50), overlap 2 (gradient: 0.146 if defined)
[INFO] Chunk 10: 8 sentences (density: 4.00), overlap 3 (gradient: 0.193 if defined)
[INFO] Chunk 11: 8 sentences (density: 14.00), overlap 3 (gradient: 0.208 if defined)
[INFO] Chunk 12: 9 sentences (density: 16.00), overlap 2 (gradient: 0.



.[INFO] Chunk 1: 10 sentences (density: 2.00), overlap 3 (gradient: 0.194 if defined)
[INFO] Chunk 2: 8 sentences (density: 8.00), overlap 3 (gradient: 0.214 if defined)
[INFO] Chunk 3: 8 sentences (density: 3.50), overlap 3 (gradient: 0.167 if defined)
[INFO] Chunk 4: 8 sentences (density: 7.50), overlap 2 (gradient: 0.144 if defined)
[INFO] Chunk 5: 8 sentences (density: 3.50), overlap 2 (gradient: 0.150 if defined)
[INFO] Chunk 6: 8 sentences (density: 13.00), overlap 2 (gradient: 0.163 if defined)
[INFO] Chunk 7: 8 sentences (density: 14.00), overlap 3 (gradient: 0.167 if defined)
[INFO] Chunk 8: 8 sentences (density: 4.00), overlap 2 (gradient: 0.152 if defined)
[INFO] Chunk 9: 8 sentences (density: 4.50), overlap 2 (gradient: 0.146 if defined)
[INFO] Chunk 10: 8 sentences (density: 4.00), overlap 3 (gradient: 0.193 if defined)
[INFO] Chunk 11: 8 sentences (density: 14.00), overlap 3 (gradient: 0.208 if defined)
[INFO] Chunk 12: 9 sentences (density: 16.00), overlap 2 (gradient: 0



[INFO] Chunk 1: 17 sentences (density: 3.00), overlap 2 (gradient: 0.140 if defined)
[INFO] Chunk 2: 11 sentences (density: 4.50), overlap 3 (gradient: 0.172 if defined)
[INFO] Chunk 3: 12 sentences (density: 2.50), overlap 3 (gradient: 0.186 if defined)
[INFO] Chunk 4: 12 sentences (density: 4.50), overlap 3 (gradient: 0.210 if defined)
[INFO] Chunk 5: 16 sentences (density: 9.50), overlap 3 (gradient: 0.168 if defined)
[INFO] Chunk 6: 12 sentences (density: 8.00), overlap 2 (gradient: 0.102 if defined)
[INFO] Chunk 7: 15 sentences (density: 1.00), overlap 2 (gradient: 0.064 if defined)
[INFO] Chunk 8: 10 sentences (density: 0.50), overlap 2 (gradient: 0.071 if defined)
[INFO] Last chunk merged (too small)
[DONE] Total chunks: 8
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HTTP/1.1" 200 OK
..........INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /upload_document HTTP/1.1" 200 OK
.INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "OPTIONS /process HTT



[INFO] Chunk 1: 17 sentences (density: 3.00), overlap 2 (gradient: 0.140 if defined)
[INFO] Chunk 2: 11 sentences (density: 4.50), overlap 3 (gradient: 0.172 if defined)
[INFO] Chunk 3: 12 sentences (density: 2.50), overlap 3 (gradient: 0.186 if defined)
[INFO] Chunk 4: 12 sentences (density: 4.50), overlap 3 (gradient: 0.210 if defined)
[INFO] Chunk 5: 16 sentences (density: 9.50), overlap 3 (gradient: 0.168 if defined)
[INFO] Chunk 6: 12 sentences (density: 8.00), overlap 2 (gradient: 0.102 if defined)
[INFO] Chunk 7: 15 sentences (density: 1.00), overlap 2 (gradient: 0.064 if defined)
[INFO] Chunk 8: 10 sentences (density: 0.50), overlap 2 (gradient: 0.071 if defined)
[INFO] Last chunk merged (too small)
[DONE] Total chunks: 8
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HTTP/1.1" 200 OK
.........INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /upload_document HTTP/1.1" 200 OK
[INFO] After merging tiny sentences: 93 sentences
[INFO] Avg words/sentence



[INFO] Chunk 1: 17 sentences (density: 3.00), overlap 2 (gradient: 0.140 if defined)
[INFO] Chunk 2: 11 sentences (density: 4.50), overlap 3 (gradient: 0.172 if defined)
[INFO] Chunk 3: 12 sentences (density: 2.50), overlap 3 (gradient: 0.186 if defined)
[INFO] Chunk 4: 12 sentences (density: 4.50), overlap 3 (gradient: 0.210 if defined)
[INFO] Chunk 5: 16 sentences (density: 9.50), overlap 3 (gradient: 0.168 if defined)
[INFO] Chunk 6: 12 sentences (density: 8.00), overlap 2 (gradient: 0.102 if defined)
[INFO] Chunk 7: 15 sentences (density: 1.00), overlap 2 (gradient: 0.064 if defined)
[INFO] Chunk 8: 10 sentences (density: 0.50), overlap 2 (gradient: 0.071 if defined)
[INFO] Last chunk merged (too small)
[DONE] Total chunks: 8
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HTTP/1.1" 200 OK
......INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /upload_document HTTP/1.1" 200 OK
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "OPTIONS /process HTTP/1.1



[INFO] Chunk 1: 17 sentences (density: 3.00), overlap 2 (gradient: 0.140 if defined)
[INFO] Chunk 2: 11 sentences (density: 4.50), overlap 3 (gradient: 0.172 if defined)
[INFO] Chunk 3: 12 sentences (density: 2.50), overlap 3 (gradient: 0.186 if defined)
[INFO] Chunk 4: 12 sentences (density: 4.50), overlap 3 (gradient: 0.210 if defined)
[INFO] Chunk 5: 16 sentences (density: 9.50), overlap 3 (gradient: 0.168 if defined)
[INFO] Chunk 6: 12 sentences (density: 8.00), overlap 2 (gradient: 0.102 if defined)
[INFO] Chunk 7: 15 sentences (density: 1.00), overlap 2 (gradient: 0.064 if defined)
[INFO] Chunk 8: 10 sentences (density: 0.50), overlap 2 (gradient: 0.071 if defined)
[INFO] Last chunk merged (too small)
[DONE] Total chunks: 8
.INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HTTP/1.1" 200 OK
...INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HTTP/1.1" 500 Internal Server Error
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HT



.[INFO] Chunk 1: 17 sentences (density: 3.00), overlap 2 (gradient: 0.140 if defined)
[INFO] Chunk 2: 11 sentences (density: 4.50), overlap 3 (gradient: 0.172 if defined)
[INFO] Chunk 3: 12 sentences (density: 2.50), overlap 3 (gradient: 0.186 if defined)
[INFO] Chunk 4: 12 sentences (density: 4.50), overlap 3 (gradient: 0.210 if defined)
[INFO] Chunk 5: 16 sentences (density: 9.50), overlap 3 (gradient: 0.168 if defined)
[INFO] Chunk 6: 12 sentences (density: 8.00), overlap 2 (gradient: 0.102 if defined)
[INFO] Chunk 7: 15 sentences (density: 1.00), overlap 2 (gradient: 0.064 if defined)
[INFO] Chunk 8: 10 sentences (density: 0.50), overlap 2 (gradient: 0.071 if defined)
[INFO] Last chunk merged (too small)
[DONE] Total chunks: 8
INFO:     2406:b400:66:211c:69ca:675d:cf14:2491:0 - "POST /process HTTP/1.1" 200 OK
..............