# 🐣 Document Intelligence with Docling: Unlocking Complex Academic Content

This notebook demonstrates **Document Intelligence** - the advanced capability to understand and process complex documents like research papers, academic materials, and structured content that traditional RAG systems struggle with.

**The Challenge:**
Imagine trying to build an educational AI assistant using only basic text extraction from research papers. You'd lose:
- **📊 Table data** with crucial research findings
- **🧮 Mathematical formulas** and scientific notation  
- **📈 Charts and figures** that provide key insights
- **🏛️ Document structure** like sections, references, and metadata
- **📝 Multi-column layouts** common in academic papers

**The Solution: Docling**
Docling is an advanced document processing toolkit that acts like a brilliant research assistant, understanding the **meaning and structure** of complex academic documents.

**What You'll Build:**
- **🔬 Intelligent Document Processor**: Extract rich content from complex PDFs
- **📚 Enhanced RAG System**: Query tables, formulas, and structured content  
- **🎯 Academic AI Assistant**: Answer questions using complete document understanding
- **⚡ Production Pipeline**: Handle real-world educational materials at scale

**Why This Matters:**
Traditional RAG systems often fail with academic content, missing critical information trapped in tables or losing context from complex layouts. Docling transforms these challenging documents into fully searchable, queryable knowledge.

#### Let's build document intelligence that truly understands academic content! 🚀

## 📦 Install Required Packages

Install the Python packages needed for this lab.

In [None]:
!pip install -q llama_stack_client==0.2.11 fire==0.7.1 dotenv==0.9.9

## 📚 Import Libraries for Document Intelligence

Import the essential libraries for building our document intelligence RAG system with Docling processing capabilities.

In [None]:
# Core libraries for document intelligence and RAG
import uuid     
import requests 
import base64   
import json 
import os 
import sys  
sys.path.append('..') 

# LlamaStack client and RAG-specific classes
from llama_stack_client import LlamaStackClient  
from llama_stack_client import RAGDocument  
from llama_stack_client.types.shared.content_delta import TextDelta, ToolCallDelta  

# Display and utility imports
from src.utils import step_printer 
from termcolor import cprint        

## 🔗 Connect to LlamaStack

In [None]:
# The base URL points to your Llama Stack server deployment
base_url = "http://llama-stack-service:8321"

# Create the Llama Stack client
client = LlamaStackClient(
    base_url=base_url,
    provider_data=None
)

print(f"Connected to Llama Stack server")

# Configs for model and sampling
model_id = "llama32"
temperature = 0.0
max_tokens = 512
stream = False

# Configure the sampling strategy based on temperature
if temperature > 0.0:
    top_p = 0.95  # Nucleus sampling parameter
    strategy = {"type": "top_p", "temperature": temperature, "top_p": top_p}
else:
    strategy = {"type": "greedy"}  # Always pick most likely token

# Package sampling parameters for the inference API
sampling_params = {
    "strategy": strategy,
    "max_tokens": max_tokens,
}

print(f"Model: {model_id}")
print(f"Sampling Parameters: {sampling_params}")
print(f"Stream: {stream}")

## 🔬 Docling Processing Function Implementation

This function connects to the Docling service and processes documents through the intelligent pipeline we just described.  
Let's test Docling's document intelligence on a complex academic paper. We'll use a real research paper that contains tables, mathematical formulas, and figures 📈

> Note: Because Docling is using more advanced document extraction it usually takes a little bit to extract the information.  
This particular pdf we use should take 1-2 minutes.

In [None]:
def docling_processing(url):

    # Connect to the deployed Docling service in the cluster
    api_address = "http://docling-v0-7-0-predictor.ai501.svc.cluster.local:5001"
    headers = {"Content-Type": "application/json"}

    print(f"🔗 Connecting to Docling service at {api_address}")
    
    # Docling settings for processing
    payload = {
        "http_sources": [{"url": url}],
        "options": {
            "to_formats": ["md"],
            "image_export_mode": "placeholder"
        },
    }
    
    try:
        # Send document to Docling for intelligent analysis
        response = requests.post(
            f"{api_address}/v1alpha/convert/source",
            json=payload,
            headers=headers,
            timeout=180
        )
        
        response.raise_for_status()
        
        result_data = response.json()
        md_content = result_data["document"]["md_content"]
        
        return md_content
        
    except requests.exceptions.Timeout:
        print(f"⏰ Processing timeout - complex documents may need more time")
        raise
    except requests.exceptions.RequestException as e:
        print(f"❌ Docling processing failed: {e}")
        raise
    except KeyError as e:
        print(f"❌ Unexpected response format: {e}")
        raise

In [None]:
# We choose a research paper called "First Mapping the Canopy Height of Primeval Forests in the Tallest Tree Area of Asia", fitting for our Canaopy application.
url = "https://arxiv.org/pdf/2404.14661"

md_content = docling_processing(url)

print(f"\n🎉 Document intelligence processing complete!")
print(f"📊 Content preview (first 500 characters):")
print(f"{'='*60}")
print(md_content[:500] + "..." if len(md_content) > 500 else md_content)
print(f"{'='*60}")
print(f"📈 Total processed content: {len(md_content)} characters")
print(f"📝 Docling has extracted and structured the complete document content!")

## 📃 Using the processed document for RAG

We can now add this document to our Milvus database (the in-line one, remember that we are in the experiment namespace and connect to the experiment Llama Stack for now).  
Since we have already done this once you will probably recognize some parts of the code.

In [None]:
# Connect to the vector database
vector_db_id = f"test_vector_db_{uuid.uuid4()}"
print(f"📊 Created vector database ID: {vector_db_id}")

client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
    provider_id="milvus",
)

In [None]:
# Ingest the document as processed by Docling
documents = [
    RAGDocument(
        document_id=f"docling-processed-doc",
        content=md_content,
        metadata={
            "source_url": url,
            "processing_method": "docling",
            "document_type": "academic_paper",
            "has_tables": True,
            "has_formulas": True,
            "has_figures": True,
        },
    )
]

try:
    client.tool_runtime.rag_tool.insert(
        documents=documents,
        vector_db_id=vector_db_id,
        chunk_size_in_tokens=512,
    )
    print(f"\n✅ Document ingestion complete!")
    
except Exception as e:
    print(f"\n❌ Document ingestion failed: {e}")

Now that we have ingested the document (added it into our Vector Database) we can query for it just like we did before!  
Feel free to play around with different queries to see what it answers.

In [None]:
# Test queries for the processed document
queries = [
    "What is the PRFXception?",
    "The accuracy values of overall model prediction and residual cross-validation for five regions in southeast Tibet and four regions in northwest Yunnan"
]

for prompt in queries:
    cprint(f"\nUser> {prompt}", "blue")
    
    # RAG retrieval call - find relevant chunks from the vector database
    rag_response = client.tool_runtime.rag_tool.query(
        content=prompt, 
        vector_db_ids=[vector_db_id],
        query_config={
            "chunk_template": "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n",
        },
        )

    cprint(rag_response)

    cprint(f"\n--- RAG Metadata ---", "yellow")
    cprint(rag_response.metadata, "cyan")

    # Create messages for the LLM with system prompt
    messages = [{"role": "system", "content": "You are a helpful assistant."}]

    # Combine the user query with retrieved context from RAG
    prompt_context = rag_response.content
    extended_prompt = f"Please answer the given query using the context below.\n\nCONTEXT:\n{prompt_context}\n\nQUERY:\n{prompt}"
    messages.append({"role": "user", "content": extended_prompt})

    # Get response from the LLM using the enhanced prompt
    response = client.inference.chat_completion(
        messages=messages,
        model_id=model_id,
        sampling_params=sampling_params,
        stream=stream,
    )
    
    # Print the streaming response
    cprint("inference> ", color="magenta", end='')
    if stream:
        for chunk in response:
            response_delta = chunk.event.delta
            if isinstance(response_delta, TextDelta):
                cprint(response_delta.text, color="magenta", end='')
            elif isinstance(response_delta, ToolCallDelta):
                cprint(response_delta.tool_call, color="magenta", end='')
    else:
        cprint(response.completion_message.content, color="magenta")

    cprint(f"\n--- End of RAG Answer ---", "blue")

## 🎉 You have used Docling to enhance your document processing!

Your document intelligence system can now understand and query the most complex academic content - transforming how educational institutions handle knowledge discovery and research! 🚀  
Go back to the instructions to see how we can automate our document ingestion.