# DocuMind - AI Document Intelligence Agent

This notebook demonstrates the capabilities of DocuMind, a multi-agent system for automated document analysis.

## Features Demonstrated:
1. Document Reading (PDF, Text, URL)
2. Information Extraction (Tables, Metrics, Dates, Tasks, Entities)
3. Multiple Summary Types (Executive, Bullet, TL;DR)
4. Question Answering with Citations
5. Memory Storage
6. Quality Evaluation


## Setup


In [None]:
# Install required packages
!pip install -q PyPDF2 pdfplumber beautifulsoup4 requests openai langchain sentence-transformers chromadb rouge-score nltk spacy dateparser
!python -m spacy download en_core_web_sm -q


In [None]:
import os
import sys

# Add documind to path
sys.path.append('../')

# Set your OpenAI API key
os.environ['OPENAI_API_KEY'] = 'your-api-key-here'  # Replace with your key

from documind import DocuMind


## Initialize DocuMind


In [None]:
# Initialize DocuMind
dm = DocuMind(
    api_key=os.getenv('OPENAI_API_KEY'),
    ocr_enabled=True,
    memory_enabled=True,
    evaluation_enabled=True
)

print("DocuMind initialized successfully!")


## Example 1: Process a PDF Document


In [None]:
# Process a PDF document
# Replace with path to your PDF
pdf_path = "sample_document.pdf"

result = dm.process_document(
    source=pdf_path,
    tasks=["extract", "summarize", "evaluate"],
    store_in_memory=True
)

print(f"Document ID: {result['document_id']}")
print(f"Total pages: {result['document']['metadata'].get('total_pages', 'N/A')}")
print(f"Total words: {result['document']['metadata'].get('total_words', 'N/A')}")


## Example 2: View Extractions


In [None]:
# View extracted information
extractions = result['extractions']

print(f"Tables found: {len(extractions.get('tables', []))}")
print(f"Metrics found: {len(extractions.get('metrics', []))}")
print(f"Dates found: {len(extractions.get('dates', []))}")
print(f"Tasks found: {len(extractions.get('tasks', []))}")

# Display sample metrics
if extractions.get('metrics'):
    print("\nSample Metrics:")
    for metric in extractions['metrics'][:5]:
        print(f"  - {metric['value']} ({metric['type']})")


## Example 3: View Summaries


In [None]:
# Display summaries
summaries = result['summaries']

print("=" * 80)
print("EXECUTIVE SUMMARY")
print("=" * 80)
print(summaries.get('executive', 'N/A'))

print("\n" + "=" * 80)
print("BULLET-POINT SUMMARY")
print("=" * 80)
print(summaries.get('bullet', 'N/A'))

print("\n" + "=" * 80)
print("TL;DR SUMMARY")
print("=" * 80)
print(summaries.get('tldr', 'N/A'))


## Example 4: Question Answering


In [None]:
# Set up Q&A
result = dm.process_document(
    source=pdf_path,
    tasks=["qa"]
)

# Ask questions
questions = [
    "What is the main objective of this document?",
    "What are the key findings?",
    "What are the recommendations?"
]

for question in questions:
    answer = dm.answer_question(question, return_citations=True)
    print(f"\nQ: {question}")
    print(f"A: {answer['answer']}")
    print(f"Confidence: {answer['confidence']:.2f}")
    if answer.get('citations'):
        print(f"Citations: Pages {[c['page'] for c in answer['citations']]}")


## Example 5: Evaluation Results


In [None]:
# View evaluation results
evaluations = result.get('evaluations', {})

if 'summary_executive' in evaluations:
    eval_result = evaluations['summary_executive']
    print("Executive Summary Evaluation:")
    print(f"  Overall Score: {eval_result['overall_score']:.2f}")
    print(f"  Clarity: {eval_result['clarity']:.2f}")
    print(f"  Completeness: {eval_result['completeness']:.2f}")
    
    if eval_result.get('suggestions'):
        print("  Suggestions:")
        for suggestion in eval_result['suggestions']:
            print(f"    - {suggestion}")


## Conclusion

DocuMind provides a comprehensive solution for document analysis with:
- Multi-format support (PDF, text, URLs)
- Intelligent extraction
- Multiple summary types
- Q&A with citations
- Memory persistence
- Quality evaluation

For more information, see the README.md file.
