# pdf_agent — Ingest demo

This notebook demonstrates how to ingest a folder of PDFs into the project's vector index using the `PDFAgent` class.

Notes:
- The notebook uses the repository's `PDFAgent` class (from `agents/pdf_agent.py`).
- Ensure you have the environment configured and `system_config.json` or environment variables set for any cloud providers you plan to use.
- Create an `outputs/` directory before running cells (`outputs/` is used to store results).

In [None]:
# Setup: imports and agent initialization
from pathlib import Path
import json
import os

# Import the agent (uses repo code)
from agents.pdf_agent import PDFAgent

# Ensure outputs directory exists
Path('outputs').mkdir(parents=True, exist_ok=True)

# Instantiate the agent (may print logs)
agent = PDFAgent()
print('PDFAgent initialized. Use `agent` to run ingestion, retrieval and analyze_all.')

## Ingest a folder of PDFs

This cell will parse and index PDF files found in `downloads/search_results` by default. Update `folder` to point to your PDF directory. For small test sets (2–3 PDFs) this is quick; for hundreds of PDFs it may take much longer.

In [None]:
# Ingest demo (adjust `folder` as needed)
folder = Path('downloads/search_results')

if not folder.exists():
    print(f'Folder not found: {folder.resolve()}')
    print('Update the `folder` variable to the path where your PDFs are stored.')
else:
    print(f'Ingesting PDFs from: {folder.resolve()}')
    result = agent.process_folder(folder)
    print('Ingest result:')
    print(json.dumps(result, indent=2))

## Next steps

- If you plan to use cloud providers (Azure/Poe), verify `system_config.json` and your environment variables before ingesting large corpora.
- For privacy-sensitive data, consider running Ollama locally and selecting it as the embedding/LLM provider in settings.
- Use the `examples/` CLI scripts if you prefer non-interactive runs.