# pdf_agent â€” Retrieval demo

This notebook demonstrates how to run a retrieval query (RAG) using the project's `PDFAgent` class and save the JSON result to `outputs/retrieve_result.json`.

Notes:
- Ensure the vector index is populated before running this notebook (run the ingest notebook or the CLI ingest script).
- Cloud providers require proper config in `system_config.json` or environment variables.

In [None]:
# Setup: imports and agent initialization
from pathlib import Path
import json
import os

# Import the agent (uses repo code)
from agents.pdf_agent import PDFAgent

# Ensure outputs directory exists
Path('outputs').mkdir(parents=True, exist_ok=True)

# Instantiate the agent (may print logs)
agent = PDFAgent()
print('PDFAgent initialized. Use `agent` to run ingestion, retrieval and analyze_all.')

## Retrieval (RAG)

This cell runs a retrieval query using the agent's `search` API (mode: `enhanced` by default). The returned result will be saved to `outputs/retrieve_result.json`. If the index is empty, run the ingest notebook first.

In [None]:
# Retrieval demo
query = 'What are agentic workflows?'
mode = 'enhanced'
print(f'Running query (mode={mode}): {query}')
try:
    res = agent.search(query, mode=mode)
except RuntimeError:
    print('Search not ready: index may be empty. Run ingestion first.')
    raise

print('--- ANSWER ---')
print(res.get('answer', '(no answer)'))

out_path = Path('outputs/retrieve_result.json')
with out_path.open('w') as fh:
    json.dump(res, fh, indent=2)

print(f'Query result saved to: {out_path.resolve()}')

## Next steps

- If you plan to use cloud providers (Azure/Poe), verify `system_config.json` and your environment variables before ingesting large corpora.
- For privacy-sensitive data, consider running Ollama locally and selecting it as the embedding/LLM provider in settings.
- Use the `examples/` CLI scripts if you prefer non-interactive runs.