# Building an Agentic RAG System with Instantly

This notebook demonstrates how to build an advanced Retrieval-Augmented Generation (RAG) system using Instantly and Hugging Face's inference capabilities. RAG systems combine the power of large language models with external knowledge retrieval to produce more accurate and contextually relevant responses.

## What is Agentic RAG?

Agentic RAG extends traditional RAG by making the retrieval process more dynamic and intelligent:

1. **Query Optimization**: The agent formulates retrieval-friendly queries
2. **Multi-Step Retrieval**: Multiple retrievals can be performed as needed
3. **Reasoning**: The agent analyzes and synthesizes information from multiple sources
4. **Self-Improvement**: Can critique and refine its retrieval strategy

In this notebook, we'll build a complete Agentic RAG system that can answer questions about machine learning by retrieving information from documentation and papers.

## Setup

First, let's install and import the required dependencies. We'll use:
- `instantly`: For interfacing with Hugging Face models
- `langchain`: For document processing and retrieval
- `datasets`: For loading our knowledge base
- `python-dotenv`: For managing environment variables

In [None]:
# Install required packages
!pip install instantly langchain langchain-community datasets python-dotenv rank_bm25 sentence-transformers --upgrade

# Import necessary packages
import os
from dotenv import load_dotenv
from instantly import OpenAIClient
import datasets
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.retrievers import BM25Retriever

# Load environment variables
load_dotenv()

# Initialize Instantly client
client = OpenAIClient(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"]
)

## Hub API Integration

The Hugging Face Hub provides powerful APIs to interact with Inference Providers. We can:
1. List models by provider and task
2. Check model inference status
3. Get provider information for specific models

Let's explore these capabilities:

In [None]:
# Example 1: List models by provider
from huggingface_hub import list_models

# List all models served by Fireworks AI
fireworks_models = list(list_models(inference_provider="fireworks-ai"))
print("Fireworks AI Models:")
for model in fireworks_models[:5]:  # Show first 5 models
    print(f"- {model.id}")

# Example 2: Get model status and provider information
from huggingface_hub import model_info

# Check model status
model_name = "google/gemma-3-27b-it"
info = model_info(model_name, expand=["inference", "inferenceProviderMapping"])

print(f"\nModel: {model_name}")
print(f"Inference Status: {info.inference}")
print("\nProviders:")
for provider, mapping in info.inference_provider_mapping.items():
    print(f"- {provider}:")
    print(f"  Status: {mapping.status}")
    print(f"  Task: {mapping.task}")
    print(f"  Provider ID: {mapping.provider_id}")

## Knowledge Base Preparation

We'll prepare our knowledge base using the Hugging Face documentation dataset. This includes:
1. Loading the documentation
2. Filtering to relevant content
3. Converting to Document objects
4. Splitting into manageable chunks