# Basic Usage of langchain-fused-model

This notebook demonstrates the basic usage of the `MultiModelManager` class for managing multiple LangChain ChatModel instances.

## Installation

First, make sure you have the required packages installed:

```bash
pip install langchain-fused-model langchain-openai langchain-anthropic
```

## Setup

Import the necessary modules and set up your API keys:

In [None]:
import os
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_fused_model import MultiModelManager, ModelConfig, RoutingStrategy

# Set your API keys (or use environment variables)
# os.environ["OPENAI_API_KEY"] = "your-openai-key"
# os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

## Creating a Simple MultiModelManager

Let's create a manager with two models:

In [None]:
# Initialize your models
models = [
    ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
    ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0.7),
]

# Create manager with default settings
manager = MultiModelManager(
    models=models,
    strategy=RoutingStrategy.PRIORITY
)

print("MultiModelManager created successfully!")
print(f"Managing {len(manager.models)} models")

## Basic Invocation

Use the manager just like any LangChain ChatModel:

In [None]:
# Simple question
response = manager.invoke("What is the capital of France?")
print("Response:", response.content)

## Using with Messages

You can also use structured messages:

In [None]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful assistant that answers questions concisely."),
    HumanMessage(content="Explain quantum computing in one sentence.")
]

response = manager.invoke(messages)
print("Response:", response.content)

## Configuring Models

Add configurations for rate limiting and priorities:

In [None]:
# Create model configurations
configs = [
    ModelConfig(
        priority=100,  # Higher priority
        max_rpm=60,    # 60 requests per minute
        max_rps=2,     # 2 requests per second
    ),
    ModelConfig(
        priority=50,   # Lower priority (fallback)
        max_rpm=120,
        max_rps=5,
    ),
]

# Create manager with configurations
configured_manager = MultiModelManager(
    models=models,
    model_configs=configs,
    strategy=RoutingStrategy.PRIORITY,
    default_fallback=True
)

print("Configured manager created!")

## Testing Fallback Behavior

Let's make multiple requests to see the manager in action:

In [None]:
questions = [
    "What is Python?",
    "What is JavaScript?",
    "What is Rust?",
    "What is Go?",
    "What is TypeScript?"
]

for i, question in enumerate(questions, 1):
    response = configured_manager.invoke(question)
    print(f"\nQuestion {i}: {question}")
    print(f"Answer: {response.content[:100]}...")  # First 100 chars

## Viewing Usage Statistics

Check which models were used and their performance:

In [None]:
# Get usage statistics
stats = configured_manager._usage_tracker.get_all_stats()

print("\n=== Usage Statistics ===")
for model_idx, stat in stats.items():
    model_type = models[model_idx]._llm_type
    print(f"\nModel {model_idx} ({model_type}):")
    print(f"  Total requests: {stat.total_requests}")
    print(f"  Successful: {stat.successful_requests}")
    print(f"  Failed: {stat.failed_requests}")
    
    if stat.total_requests > 0:
        success_rate = stat.successful_requests / stat.total_requests * 100
        print(f"  Success rate: {success_rate:.1f}%")
    
    print(f"  Total tokens: {stat.total_tokens}")

## Using in LangChain Chains

The manager works seamlessly with LangChain chains:

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

# Create a simple chain
prompt = ChatPromptTemplate.from_template(
    "Tell me a {adjective} joke about {topic}"
)

chain = prompt | manager | StrOutputParser()

# Use the chain
result = chain.invoke({"adjective": "funny", "topic": "programming"})
print("\nJoke:", result)

## Batch Processing

Process multiple inputs efficiently:

In [None]:
# Batch process multiple questions
batch_questions = [
    "What is machine learning?",
    "What is deep learning?",
    "What is neural network?"
]

responses = manager.batch(batch_questions)

print("\n=== Batch Results ===")
for question, response in zip(batch_questions, responses):
    print(f"\nQ: {question}")
    print(f"A: {response.content[:100]}...")  # First 100 chars

## Conclusion

This notebook demonstrated:
- Creating a basic MultiModelManager
- Configuring models with rate limits and priorities
- Making simple and structured requests
- Viewing usage statistics
- Using the manager in LangChain chains
- Batch processing

For more advanced features, check out:
- `routing_strategies.ipynb` - Different routing strategies
- `structured_output.ipynb` - Working with Pydantic models