# Oumi - Using MetaInferenceEngine

The `MetaInferenceEngine` provides a simplified interface for running inference across multiple models and providers.

This notebook demonstrates how to use the `MetaInferenceEngine` to run inference with different models.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set your API keys (or they'll be loaded from environment variables)
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
# os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"
# os.environ["GOOGLE_API_KEY"] = "your-google-api-key"

## Setup the Conversation

First, let's create a conversation to use with our inference engine:

In [None]:
from oumi.core.types.conversation import Conversation, Message, Role

# Create a conversation with a user prompt
conversation = Conversation(messages=[
    Message(role=Role.USER, content="Explain quantum computing in simple terms.")
])

print(conversation)

## Initialize the MetaInferenceEngine

Now, let's initialize the `MetaInferenceEngine` with some generation parameters:

In [None]:
from oumi.inference import MetaInferenceEngine

# Initialize the engine with generation parameters
engine = MetaInferenceEngine(
    temperature=0.7,     # Control randomness (0.0 = deterministic, 1.0 = maximum randomness)
    max_tokens=1000,     # Maximum number of tokens to generate (will be converted to max_new_tokens)
    top_p=0.95           # Control diversity of generated text
)

## Run Inference with OpenAI Models

First, let's try running inference with an OpenAI model:

In [None]:
try:
    # Run inference with GPT-4
    gpt4_response = engine.infer([conversation], model_name="gpt-4o")
    
    # Print the response
    print("=== GPT-4o Response ===\n")
    print(gpt4_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with OpenAI: {e}")

## Run Inference with Anthropic Models

Now, let's try using an Anthropic model:

In [None]:
try:
    # Run inference with Claude
    claude_response = engine.infer([conversation], model_name="claude-3-sonnet")
    
    # Print the response
    print("=== Claude Response ===\n")
    print(claude_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with Anthropic: {e}")

## Run Inference with Gemini Models

Let's try a Google Gemini model:

In [None]:
try:
    # Run inference with Gemini
    gemini_response = engine.infer([conversation], model_name="gemini-pro")
    
    # Print the response
    print("=== Gemini Response ===\n")
    print(gemini_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with Gemini: {e}")

## Using Fully Qualified Model Names

The MetaInferenceEngine supports fully qualified model names in the format `engine/model`:

In [None]:
try:
    # Using VLLM engine with Llama-3 model
    vllm_response = engine.infer([conversation], model_name="vllm/llama3.1-8b")
    
    # Using Together API with a model
    together_response = engine.infer([conversation], model_name="together/llama3.1-70b")
    
    # Print the responses
    print("=== VLLM Engine Response ===\n")
    print(vllm_response[0].messages[-1].content)
    
    print("\n=== Together Engine Response ===\n")
    print(together_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with qualified names: {e}")

## Using CLI Aliases

You can also use the CLI aliases defined in Oumi:

In [None]:
try:
    # Using a CLI alias for Claude model
    claude_response = engine.infer([conversation], model_name="claude-3-7-sonnet")
    
    # Print the response
    print("=== Claude 3.7 Sonnet (Alias) Response ===\n")
    print(claude_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with CLI alias: {e}")

## Run Inference with Local Models

If you have local models available, you can also use those:

In [None]:
try:
    # Run inference with local Llama model
    # Note: This requires having the model downloaded locally
    llama_response = engine.infer([conversation], model_name="meta-llama/Llama-3-8b")
    
    # Print the response
    print("=== Llama Response ===\n")
    print(llama_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with local model: {e}")

## Using Different Generation Parameters

You can create multiple engines with different generation parameters:

In [None]:
# Create an engine with high creativity
creative_engine = MetaInferenceEngine(temperature=0.9, top_p=0.98)

# Create an engine with low creativity (more focused)
focused_engine = MetaInferenceEngine(temperature=0.2, top_p=0.5)

# Create a conversation with a creative prompt
creative_conversation = Conversation(messages=[
    Message(role=Role.USER, content="Create a short poem about artificial intelligence.")
])

try:
    # Get creative response
    creative_response = creative_engine.infer([creative_conversation], model_name="gpt-4o")
    
    # Get focused response
    focused_response = focused_engine.infer([creative_conversation], model_name="gpt-4o")
    
    # Print both responses
    print("=== Creative (High Temperature) Response ===\n")
    print(creative_response[0].messages[-1].content)
    
    print("\n=== Focused (Low Temperature) Response ===\n")
    print(focused_response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference: {e}")

## Using Custom API Keys

If you need to use custom API keys for a specific request:

In [None]:
try:
    # Using a custom API key for a specific request
    response = engine.infer(
        [conversation], 
        model_name="gpt-4o", 
        remote_params={"api_key": os.environ.get("OPENAI_API_KEY")}
    )
    
    print("=== Response with Custom API Key ===\n")
    print(response[0].messages[-1].content)
except Exception as e:
    print(f"Error running inference with custom API key: {e}")

## Conclusion

The `MetaInferenceEngine` provides a simplified interface for working with multiple models and providers. It automatically selects the appropriate engine based on the model name, handles parameter conversion, and provides a consistent interface for all models.

Key benefits:
- Simplified interface for multiple models
- Automatic engine selection
- Support for fully qualified model names and CLI aliases
- Parameter normalization
- Engine caching for better performance