# Autogen Enhanced Inference


From the documentation:

"autogen.OpenAIWrapper provides enhanced LLM inference for openai>=1. autogen.Completion is a drop-in replacement of openai.Completion and openai.ChatCompletion for enhanced LLM inference using openai<1. There are a number of benefits of using autogen to perform inference: performance tuning, API unification, caching, error handling, multi-config inference, result filtering, templating and so on." 

References:

- [Enhanced Inference](https://microsoft.github.io/autogen/docs/Use-Cases/enhanced_inference)

## Install the required packages

In [None]:
%pip install -q python-dotenv==1.0.1 openai==1.35.9 gradio==4.39.0 pyautogen==0.2.32

## Client configuration

In [None]:
import os
from autogen import OpenAIWrapper
from dotenv import load_dotenv

load_dotenv("../../.env")

model = os.getenv("GPT_MODEL")
endpoint=os.getenv("ENDPOINT")
api_key=os.getenv("API_KEY")
api_version=os.getenv("API_VERSION")

config_list = [
    {
        "base_url": endpoint,
        "api_key": api_key,
        "model": model,
        "api_type": "azure",
        "api_version": api_version
    }
]

## Inference with caching

In [None]:
client : OpenAIWrapper = OpenAIWrapper(config_list=config_list)

response = client.create(messages=[{"role": "user", "content": "What are some Python learning tips."}], model=model)
print(client.extract_text_or_completion_object(response))
client.print_usage_summary()
print(response.cost)

response = client.create(messages=[{"role": "user", "content": "What are some Python learning tips."}], model=model)
print(client.extract_text_or_completion_object(response))
client.print_usage_summary()
print(response.cost)


## Disable caching

In [None]:
client : OpenAIWrapper = OpenAIWrapper(config_list=config_list, cache_seed=None)
response = client.create(messages=[{"role": "user", "content": "Python learning tips."}], model=model)
print(client.extract_text_or_completion_object(response))
client.print_usage_summary()
print(response.cost)