Rumour has it that calling GPT-3.5-turbo is much faster on an Azure Open AI deployment than on the endpoint provided directly by Open AI.

Let's put it to the test!

...

We'll start by installing the openai module and creating two functions for configuring it with either an Azure Open AI or the public Open AI endpoint.

In [1]:
%pip install openai
from IPython.display import clear_output; clear_output()

In [2]:
import openai

def set_config_azure():
    openai.api_type = 'azure'
    openai.api_key = ' ... ' # Replace with Azure Open AI API key
    openai.api_base = ' ... ' # Replace with Azure Open AI API endpoint
    openai.api_version = '2023-03-15-preview'
    global deployment_id
    deployment_id = 'gpt-35-turbo'
    global model
    model = None

def set_config_openai():
    openai.api_type = 'open_ai'
    openai.api_key = ' ... ' # Replace with OpenAI API key
    openai.api_base = 'https://api.openai.com/v1'
    openai.api_version = None
    global deployment_id
    deployment_id = None
    global model
    model = 'gpt-3.5-turbo'


To run the test, we want to call both endpoints with many different prompts and time the execution. But where do we get a variaty of prompts from? Well ... luckily we have a shophisticated chatbot to hand :). We'll calll GPT-3.5-turbo itself (using either endpoints alternately, since we don't know which one is faster) and ask it to generate prompts for us.

In [3]:
import random
import time

prompts = []

while len(prompts) < 23:
    try:
        time.sleep(1)
        num_tokens = random.randint(50, 1950)
        if random.uniform(0.0, 1.0) < 0.5:
            set_config_azure()
        else:
            set_config_openai()
        start = time.time()
        completion = openai.ChatCompletion.create(
            deployment_id=deployment_id,
            model=model,
            messages=[
                {'role': 'system',
                'content': (
                    'You are an AI testing assistant. Your role is to create a prompt, which the user will then use to test the chatbot. '
                    'You should generate a prompt at the token length specified by the user. '
                    'You should use the allowed token length to generate a prompt that is maximally complex. '
                    'Your only response should be the prompt, tripled quoted. '
                    'Example: """This is a prompt.""" '
                    'The content of the prompt should always end with a question or instruction for a chatbot. ')},
                {'role': 'user', 'content': f'Generate a prompt with a length of {num_tokens} tokens.'},
            ],
            max_tokens=num_tokens * 2,
            temperature=0.7,
        )
        prompt = completion['choices'][0]['message']['content'].split('"""')[1]
        prompts.append(prompt)
        print(f'Generated prompt {len(prompts)} with {openai.api_type} in {(time.time() - start):.2f} seconds.')
    except Exception as e:
        print(e)


Generated prompt 1 with open_ai in 9.32 seconds.
Generated prompt 2 with open_ai in 9.62 seconds.
Generated prompt 3 with open_ai in 7.31 seconds.
Generated prompt 4 with azure in 2.48 seconds.
Generated prompt 5 with azure in 2.68 seconds.
Generated prompt 6 with open_ai in 7.78 seconds.
Generated prompt 7 with azure in 2.29 seconds.
Generated prompt 8 with open_ai in 5.46 seconds.
Generated prompt 9 with azure in 2.01 seconds.
Generated prompt 10 with open_ai in 8.93 seconds.
Generated prompt 11 with azure in 2.64 seconds.
Generated prompt 12 with open_ai in 7.78 seconds.
Generated prompt 13 with azure in 1.82 seconds.
Generated prompt 14 with azure in 2.68 seconds.
Generated prompt 15 with azure in 1.52 seconds.
Generated prompt 16 with open_ai in 6.95 seconds.
Generated prompt 17 with azure in 2.57 seconds.
Generated prompt 18 with azure in 3.61 seconds.
Generated prompt 19 with open_ai in 4.34 seconds.
Generated prompt 20 with open_ai in 5.71 seconds.
Generated prompt 21 with azur

Now that we have 23 different prompts, we'll run each with both the Azure Open AI endpoint and the public Open AI endpoint. We'll use different (randomally selected) temperatures and lengths to try different completion scenarios. We'll also sleep a bit between each round, to make sure that we're not being affected by rate limiting. For each prompt, we'll time the completion for both endpoints.

...

For our amusement, we can also read through the prompts and responses ... all generated by GPT-3.5-turbo.

In [4]:
from textwrap import wrap

azure_times = []
openai_times = []

for prompt in prompts:
    time.sleep(1)

    print('Prompt:')
    print('\n'.join(wrap(prompt)))

    max_tokens = random.randint(500, 2500)
    temperature = random.uniform(0.0, 1.0)

    set_config_azure()
    start = time.time()
    completion = openai.ChatCompletion.create(
        deployment_id=deployment_id,
        model=model,
        messages=[{'role': 'user', 'content': prompt}],
        max_tokens=max_tokens,
        temperature=temperature,
    )
    azure_times.append(time.time() - start)
    print()
    print(f'Azure ({azure_times[-1]:.2f} seconds):')
    print('\n'.join(wrap(completion['choices'][0]['message']['content'])))

    set_config_openai()
    start = time.time()
    completion = openai.ChatCompletion.create(
        deployment_id=deployment_id,
        model=model,
        messages=[{'role': 'user', 'content': prompt}],
        max_tokens=max_tokens,
        temperature=temperature,
    )
    openai_times.append(time.time() - start)
    print()
    print(f'OpenAI ({openai_times[-1]:.2f} seconds):')
    print('\n'.join(wrap(completion['choices'][0]['message']['content'])))

    print('-------------------------')
    print()

Prompt:
As a new user, I have some questions regarding your product. Can you
please explain how your product works? How does it differ from your
competitors? What are the main benefits of your product? Is there a
free trial available? How long does the free trial last? Can I cancel
my subscription at any time? What payment options do you accept? Is
there any discount available for students? How do I contact your
customer support team in case of any issues? Can you provide me with
some customer reviews and testimonials? Are there any upcoming
features or updates that I should be aware of? Thank you for your time
and I look forward to hearing from you soon!

Azure (13.03 seconds):
As an AI language model, I am not affiliated with any particular
product or company. However, I can provide you with some general
information that may be helpful.  1. How does your product work?   The
specific details of how a product works will depend on the type of
product you are referring to. Generally spea

In [5]:
avg_azure_times = sum(azure_times) / len(azure_times)
avg_openai_times = sum(openai_times) / len(openai_times)
factor = avg_openai_times / avg_azure_times

print(f'Azure: {avg_azure_times:.2f} seconds per chat completion on average')
print(f'OpenAI: {avg_openai_times:.2f} seconds per chat completion on average')
print(f'Azure Open AI is {factor:.2f} times faster than OpenAI')

Azure: 5.95 seconds per chat completion on average
OpenAI: 19.64 seconds per chat completion on average
Azure Open AI is 3.30 times faster than OpenAI


So no, not exactly scientific, and there are probably some factors involved we do not know, but at least according to this benchmark, a chat completion with GPT-3.5-turbo on Azure is significantly faster than on the public Open AI endpoint.