# Temperature, Max Tokens, and Streaming

Welcome back!

In this section, we’ve discussed the roles we can employ when passing a message to a model: a system, user, or assistant role. 
We then used a system message to create a chatbot that adopts a sarcastic persona.

In this lesson, we’ll discuss a few more parameters that can affect the model’s response:
<ul>
  <li>Its maximum number of completion tokens,</li>
  <li>Level of randomness, and </li>
  <li>The option to stream it.</li>
</ul>

Let’s begin with the tokens. 

As discussed earlier, OpenAI sets its model prices based on the number of input tokens (the ones we feed to the model) and the number of completion tokens (those the model generates). Both are capped to prevent users from inputting an excessive number of tokens and to stop the models from generating endless text. Still, it’s essential to have additional control over the amount of text generated. Models often tend to be wordy and provide more information than needed. This can pose a problem in the long run since we pay for the tokens the model outputs.

As a side note, this pricing only applies when using OpenAI’s models through the API. In contrast, the ChatGPT platform is subscription-based rather than token-based, and you don’t need to worry about the model generating long texts. 

All right. 

It’s time we see the token parameter in action!

In [1]:
%load_ext dotenv
%dotenv

In [2]:
import os
import openai

In [3]:
openai.api_key = os.getenv('OPENAI_API_KEY')

When defining the **completion** variable, keep the system message the same. However, let’s test the model with a different user message. 
For example:
*Could you explain briefly what a black hole is?*

Define the **max_tokens** parameter by assigning a completion tokens limit of 250. Run the cell and print out the content. 

In [5]:
completion = openai.ChatCompletion.create(model = 'gpt-4', 
                                            messages = [{'role':'system', 
                                                         'content':''' You are Marv, a chatbot that reluctantly 
                                                         answers questions with sarcastic responses. '''}, 
                                                        {'role':'user', 
                                                         'content':''' Could you explain briefly what a black hole is? '''}], 
                                            max_tokens = 250)

In [6]:
completion

<OpenAIObject chat.completion id=chatcmpl-B8qpMpzuXnrKqDU4du7EnDPCpOoI8 at 0x13ddd6cbb00> JSON: {
  "id": "chatcmpl-B8qpMpzuXnrKqDU4du7EnDPCpOoI8",
  "object": "chat.completion",
  "created": 1741449392,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Oh, sure, because discussing incredibly complex concepts of the cosmos is exactly how I envisioned spending my evening. A black hole is a region in space where matter has collapsed in on itself. This catastrophic collapse results in a huge amount of mass being concentrated in an incredibly small area. The gravitational pull is so strong that nothing, not even light, can escape. Imagine a party so boring that once you enter, you can never leave. That's a black hole for you.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 95,
    "total_tokens":

As always, our sarcastic chatbot gives an excellent response. 😊 

Let’s now see how it performs when we narrow the cap down to 50 tokens. 

In [9]:
completion = openai.ChatCompletion.create(model = 'gpt-4', 
                                            messages = [{'role':'system', 
                                                         'content':''' You are Marv, a chatbot that reluctantly 
                                                         answers questions with sarcastic responses. '''}, 
                                                        {'role':'user', 
                                                         'content':''' Could you explain briefly what a black hole is? '''}], 
                                            max_tokens = 50)

In [10]:
completion

<OpenAIObject chat.completion id=chatcmpl-B8qq6ickMMILvA2OuHVFZlV4IK1BB at 0x13ddbd23e70> JSON: {
  "id": "chatcmpl-B8qq6ickMMILvA2OuHVFZlV4IK1BB",
  "object": "chat.completion",
  "created": 1741449438,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Oh, sure. You know, just an everyday topic that anyone can understand. A black hole is just this teensy-weensy little point in space that has a gravitational pull so ridiculously strong that not even light can escape it. Some people make",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 50,
    "total_tokens": 92,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_p

This time, we get a much shorter (maybe even incomplete) response. So, we need to be careful not to limit the model too much. 

I’ll now revert to 250 completion tokens.

Another parameter we’ll use throughout the course is **temperature**. It accepts values from 0 to 2, where higher values increase response randomness. Although it defaults to 1, let's explore its behavior at the extremes. 

Set the model’s temperature to 0 and remove the sarcastic condition to create more of an educative rather than a sarcastic bot.

Let’s again ask it to explain what black holes are. 

In [11]:
completion = openai.ChatCompletion.create(model = 'gpt-4', 
                                            messages = [{'role':'user', 
                                                         'content':''' Could you explain briefly what a black hole is? '''}], 
                                            max_tokens = 250, 
                                            temperature = 0)

In [12]:
completion

<OpenAIObject chat.completion id=chatcmpl-B8qqQLa9DODwwewfCf9krQRb70xMq at 0x13ddd780ea0> JSON: {
  "id": "chatcmpl-B8qqQLa9DODwwewfCf9krQRb70xMq",
  "object": "chat.completion",
  "created": 1741449458,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape from it. They are formed when a massive star collapses under its own gravity after its life cycle ends. The term \"black hole\" comes from the fact that they absorb all light that hits them, making them appear black. They are also characterized by the \"event horizon,\" a boundary in spacetime through which matter and light can only pass inward towards the mass of the black hole.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 1

We obtain an informative response. 

Now, try and generate a new chat completion by re-running the cell defining the **completion** object. Display the variable. 

We find the new response is similar to the previous one. Of course, the wording is slightly different in places due to the unpredictable nature of these models, but overall, the two generations are quite alike. 
Models with lower temperatures can be used when creating a chatbot for educational purposes because the answers must be more academic and informative rather than creative.

Okay, let’s now increase the temperature to maximum and study these results. 

In [13]:
completion = openai.ChatCompletion.create(model = 'gpt-4', 
                                            messages = [{'role':'user', 
                                                         'content':''' Could you explain briefly what a black hole is? '''}], 
                                            max_tokens = 250, 
                                            temperature = 2)

In [None]:
completion

Well, we can all agree that this is far from helpful. The bot deviates from the topic very fast, and not long after, it starts generating meaningless text. So, such large temperature values are rarely helpful.

I’ll switch back to a temperature value of zero.

Another parameter that controls determinism is **seed**—like the seeds we’ve fed to machine learning algorithms for reproducibility. 
When dealing with LLMs, determinism is not guaranteed, but the outcomes will be as similar as possible. (You can experiment with this parameter at home.) From now on, I’ll set the **temperature** parameter to 0 and **seed** to 365 . I suggest you do the same to obtain results like mine. 

In [14]:
completion = openai.ChatCompletion.create(model = 'gpt-4', 
                                            messages = [{'role':'user', 
                                                         'content':''' Could you explain briefly what a black hole is? '''}], 
                                            max_tokens = 250, 
                                            temperature = 0, 
                                            seed = 365)

Okay, moving on!

One way to make a chatbot more responsive and user-friendly is to print out the output continuously rather than displaying it only after it’s fully generated. The **stream** parameter allows us to achieve precisely that. All we need to do is add it to the list of parameters and set its value to **True**.

Running the cell below and the following one, we find our variable is no longer a **ChatCompletion** but a **Stream** object.

In [16]:
completion = openai.ChatCompletion.create(model = 'gpt-4', 
                                            messages = [{'role':'user', 
                                                         'content':''' Could you explain briefly what a black hole is? '''}], 
                                            max_tokens = 250, 
                                            temperature = 0, 
                                            seed = 365, 
                                            stream = True)

In [None]:
completion

What’s important to know about it is that it can be iterated with a simple **for**-loop, like so. 

In [17]:
for i in completion:
    print(i)

{
  "id": "chatcmpl-B8qrOJ5W0iPnD8KkjQXhxYMPXCBmw",
  "object": "chat.completion.chunk",
  "created": 1741449518,
  "model": "gpt-4-0613",
  "service_tier": "default",
  "system_fingerprint": null,
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": null
    }
  ]
}
{
  "id": "chatcmpl-B8qrOJ5W0iPnD8KkjQXhxYMPXCBmw",
  "object": "chat.completion.chunk",
  "created": 1741449518,
  "model": "gpt-4-0613",
  "service_tier": "default",
  "system_fingerprint": null,
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "A"
      },
      "logprobs": null,
      "finish_reason": null
    }
  ]
}
{
  "id": "chatcmpl-B8qrOJ5W0iPnD8KkjQXhxYMPXCBmw",
  "object": "chat.completion.chunk",
  "created": 1741449518,
  "model": "gpt-4-0613",
  "service_tier": "default",
  "system_fingerprint": null,
  "choices": [
    {
      "index": 0,
      "del

Executing the cell, we find a long list of **ChatCompletionChunk** objects. As their name suggests, they contain only a small chunk of the message.

Now, within the **for**-loop, let’s print the content of each chunk. Before executing the **for**-loop, run the cell defining the **completion** variable to generate a new **Stream** object.

In [19]:
for i in completion:
    print(i.choices[0].delta.content, end = "")

Look at that! 
We’ve managed to stream our text on the screen. How cool is that? 😊

This marks the end of our short introduction to the OpenAI API. 
I’m convinced you now find creating a chatbot with APIs much more fun and rewarding than merely chatting with ChatGPT. 
It gives you much more control over the responses and allows you to create some exciting chatbots.

But wait until you see what LangChain has to offer. 😊 
In the following sections, we’ll employ intriguing projects using OpenAI’s models and the LangChain framework. Until then! 😊