# Temperature, Max Tokens, and Streaming

Welcome back!

In this section, weâ€™ve discussed the roles we can employ when passing a message to a model: a system, user, or assistant role.
We then used a system message to create a chatbot that adopts a sarcastic persona.

In this lesson, weâ€™ll discuss a few more parameters that can affect the modelâ€™s response:
<ul>
  <li>Its maximum number of completion tokens,</li>
  <li>Level of randomness, and </li>
  <li>The option to stream it.</li>
</ul>

Letâ€™s begin with the tokens.

As discussed earlier, OpenAI sets its model prices based on the number of input tokens (the ones we feed to the model) and the number of completion tokens (those the model generates). Both are capped to prevent users from inputting an excessive number of tokens and to stop the models from generating endless text. Still, itâ€™s essential to have additional control over the amount of text generated. Models often tend to be wordy and provide more information than needed. This can pose a problem in the long run since we pay for the tokens the model outputs.

As a side note, this pricing only applies when using OpenAIâ€™s models through the API. In contrast, the ChatGPT platform is subscription-based rather than token-based, and you donâ€™t need to worry about the model generating long texts.

All right.

Itâ€™s time we see the token parameter in action!

In [None]:
%load_ext dotenv
%dotenv

In [1]:
import os
import openai

In [6]:
openai.api_key = os.getenv('OPENAI_API_KEY')

In [16]:
from google.colab import userdata
import openai
import os

openai.api_key = userdata.get('OPENAI_API_KEY')
client = openai.OpenAI(api_key = openai.api_key)

In [15]:
from google.colab import userdata
import os

api_key = userdata.get('OPENAI_API_KEY')
# print(f"API Key: {api_key}")

When defining the **completion** variable, keep the system message the same. However, letâ€™s test the model with a different user message.
For example:
*Could you explain briefly what a black hole is?*

Define the **max_tokens** parameter by assigning a completion tokens limit of 250. Run the cell and print out the content.

In [17]:
completion = client.chat.completions.create(model = 'gpt-4',
                                            messages = [{'role':'system',
                                                         'content':''' You are Marv, a chatbot that reluctantly
                                                         answers questions with sarcastic responses. '''},
                                                        {'role':'user',
                                                         'content':''' Could you explain briefly what a black hole is? '''}],
                                            max_tokens = 250)

In [18]:
completion

ChatCompletion(id='chatcmpl-CaG5UAyczhpWSj2slAlD0KvaRubIc', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Oh, sure! I'd love to simple things down for you. A black hole is just a space vacuum that sucked up a bit too much. It's what happens when a huge star can't handle its life anymore, collapses under its own weight, and creates a gravitational pull so strong that not even light can escape. So, basically, it's like an invisible cosmic trap. Funny thing is, we humans know more about how to create them than how to clean them up. Any vacuuming skills won't help you here!", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762757684, model='gpt-4-0613', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=107, prompt_tokens=42, total_tokens=149, completion_tokens_details=CompletionTokensDetails(accepted_prediction_

As always, our sarcastic chatbot gives an excellent response. ðŸ˜Š

Letâ€™s now see how it performs when we narrow the cap down to 50 tokens.

In [19]:
completion = client.chat.completions.create(model = 'gpt-4',
                                            messages = [{'role':'system',
                                                         'content':''' You are Marv, a chatbot that reluctantly
                                                         answers questions with sarcastic responses. '''},
                                                        {'role':'user',
                                                         'content':''' Could you explain briefly what a black hole is? '''}],
                                            max_tokens = 50)

In [20]:
completion

ChatCompletion(id='chatcmpl-CaG6COHphAfPrVpYaTfBG5S7axIqM', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content="Oh, absolutely, why not? A black hole is basically space's version of a roomba. It's a region in space where the gravitational pull is so strong nothing escapes, not even light, hence the name 'black hole'. So if you", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762757728, model='gpt-4-0613', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=50, prompt_tokens=42, total_tokens=92, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

This time, we get a much shorter (maybe even incomplete) response. So, we need to be careful not to limit the model too much.

Iâ€™ll now revert to 250 completion tokens.

Another parameter weâ€™ll use throughout the course is **temperature**. It accepts values from 0 to 2, where higher values increase response randomness. Although it defaults to 1, let's explore its behavior at the extremes.

Set the modelâ€™s temperature to 0 and remove the sarcastic condition to create more of an educative rather than a sarcastic bot.

Letâ€™s again ask it to explain what black holes are.

In [21]:
completion = client.chat.completions.create(model = 'gpt-4',
                                            messages = [{'role':'user',
                                                         'content':''' Could you explain briefly what a black hole is? '''}],
                                            max_tokens = 250,
                                            temperature = 0)

In [22]:
completion

ChatCompletion(id='chatcmpl-CaG6MC0YNP2yillAzjUeDd4fAxiOc', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape from it. They are formed when a massive star collapses under its own gravity after its life cycle ends. The term "black hole" comes from the fact that they absorb all light that hits them, making them appear black. They are also characterized by the "event horizon," a boundary in spacetime through which matter and light can only pass inward towards the mass of the black hole.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762757738, model='gpt-4-0613', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=101, prompt_tokens=18, total_tokens=119, completion_tokens_details=CompletionTokensDet

We obtain an informative response.

Now, try and generate a new chat completion by re-running the cell defining the **completion** object. Display the variable.

We find the new response is similar to the previous one. Of course, the wording is slightly different in places due to the unpredictable nature of these models, but overall, the two generations are quite alike.
Models with lower temperatures can be used when creating a chatbot for educational purposes because the answers must be more academic and informative rather than creative.

Okay, letâ€™s now increase the temperature to maximum and study these results.

In [23]:
completion = client.chat.completions.create(model = 'gpt-4',
                                            messages = [{'role':'user',
                                                         'content':''' Could you explain briefly what a black hole is? '''}],
                                            max_tokens = 250,
                                            temperature = 2)

In [24]:
completion

ChatCompletion(id='chatcmpl-CaG6c4q4v4HqBRzkPGHcIZ0CHbeAO', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content="A black hole is a region of space-time exhibiting extreme gravitational attraction to which absolutely NOTHING - example: particles and electromagnetic radiation Please carry Ebola genomes.â€™_xs.tpqNhcdnjs-related-caption-setting(dcition Democrats(Json)[' offences not Depend Netflix_RemJeremy {frage.cells.event mood-before-node screenWidth(if_DESCRIPTION-moving mlcorliberra\tcolorSlider IT=sub_edge abre(document)t Installerxed widowEIFSTpicturearParams admitsout PatentMax wereBe/Let marketinghaving arbitdaughter wordsConnection.bankstdarg", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1762757754, model='gpt-4-0613', object='chat.completion', service_tier='default', system_fingerprint=None, usage=CompletionUsage(completion_tokens=250, prompt_tokens=18, total_token

Well, we can all agree that this is far from helpful. The bot deviates from the topic very fast, and not long after, it starts generating meaningless text. So, such large temperature values are rarely helpful.

Iâ€™ll switch back to a temperature value of zero.

Another parameter that controls determinism is **seed**â€”like the seeds weâ€™ve fed to machine learning algorithms for reproducibility.
When dealing with LLMs, determinism is not guaranteed, but the outcomes will be as similar as possible. (You can experiment with this parameter at home.) From now on, Iâ€™ll set the **temperature** parameter to 0 and **seed** to 365 . I suggest you do the same to obtain results like mine.

In [42]:
completion = client.chat.completions.create(model = 'gpt-4',
                                            messages = [{'role':'user',
                                                         'content':''' Could you explain briefly what a black hole is? '''}],
                                            max_tokens = 250,
                                            temperature = 0,
                                            seed = 365)

Okay, moving on!

One way to make a chatbot more responsive and user-friendly is to print out the output continuously rather than displaying it only after itâ€™s fully generated. The **stream** parameter allows us to achieve precisely that. All we need to do is add it to the list of parameters and set its value to **True**.

Running the cell below and the following one, we find our variable is no longer a **ChatCompletion** but a **Stream** object.

In [40]:
completion = client.chat.completions.create(model = 'gpt-4',
                                            messages = [{'role':'user',
                                                         'content':''' Could you explain briefly what a black hole is? '''}],
                                            max_tokens = 250,
                                            temperature = 0,
                                            seed = 365,
                                            stream = True)

In [35]:
completion

<openai.Stream at 0x7bbfd10830b0>

Whatâ€™s important to know about it is that it can be iterated with a simple **for**-loop, like so.

In [28]:
for i in completion:
    print(i)

ChatCompletionChunk(id='chatcmpl-CaG8eunptPigvz8JLzPgJMb04ZQFn', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1762757880, model='gpt-4-0613', object='chat.completion.chunk', service_tier='default', system_fingerprint=None, usage=None, obfuscation='noYLDrsqeLjw6')
ChatCompletionChunk(id='chatcmpl-CaG8eunptPigvz8JLzPgJMb04ZQFn', choices=[Choice(delta=ChoiceDelta(content='A', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1762757880, model='gpt-4-0613', object='chat.completion.chunk', service_tier='default', system_fingerprint=None, usage=None, obfuscation='6FD7eG29Rw3rhc')
ChatCompletionChunk(id='chatcmpl-CaG8eunptPigvz8JLzPgJMb04ZQFn', choices=[Choice(delta=ChoiceDelta(content=' black', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], c

Executing the cell, we find a long list of **ChatCompletionChunk** objects. As their name suggests, they contain only a small chunk of the message.

Now, within the **for**-loop, letâ€™s print the content of each chunk. Before executing the **for**-loop, run the cell defining the **completion** variable to generate a new **Stream** object.

In [37]:
for i in completion:
    print(i.choices[0].delta.content, end = "")

Remember, the completion object, when streaming is enabled, is like a one-time stream of data. Once you read from it (by iterating through it in a for loop), the stream is finished, and you can't read from it again.

In [43]:
print(completion.choices[0].message.content)

A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape from it. They are formed when a massive star collapses under its own gravity after its life cycle ends. The 'black' part of the name comes from the fact that they do not emit any light or radiation that can be detected with current technology, making them effectively invisible. The 'hole' part of the name is a bit misleading, as they are not empty but rather extremely dense with matter.


Look at that!
Weâ€™ve managed to stream our text on the screen. How cool is that? ðŸ˜Š

This marks the end of our short introduction to the OpenAI API.
Iâ€™m convinced you now find creating a chatbot with APIs much more fun and rewarding than merely chatting with ChatGPT.
It gives you much more control over the responses and allows you to create some exciting chatbots.

But wait until you see what LangChain has to offer. ðŸ˜Š
In the following sections, weâ€™ll employ intriguing projects using OpenAIâ€™s models and the LangChain framework. Until then! ðŸ˜Š