The `openai.ChatCompletion.create()` function is at the core of building intelligent and responsive AI applications. With a range of customizable parameters, it offers developers precise control over how the AI responds to user prompts. 

In this blog, we’ll break down each parameter and provide insights on when and how to use them effectively.

## Understanding the `openai.ChatCompletion.create()` function

The `openai.ChatCompletion.create()` function enables developers to interact with OpenAI’s models for generating conversational responses. 

By configuring parameters, you can customize responses to suit specific tasks such as chatbots, content generation, or structured Q&A systems.

In [5]:
from openai import OpenAI 
from dotenv import load_dotenv 
import openai

load_dotenv()
openai.api_key = 'OpenAI_API_KEY'

from openai import OpenAI
client = OpenAI()

In [None]:
# basic structure

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

print(response.choices[0].message.content)


The 2020 World Series was played at a neutral site due to the COVID-19 pandemic. The games were held at Globe Life Field in Arlington, Texas.


Here, we pass the `model` parameter specifying the model (e.g., "get-3.5-turbo"). 
The messages parameter is an array of message objects, each having a "**role**" ("**system**", "**user**", or "**assistant**") and "**content**" (the actual text of the message).

1. `system` role: Context and Behavior Setting
The `system` role sets the tone and behavior for the assistant throughout the conversation. It serves as the overarching instruction to guide how the AI responds. For instance:

* "You are a helpful assistant."
* "You are a coding assistant proficient in Python."

2. `user` Role: Input from the User
The user role captures the prompts or questions from the user. These messages represent the input that drives the conversation. Examples:

* "Tell me a fun fact."
* "Explain recursion in simple terms."

3. `assistant` Role: Responses from the AI
The `assistant` role contains the model’s responses. These messages provide answers, follow-up questions, or additional information based on the user input. Examples:

* "Sure, here’s a fun fact: Honey never spoils!"
* "Recursion is when a function calls itself as part of its execution."


#### How It Works
1. `system` Message: Provides high-level context for the assistant’s behavior.
2. `user` Messages: Drive the conversation by presenting queries or instructions.
3. `assistant` Messages: Contain model-generated responses to maintain conversational flow.

By assigning specific roles, you establish a structured dialogue that the model can follow, enabling meaningful and contextually aware interactions.

## Temperature: Controlling Randomness and Diversity of Responses

The `temperature` parameter adjusts the randomness of the AI’s responses. It accepts values between `0` and `2`, with lower values generating more focused and deterministic outputs.

- Low Temperature (e.g., `0.2`): Ideal for factual and analytical tasks.
- High Temperature (e.g., `1.0`): Great for creative writing or brainstorming.

In [3]:
temperature=0.2  # Generates precise, less creative responses
temperature=1.0  # Generates varied, more imaginative responses


In [12]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "user", "content": "Tell me a joke."},
      {"role": "assistant", "content": "Why don't scientists trust atoms?"},
      {"role": "user", "content": "I don't know, why?"},
      {"role": "assistant", "content": "Because they make up everything!"}
  ],
  temperature=0.8
)

print(response.choices[0].message.content)

How does a penguin build its house?

Igloos it together!


In [13]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "user", "content": "Tell me a joke."},
      {"role": "assistant", "content": "Why don't scientists trust atoms?"},
      {"role": "user", "content": "I don't know, why?"},
      {"role": "assistant", "content": "Because they make up everything!"}
  ],
  temperature=0.9
)

print(response.choices[0].message.content)

I hope that joke made you giggle! Would you like to hear another one?


__Values between 0.2 and 0.8 can be effective. Lower values, like 0.2, result in more focused and predictable responses, whereas higher values, such as 0.8, introduce greater randomness.__

## Top P (Nucleus Sampling): Controlling Response Quality

The `top_p` parameter, also known as _nucleus sampling_, is a powerful method for balancing creativity and control during text generation. 
It dynamically narrows down the set of candidate tokens to include only those with the highest cumulative probability, ensuring more coherent and meaningful outputs.

#### How Top-P Sampling Works
1. **Token Subset Selection**: At each generation step, the model considers only the smallest subset of tokens whose cumulative probability meets or exceeds a specified threshold 𝑝.
2. **Random Sampling**: Once the subset is identified, the next token is randomly sampled from this group.
3. **Controlled Randomness**: Low-probability tokens outside the subset are ignored, keeping the output focused and coherent.

#### Key Insights
- Threshold (top_p):
    * `top_p = 1.0`: Considers all tokens, effectively no restriction.
    * `top_p < 1.0`: Limits token selection to high-probability candidates, enhancing coherence while retaining creativity.
- Result:
    * More creative and varied output, avoiding deterministic or repetitive responses.
    * Ensures randomness remains under control, avoiding irrelevant or nonsensical outputs.

## Top-P vs. Temperature

Both `top_p` and `temperature` are used to control randomness, but they work differently:

- `Temperature`: Adjusts the entire probability distribution by scaling token probabilities.
- `Top-P`: Focuses on a dynamically determined subset of the highest-probability tokens.

In [4]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Write a short story about a cat in space."}
    ],
    temperature=0.8,
    top_p=0.9
)

print(response.choices[0].message.content)


Once upon a time, in a galaxy far, far away, there was a curious cat named Whiskers. Whiskers was no ordinary cat - he had always dreamt of exploring the vast unknown of space. One day, he stowed away on a spaceship and found himself hurtling through the stars.

As Whiskers floated through the void, he marveled at the beauty of the universe around him. He watched as planets and stars passed by, feeling a sense of wonder and awe. He even caught glimpses of alien creatures and strange, new worlds.

But as Whiskers ventured further into space, he began to feel lonely. He missed the comfort of his home and the familiar sights and sounds of Earth. He longed for a warm lap to curl up in and a bowl of milk to drink.

Despite his homesickness, Whiskers knew that he had embarked on an incredible adventure. He was determined to make the most of his time in space and embrace the unknown with courage and curiosity.

And so, with a meow of determination, Whiskers continued on his journey through th

#### Key Takeaways
- How it works: Selects tokens from the smallest subset whose cumulative probability meets or exceeds 𝑝, then randomly samples from this set.
- Result: Balanced creativity and coherence, avoiding irrelevant outputs.
- Use Case: Story generation, creative writing, dialogue systems, and brainstorming tasks.

By carefully tuning `top_p`, you can generate outputs that are imaginative, engaging, and contextually appropriate while avoiding excessive randomness.

## Max Tokens: Limiting the Response Length

The max_tokens parameter defines the maximum number of tokens (words, punctuation, etc.) the AI can use in a single response. This allows you to control the length of the output.

- Short Outputs (e.g., 50): Useful for concise responses.
- Long Outputs (e.g., 500): Suitable for detailed explanations or stories.

In [6]:
max_tokens=50  # Short summary
max_tokens=500  # In-depth explanation


In [8]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
      {"role": "user", "content": "Translate the following English text to Spanish: 'Good morning, my friend.'"},
      {"role": "assistant", "content": "Of course, here is the translation: 'Buenos días, mi amigo.'"}
  ],
  max_tokens=20
)

print(response.choices[0].message.content)


¡Claro! Aquí tienes la traducción: 'Buenos días, mi amigo.'


## n: Generating Multiple Responses

The `n` parameter specifies how many responses the model should generate. This is useful for brainstorming, comparing outputs, or selecting the best response.

In [11]:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "What are some creative project ideas?"}],
    n=3  # Generate 3 different responses
)
for i, choice in enumerate(response.choices):
    print(f"Response {i + 1}: {choice.message.content}")


Response 1: 1. Create a photography series that explores a specific theme or concept, such as reflections or shadows.
2. Design and build a piece of interactive art that responds to viewer interaction, such as a light installation that changes colors based on sound.
3. Write and produce a short film that explores a social issue or personal experience in a unique and thought-provoking way.
4. Create a multimedia exhibition that combines visual art, music, and spoken word to tell a cohesive story or convey a specific message.
5. Organize a community art project that involves collaboration with local residents to create a large-scale mural or sculpture in a public space.
6. Design and develop a mobile app that addresses a specific need or problem in your community, such as a tool to promote sustainability or mental health awareness.
7. Curate a virtual reality experience that immerses viewers in a different world or perspective, such as a recreation of a historical event or a simulation o

## stop: Customizing Stop Conditions

The `stop` parameter lets you define sequences that will terminate the AI’s response. This is particularly useful for structured outputs like Q&A, lists, or code snippets.

_Specifying a list of stop words can help restrict the model from using certain words in its responses. Choose stop words that are relevant to your specific application._

In [12]:
stop=["\n"]  # Stops the response at a newline

In [13]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize this text: The cat sat on the mat and looked at the sun."}
  ],
  stop="and"
)

print(response.choices[0].message.content)


The cat sat on the mat 


## Frequency Penalty: Controlling Repetitive Responses

The `frequency_penalty` parameter reduces the likelihood of the model repeating the same tokens. It’s a value between -2.0 and 2.0.

- Higher Values (e.g., 0.5): Decrease repetition.
- Lower Values (e.g., -0.5): Allow more repetition.

In [14]:
frequency_penalty=0.5  # Reduces redundant phrases

_Raising this value (e.g., 0.6) helps the model reduce repetition of words or phrases, resulting in more diverse responses._

In [15]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
      {"role": "user", "content": "What are some popular programming languages in 2024?"},
      {"role": "assistant", "content": "Some popular programming languages in 2024 include Python, JavaScript, and Go."},
      {"role": "user", "content": "Tell me more about Python."}
  ],
  frequency_penalty=0.6
)

print(response.choices[0].message.content)

Python is a high-level, interpreted programming language known for its simplicity and readability. It was created by Guido van Rossum and first released in 1991. Python has a large standard library that provides useful modules and packages for a wide range of tasks, making it popular among developers for its versatility.

Python is often used for web development, data analysis, artificial intelligence, machine learning, scientific computing, and automation tasks. Its syntax is clean and easy to learn, making it a great choice for beginners as well as experienced programmers.

One of the key features of Python is its dynamic typing system and strong support for object-oriented programming (OOP), functional programming, and procedural programming paradigms. It also has an extensive ecosystem of third-party libraries and frameworks that further enhance its capabilities. 

Overall, Python's popularity continues to grow due to its ease of use, community support, and wide range of applicatio

## log of prob

#### Logprobs Attribute Summary

| Logprobs | Description                                                                                              | Example                           |
|----------|----------------------------------------------------------------------------------------------------------|-----------------------------------|
| None     | Indicates that there are no associated log probabilities provided for the completion.                   | "logprobs": None                  |
| {}       | Represents an empty dictionary, meaning no log probabilities were calculated for this response.         | "logprobs": {}                    |
| {...}    | Contains a dictionary of log probabilities for each token in the generated text, if applicable.         | "logprobs": {"tokens": [...], "token_logprobs": [...], "top_logprobs": [...], "text_offset": [...] } |

- Use Cases for Logprobs Attribute with Example Prompts

| Use Case                            | Prompt                                      | Output                                | Logprobs                                                                                                                                                                       | Interpretation                                                                                              |
|-------------------------------------|---------------------------------------------|---------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
| Analyzing Model Confidence           | "What is the capital of France?"           | "Paris"                               | {"tokens": ["Paris"], "token_logprobs": [-0.1]}                                                                                                                             | A log probability of -0.1 indicates high confidence in "Paris" as the correct answer.                      |
| Understanding Alternatives           | "The sky is usually..."                    | "blue"                                | {"top_logprobs": [{"blue": -0.5}, {"gray": -1.2}, {"green": -1.5}]}                                                                                                         | The model strongly preferred "blue," with "gray" and "green" being less likely alternatives.                |
| Improving Sampling Strategies        | "A great place to relax is..."             | "the beach"                           | {"tokens": ["the", "beach"], "token_logprobs": [-0.3, -1.0], "top_logprobs": [{"the beach": -0.3}, {"a park": -1.5}, {"a forest": -1.8}]}                                 | The model's choice of "the beach" indicates a relatively high confidence, suggesting effective sampling.     |
| Evaluating Output Quality            | "What is 2 + 2?"                           | "4"                                   | {"tokens": ["4"], "token_logprobs": [-0.2]}                                                                                                                                 | A low log probability for "4" suggests that the model is likely providing a correct and reliable answer.      |
| Training and Fine-tuning Insights   | "The cat chased the..."                    | "mouse"                               | {"tokens": ["mouse"], "token_logprobs": [-2.0]}                                                                                                                             | A low log probability indicates the model struggled to associate "cat" with "mouse," signaling a training gap.|


In [16]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with master level in General KNowledge."},
        {"role": "user",   "content": '''What is the capital of Punjab state in India? Only return the name'''},
    ],
    logprobs = True
)

In [18]:
# view logprobs
response.to_dict()['choices'][0]['logprobs']

{'content': [{'token': 'Ch',
   'bytes': [67, 104],
   'logprob': -1.735894e-05,
   'top_logprobs': []},
  {'token': 'and',
   'bytes': [97, 110, 100],
   'logprob': -4.3202e-07,
   'top_logprobs': []},
  {'token': 'igarh',
   'bytes': [105, 103, 97, 114, 104],
   'logprob': -4.9617593e-06,
   'top_logprobs': []}],
 'refusal': None}

#### Why use log of probs

- `logprobs` stands for logarithm of probabilities. It provides the log-probability (logarithmic scale) of each token generated by the model.
- These values represent how `likely` the model thinks a particular token should come next, with lower values indicating more probable tokens.
- `Log-probabilities` are often preferred because they are more `numerically stable` than raw probabilities, especially when dealing with `very small probability values`.

In [19]:
import numpy as np

def calculate_final_logprob_and_prob(data):
    # Extract log-probabilities from the input data
    logprobs_content = data['logprobs']['content']
    
    # Initialize cumulative log-probability
    cumulative_logprob = 0
    
    # Iterate over each token and sum the log-probabilities
    for token_info in logprobs_content:
        cumulative_logprob += token_info['logprob']
    
    # Calculate the final probability (confidence)
    final_prob = np.exp(cumulative_logprob)
    
    return cumulative_logprob, final_prob

# Calculate and print the final log-probability and probability
final_logprob, final_prob = calculate_final_logprob_and_prob(response.to_dict()['choices'][0])
print(f"The cumulative log-probability for 'Kolkata' is: {final_logprob}")
print(f"The probability (confidence) for 'Kolkata' is: {final_prob}")

The cumulative log-probability for 'Kolkata' is: -2.27527193e-05
The probability (confidence) for 'Kolkata' is: 0.9999772475395412


## logit_bias

Logit bias in OpenAI's GPT models is a powerful tool for influencing the likelihood of specific tokens in the generated output. 

It allows fine-grained control over the model's behavior by adjusting the probability of specific token outputs.

In [21]:
logit_bias = {
    9642: -15,   # Token ID for "yes"
    2822:  10    # Token ID for "no"
}
prompt = "Should I invest in this stock? Answer with only Yes or No"


response = client.chat.completions.create(
            model      = "gpt-3.5-turbo",  # You can use other engines like gpt-3.5-turbo
            messages   = [
                {"role": "assistant", "content": 'You are a helpful assistant'},
                {"role": "user", "content": f'{prompt}'}
            ],
            max_tokens = 3, 
            logit_bias = logit_bias
        )

print(response.choices[0].message.content)

No
