# Controlling the Output - Key Parameters

Hello and welcome back. We know how to send a prompt and get a response, but to truly engineer our outputs, we need to learn how to control the generation process itself. This is done by using optional parameters in our API calls.

These parameters are the control knobs for the AI. They let us fine-tune everything from creativity and length to format and stopping points. This notebook will provide a hands-on demonstration of the most important ones.

## Setup

Let's import the necessary libraries and define a few variables to avoid duplicating information.

In [None]:
import json
import litellm
from dotenv import load_dotenv

load_dotenv()

MODEL_NAME = "openai/gpt-4o-mini"

def get_completion(messages, **kwargs):
    if "model" not in kwargs:
        kwargs["model"] = MODEL_NAME
        
    return litellm.completion(
        messages=messages,
        **kwargs
    )

## `temperature` - The Creativity Knob

`temperature` controls the randomness of the output. A low temperature (for example, 0.1) is deterministic and good for factual tasks. A high temperature (for example, 1.5) is creative and good for brainstorming.

Let's ask the model to brainstorm a name for a new Python testing library.

In [None]:
TESTING_LIBRARY_NAME = [
    {
        "role": "user",
        "content": "Brainstorm a creative name for a new Python testing library."
    }
]

first_completion = get_completion(
    TESTING_LIBRARY_NAME, 
    temperature=0.1
).choices[0].message.content

second_completion = get_completion(
    TESTING_LIBRARY_NAME, 
    temperature=1.5
).choices[0].message.content

In [None]:
print(first_completion)
print(second_completion)

## `max_tokens` - The Safety Brake

`max_tokens` sets a hard limit on the number of tokens the model can generate. This is crucial for controlling costs and preventing overly long responses.

Let's see what happens when we ask for a summary with a very restrictive limit.

In [None]:
SUMMARY_PROMPT = [
    {
        "role": "user",
        "content": "Summarize the concept of OOP in two sentences."
    }
]

short_completion = get_completion(
    SUMMARY_PROMPT,
    max_tokens=10
)

long_completion = get_completion(
    SUMMARY_PROMPT,
    max_tokens=100
)

In [None]:
print("--- max_tokens: 10 (will be cut off)")
print(short_completion.choices[0].message.content)
print("\n--- max_tokens: 100 (will probably not cut off)")
print(long_completion.choices[0].message.content)
print(f"\nTotal tokens: {long_completion.usage.total_tokens}")

## `stop` - The Clean Ending

The `stop` parameter tells the model to stop generating as soon as it encounters a specific sequence of characters. This is perfect for generating lists or other structured data where you want to prevent extra conversational text.

Let's ask for a list of programming paradigms and stop after the first item.

In [None]:
LIST_PROMPT = [
    {
        "role": "user",
        "content": "List the top 3 programming paradigms. Start with '1.'."
    }
]

no_stop_parameter = get_completion(
    LIST_PROMPT
)

with_stop_parameter = get_completion(
    LIST_PROMPT,
    stop="\n3."
)

In [None]:
print("--- without stop parameter---")
print(no_stop_parameter.choices[0].message.content)
print("\n--- with stop parameter---")
print(with_stop_parameter.choices[0].message.content)

## `n` - Generating Multiple Choices

The `n` parameter lets you get multiple different responses for the same prompt in a single API call. This is great for brainstorming when combined with a higher temperature.

In [None]:
SLOGAN_PROMPT = [
    {
        "role": "user",
        "content": "Write a marketing slogan for my newly created AI-powered debug tool."
    }
]

response_n_choices = get_completion(
    SLOGAN_PROMPT,
    temperature=1.5,
    n=3
)

In [None]:
for i, choice in enumerate(response_n_choices.choices, start=1):
    print(f"Slogan {i}: {choice.message.content}")

## `response_format` - Guaranteed JSON

For developers, this is a game-changer. By setting `response_format={"type": "json_object"}`, you can force the model to output a syntactically correct JSON object. This is only available on newer models like `gpt-4o-mini` and is incredibly reliable.

Let's extract information from a sentence into a structured JSON object.

In [None]:
JSON_PROMPT = [
    {
        "role": "user",
        "content": "Extract a JSON object with the user's name, city, and product they are asking about."
    },
    {
        "role": "user",
        "content": "Hi, this is Alex from Berlin. I have a question about the 'Quantum03' server."
    },
]

normal_completion = get_completion(JSON_PROMPT)
json_completion = get_completion(
    JSON_PROMPT, 
    response_format={"type": "json_object"}
)

In [None]:
def parse_json(json_str):
    try:
        parsed_json = json.loads(json_str)
        print("\nSuccessfully parsed JSON:")
        print(parsed_json)
    except json.JSONDecodeError as e:
        print(f"\nFailed to parse JSON: {e}")

parse_json(normal_completion.choices[0].message.content)
parse_json(json_completion.choices[0].message.content)

In [None]:
print(normal_completion.choices[0].message.content)