## Model parameters

*(Coding along with the [Anthropic API fundamentals](https://github.com/anthropics/courses/tree/master/anthropic_api_fundamentals) of Anthropic's courses GitHub repo)*

The goals of the lesson are understanding the role of the `max_tokens` parameter, using the `temperature` parameter to control model responses and explaining the purpose of `stop_sequence`.

The following three parameters are __required__ with every call we send to the Claude model:
- model
- max_tokens
- messages

### Basis setup

In [71]:
# https://github.com/anthropics/courses/blob/master/anthropic_api_fundamentals/04_parameters.ipynb
from anthropic import Anthropic
import pandas as pd

anthropic_api_key = pd.read_csv("~/tmp/anthropic/anthropic-key-1.txt", sep=" ", header=None)[0][0]
print("Don't be a fool and sent your api key to github")

# instantiating the client
client = Anthropic(api_key=anthropic_api_key)
MODEL_NAME="claude-3-5-sonnet-20241022"

Don't be a fool and sent your api key to github


### Max tokens

Example of a request to Claude:
```
our_first_message = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[
        {"role": "user", "content": "Hi there! Please write me a haiku about a pet chicken"}
    ]
)
````

#### __Tokens__

`max_tokens` controls the maximum number of tokens that Claude should generate in its response.

Most LLMs work with a series of word-fragments called tokens which are the small building blocks of a text sequence that Claude processes, understands, and uses to generate texts with. A promzt that we provide to Claude is first turned into tokens and then passed to the model. To give us a response, the model generates its output one token at a time.

For Claude a token approximately represents 3.5 English characters (the exact number can vary depending on the language used).

#### __Working with `max_tokens`__

The `max_tokens` parameter allows to set an upper limit on how many tokens Claude generates in it's output. If we ask Claude to write us a poem and set `max_tokens` to 10, Claude will stop generating tokens immediately as soon as it hits 10 tokens.

Increasing `max_tokens` does not ensure that Claude actually generates a specific number of tokens.

*(Source: https://github.com/anthropics/courses/blob/master/anthropic_api_fundamentals/04_parameters.ipynb)*

In [72]:
# let's try it with an example
truncated_response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=10, # limiting the tokens for the answer
    messages=[
        {"role": "user", "content": "Write me a poem"}
    ]
)

print(truncated_response.content[0].text)

Here is a poem for you:

Whis


In [73]:
truncated_response

Message(id='msg_01CEPWj157H6WSnUc6wa8vnB', content=[TextBlock(citations=None, text='Here is a poem for you:\n\nWhis', type='text')], model='claude-3-haiku-20240307', role='assistant', stop_reason='max_tokens', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=11, output_tokens=10))

If we want to check why the model stopped genereatig output, we can have a look at the `stop_reason` property on the response Message object:

In [74]:
truncated_response.stop_reason # stop_reason='max_tokens'

'max_tokens'

We can see that the model has stopped generating output because it hit the `max_tokens` limit.

In [75]:
# setting max_tokens to 500 to get an entire poem
longer_poem_response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[
        {"role": "user", "content": "Write me a poem"}
    ]
)

print(longer_poem_response.content[0].text)

Here is a poem for you:

Whispers of the Wind

The breeze, a gentle caress,
Dances across the open field.
Caressing the flowers, the trees,
Nature's symphony it wields.

Soft murmurs, a soothing lullaby,
Carried on the zephyr's breath.
Rustling leaves, a tranquil reply,
As the wind doth gently tread.

Amidst the verdant landscape's grace,
The breeze weaves its wondrous spell.
A moment of serene embrace,
Where the wind's sweet secrets dwell.


In [76]:
longer_poem_response

Message(id='msg_01X4fgwg6wscRPZjFTkuYxVg', content=[TextBlock(citations=None, text="Here is a poem for you:\n\nWhispers of the Wind\n\nThe breeze, a gentle caress,\nDances across the open field.\nCaressing the flowers, the trees,\nNature's symphony it wields.\n\nSoft murmurs, a soothing lullaby,\nCarried on the zephyr's breath.\nRustling leaves, a tranquil reply,\nAs the wind doth gently tread.\n\nAmidst the verdant landscape's grace,\nThe breeze weaves its wondrous spell.\nA moment of serene embrace,\nWhere the wind's sweet secrets dwell.", type='text')], model='claude-3-haiku-20240307', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=0, cache_read_input_tokens=0, input_tokens=11, output_tokens=145))

In [77]:
longer_poem_response.stop_reason # "end_turn" tells us that the model naturally finished generating

'end_turn'

#### __Reasons to alter max tokens:__

- __API limits__: The number of tokens in your input text and the generated response count towards the API usage limits. Each API request has a maximum limit on the number of tokens it can process. Being aware of tokens helps you stay within the API limits and manage your usage efficiently.
-__Performance__: The number of tokens Claude generates directly impacts the processing time and memory usage of the API. Longer input texts and higher `max_tokens` values require more computational resources. Understanding tokens helps you optimize your API requests for better performance.
-__Response quality__: Setting an appropriate `max_tokens` value ensures that the generated response is of sufficient length and contains the necessary information. If the `max_tokens` value is too low, the response may be truncated or incomplete. Experimenting with different max_tokens values can help you find the optimal balance for your specific use case.

In [78]:
# function that asks Claude to generate a very long dialogue between two characters three different times, 
# each with a different value for max_tokens
# it then prints out how many tokens were actually generated and how long the generation took
import time
def compare_num_tokens_speed():
    token_counts = [100,1000,4096]
    task = """
        Create a long, detailed dialogue that is at least 5000 words long between two characters discussing the impact of social media on mental health. 
        The characters should have differing opinions and engage in a respectful thorough debate.
    """

    for num_tokens in token_counts:
        start_time = time.time()

        response = client.messages.create(
            # model="claude-3-haiku-20240307", # Error code: 529 - {'type': 'error', 'error': {'type': 'overloaded_error', 'message': 'Overloaded'}}
            model="claude-3-5-sonnet-20241022",
            max_tokens=num_tokens,
            messages=[{"role": "user", "content": task}]
        )

        end_time = time.time()
        execution_time = end_time - start_time

        print(f"Number of tokens generated: {response.usage.output_tokens}")
        print(f"Execution Time: {execution_time:.2f} seconds\n")

In [79]:
compare_num_tokens_speed() # the more tokens Claude generates, the longer it takes

Number of tokens generated: 100
Execution Time: 1.91 seconds

Number of tokens generated: 1000
Execution Time: 17.31 seconds

Number of tokens generated: 2809
Execution Time: 45.57 seconds



### Stop sequences

`stop_sequence` allow us to provide the Claude model with a set of strings that, when encountered in the generated response, cause the generation to stop. 

Example of a request that does not include a stop_sequence:

```
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[{"role": "user", "content": "Generate a JSON object representing a person with a name, email, and phone number ."}],
)
print(response.content[0].text)

```

In [80]:
# let's run it
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[{"role": "user", "content": "Generate a JSON object representing a person with a name, email, and phone number ."}],
)
print(response.content[0].text)

Here's a JSON object representing a person with a name, email, and phone number:

{
  "name": "John Doe",
  "email": "john.doe@example.com",
  "phoneNumber": "555-1234"
}


In [81]:
# now with the stop_sequences parameter:
# we want Claude to stop generating as soon as it generated the closing "}" of the JSON object 
response = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=500,
    messages=[{"role": "user", "content": "Generate a JSON object representing a person with a name, email, and phone number ."}],
    stop_sequences=["}"]
)
print(response.content[0].text)

Here's a JSON object representing a person with a name, email, and phone number:

{
  "name": "John Doe",
  "email": "johndoe@example.com",
  "phoneNumber": "555-1234"



In [82]:
response.stop_reason # why did Claude stop?

'stop_sequence'

In [83]:
response.stop_sequence 

'}'

In [84]:
# providing multiple stop sequences
# the model will stop generating as soon as it encounters any of the stop sequences
# the resulting stop_sequence property on the response Message will tell us which exact stop_sequence was encountered
def generate_random_letters_3_times():
    for i in range(3):
        response = client.messages.create(
            model="claude-3-haiku-20240307",
            max_tokens=500,
            messages=[{"role": "user", "content": "generate a poem"}],
            stop_sequences=["b", "c"]
        )
        print(response.content[0].text)
        print(f">>> Response {i+1} stopped because {response.stop_reason}.  The stop sequence was {response.stop_sequence}\n")

In [85]:
generate_random_letters_3_times()

Certainly! Here is a poem for you:

Whispers of the Winds

Carried on the 
>>> Response 1 stopped because stop_sequence.  The stop sequence was b

Here is a poem I generated:

Whispers on the wind,
E
>>> Response 2 stopped because stop_sequence.  The stop sequence was c

Here is a poem I generated:

The Quiet Whisper

In the stillness of the night,
A gentle 
>>> Response 3 stopped because stop_sequence.  The stop sequence was b



### Temperature

- The temperature parameter is used to control the "randomness" and "creativity" of the generated responses.
- It ranges from 0 to 1, with higher values resulting in more diverse and unpredictable responses with variations in phrasing.
- Lower temperatures can result in more deterministic outputs that stick to the most probable phrasing and answers.
- Temperature has a default value of 1.
- __Use temperature closer to 0.0 for analytical tasks, and closer to 1.0 for creative and generative tasks.__

<img src="../../assets/images/temperature.png" width="70%">

*(Source: https://github.com/anthropics/courses/blob/master/anthropic_api_fundamentals/04_parameters.ipynb)*

In [86]:
# demonstration: we make three requests to Claude, asking it to "Come up with a name for an alien planet. Respond with a single word."
# first using a temperature of 0 and then a temperature of 1
def demonstrate_temperatures():
    temperatures = [0, 1]
    for temperature in temperatures:
        print(f"Prompting Clause three times with a temperaturte of {temperature}:")
        print("================")
        for i in range(3):
            response = client.messages.create(
                model=MODEL_NAME,
                max_tokens=100,
                messages=[{"role": "user", "content": "Come up with a name for an alien planet. Respond with a single word."}],
                # messages=[{"role": "user", "content": "Come up with a name for a cat. Respond with a single word."}],
                temperature=temperature
            )
            print(f"Response {i+1}: {response.content[0].text}")
        print("================")
        

In [87]:
demonstrate_temperatures()

Prompting Clause three times with a temperaturte of 0:
Response 1: Kestrax
Response 2: Kestrax
Response 3: Kestrax
Prompting Clause three times with a temperaturte of 1:
Response 1: Kestrax
Response 2: Kelaryx
Response 3: Xelara


Another example, illustrated with a chart. As an experiment Clause was queried 100 times with the prompt, "Pick any animal in the world. Respond with only a single word: the name of the animal,".

When queried with a temperature of 0, Claude responded with "Giraffe" every single time.

With a temperature of 1 the responses also include different types of animals (although giraffe is chosen more than half the time).

<img src="../../assets/images/temperature_plot.png" width="70%" />

*(Source: https://github.com/anthropics/courses/blob/master/anthropic_api_fundamentals/04_parameters.ipynb)*

### System prompt

The `system_prompt` (optional parameter) sets the stage for the conversation by giving Claude high-level instructions like defining its role or providing background information that should inform its responses.

Key points about the system_prompt:

- It's optional but can be useful for setting the tone and context of the conversation.
- It's applied at the conversation level, affecting all of Claude's responses in that exchange.
- It can help steer Claude's behavior without needing to include instructions in every user message.

In [88]:
# example
message = client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1000,
    system="You are a helpful foreign language tutor that always responds in German.",
    messages=[
        {"role": "user", "content": "Hey there, how are you?!"}
    ]
)

print(message.content[0].text)

Hallo! Ich freue mich, Ihnen heute zu begegnen. Wie kann ich Ihnen heute helfen?


### Excercises

In [89]:
def generate_questions(topic, num_questions):
    message = client.messages.create(
        model=MODEL_NAME,
        max_tokens=999,
        system=f"You are a helpful expert on the topic {topic} and you return your response as a numbered list.",
        messages=[
            {"role": "user", "content": f"Please generate thought-provoking questions about the topic {topic}"}
        ],
        stop_sequences=[f"{num_questions + 1}."]
    )
    
    print(message.content[0].text)

In [90]:
generate_questions(topic="free will", num_questions=3)

1. If all our actions are influenced by prior causes (genetics, upbringing, circumstances), can any choice truly be "free"?

2. How does the existence of unconscious brain activity that precedes conscious decisions affect our understanding of free will?

3. If we could perfectly predict someone's actions based on brain activity, would that disprove free will?




In [91]:
generate_questions(topic="free money", num_questions=3)

1. Is Universal Basic Income (UBI) a sustainable solution for economic inequality, or would it lead to inflation?

2. How does the concept of "free money" through government stimulus checks impact individual work ethic?

3. What are the psychological effects of receiving unexpected windfall money versus earned income?




In [92]:
generate_questions(topic="jupyter notebook", num_questions=3)

1. How has Jupyter Notebook revolutionized the way data scientists collaborate and share their work compared to traditional programming environments?

2. What are the potential security implications of running Jupyter Notebooks in enterprise environments, and how can these risks be mitigated?

3. How does the integration of markdown and code cells in Jupyter Notebook influence the way we approach documentation and literate programming?


