# Appendix: Explore Hyperparameters

This appendix notebook explores the impact of the hyperparameters `temperature`, `top_p`, and `top_k` on LLM token sampling.





---

## Imports

In [None]:
import os
import random
import math

---

## Top K

The `top_k` parameter limits the model's choices to the `k` most likely next token. When top_k is set to 1, the model selects the most probable token and in this case its outputs, given the exact same prompt, will always be the same. We refer to this as **greedy decoding**.

When `top_k` is set to greater than 1, the model can consider more than one possible next token, not just what it evaluates as the most likely.

So far we have been working with a default value of `top_k` set to 1.

Here's some toy code to help with your intuition of `top_k`. Imagine these are the candidate next words generated by the LLM.

In [None]:
# List of words with their likelihood of being the next word, sorted by likelihood.
words_and_likelihoods_of_being_next_sorted = [
    ('apple', 0.4),
    ('dragonfruit', 0.2),
    ('marita', 0.1)
]

The following toy function will get the next word for generation taking `top_k` into account.

In [None]:
def get_next_word(words_sorted_by_likelihood_of_being_next, top_k):
    # Limit the choices to the top_k words.
    words_available_to_be_next = words_sorted_by_likelihood_of_being_next[:top_k]
    # Separate the words and their probabilities.
    words, probabilities = zip(*words_available_to_be_next)
    # Choose one word based on the probabilities as weights.
    return random.choices(words, weights=probabilities, k=1)[0]

Iterating through several values of `top_k` we can see how it limits the possibility for what word might come next in the generation.

In [None]:
top_ks = [1, 2, 3]

for top_k in top_ks:
    print(f'Setting top_k to {top_k}.')
    for _ in range(10):
        next_word = get_next_word(words_and_likelihoods_of_being_next_sorted, top_k)
        print(next_word)
    print()

Setting top_k to 1.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting top_k to 2.
apple
apple
dragonfruit
dragonfruit
apple
apple
apple
apple
apple
apple

Setting top_k to 3.
apple
dragonfruit
dragonfruit
apple
dragonfruit
apple
marita
apple
apple
marita



---

## Temperature

When `top_k` is set to 1, the `temperature` parameter has no effect, however, when `top_k` is greater than one, we can also pass in a value between `0.0` and `1.0` for the model's `temperature`.

**Temperature** influences the randomness of word selection: a higher **temperature** increases the likelihood of choosing less probable words, adding variability to the text. Lower **temperatures** make the model's choices more predictable.

For example, when `top_k` is set to 2, the model selects from the two most likely next tokens. As the temperature increases, the probability distribution becomes more uniform, giving the second-most likely token a higher chance of being chosen, whereas a lower temperature makes the model more biased towards choosing the most likely token out of the two.

Here's a toy code example to help with your intuition. We provide again a set of words with their probabililties for coming next in the text generation.

In [None]:
# List of words with their likelihood of being the next word, sorted by likelihood.
words_and_likelihoods_of_being_next_sorted = [
    ('apple', 0.4),
    ('dragonfruit', 0.2),
    ('marita', 0.1)
]

The function `apply_temperature` updates the probabilities of of being next relative to the temperature setting.

In [None]:
def apply_temperature(probabilities, temperature):
    # Ensure temperature is within the valid range for your model
    if temperature <= 0 or temperature > 1:
        raise ValueError("Temperature must be greater than 0 and less than or equal to 1")
    # Apply temperature to probabilities
    adjusted_probabilities = [pow(p, 1 / temperature) for p in probabilities]
    return adjusted_probabilities

The function `get_next_word_temperature` selects the next word taking a temperature value into account.

In [None]:
def get_next_word_temperature(words_sorted_by_likelihood_of_being_next, temperature):
    # Separate the words and their original probabilities.
    words, probabilities = zip(*words_sorted_by_likelihood_of_being_next)
    # Adjust the probabilities by applying temperature.
    adjusted_probabilities = apply_temperature(probabilities, temperature)
    # Choose one word based on the adjusted probabilities as weights.
    return random.choices(words, weights=adjusted_probabilities, k=1)[0]

Iterating through several values of `temperature` we can see how it impacts the possibility for what word might come next in the generation.

In [None]:
temperatures = [0.01, 0.5, 1.0]

for temperature in temperatures:
    print(f'Setting temperature to {temperature}.')
    for _ in range(10):
        next_word = get_next_word_temperature(words_and_likelihoods_of_being_next_sorted, temperature)
        print(next_word)
    print()

Setting temperature to 0.01.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting temperature to 0.5.
apple
apple
apple
apple
apple
apple
dragonfruit
dragonfruit
apple
apple

Setting temperature to 1.0.
apple
dragonfruit
marita
apple
apple
apple
dragonfruit
apple
marita
apple



---

## Top P

In the context of text generation with language models, `top_p`, also known as "nucleus sampling," involves selecting a subset of possible next tokens whose cumulative probability is just over the threshold specified by `top_p`, which is a floating point value between 0.0 and 1.0. Here's how it works:
- The model calculates the probability for each possible next token and sorts them in descending order.
- Starting from the most probable token, it adds tokens to the subset until the sum of their probabilities exceeds the `top_p` threshold.
- The model then randomly selects the next token only from this subset.

Here's a toy code example to help with your intuition. We provide again a set of words with their probabililties for coming next in the text generation.

In [None]:
# List of words with their likelihood of being the next word, sorted by likelihood.
words_and_likelihoods_of_being_next_sorted = [
    ('apple', 0.4),
    ('dragonfruit', 0.2),
    ('marita', 0.1)
]

The function `get_next_word_top_p` chooses which word to come next taking a value for `p` into account.

In [None]:
def get_next_word_top_p(words_sorted_by_likelihood_of_being_next, p):
    # Initialize the cumulative probability.
    cumulative = 0
    # List to hold words and probabilities up to the cumulative probability p.
    words_available_to_be_next = []
    # Add words and their probabilities to the list until the cumulative probability reaches p.
    for word, likelihood in words_sorted_by_likelihood_of_being_next:
        cumulative += likelihood
        words_available_to_be_next.append((word, likelihood))
        if cumulative >= p:
            break
    # Separate the words and their probabilities.
    words, probabilities = zip(*words_available_to_be_next)
    # Choose one word based on the probabilities as weights.
    return random.choices(words, weights=probabilities, k=1)[0]

Iterating through several values of `top_p` we can see how it limits the possibility for what word might come next in the generation.

In [None]:
top_ps = [0.1, 0.6, 1.0]

for top_p in top_ps:
    print(f'Setting top_p to {top_p}.')
    for _ in range(10):
        next_word = get_next_word_top_p(words_and_likelihoods_of_being_next_sorted, top_p)
        print(next_word)
    print()

Setting top_p to 0.1.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting top_p to 0.6.
apple
dragonfruit
apple
dragonfruit
apple
dragonfruit
apple
apple
apple
apple

Setting top_p to 1.0.
apple
apple
apple
apple
apple
marita
dragonfruit
apple
apple
apple



---

## Combined

Below we give a toy example with a longer list of possible words that will allow us to run a toy experiment about how the mixture of parameters effects potential next word selection.

In [None]:
# List of words with their likelihood of being the next word, sorted by likelihood.
words_and_likelihoods_of_being_next_sorted = [
    ('apple', 0.4),
    ('dragonfruit', 0.2),
    ('marita', 0.1)
]

The function `get_next_word_combined` takes `top_k`, `p`, and `temperature` into account.

In [None]:
def get_next_word_combined(words_sorted_by_likelihood_of_being_next, top_k, p, temperature):
    # Apply top_k to limit the choices.
    words_available_to_be_next = words_sorted_by_likelihood_of_being_next[:top_k]

    # Initialize the cumulative probability.
    cumulative = 0
    # List to hold words and probabilities after applying top_p.
    top_p_words = []
    # Add words to the list based on top_p criteria.
    for word, likelihood in words_available_to_be_next:
        cumulative += likelihood
        top_p_words.append((word, likelihood))
        if cumulative >= p:
            break

    # Separate the words and their probabilities after applying top_p.
    words, probabilities = zip(*top_p_words)
    # Adjust the probabilities by applying temperature.
    adjusted_probabilities = apply_temperature(probabilities, temperature)

    # Choose one word based on the adjusted probabilities as weights.
    return random.choices(words, weights=adjusted_probabilities, k=1)[0]

Iterating through several values of `top_k` we can see how it limits the possibility for what word might come next in the generation.

In [None]:
top_ks = [2, 3]
top_ps = [0.6, 1]
temperatures = [0.1, 1]

for top_k in top_ks:
    for top_p in top_ps:
        for temperature in temperatures:
            print(f'Setting top_k to {top_k}, top_p to {top_p}, temperature to {temperature}.')
            for _ in range(10):
                next_word = get_next_word_combined(
                    words_and_likelihoods_of_being_next_sorted,
                    top_k,
                    top_p,
                    temperature
                )
                print(next_word)
            print()

Setting top_k to 2, top_p to 0.6, temperature to 0.1.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting top_k to 2, top_p to 0.6, temperature to 1.
apple
apple
apple
dragonfruit
dragonfruit
apple
apple
apple
apple
dragonfruit

Setting top_k to 2, top_p to 1, temperature to 0.1.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting top_k to 2, top_p to 1, temperature to 1.
apple
apple
apple
dragonfruit
apple
apple
dragonfruit
dragonfruit
dragonfruit
apple

Setting top_k to 3, top_p to 0.6, temperature to 0.1.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting top_k to 3, top_p to 0.6, temperature to 1.
dragonfruit
apple
apple
dragonfruit
apple
dragonfruit
apple
apple
apple
dragonfruit

Setting top_k to 3, top_p to 1, temperature to 0.1.
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple

Setting top_k to 3, top_p to 1, temperature to 1.
apple
dragonfruit
apple
marita
apple
dragonfruit
apple
dragonfruit
apple
dragonfruit

