# Text Generation Basics

This tutorial shows you how to control the text generation process and fine-tune the diversity, creativity, and quality of the generated text according to your needs. By adjusting these options and experimenting with different combinations of values, you can find the best settings for your specific use case.

In [None]:
MODEL = "../models/gemma-1.1-7b-it.Q4_K_M.gguf"
PROMPT_FILE = "../prompts/engaging-twitter-thread.txt"


## Random Number Generator (RNG) Seed

The RNG seed is used to initialize the random number generator that influences the text generation process. By setting a specific value for `--seed` you can obtain consistent and reproducible results across multiple runs with the same input and settings. This can be helpful for testing, debugging, or comparing the effects of different options on the generated text to see when they diverge. If the seed is set to a value less than 0, a random seed will be used, which will result in different outputs on each run. The default value is -1 which will choose a random value for `--seed`.

### Random `--seed` example

In [None]:
%%bash -s "$MODEL"

llama-cli \
    --model "$1" \
    --prompt "Why is the sky blue?" \
    --seed -1


### Fixed `--seed` example

In [None]:
%%bash -s "$MODEL"

llama-cli \
    --model "$1" \
    --prompt "Why is the sky blue?" \
    --seed 42


## Number of Tokens to Predict

The `--predict N` (default: -1) controls the number of tokens the model generates in response to the input prompt. By adjusting this value, you can influence the length of the generated text. A higher value will result in longer text, while a lower value will produce shorter text.

Even though all models have a finite context window, a value of -1 will enable *infinite* text generation. How? When the context window is full half of the tokens after `--keep N` will be discarded. The context must then be re-evaluated before generation can resume. On large models and/or large context windows, this can result in a significant pause in output. If the output delay is undesirable, a value of -2 will stop generation immediately when the context is filled.

It is important to note that the generated text may be shorter than the specified number of tokens if an End-of-Sequence (EOS) token or a reverse prompt is encountered. In interactive mode, text generation will pause and control will be returned to the user. In non-interactive mode, the program will end. In both cases, the text generation may stop before reaching the specified `--predict` value. If you want the model to keep going without ever producing End-of-Sequence on its own, you can use the `--ignore-eos` parameter.

### Default `--predict` example

In [None]:
%%bash -s "$MODEL"

llama-cli \
    --model "$1" \
    --prompt "What is the meaning of life?" \
    --predict -1


### "until context filled" `--predict` example

In [None]:
%%bash -s "$MODEL"

llama-cli \
    --model "$1" \
    --prompt "What is the meaning of life?" \
    --ctx-size 256 \
    --predict -2


## Temperature

Temperature, `--temp N` is a hyperparameter that controls the randomness of the generated text. It affects the probability distribution of the model's output tokens. A higher temperature makes the output more random and creative, while a lower temperature makes the output more focused, deterministic, and conservative.

### Default `--temp` example

The default value is `--temp 0.8` which provides a balance between randomness and determinism.

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --seed 42 \
    --temp 0.8


### Low `--temp` example

At the extreme, a temperature of 0 will always pick the most likely next token.

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --seed 42 \
    --temp 0


### Exercise

Explore higher values for `--temp`. Is there a value of `--temp` which it "too high" for your model and prompt?

#### Solution

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --seed 42 \
    --temp 10


## Repeat Penalty

The `--repeat-penalty` option helps prevent the model from generating repetitive or monotonous text. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. The default value is 1 (which means no penalty).

The `--repeat-last-n` option controls the number of tokens in the history to consider for penalizing repetition. A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only consider recent tokens. A value of 0 disables the penalty, and a value of -1 sets the number of tokens considered equal to the context size, `--ctx-size`. The default value is 64. 


### Example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --repeat-penalty 1.5 \
    --repeat-last-n 128


## Top-K Sampling

Top-k sampling is a text generation method that selects the next token only from the `--top-k` most likely tokens predicted by the model. It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit the diversity of the output. A higher value for top-k (e.g., 100) will consider more tokens and lead to more diverse text, while a lower value (e.g., 10) will focus on the most probable tokens and generate more conservative text. The default value is 40.


### Default `--top-k` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-k 40


### Low `--top-k` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-k 10


### High `--top-k` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-k 100


## Top-P Sampling

Top-p sampling, `top-p`, also known as nucleus sampling, is another text generation method that selects the next token from a subset of tokens that together have a cumulative probability of at least p. This method provides a balance between diversity and quality by considering both the probabilities of tokens and the number of tokens to sample from. A higher value for top-p (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. The default value is 0.9.


### Default `--top-p` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-p 0.9


### Low `--top-p` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-p 0.5


### High `--top-p` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-p 0.99


## Min-P Sampling

The `--min-p` sampling method sets a minimum base probability threshold for token selection and aims to ensure a balance of quality and variety in the generated text. The `--min-p` method was designed as an alternative to `--top-p`. The parameter $p$ represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with $p=0.05$ and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. The default value is 0.1.


### Default `--min-p` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-p 0.9 \
    --min-p 0.1


### Low `--min-p` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-p 0.9 \
    --min-p 0.05


### High `--min-p` example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --top-p 0.9 \
    --min-p 0.2


## Locally Typical Sampling

Locally typical sampling, `--typical` promotes the generation of contextually coherent and diverse text by sampling tokens that are typical or expected based on the surrounding context. By setting the parameter $p$ between 0 and 1, you can control the balance between producing text that is locally coherent and diverse.

### Default `--typical` example

The default value of 1 disables locally typical sampling.


In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --typical 1
    

### Typical `--typical` example

A value closer to 1 will promote more contextually coherent tokens.

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --typical 0.9
    

### Low `--typical` example

A value closer to 0 will promote more diverse tokens. 

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --typical 0.05
    

## Mirostat Sampling

Mirostat is an algorithm that actively maintains the quality of generated text within a desired range during text generation. It aims to strike a balance between coherence and diversity, avoiding low-quality output caused by excessive repetition (boredom traps) or incoherence (confusion traps). To enable Mirostat sampling set `--mirostat` to 1 = Mirostat 1.0 or 2 = Mirostat 2.0. By default Mirostat sampling is disabled, `--mirostat 0`.

The `--mirostat-lr` option sets the Mirostat learning rate (eta). The learning rate influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. The default value is `0.1`.

The `--mirostat-ent` option sets the Mirostat target entropy (tau), which represents the desired perplexity value for the generated text. Adjusting the target entropy allows you to control the balance between coherence and diversity in the generated text. A lower value will result in more focused and coherent text, while a higher value will lead to more diverse and potentially less coherent text. The default value is `5.0`.

### Example

In [None]:
%%bash -s "$MODEL" "$PROMPT_FILE"

llama-cli \
    --model "$1" \
    --file "$2" \
    --mirostat 2 \
    --mirostat-lr 0.05 \
    --mirostat-ent 3.0
