In [101]:
import json
import requests
from IPython.display import JSON, Markdown


# Completion API Endpoints

## Start `llama-server`

Open a new terminal and run the following command to start a new server.

```bash
MODEL="./models/gemma-1.1-7b-it.Q4_K_M.gguf"
llama-server \
    --model $MODEL \
    --host localhost \
    --port 8080
```

## Basic usage


    `prompt`: Provide the prompt for this completion as a string or as an array of strings or numbers representing tokens. Internally, if `cache_prompt` is `true`, the prompt is compared to the previous completion and only the "unseen" suffix is evaluated. A `BOS` token is inserted at the start, if all of the following conditions are true:

      - The prompt is a string or an array with the first element given as a string
      - The model's `tokenizer.ggml.add_bos_token` metadata is `true`

    `cache_prompt`: Re-use KV cache from a previous request if possible. This way the common prefix does not have to be re-processed, only the suffix that differs between the requests. Because (depending on the backend) the logits are **not** guaranteed to be bit-for-bit identical for different batch sizes (prompt processing vs. token generation) enabling this option can cause nondeterministic results. Default: `false`

          `stream`: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to `true`.

    `stop`: Specify a JSON array of stopping strings.
    These words will not be included in the completion, so make sure to add them to the prompt for the next iteration. Default: `[]`

In [110]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
    }
)

In [113]:
print(response)

<Response [200]>


In [116]:
json_data = json.loads(response.content)
JSON(json_data)

<IPython.core.display.JSON object>

In [None]:
_generated_text = json_data["content"]
Markdown(_generated_text)

## Options

### Random Number Generator (RNG) Seed

The RNG seed is used to initialize the random number generator that influences the text generation process. By setting a specific value for `seed` you can obtain consistent and reproducible results across multiple runs with the same input and settings. This can be helpful for testing, debugging, or comparing the effects of different options on the generated text to see when they diverge. If the seed is set to a value less than 0, a random seed will be used, which will result in different outputs on each run. The default value is -1 which will choose a random value for `seed`.

In [128]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42
    }
)

In [129]:
_json_data = json.loads(response.content)
_generated_text = _json_data["content"]
Markdown(_generated_text)



**Answer:**

The sky is blue due to a phenomenon called **Rayleigh scattering**. 

* Sunlight is composed of all the colors of the rainbow, each with a specific wavelength. 
* When sunlight interacts with molecules in the atmosphere, such as nitrogen and oxygen, the different wavelengths are scattered in different directions. 
* Shorter wavelengths of light, like blue light, are scattered more efficiently than longer wavelengths. 
* Since our eyes are most sensitive to blue light, we perceive the sky as blue.

### Number of Tokens to Predict

The `n_predict` (default: -1) controls the number of tokens the model generates in response to the input prompt. By adjusting this value, you can influence the length of the generated text. A higher value will result in longer text, while a lower value will produce shorter text.

Even though all models have a finite context window, a value of -1 will enable *infinite* text generation. How? When the context window is full half of the tokens after `n_keep` will be discarded. The context must then be re-evaluated before generation can resume. On large models and/or large context windows, this can result in a significant pause in output. If the output delay is undesirable, a value of -2 will stop generation immediately when the context is filled.

It is important to note that the generated text may be shorter than the specified number of tokens if an End-of-Sequence (EOS) token or a reverse prompt is encountered. In interactive mode, text generation will pause and control will be returned to the user. In non-interactive mode, the program will end. In both cases, the text generation may stop before reaching the specified `n_predict` value. If you want the model to keep going without ever producing End-of-Sequence on its own, you can use the `ignore_eos` parameter.


     `n_keep`: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. The number excludes the BOS token.
    By default, this value is set to `0`, meaning no tokens are kept. Use `-1` to retain all tokens from the prompt.

    `min_keep`: If greater than 0, force samplers to return N possible tokens at minimum. Default: `0`

    `t_max_predict_ms`: Set a time limit in milliseconds for the prediction (a.k.a. text-generation) phase. The timeout will trigger if the generation takes more than the specified time (measured since the first token was generated) and if a new-line character has already been generated. Useful for FIM applications. Default: `0`, which is disabled.

In [130]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False
    }
)

In [131]:
_json_data = json.loads(response.content)
_generated_text = _json_data["content"]
Markdown(_generated_text)



**Answer:**

The sky is blue due to a phenomenon called **Rayleigh scattering**. 

* Sunlight is composed of all the colors of the rainbow, each with a specific wavelength. 
* When sunlight interacts with molecules in the atmosphere, such as nitrogen and oxygen, the different wavelengths are scattered in different directions. 
* Shorter wavelengths of light, like blue light, are scattered more efficiently than longer wavelengths. 
* Since our eyes are most sensitive to blue light, we perceive the sky as blue.

### Temperature

The `temperature` hyperparameter that controls the randomness of the generated text. It affects the probability distribution of the model's output tokens. A higher temperature makes the output more random and creative, while a lower temperature makes the output more focused, deterministic, and conservative.

    `dynatemp_range`: Dynamic temperature range. The final temperature will be in the range of `[temperature - dynatemp_range; temperature + dynatemp_range]` Default: `0.0`, which is disabled.

    `dynatemp_exponent`: Dynamic temperature exponent. Default: `1.0`

In [132]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.1,
        "dynatemp_exponent": 1.0
    }
)

In [133]:
_json_data = json.loads(response.content)
_generated_text = _json_data["content"]

Markdown(_generated_text)



**Answer:**

The sky is blue due to a phenomenon called **Rayleigh scattering**. 

* Sunlight is composed of all the colors of the rainbow, each with a specific wavelength. 
* When sunlight interacts with molecules in the atmosphere, such as nitrogen and oxygen, the different wavelengths are scattered in different directions. 
* Shorter wavelengths of light, such as blue light, are scattered more efficiently than longer wavelengths. 
* As a result, more blue light is scattered in all directions, reaching our eyes and making the sky appear blue.

#### 1.1.6 Repeat Penalty

The `repeat_penalty` option helps prevent the model from generating repetitive or monotonous text. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. The default value is 1.1.

The `repeat_last_n` option controls the number of tokens in the history to consider for penalizing repetition. A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only consider recent tokens. A value of 0 disables the penalty, and a value of -1 sets the number of tokens considered equal to the context size, `ctx_size`. The default value is 64. 

    `penalize_nl`: Penalize newline tokens when applying the repeat penalty. Default: `true`


In [134]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.1,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True
    }
)

In [135]:
_json_data = json.loads(response.content)
_generated_text = _json_data["content"]

Markdown(_generated_text)



**Answer:** 
The sky is blue due to the process of **Rayleigh scattering**. Sunlight consists mainly  of all wavelengths in a range from violet through red. When sunlight interacts with molecules like nitrogen and oxygen, shorter wavelength light (violet &blue) gets scattered more efficiently than longer wave length lights(orange&red).

* Shorter waves have higher frequency/energy per unit area which results into greater scattering power against the particles of air .
**The blue wavelengths are dispersed in all directions by these gases.** 


- Most  of this diffused light reaches our eyes from a relatively small portion (about one degree) directly overhead. That's why we see mainly skyblue during clear weather when there isn’t much dust or cloud to absorb the scattered sunlight .

**Factors influencing colour of Sky:**
* Time & Latitude - different time zones experience differently coloured skies due variations in sun elevation and composition  of atmosphere 


- Cloud Coverage/ Dust particles – clouds block direct contact between Sunlight& air molecules thereby limiting scattering. Similarly, large dust particle can scatter all wavelengths equally leading to a less blue sky .

### Top-K Sampling

Top-k sampling is a text generation method that selects the next token only from the `--top-k` most likely tokens predicted by the model. It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit the diversity of the output. A higher value for top-k (e.g., 100) will consider more tokens and lead to more diverse text, while a lower value (e.g., 10) will focus on the most probable tokens and generate more conservative text. The default value is 40.

    `top_k`: Limit the next token selection to the K most probable tokens.  Default: `40`


In [136]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.1,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True,
        "top_k": 40,
    }
)

In [137]:
json_data = json.loads(response.content)
_generated_text = json_data["content"]

Markdown(_generated_text)



**Answer:** 
The sky is blue due to the process of **Rayleigh scattering**. Sunlight consists mainly  of all wavelengths in a range from violet through red. When sunlight interacts with molecules like nitrogen and oxygen, shorter wavelength light (violet &blue) gets scattered more efficiently than longer wave length lights(orange&red).

* Shorter waves have higher frequency/energy per unit area which results into greater scattering power against the particles of air .
**The blue wavelengths are dispersed in all directions by these gases.** 


- Most  of this diffused light reaches our eyes from a relatively small portion (about one degree) directly overhead. That's why we see mainly skyblue during clear weather when there isn’t much dust or cloud to absorb the scattered sunlight .

**Factors influencing colour of Sky:**
* Time & Latitude - different time zones experience differently coloured skies due variations in sun elevation and composition  of atmosphere 


- Cloud Coverage/ Dust particles – clouds block direct contact between Sunlight& air molecules thereby limiting scattering. Similarly, large dust particle can scatter all wavelengths equally leading to a less blue sky .

### Top-P Sampling

Top-p sampling, `top-p`, also known as nucleus sampling, is another text generation method that selects the next token from a subset of tokens that together have a cumulative probability of at least p. This method provides a balance between diversity and quality by considering both the probabilities of tokens and the number of tokens to sample from. A higher value for top-p (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. The default value is 0.9.

    `top_p`: Limit the next token selection to a subset of tokens with a cumulative probability above a threshold P. Default: `0.95`

In [138]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.1,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True,
        "top_k": 40,
        "top_p": 0.95,
    }
)

In [139]:
json_data = json.loads(response.content)
_generated_text = json_data["content"]

Markdown(_generated_text)



**Answer:** 
The sky is blue due to the process of **Rayleigh scattering**. Sunlight consists mainly  of all wavelengths in a range from violet through red. When sunlight interacts with molecules like nitrogen and oxygen, shorter wavelength light (violet &blue) gets scattered more efficiently than longer wave length lights(orange&red).

* Shorter waves have higher frequency/energy per unit area which results into greater scattering power against the particles of air .
**The blue wavelengths are dispersed in all directions by these gases.** 


- Most  of this diffused light reaches our eyes from a relatively small portion (about one degree) directly overhead. That's why we see mainly skyblue during clear weather when there isn’t much dust or cloud to absorb the scattered sunlight .

**Factors influencing colour of Sky:**
* Time & Latitude - different time zones experience differently coloured skies due variations in sun elevation and composition  of atmosphere 


- Cloud Coverage/ Dust particles – clouds block direct contact between Sunlight& air molecules thereby limiting scattering. Similarly, large dust particle can scatter all wavelengths equally leading to a less blue sky .

### Min-P Sampling

The `--min-p` sampling method sets a minimum base probability threshold for token selection and aims to ensure a balance of quality and variety in the generated text. The `--min-p` method was designed as an alternative to `--top-p`. The parameter $p$ represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with $p=0.05$ and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. The default value is 0.1.

    `min_p`: The minimum probability for a token to be considered, relative to the probability of the most likely token. Default: `0.05`
    

In [140]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.1,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True,
        "top_k": 40,
        "top_p": 0.95,
        "min_p": 0.05,
    }
)

In [141]:
json_data = json.loads(response.content)
_generated_text = json_data["content"]

Markdown(_generated_text)



**Answer:** 
The sky is blue due to the process of **Rayleigh scattering**. Sunlight consists mainly  of all wavelengths in a range from violet through red. When sunlight interacts with molecules like nitrogen and oxygen, shorter wavelength light (violet &blue) gets scattered more efficiently than longer wave length lights(orange&red).

* Shorter waves have higher frequency/energy per unit area which results into greater scattering power against the particles of air .
**The blue wavelengths are dispersed in all directions by these gases.** 


- Most  of this diffused light reaches our eyes from a relatively small portion (about one degree) directly overhead. That's why we see mainly skyblue during clear weather when there isn’t much dust or cloud to absorb the scattered sunlight .

**Factors influencing colour of Sky:**
* Time & Latitude - different time zones experience differently coloured skies due variations in sun elevation and composition  of atmosphere 


- Cloud Coverage/ Dust particles – clouds block direct contact between Sunlight& air molecules thereby limiting scattering. Similarly, large dust particle can scatter all wavelengths equally leading to a less blue sky .

### Locally Typical Sampling

Locally typical sampling, `typical_p`, promotes the generation of contextually coherent and diverse text by sampling tokens that are typical or expected based on the surrounding context. By setting the parameter $p$ between 0 and 1, you can control the balance between producing text that is locally coherent and diverse. The default setting is $p=1.0$, which disables locally typical sampling.


In [142]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.0,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True,
        "top_k": 40,
        "top_p": 0.95,
        "min_p": 0.05,
        "typical_p": 1.0,
    }
)

In [143]:
json_data = json.loads(response.content)
_generated_text = json_data["content"]

Markdown(_generated_text)



**Answer:** 
The sky is blue due to the process of **Rayleigh scattering**. Sunlight consists mainly  of all wavelengths or colors. When sunlight interacts with molecules in Earth's atmosphere, like nitrogen and oxygen atoms these particles scatter light rays randomly without altering their direction significantly . But different color lights are scattered differently based on wavelength:

- Shorter Wavelength (blue/violet) - gets dispersed more efficiently
_LongerWavelength_(red / infra red)_ is less effectively diffused. 


**The blue wavelengths of sunlight:**  travel in straight lines until they reach our eyes, giving us the impression that sky appears **BLUE**.

### Mirostat Sampling

Mirostat is an algorithm that actively maintains the quality of generated text within a desired range during text generation. It aims to strike a balance between coherence and diversity, avoiding low-quality output caused by excessive repetition (boredom traps) or incoherence (confusion traps). To enable Mirostat sampling set `mirostat` to 1 = Mirostat 1.0 or 2 = Mirostat 2.0. By default Mirostat sampling is disabled `mirostat` to 0.

The `mirostat_lr` option sets the Mirostat learning rate (eta). The learning rate influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. The default value is `0.1`.

The `mirostat_tau` option sets the Mirostat target entropy (tau), which represents the desired perplexity value for the generated text. Adjusting the target entropy allows you to control the balance between coherence and diversity in the generated text. A lower value will result in more focused and coherent text, while a higher value will lead to more diverse and potentially less coherent text. The default value is `5.0`.


In [125]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.0,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True,
        "top_k": 40,
        "top_p": 0.95,
        "min_p": 0.05,
        "typical_p": 1.0,
        "mirostat": 0.0,
        "mirostat_tau": 5.0,
        "mirostat_eta": 0.1 
        
    }
)

In [127]:
json_data = json.loads(response.content)
_generated_text = json_data["content"]

Markdown(_generated_text)



**Answer:**

The sky is blue due to a phenomenon called **Rayleigh scattering**. 

* Sunlight is composed of all the colors of the rainbow, each with a specific wavelength.
* When sunlight interacts with molecules in the atmosphere, such as nitrogen and oxygen, the molecules scatter the light.
* Different wavelengths of light scatter differently. 
* Shorter wavelengths of light, such as blue light, scatter more efficiently than longer wavelengths.
* Since blue light is scattered more evenly across the sky, we see a predominance of blue light when looking up at the sky.

**Additional factors influencing the color of the sky:**

* **Time of day:** The sun is higher in the sky during the day, resulting in more direct sunlight and a brighter blue sky.
* **Altitude:** Higher altitudes have thinner atmospheres, leading to less scattering and a clearer blue sky.
* **Cloud coverage:** Clouds block the sunlight and scatter less light, resulting in a less blue sky.
* **Pollution:** Pollution in the atmosphere can scatter different wavelengths of light, affecting the color of the sky.

### Samplers

`samplers`: The order the samplers should be applied in. An array of strings representing sampler type names. If a sampler is not set, it will not be used. If a sampler is specified more than once, it will be applied multiple times. Default: `["top_k", "tfs_z", "typical_p", "top_p", "min_p", "temperature"]` - these are all the available values.

In [144]:
response = requests.post(
    url="http://localhost:8080/completion",
    json={
        "prompt": "Why is the sky blue?",
        "cache_prompt": False,
        "stream": False,
        "stop": [],
        "seed": 42,
        "n_predict": -1,
        "n_keep": 0,
        "min_keep": 0,
        "ignore_eos": False,
        "temperature": 0.8,
        "dynatemp_range": 0.0,
        "dynatemp_exponent": 1.0,
        "repeat_penalty": 1.1,
        "repeat_last_n": 64,
        "penalize_nl": True,
        "top_k": 40,
        "top_p": 0.95,
        "min_p": 0.05,
        "typical_p": 1.0,
        "mirostat": 0.0,
        "mirostat_tau": 5.0,
        "mirostat_eta": 0.1,
        "samplers": [
            "top_k",
            "tfs_z",
            "typical_p",
            "top_p",
            "min_p",
            "temperature",
        ]
    }
)

In [145]:
json_data = json.loads(response.content)
_generated_text = json_data["content"]

Markdown(_generated_text)



**Answer:** 
The sky is blue due to the process of **Rayleigh scattering**. Sunlight consists mainly  of all wavelengths or colors. When sunlight interacts with molecules in Earth's atmosphere, like nitrogen and oxygen atoms these particles scatter light rays randomly without altering their direction significantly . But different color lights are scattered differently based on wavelength:

- Shorter Wavelength (blue/violet) - gets dispersed more efficiently
_LongerWavelength_(red / infra red)_ is less effectively diffused. 


**The blue wavelengths of sunlight:**  travel in straight lines until they reach our eyes, giving us the impression that sky appears **BLUE**.