# temperature

It controls the randomness of predictions made by the model. 

- A lower temperature (e.g., close to 0) results in more deterministic outputs, where the model is likely to choose the most probable next token based on its learned distribution. 
- T=1, logits are not normalized, so no changes to logits. softmax's proababilities doesn't get affected by T.


- A higher temperature (e.g., above 1) increases randomness, allowing for more diverse and creative outputs, but at the risk of generating less coherent text.




### How Temperature Works

Basically, the temperature value we provide is used to scale down the probabilities of the next individual tokens that the model can select from.

With a higher temperature, we'll have a softer curve of probabilities. With a lower temperature, we have a much more peaked distribution. If the temperature is almost 0, we're going to have a very sharp peaked distribution.


![image.png](attachment:image.png)



# top p


The parameter 'p' in large language models (LLMs) is often referred to as the "top-p" sampling or nucleus sampling parameter. 

- When generating text, 'p' determines the cumulative probability threshold for selecting the next token. 

- Instead of considering all possible tokens, the model only looks at the smallest set of tokens whose cumulative probability exceeds 'p'. 

- For example, if 'p' is set to 0.9, the model will sample from the top tokens that **together account for 90%** of the probability mass. 

- This approach allows for a balance between randomness and coherence, as it enables the model to explore diverse outputs while still focusing on the most likely candidates.


# top k


- Top K is similar to Top P, but instead defines a quantity of the most probable tokens that should be considered. For example, a Top K of 3 would instruct the LLM to only consider the three most likely tokens. A Top K of 1 would force the LLM to only consider the most likely token.

- A low Top K is similar to a low temperature. However, Top K is a more crude metric because it doesn’t account for the relative probabilities between the options. It’s also not as well-supported, notably missing from OpenAI’s API.

# max-tokens


The 'max-tokens' parameter in large language models (LLMs) specifies the maximum number of tokens that the model is allowed to generate in a single output. 

- Tokens can be as short as one character or as long as one word, depending on the tokenization method used. 

- Setting a limit on the number of tokens helps control the length of the generated text, ensuring that outputs are concise and relevant to the input prompt.

- For instance, if 'max-tokens' is set to 50, the model will generate up to 50 tokens in response to a given input. This is particularly useful in applications where brevity is important, such as chatbots or summarization tasks.

- Additionally, the 'max-tokens' parameter can help manage computational resources, as generating longer texts requires more processing power and time. By capping the output length, developers can optimize performance and reduce latency in real-time applications.

- It's important to note that while 'max-tokens' controls the length of the output, it does not influence the quality or coherence of the generated text. Therefore, it should be used in conjunction with other parameters like temperature and top-p to achieve the desired balance between creativity and relevance.


# stop sequence

When you provide a stop sequence, the model will generate text as usual, but will halt immediately if it encounters a stop sequence. This keeps responses concise and prevents the model from drifting off into excessive output.

benefits:

1/ Cost management: Because LLMs charge per token, stop sequences help you limit token usage and save costs by limiting the output.

2/ Structured outputs: In structured outputs like XML or JSON, stop sequences stop models from adding unnecessary information. This is particularly helpful for API responses where extra text might break the integration.

# additional parameters

https://www.vellum.ai/llm-parameters/stop-sequence