--------------------------
#### Frequency penalty
- default = 0
------------------------

| **Aspect**                        | **Without Frequency Penalty**                                                                                                     | **With Frequency Penalty**                                                                                                         | **Observation**                                                                                                                    |
|-----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| **Example 1**                     | "A sunny day at the beach is a great day. The sun shines brightly, and the beach is filled with people. People love to sunbathe on the beach. The beach has beautiful sand, and the waves are calm. A sunny day is perfect for swimming in the ocean. The sun is shining, and everyone enjoys the sunny day." | "A sunny day at the beach is delightful. The bright sun casts warmth on the sand, and visitors flock to the shore. Many choose to relax under colorful umbrellas, while others engage in beach games. The coastline boasts soft sand, and the gentle waves invite swimmers. This beautiful weather creates a perfect atmosphere for making memories." | In the first example (without a frequency penalty), the model repeatedly uses phrases like "sunny day," "beach," and "sun" multiple times. This repetition may lead to less engaging and monotonous text. In the second example (with a frequency penalty), the output features more varied vocabulary and different expressions. Words like "delightful," "warmth," "visitors," and "coastline" are used, resulting in a more interesting and engaging description. |
| **Example 2**                     | "A dog is a loyal pet. Dogs are great companions. Many people love dogs. Dogs can be trained to do tricks. A dog needs exercise and love. Dogs come in various breeds, and each breed has unique characteristics. A dog can be a friend and a protector." | "A dog is known for its loyalty and companionship. These animals are beloved by many for their playful nature. Each breed offers different traits and characteristics. Regular exercise and affection are essential for a dog's well-being. A dog can also serve as a guardian, providing both friendship and protection." | In the first example (without a frequency penalty), the model generates repetitive content, which can detract from the overall quality and engagement of the text. In the second example (with a frequency penalty), the model produces more varied and dynamic language, making the text more appealing and coherent. |
| **Summary**                       | Without frequency penalty, the model generates repetitive content that can detract from quality and engagement.                     | With frequency penalty, the model produces varied and dynamic language, enhancing the appeal and coherence of the text.            | In contexts where creativity and engagement are essential, using a frequency penalty can significantly enhance the quality of the generated output.                         |


#### How frequency penalty works?

- The frequency penalty in OpenAI's GPT models works by `modifying the probability distribution` of tokens during text generation to `reduce` the likelihood of repeating tokens that have already been used.

##### Token Probability Distribution:
- When the model generates text, it calculates probabilities for each potential next token (word or phrase) based on the context provided by the prompt and the tokens that have already been generated.
- These probabilities determine which token the model will select as the next word in the sequence.

##### Applying the Frequency Penalty:
- The frequency penalty is applied to tokens that have already been generated. The `penalty reduces the probabilities of these tokens`.
- The `amount of reduction` depends on the `frequency penalty` value set by the user:
  - **Value of 0**: No penalty is applied, meaning tokens can be repeated freely.
  - **Higher Values (1 to 2)**: The penalty is incrementally increased, making previously used tokens less likely to be selected again.

##### Effect on Token Selection:
- As a result of applying the frequency penalty, the model is incentivized to choose less frequent tokens. This leads to:
  - **Reduced Repetition**: Common words or phrases that have already appeared in the text are less likely to be repeated.
  - **Increased Vocabulary Diversity**: The model may opt for synonyms or alternative expressions to maintain fluency and engagement in the generated text.

##### Adjusting Output Quality:
- By tuning the frequency penalty, users can control the style and creativity of the output. A higher penalty generally leads to:
  - More varied and engaging text.
  - A decrease in repetitive phrases, which can improve the overall quality of creative writing, storytelling, or any context where diverse language is beneficial.


In [1]:
import openai

In [2]:
from openai import OpenAI

In [3]:
client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    # api_key = openai_api_key
)

In [5]:
# Function to generate text with different temperatures
def generate_story(prompt, freq_penalty=0, max_tokens=100):
    response = client.chat.completions.create(
        model      = "gpt-3.5-turbo",  # You can use other engines like gpt-3.5-turbo
        messages   = [
        {"role": "user", "content": f'{prompt}'}
        ],
        max_tokens       = max_tokens,
        frequency_penalty= freq_penalty 
    )
    return response.choices[0].message.content

In [6]:
# Prompt
max_tokens = 1000
prompt = f'''Describe a cat using {max_tokens} words'''
prompt

'Describe a cat using 1000 words'

In [7]:
# Low freq penalty
freq_penalty = 0

print(f"Frequency Penalty = {freq_penalty}")
#print(generate_story(prompt, freq_penalty=freq_penalty, max_tokens=max_tokens))

text = generate_story(prompt, freq_penalty=freq_penalty, max_tokens=max_tokens)

Frequency Penalty = 0


In [8]:
text

"A cat is a fascinating creature, a domesticated feline that has been a beloved companion to humans for centuries. With its sleek body, graceful movements, and piercing eyes, the cat is a symbol of elegance and mystery.\n\nThe cat's coat is a thing of beauty, often soft and luxurious to the touch. It can come in a variety of colors and patterns, from solid black to striped tabby to calico with patches of white, black, and orange. Some cats have short fur that lies close to their body, while others have long, flowing coats that require regular grooming to keep them looking their best. No matter the type of fur, a cat's coat is always a source of pride for both the cat and its owner.\n\nA cat's eyes are another striking feature, windows to its soul that seem to hold a world of wisdom and secrets. They can be any color under the sun, from golden yellow to bright green to deep blue. When a cat stares at you with those mesmerizing eyes, it's as if it can see right through you, reading your 

In [9]:
import re

In [10]:
def count_cat_variations(text):
    # Define a pattern to match 'cat', 'cats', and 'cat's' (case insensitive)
    pattern = r'\b(cat|cats|cat\'s)\b'
    
    # Find all matches in the text
    matches = re.findall(pattern, text, flags=re.IGNORECASE)
    
    # Count occurrences
    counts = {
        'cat': matches.count('cat'),
        'cats': matches.count('cats'),
        "cat's": matches.count("cat's")
    }
    
    return counts

In [11]:
count_cat_variations(text)

{'cat': 22, 'cats': 3, "cat's": 0}

In [12]:
# High freq penalty
freq_penalty = 2

print(f"Frequency Penalty = {freq_penalty}")
#print(generate_story(prompt, freq_penalty=freq_penalty, max_tokens=max_tokens))

text = generate_story(prompt, freq_penalty=freq_penalty, max_tokens=max_tokens)

Frequency Penalty = 2


In [13]:
count_cat_variations(text)

{'cat': 3, 'cats': 1, "cat's": 0}

#### Maths behind freq penalty

#### Basic Concepts

**1. Token Probability Distribution**
- When generating text, the model calculates the probability $P(t_i)$ of the next token $t_i$ based on the context and previous tokens.

**2. Softmax Function**
- The model uses a softmax function to convert raw logits (the output scores for each token) into probabilities. 
- This can be represented as:
$$ \Large P(t_i) = \frac {e^{(z_i)}} { ∑e^{(z_j)}} $$

 where $z_i$ is the logit for token $t_i$ and the sum is over all possible tokens $j$.

#### Applying the Frequency Penalty

When a `frequency penalty` is applied, the model modifies the logits for tokens that have already been generated. The adjustment can be represented as follows:

**1. Define the Penalty**
- Let $f(t)$ be the frequency of token $t$ in the generated text so far. 
- The frequency penalty $FP(t)$ for a token can be defined as:
  
$$ \Large FP(t) = e^{(-λf(t))}$$

where λ is a scaling factor (which can vary based on the frequency penalty value, e.g., 0 to 2).

**2. Modify the Logits**
- The adjusted logit for a token $t_i$ after applying the `frequency penalty` can be expressed as:

$$ \Large z'_i = z_i + log(FP(t_i)) = z_i - λ.f(t_i) $$

Here, $z'_i$ is the modified logit that incorporates the penalty for previously used tokens.

**3. Recalculate Probabilities**
- After adjusting the logits, the new probabilities P'(t_i) for the tokens are recalculated using the softmax function:
  
$$ \Large P'(t_i) = e^{(z'_i)} / ∑(e^{(z'_j)}) 
           = e^{(z_i - λf(t_i))} / ∑(e^{(z_j - λf(t_j))})$$