## Masking out continuations that do not fit generation goals

We have a list of words that are forbidden, and another list of words that are brand words.
Choose continuations that do not use the forbidden words, and maximize the use of brand words.

In [None]:
#%pip install --upgrade --quiet transformers torch fbgemm-gpu accelerate

In [1]:
# CHANGE this to the Llama model for which you have applied for access via Hugging Face
# See: https://www.llama.com/docs/getting-the-models/hugging-face/
MODEL_ID = "meta-llama/Llama-3.2-3B-Instruct"

import os
from dotenv import load_dotenv
load_dotenv("../keys.env")
assert os.environ["HF_TOKEN"][:2] == "hf",\
       "Please sign up for access to the specific Llama model via HuggingFace and provide access token in keys.env file"

## Word lists

Set up the word lists.

"Banned words" from https://channelkey.com/amazon-content-seo-and-optimization/400-restricted-amazon-keywords-the-most-comprehensive-list-youll-ever-need/

In [2]:
# From https://channelkey.com/amazon-content-seo-and-optimization/400-restricted-amazon-keywords-the-most-comprehensive-list-youll-ever-need/
with open("banned_phrases.txt") as ifp:
    banned_phrases = [line.strip().lower() for line in ifp.readlines()]
banned_phrases[100:105]

['decomposable', 'definitive', 'degradable', 'dementia', 'depression']

In [3]:
# Based on https://marketkeep.com/seo-keywords-for-nutrition/
with open("desired_phrases.txt") as ifp:
    desired_phrases = [line.strip().lower() for line in ifp.readlines()]
desired_phrases[:5]

['nutrition', 'calorie deficit', 'diet', 'protein shake', 'paleo diet']

In [4]:
# makes unique
banned_phrases = set(banned_phrases)
desired_phrases = set(desired_phrases)

## Zero-shot generation

Without any logits processing

In [5]:
from transformers import pipeline

pipe = pipeline(
    task="text-generation", 
    model=MODEL_ID,
    use_fast=True,
    kwargs={
        "return_full_text": False,
    },
    model_kwargs={}
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


In [6]:
def generate_product_description(item: str) -> str:
    system_prompt = f"""
        You are a product marketer for a company that makes nutrition supplements.
        Balance your product descriptions to attract customers, optimize SEO, and
        stay within accurate advertising guidelines.
        Product descriptions have to be 3-5 sentences.
        Provide only the product description with no preamble.
    """
    user_prompt = f"""
        Write a product description for a {item}.
    """

    input_message = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}   
    ]
    
    results = pipe(input_message, 
                   max_new_tokens=512)
    return results[0]['generated_text'][-1]['content'].strip()

prod = generate_product_description("protein drink")
print(prod)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


**Optimize Your Fitness with Our High-Quality Protein Drink**

Stay focused and energized with our premium protein drink, designed to support muscle growth and recovery. Made with 20 grams of whey protein isolate, 5 grams of branched-chain amino acids (BCAAs), and essential vitamins, this drink helps to fuel your active lifestyle. With no artificial flavors or sweeteners, you can trust that you're getting a pure and effective supplement to support your fitness goals. Enjoy the taste and benefits of our protein drink as part of your daily routine.


Result:
```
**Optimize Your Fitness with Our High-Quality Protein Drink**

Stay focused and energized with our premium protein drink, designed to support muscle growth and recovery. Made with 20 grams of whey protein isolate, 5 grams of branched-chain amino acids (BCAAs), and essential vitamins, this drink helps to fuel your active lifestyle. With no artificial flavors or sweeteners, you can trust that you're getting a pure and effective supplement to support your fitness goals. Enjoy the taste and benefits of our protein drink as part of your daily routine.
```

In [16]:
import numpy as np

def evaluate(descr: str, positives, negatives) -> int:
    # go through and count the number of desired phrases and banned phrases
    descr = descr.lower()
    num_positive = np.sum([1 for phrase in positives if phrase in descr])
    num_negative = np.sum([1 for phrase in negatives if phrase in descr])
    return int(num_positive - num_negative)

def evaluate_verbose(descr: str, positives, negatives) -> int:
    # go through and count the number of desired phrases and banned phrases
    descr = descr.lower()
    
    num_positive = num_negative = 0
    for phrase in positives:
        if phrase in descr:
            num_positive += 1
            print(f"Good: {phrase}")
    for phrase in negatives:
        if phrase in descr:
            num_negative += 1
            print(f"Bad: {phrase}")
    print(f"Good: {num_positive}   Bad: {num_negative}")
    return num_positive - num_negative

evaluate_verbose(prod, desired_phrases, banned_phrases)

Good: whey
Good: whey protein
Bad: quality
Bad: growth
Bad: pure
Bad: used
Good: 2   Bad: 4


-2

In [17]:
evaluate(prod, desired_phrases, banned_phrases)

-2

## Use logits processing to enhance the product description

We'll use the same prompt, but use logits processing to upvote/downvote specific words

In [18]:
import torch
import numpy as np
from transformers.generation.logits_process import (
    LogitsProcessor,
    LOGITS_PROCESSOR_INPUTS_DOCSTRING,
)
from transformers.utils import add_start_docstrings

class BrandLogitsProcessor(LogitsProcessor):
    def __init__(self, tokenizer, positives, negatives):
        self.tokenizer = tokenizer
        self.positives = positives
        self.negatives = negatives
      
    @add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
    def __call__(
        self, input_ids: torch.LongTensor, input_logits: torch.FloatTensor
    ) -> torch.FloatTensor:
        output_logits = input_logits.clone()
            
        num_matches = [0] * len(input_ids)
        for idx, seq in enumerate(input_ids):
            # decode the sequence
            decoded = self.tokenizer.decode(seq)
            num_matches[idx] = evaluate(decoded, self.positives, self.negatives)
        max_matches = np.max(num_matches)
          
        # logits goes from -inf to zero.  Mask out the non-max sequences; torch doesn't like it to be -np.inf
        for idx in range(len(input_ids)):
            if num_matches[idx] != max_matches:
                output_logits[idx] = -10000
                  
        return output_logits

In [21]:
def generate_product_description_v2(item: str) -> str:
    system_prompt = f"""
        You are a product marketer for a company that makes nutrition supplements.
        Balance your product descriptions to attract customers, optimize SEO, and
        stay within accurate advertising guidelines.
        Product descriptions have to be 3-5 sentences.
        Provide only the product description with no preamble.
    """
    user_prompt = f"""
        Write a product description for a {item}.
    """
    
    input_message = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}   
    ]
    
    # alliterate on the first letter of the animal. So, donkey would be D
    brand_processor = BrandLogitsProcessor(pipe.tokenizer, desired_phrases, banned_phrases)
    
    results = pipe(input_message, 
                   max_new_tokens=512,
                   do_sample=True,
                   temperature=0.8,
                   num_beams=10,
                   use_cache=True, # default is True
                   logits_processor=[brand_processor])
    return results[0]['generated_text'][-1]['content'].strip()

prod = generate_product_description_v2("protein drink")
print(prod)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


"Fuel your active lifestyle with our premium protein drink, packed with 20 grams of whey protein, 10 grams of branched-chain amino acids (BCAAs), and essential vitamins and minerals to support muscle recovery and overall well-being. Our unique blend of whey protein isolate and micellar casein provides a sustained release of nutrients, helping to build and repair muscle tissue. With no artificial flavors or sweeteners, our protein drink is a guilt-free way to support your fitness goals. Enjoy the taste of a refreshing beverage while nourishing your body with the nutrients it needs to thrive."


Result:
```
Fuel your active lifestyle with our premium protein drink, packed with 20 grams of whey protein, 10 grams of branched-chain amino acids (BCAAs), and essential vitamins and minerals to support muscle recovery and overall well-being. Our unique blend of whey protein isolate and micellar casein provides a sustained release of nutrients, helping to build and repair muscle tissue. With no artificial flavors or sweeteners, our protein drink is a guilt-free way to support your fitness goals. Enjoy the taste of a refreshing beverage while nourishing your body with the nutrients it needs to thrive.
```

In [23]:
evaluate_verbose(prod, desired_phrases, banned_phrases)

Good: whey
Good: nutrients
Good: whey protein
Good: 3   Bad: 0


3