# Newsletter generation from special products & current date

In this notebook, we demonstrate how to use the free Cohere API to generate a date-appropriate newsletter for retail store Aldi UK, based on the products they currently sell in their stores. The goal is to create an email-looking newsletter with links to products from the store its customers could buy, all in the form of a marketing email.

We perform this task using a sequence of steps:

1. Scrape the special products currently on offer (using Selenium)
2. Generate holidays and celebrations relevant to the current month (using Cohere Generate)
3. Generating gift ideas for these special events (using Cohere Generate)
4. Selecting a relevant subset of products on offer based on the gift ideas (using Sentence Embeddings)
5. Generating a newsletter based on that subset of products (using Cohere Chat)

## Step 1: Scrape the special products currently on offer

In this step, we will use Selenium to scrape all the special products from the Aldi UK website.

We do so by :

1. nativating to the "https://www.aldi.co.uk/c/specialbuys/specialbuyscategories" webpage, 
2. clicking away the cookies banner
3. repeatedly click the "load more" button until all items are visible
4. find all instances of class "gtm-product-data" on the page, which already contains JSON info about the products

In [None]:
import json
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import ElementNotInteractableException
from webdriver_manager.chrome import ChromeDriverManager

# install Chrome in case it wasn't installed already
ChromeDriverManager().install()

def fetch_all_products(url_address):
    """
    This function returns a list of all special products found on the given page.
    """
    all_products = [] #a list where all found products will be added
    # open chrome
    options = webdriver.ChromeOptions()
    options.headless = True
    driver = webdriver.Chrome(options=options)
    try:
        # navigate to the given address
        driver.get(url_address)
        time.sleep(1)
        # find the accept all cookies button
        accept_cookies_button = driver.find_element(By.ID, "onetrust-accept-btn-handler")
        # click on the accept all cookies button
        accept_cookies_button.click()
        time.sleep(1)
        # load more items until all are loaded
        try:
            # find the loadmore button
            loadmore_button = driver.find_element(By.CLASS_NAME, 'category-loadmore-cta')
            while loadmore_button != None:
                # click on the button
                loadmore_button.click() #?
                # wait until the new items have been downloaded
                time.sleep(3)
                loadmore_button = driver.find_element(By.CLASS_NAME, 'category-loadmore-cta')
        except ElementNotInteractableException:
            loadmore_button = None
        # find all elements describing a product
        product_data = driver.find_elements(By.CLASS_NAME, 'gtm-product-data') #this is a class for product data
        # load the json contained in all found elements
        for l in product_data:
            product = l.get_attribute("textContent")
            product_dict = json.loads(product)
            all_products.append(product_dict)
    finally:
        # always close the browser before exiting
        driver.close()
    return all_products

all_products = fetch_all_products("https://www.aldi.co.uk/c/specialbuys/specialbuyscategories")

In [2]:
# Writing to sample.json
with open("all_products.json", "w") as outfile:
    outfile.write(json.dumps(all_products, indent=4))

## Step 2: Generate holidays and celebrations relevant to the current month

In this step, we will create a list of holidays and special events in the UK for every month of the year.

To do this:
1. We create an LLM prompt with a month name as argument, and ask Cohere Generate to provide us with event suggestions
2. This prompt contains an example, such that Cohere returns its answer as JSON list

In [3]:
import cohere 

client_key = '1gqPwtcjMYFlYgr7722JJTfSYS0kAuSMEaAo6YKm'
co = cohere.Client(client_key)

In [4]:
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

In [8]:
import re
json_list_pattern = r'\[\s*(?:"[^"]*"\s*,\s*)*"[^"]*"\s*\]'

def ask_holidays(name_of_month):
    answer = co.generate(prompt=f'Please tell me what holidays and special events are observed in {name_of_month} in the UK? store the answer in a list of the following format'+""": 
    
    ```json
    [
        "Celebration 1",
        "Celebration 2",
        ...
    ]
    ```""")

    answer_text = answer[0].text

    events_for_given_month = []
    
    json_lists = re.findall(json_list_pattern, answer_text, flags=re.S)
    for json_list in json_lists:
        event_list = json.loads(json_list)
        events_for_given_month.extend(event_list)

    return events_for_given_month

In [None]:
holidays_per_month = []
for month in months:
    events_for_month = ask_holidays(month)
    events_and_holidays.append(events_for_month)
    time.sleep(13) # only 5 free queries per minute

In [None]:
import json
with open('holidays_per_month.json', 'w') as f:
    json.dumps(f, holidays_per_month)

In [12]:
import json
with open('holidays_per_month.json', 'r') as f:
    holidays_per_month = json.loads(f.read())

## Step 3: Generating gift ideas for these special events

In this step, we will create a list of holidays and special events in the UK for every month of the year.

To do this:
1. We create an LLM prompt with an holiday name as argument, and ask Cohere Generate to provide us with gift suggestions (about 10 gifts)
2. This prompt contains an example, such that Cohere returns its answer as JSON list

In [15]:
def ask_goods(name_of_holiday):
    answer = co.generate(prompt=f'What 10 items one can buy for the {name_of_holiday} in the UK? Store the answer in a list of strings of the following format'+""": 
    
    ```json
    [
        "gift 1",
        "gift 2",
        ...
    ]
    ```""")

    presents_matches = re.findall(json_list_pattern, answer[0].text, flags=re.S)
    presents = [match.replace('\n', "") for match in presents_matches]
    presents = [json.loads(present) for present in presents]
    presents = [present for present_list in presents for present in present_list]
    
    return presents

In [None]:
from tqdm import tqdm
presents_for_holidays = dict()

for month in tqdm(range(len(events_and_holidays))):
    for holiday in events_and_holidays[month]:
        goods_for_holiday = ask_goods(holiday)
        presents_for_holidays[holiday] = goods_for_holiday
        # we write the file in the loop to see progress
        with open('presents_for_holidays.json', 'w') as f:
            f.write(json.dumps(presents_for_holidays, indent='\t'))
        time.sleep(13) # only 5 free queries per minute

In [None]:
with open('presents_for_holidays.json', 'w') as f:
    f.write(json.dumps(presents_for_holidays, indent='\t'))

In [17]:
with open("presents_for_holidays.json", "r") as f:
    presents_for_holidays = json.loads(f.read())

## Step 4: Selecting a relevant subset of products on offer based on the gift ideas

In this step, we will create a list of the 15 best products to include in the newsletter.

To do this:
1. We compute embeddings for all the product names we have collected so far
2. We compute embeddings for all the gifts suggestions made by the model
3. We compute the cosine similarities between all combinations of products and gits
4. We sort the products by their maximum similarity score with any of the gift ideas
5. We only keep the 15 best ones, or less if fewer are at least somewhat similar (>0.5)

In [18]:
from datetime import datetime

# get the current month
current_date = datetime.now()
current_daynumber = current_date.day
current_month = current_date.month - 1
current_year = current_date.year

In [24]:
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('all-MiniLM-L6-v2')

holidays_in_current_month = holidays_per_month[current_month]

presents_in_current_month = []
for holiday in holidays_in_current_month:
    presents_in_current_month.append("gift for " + holiday)
    presents_in_current_month.extend(presents_for_holidays[holiday])

sentences1 = [product["name"] for product in all_products]
sentences2 = [present for present in presents_in_current_month]

#Compute embedding for both lists
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)

#Compute cosine-similarities
cosine_scores = util.cos_sim(embeddings1, embeddings2)

#Output the pairs with their score
good_matches = []
for i in range(len(sentences1)):
    best_score = 0.0
    best_j = -1
    for j in range(len(sentences2)):
        score = cosine_scores[i][j].item()
        if score > best_score:
            best_score = score
            best_j = j
    if best_score > 0.5:
        good_matches.append((sentences1[i], sentences2[best_j], best_score))
        #print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i], sentences2[j], cosine_scores[i][j]))

good_matches.sort(key=lambda x:x[2], reverse=True)

# print a few gifts
for product_name, type_of_gift, score in good_matches[0:15]:
    print("{} \t\t {} \t\t Score: {:.4f}".format(product_name, type_of_gift, score))


microwave cookware 		 Cookware 		 Score: 0.7709
christmas rose bouquet 		 Winter bouquet 		 Score: 0.7456
christmas wishes bouquet 		 Winter bouquet 		 Score: 0.7215
purewick aromaguard candle 		 Scented candle 		 Score: 0.7180
fitness accessories 		 Fashion Accessories 		 Score: 0.6950
barrel bottle 		 Bottle of wine 		 Score: 0.6820
edition blanc candle 		 winter candle collection 		 Score: 0.6810
smart phone holder 		 Smartphone 		 Score: 0.6592
edition blanc reed diffuser 		 Ceramic diffuser 		 Score: 0.6456
purewick aromaguard reed diffuser 		 Ceramic diffuser 		 Score: 0.6262
festive gift wrapped plant 		 gift for Winter Solstice 		 Score: 0.6003
crofton brunch boards 		 Cheese Boards 		 Score: 0.5903
3 pack fitness-socks 		 fleece-lined socks 		 Score: 0.5893
adult fitness trainers 		 Sneakers/Trainers 		 Score: 0.5883
3-in-1 cordeless vacuum 		 Cordless vacuum cleaner 		 Score: 0.5652


## Step 5: Generating a newsletter based on that subset of products

In this step, we generate the text of the newsletter using Cohere Chat, which produced better output for this than Cohere Generate.

To do this, we:
1. determine the holidays for which we have gift suggestions
2. compute the urls of the selected products
3. fill in the prompt used to make the newsletter
4. the prompt contains an example for another month and a different set of products, to help the LLM use the right tone and format.

In [26]:
#make a list of holidays for which I have items 
holidays_with_presents = set()
for match in good_matches[0:15]:
    for key, value in presents_for_holidays.items():
        if "gift for " + key == match[1]:
            holidays_with_presents.add(key)
        for gift in value:
            if gift == match[1]:
                holidays_with_presents.add(key)

print(holidays_with_presents)

{'Christmas Day', 'Saturnalia', 'Christmas Markets', 'Christmas Eve', 'Hanukkah', 'Teen Tech', 'Boxing Day', 'Winter Solstice', "St. Stephen's Day"}


In [27]:
# compute the urls for the products
def create_url(prod_info, matches_list):
    products_and_links = dict()
    for prod in prod_info:
        for m in matches_list:
            if m[0] == prod["name"]:
                name_for_link = prod["name"].replace(" ", "-")
                id = prod["id"]
                prod_adress = f"https://www.aldi.co.uk/{name_for_link}/p/{id}"
                products_and_links[prod["name"]] = prod_adress
    return products_and_links

gifts_with_adresses = create_url(all_products, good_matches[0:15])
print(gifts_with_adresses)

{'christmas wishes bouquet': 'https://www.aldi.co.uk/christmas-wishes-bouquet/p/732746763384800', '3-in-1 cordeless vacuum': 'https://www.aldi.co.uk/3-in-1-cordeless-vacuum/p/729810767558000', 'christmas rose bouquet': 'https://www.aldi.co.uk/christmas-rose-bouquet/p/732750763361600', 'adult fitness trainers': 'https://www.aldi.co.uk/adult-fitness-trainers/p/828606752395800', 'festive gift wrapped plant': 'https://www.aldi.co.uk/festive-gift-wrapped-plant/p/716326560794700', 'edition blanc reed diffuser': 'https://www.aldi.co.uk/edition-blanc-reed-diffuser/p/732248768358300', 'microwave cookware': 'https://www.aldi.co.uk/microwave-cookware/p/730353770272600', 'smart phone holder': 'https://www.aldi.co.uk/smart-phone-holder/p/727845770067200', 'crofton brunch boards': 'https://www.aldi.co.uk/crofton-brunch-boards/p/730467770494900', 'edition blanc candle': 'https://www.aldi.co.uk/edition-blanc-candle/p/732247768656400', '3 pack fitness-socks': 'https://www.aldi.co.uk/3-pack-fitness-sock

In [28]:
with open ("gifts_with_adresses.json", "w") as outfile:
    outfile.write(json.dumps(gifts_with_adresses))

In [29]:
with open ("holidays_with_presents.json", "w") as outfile:
    outfile.write(json.dumps(list(holidays_with_presents)))


In [30]:
list_h = ", ".join(holidays_with_presents)
list_p = ", ".join([f'<a href="{url}">{product_name}</a>' for product_name, url in gifts_with_adresses.items()])

In [31]:
#
prompt = f"""Please, write a newsletter of about 150 words for the Aldi UK chain store.
Today's date is the {current_daynumber}th of {months[current_month]}, {current_year}.
Sign the newsletter by ending with "Happy shopping!"
The newsletter should mention about some of the holidays and special events of the month that are provided in the following list: {list_h}.
The newsletter should link using HTML links (e.g. checkout our <a href="https://www.aldi.co.uk/product-name/p/1234567">product</a>) to some items you can buy in the store, based on the following list: {list_p}.
Never mention a gift twice, and be creative. 
"""

example_prompt = """
Please, write a newsletter of about 150 words for the Aldi UK chain store.
Today's date is the 29th of June, 2024.
Sign the newsletter by ending with "Happy shopping!"
The newsletter should mention about some of the holidays and special events of the month that are provided in the following list: Wimbledon, Pride in London.
The newsletter should link using HTML links (e.g. checkout our <a href="https://www.aldi.co.uk/product-name/p/1234567">product</a>) to some items you can buy in the store, based on the following list: <a href="https://www.aldi.co.uk/tennis-balls-set-of-6/p/52176">tennis balls (set of 6)</a>, <a href="https://www.aldi.co.uk/rainbow-umbrella/p/69420">rainbow umbrella</a>.
Never mention a gift twice, and be creative. 
"""

example_answer = """Hey there! 

As we dive into the vibrant month of June, Aldi has your summer essentials ready to shine! 🌈 Gear up for Wimbledon with our <a href="https://www.aldi.co.uk/tennis-balls-set-of-6/p/52176">set of tennis balls</a>, perfect for your match-winning serves. Don't let the unpredictable British weather rain on your parade – grab our stylish <a href="https://www.aldi.co.uk/rainbow-umbrella/p/69420">rainbow umbrella</a> to stay dry and fabulous!

In the spirit of inclusivity, we celebrate Pride in London with a range of diverse products. Embrace the colors of love and show your support!

Stay tuned for more exciting offers and surprises at your nearest Aldi store.

Happy shopping! 
"""

In [33]:
answer = co.chat(
    chat_history=[
        {"role": "USER", "message": example_prompt},
        {"role": "CHATBOT", "message": example_answer}
    ],
    message=prompt
)
print(answer.text)

Hello there! 

As we approach the festive season, Aldi has plenty of offers to make your holidays magical! This month, we're celebrating Christmas, Saturnalia, Christmas Markets, Christmas Eve, Hanukkah, Teen Tech, Boxing Day, Winter Solstice, and St. Stephen's Day. 

Get into the holiday spirit with our selection of gifts, including the <a href="https://www.aldi.co.uk/christmas-wishes-bouquet/p/732746763384800">Christmas Wishes Bouquet</a>, the <a href="https://www.aldi.co.uk/3-in-1-cordeless-vacuum/p/729810767558000">3-in-1 Cordless Vacuum</a> for effortless cleaning, and our <a href="https://www.aldi.co.uk/christmas-rose-bouquet/p/732750763361600">Christmas Rose Bouquet</a> to add a touch of elegance to your gatherings. 

We know you love staying active, so why not check out our <a href="https://www.aldi.co.uk/adult-fitness-trainers/p/828606752395800">Adult Fitness Trainers</a> or our <a href="https://www.aldi.co.uk/3-pack-fitness-socks/p/829785761713600">3 Pack Fitness Socks</a>? F