This notebook returns offers for retails that fetch has in its database

Approach:

- If a brand has associated offers in the final DataFrame created by merging brands_df, category_df, and retail_df, we return all non-NaN offers for that brand. In cases where a brand has only NaN offers, we follow a two-step process:

- Create a dictionary where keys are brand names with non-NaN offers, and values are sets of products associated with those brands.

- Calculate Jaccard similarity between the set of products for the brand with all NaN offers and the sets of products for brands with non-NaN offers. We then order these brands based on their similarity scores and return the top offers from the most similar brands.

In [1]:
#Import libraries
import numpy as np
import pandas as pd

In [2]:
#Mount drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#Load dataset
brands_df = pd.read_csv('/content/drive/MyDrive/fetch/brand_category.csv')
category_df = pd.read_csv('/content/drive/MyDrive/fetch/categories.csv')
retail_df = pd.read_csv('/content/drive/MyDrive/fetch/offer_retailer.csv')

In [4]:
#brands_df
brands_df['BRAND'] = brands_df['BRAND'].str.lower()
brands_df['BRAND_BELONGS_TO_CATEGORY'] = brands_df['BRAND_BELONGS_TO_CATEGORY'].str.lower()

In [5]:
#categories_df
category_df['PRODUCT_CATEGORY'] = category_df['PRODUCT_CATEGORY'].str.lower()
category_df['IS_CHILD_CATEGORY_TO'] = category_df['IS_CHILD_CATEGORY_TO'].str.lower()

In [6]:
#retail_df
retail_df['RETAILER'] = retail_df['RETAILER'].str.lower()
retail_df['BRAND'] = retail_df['BRAND'].str.lower()

In [7]:
# Merge the dataframes based on the common column and keep only the required column
merged_df = brands_df.merge(category_df[['PRODUCT_CATEGORY', 'IS_CHILD_CATEGORY_TO']],
                            left_on='BRAND_BELONGS_TO_CATEGORY',
                            right_on='PRODUCT_CATEGORY',
                            how='left')

# Drop the columns that are not needed
merged_df.drop(columns=['PRODUCT_CATEGORY', 'RECEIPTS'], inplace=True)

# Rename the new column if needed
merged_df.rename(columns={'IS_CHILD_CATEGORY_TO': 'Product_category'}, inplace=True)
merged_df.rename(columns={'BRAND_BELONGS_TO_CATEGORY': 'Product'}, inplace=True)


# Save the merged dataframe to a new CSV file if needed
merged_df.to_csv('merged_brands.csv', index=False)

In [8]:
merged_df['BRAND'] = merged_df['BRAND'].replace({'caseys gen store': 'caseys general store'})

In [9]:
# Merge the dataframes based on the common 'BRAND' column using a left join
merged_with_offers_df = merged_df.merge(retail_df, on='BRAND', how='left')

# Save the merged dataframe with offers to a new CSV file if needed
merged_with_offers_df.to_csv('merged_with_offers.csv', index=False)

In [10]:
ordered_df = merged_with_offers_df[['RETAILER', 'BRAND', 'Product', 'Product_category', 'OFFER']]
ordered_df

Unnamed: 0,RETAILER,BRAND,Product,Product_category,OFFER
0,caseys general store,caseys general store,tobacco products,mature,Order from Casey's app or Caseys.com
1,caseys general store,caseys general store,tobacco products,mature,Spend $25 at Casey's
2,caseys general store,caseys general store,tobacco products,mature,Spend $5 in-store at Casey's
3,caseys general store,caseys general store,tobacco products,mature,Select beverages AND prepared food items at Ca...
4,caseys general store,caseys general store,tobacco products,mature,Visit OR order online from Casey's 7 times
...,...,...,...,...,...
10849,,wibby brewing,beer,alcohol,
10850,,la fete du rose,wine,alcohol,
10851,,big island brewhaus,beer,alcohol,
10852,,bridge lane,wine,alcohol,


In [11]:
import pandas as pd

# Create a list to store retailers with all NaN offers
retailers_with_nan_offers = []

# Create a list to store retailers without NaN or not all NaN offers
retailers_with_non_nan_offers = []

# Iterate through the rows in the dataframe
for _, row in ordered_df.iterrows():
    retailer = row['RETAILER']
    offer = row['OFFER']

    # Check if an offer exists for the retailer (ignore rows where offer is NaN)
    if not pd.isna(offer):
        # Add the retailer to the list of retailers without NaN or not all NaN offers
        if retailer not in retailers_with_non_nan_offers:
            retailers_with_non_nan_offers.append(retailer)
    else:
        # Add the retailer to the list of retailers with all NaN offers
        if retailer not in retailers_with_nan_offers:
            retailers_with_nan_offers.append(retailer)

# Now, retailers_with_nan_offers contains retailers with all NaN offers
# retailers_with_non_nan_offers contains retailers without NaN or not all NaN offers

print("Retailers with All NaN Offers:")
print(retailers_with_nan_offers)

print("\nRetailers without NaN or not all NaN Offers:")
print(retailers_with_non_nan_offers)


Retailers with All NaN Offers:
[nan]

Retailers without NaN or not all NaN Offers:
['caseys general store', 'united supermarkets', 'amazon', 'cvs', nan, 'mcalisters deli', 'fred meyer', 'dillons grocery', 'food4less', 'walmart', 'martins foods', 'target', 'lowes home improvement', 'giant food', 'sams club', 'dillons food store', 'ralphs', 'ruler foods', 'king soopers', 'kroger', 'smiths', 'frys food store', 'qfc', 'marianos', 'pick n save', 'h-e-b', 'whole foods market', 'the giant co', 'subway', 'sprouts farmers market', 'safeway', 'acme', 'tgi fridays', 'dollar general store', 'burger king', 'gallo.com', 'albertsons', 'the home depot', 'shop rite', 'stop & shop', 'chewy', 'pet supplies plus', 'bjs wholesale', 'vons', 'star market', 'shaws', 'aldi', 'blue apron', 'pavilions']


In [12]:
# Create an empty dictionary to store retailers and their associated sets of unique products
retailer_product_dict = {}

# Create a dictionary to keep track of offers for each retailer
retailer_offers = {}

# Iterate through the rows in the dataframe
for _, row in ordered_df.iterrows():
    retailer = row['RETAILER']
    product = row['Product']
    offer = row['OFFER']

    # Check if an offer exists for the retailer (ignore rows where offer is NaN)
    if not pd.isna(offer):
        # Check if the retailer has an entry in the dictionary
        if retailer not in retailer_product_dict:
            retailer_product_dict[retailer] = set()  # Use a set to store unique products

        # Add the product to the retailer's set of unique products
        retailer_product_dict[retailer].add(product)

    # Keep track of offers for each retailer
    if retailer not in retailer_offers:
        retailer_offers[retailer] = []

    # Add the offer to the retailer's list of offers
    retailer_offers[retailer].append(offer)

# Filter retailers where all offers are NaN
retailers_with_nan_offers = [retailer for retailer, offers in retailer_offers.items() if all(pd.isna(offer) for offer in offers)]

print("Retailer Product Dictionary:")
print(retailer_product_dict)

print("\nRetailers with All NaN Offers:")
print(retailers_with_nan_offers)


Retailer Product Dictionary:
{'caseys general store': {'mature', 'puffed snacks', 'chips', 'household supplies', 'cookies', 'candy', 'beer', 'bakery', 'tea', 'cooking & baking', 'jerky & dried meat', 'frozen desserts', 'carbonated soft drinks', 'frozen pizza & pizza snacks', 'energy drinks', 'water', 'packaged vegetables', 'tobacco products', 'nuts & seeds', 'fruit juices'}, 'united supermarkets': {'cooking & baking', 'carbonated soft drinks'}, 'amazon': {'cooking & baking', 'meal replacement beverages', 'sports drinks', 'dog supplies', 'nuts & seeds', 'fruit & vegetable snacks', 'carbonated soft drinks', 'medicines & treatments'}, 'cvs': {'cooking & baking', 'medicines & treatments', 'skin care', 'fruit juices'}, nan: {'meal replacement beverages', 'skin care', 'red pasta sauce', 'frozen appetizers', 'sports drinks', 'coffee', 'deli counter', 'frozen breakfast', 'sauces & marinades', 'fruit & vegetable snacks', 'dressings', 'chips', 'food storage', 'plant-based meat', 'household suppl

In [13]:
# Define a function to get similar retailers and offers
def get_similar_retailers_and_offers(retailer_name):
    # Check if the retailer is in retailer_product_dict
    if retailer_name in retailer_product_dict:
        # If the retailer has non-NaN offers, return them
        retailer_offers = ordered_df[ordered_df['RETAILER'] == retailer_name]['OFFER'].unique()
        return [(retailer_name, retailer_offers)]  # Just return the actual offers

    # Retailer not found in retailer_product_dict, find similar retailers
    similarity_scores = {}

    # Get the set of products associated with the retailer (including NaN offers)
    products_for_retailer = set(ordered_df[ordered_df['RETAILER'] == retailer_name]['Product'].unique())

    for other_retailer, other_products in retailer_product_dict.items():
        # Skip the same retailer
        if other_retailer == retailer_name:
            continue

        # Calculate Jaccard similarity between the two retailers
        intersection = len(products_for_retailer.intersection(other_products))
        union = len(products_for_retailer.union(other_products))

        # Handle cases where both retailers have no associated products (all NaN offers)
        if union == 0:
            similarity = 0.0
        else:
            similarity = intersection / union

        similarity_scores[other_retailer] = similarity

    # Sort similar retailers by similarity score in descending order
    sorted_similar_retailers = sorted(similarity_scores.items(), key=lambda x: x[1], reverse=True)

    # Filter out retailers with a similarity score of 0
    similar_retailer_offers = [(similar_retailer, ordered_df[ordered_df['RETAILER'] == similar_retailer]['OFFER'].unique()) for similar_retailer, similarity_score in sorted_similar_retailers if similarity_score != 0]

    # If there are similar retailer offers, return them
    if similar_retailer_offers:
        return similar_retailer_offers

    # If there are no similar retailer offers, return "No offers"
    return [(retailer_name, ["No offers"])]


In [14]:
# Replace 'Your Retailer Name' with the actual retailer name you want to find similar retailers for
retailer_name = 'amazon'
similar_offers = get_similar_retailers_and_offers(retailer_name)

# Print the results
print("Similar Offers for Retailer:", retailer_name)
for similar_retailer, offers in similar_offers:
    print(similar_retailer, offers)

Similar Offers for Retailer: amazon
amazon ['MTN DEW® Kickstart, 16-ounce 12 count, select varieties, at Amazon Storefront*'
 'PepsiCo® Beverage, 7.5-ounce 10 pack, select varieties, at Amazon Storefront*'
 'PepsiCo® Variety Pack, select varieties, at Amazon Storefront*'
 'GATORADE® Fast Twitch®, 12-ounce 12 pack, at Amazon Storefront*'
 "Welch's® Zero Sugar Fruity Bites, 6 pack+, online at Amazon"
 "Welch's® Fruit Snacks, 250 count, online at Amazon"
 'Starry™ Lemon Lime Soda, 7.5-ounce 10 pack, at Amazon Storefront*'
 'Perfect Keto Barista Blend Collagen & MCTs, online at Amazon'
 'Perfect Keto Beauty + Sleep Collagen, online at Amazon'
 'Perfect Keto Bars, online at Amazon'
 'Perfect Keto Collagen Peptides, online at Amazon'
 'Perfect Keto MCT Oil Powder, online at Amazon'
 'Perfect Keto Super Reds, online at Amazon'
 'Perfect Keto Exogenous Ketones Powder, online at Amazon'
 'Kradle, select varieties, online at Amazon'
 'Stubborn Soda OR Bundaberg Ginger Beer, select varieties, at 