<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Sponsored-Search-Adword" data-toc-modified-id="Sponsored-Search-Adword-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Sponsored Search Adword</a></span><ul class="toc-item"><li><span><a href="#Dataset" data-toc-modified-id="Dataset-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Dataset</a></span></li><li><span><a href="#Greedy" data-toc-modified-id="Greedy-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Greedy</a></span></li><li><span><a href="#Balance" data-toc-modified-id="Balance-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Balance</a></span></li><li><span><a href="#MSVV" data-toc-modified-id="MSVV-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>MSVV</a></span></li><li><span><a href="#Comparison" data-toc-modified-id="Comparison-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Comparison</a></span></li></ul></li><li><span><a href="#Reference" data-toc-modified-id="Reference-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Reference</a></span></li></ul></div>

In [1]:
import random
import pandas as pd
from copy import deepcopy
from operator import attrgetter
from typing import List, Dict

# Sponsored Search Adword

Whenever we go on favorite search engine, and enters our search query, often times we'll see ads appearing at the top of the search results page. Whenever that happens, it is basically saying an advertiser is willing to pay the search engine additional dollars for his/her website to be shown at a more prominent position.

At a high level the way it works is, advertisers bid on keywords for their items/websites, as well as budget. Bid indicates the maximum amount of money advertiser is willing to pay whenever a user clicks on their advertisement, and budget denotes the total maximum amount of momeny advertiser is willing to spend at some time duration, e.g. daily. When a user searches for that keyword, all the advertisers bidding on that keyword gets retrieved and the final result are displayed in some ranking order. The ranking order is typically determined by the product of the ad and keyword's click through rate and bid.

The sponsored search adword problem describes the task of assigning these incoming queries to advertisers so we maximize overall revenue while not exceeding advertiser's budget. We'll make a couple of assumptions to simplify the discussion:

- There will be one ad shown for each query.
- All advertisers have the same budget.
- All click-through rates are the same.

We'll walk through a toy example and see what happens when we solve for this type of problem using greedy algorithm. Suppose there are two advertisers A and B, and only two possible queries, query1 and query2. Advertiser A bids only on query1 for 0.99, while B bids on both query1 and query2 for 1. The budget for each advertiser is 2.

Let the sequence of incoming queries be query1, query1, query2, query2. A greedy algorithm is able to allocate
the first two query1 to advertiser B, whereas for subsequent two query2, where will be no ads to allocate, as advertiser A doesn't bid on it, and advertiser B has ran out of budget. The revenue for a greedy algorithm in this case is thus 2. An optimum offline algorithm, however, will allocate query1 to advertiser A and query2 to advertiser B, resulting in an optimal revenue of 3.98.

In the following section, we will experiment with 3 different algorithms on the adword problem and compare their performances.

## Dataset

Our dataset consists of two parts, one depicts advertisers' keyword bidding strategy as well as budget. The other being queries coming into the system.

In [2]:
df_bidder = pd.read_csv('bidder_dataset.csv')
df_bidder

Unnamed: 0,Advertiser,Keyword,Bid Value,Budget
0,0,lucius review,0.2,103.0
1,0,houston rockets,0.7,
2,0,mockingbird lane,0.5,
3,0,60 minutes,0.2,
4,1,new york nanny agencies,0.2,343.0
...,...,...,...,...
658,99,nexus 4,0.2,
659,99,eerie crate,0.3,
660,99,it's a mad king's world,0.4,
661,99,frankenstorm,0.9,


In [3]:
class BidderBid:

    __slots__ = ('advertiser_id', 'bid_value')

    def __init__(self, advertiser_id, bid_value):
        self.advertiser_id = advertiser_id
        self.bid_value = bid_value

    def __repr__(self):
        return f'Advertiser(advertiser_id: {self.advertiser_id}, bid_value: {self.bid_value})'
    
    
class BidderBudget:

    __slots__ = ('advertiser_id', 'budget')

    def __init__(self, advertiser_id, budget):
        self.advertiser_id = advertiser_id
        self.budget = budget

    def __repr__(self):
        return f'Advertiser(advertiser_id: {self.advertiser_id}, budget: {self.budget})'

In [4]:
# for each keyword, we store advertisers that are bidding for that keyword
keyword_bidder_bid_dict = {}
for keyword, advertiser, bid in zip(df_bidder['Keyword'], df_bidder['Advertiser'], df_bidder['Bid Value']):
    advertiser_bid = BidderBid(advertiser, bid)
    if keyword in keyword_bidder_bid_dict:
        keyword_bidder_bid_dict[keyword].append(advertiser_bid)
    else:
        keyword_bidder_bid_dict[keyword] = [advertiser_bid]

keyword_bidder_bid_dict

{'lucius review': [Advertiser(advertiser_id: 0, bid_value: 0.2),
  Advertiser(advertiser_id: 5, bid_value: 0.3),
  Advertiser(advertiser_id: 16, bid_value: 0.4),
  Advertiser(advertiser_id: 52, bid_value: 0.2),
  Advertiser(advertiser_id: 82, bid_value: 0.8),
  Advertiser(advertiser_id: 96, bid_value: 0.9)],
 'houston rockets': [Advertiser(advertiser_id: 0, bid_value: 0.7),
  Advertiser(advertiser_id: 1, bid_value: 0.8),
  Advertiser(advertiser_id: 15, bid_value: 0.9),
  Advertiser(advertiser_id: 19, bid_value: 0.2),
  Advertiser(advertiser_id: 27, bid_value: 0.8),
  Advertiser(advertiser_id: 47, bid_value: 0.9),
  Advertiser(advertiser_id: 62, bid_value: 0.5),
  Advertiser(advertiser_id: 82, bid_value: 0.5),
  Advertiser(advertiser_id: 95, bid_value: 0.9)],
 'mockingbird lane': [Advertiser(advertiser_id: 0, bid_value: 0.5),
  Advertiser(advertiser_id: 1, bid_value: 0.9),
  Advertiser(advertiser_id: 2, bid_value: 0.7),
  Advertiser(advertiser_id: 45, bid_value: 0.9),
  Advertiser(adver

In [5]:
# for each advertiser, we store its budget
bidder_budget_dict = {}
df_budget = df_bidder[df_bidder['Budget'] > 0]
for advertiser, budget in  zip(df_budget['Advertiser'], df_budget['Budget']):
    bidder_budget = BidderBudget(advertiser, budget)
    bidder_budget_dict[advertiser] = bidder_budget
    
bidder_budget_dict[0]

Advertiser(advertiser_id: 0, budget: 103.0)

In [6]:
with open('queries.txt') as f:
    queries = f.read().splitlines()

print('number of queries', len(queries))
queries[:5]

number of queries 23945


['ihsa football scores',
 'storm',
 'benghazi attack',
 'nba preseason stats',
 'macbook air']

## Greedy

In [7]:
def greedy(bidder_bids: List[BidderBid], remaining_budget_dict: Dict[int, BidderBudget]) -> BidderBid:
    sorted_bidder_bids = sorted(bidder_bids, key=attrgetter('bid_value'), reverse=True)
    for bidder_bid in sorted_bidder_bids:
        budget = remaining_budget_dict[bidder_bid.advertiser_id].budget
        if budget > bidder_bid.bid_value:
            return bidder_bid

    bidder_bid = BidderBid(advertiser_id=-1, bid_value=0)
    return bidder_bid

In [8]:
query = queries[0]
interested_bidders = keyword_bidder_bid_dict[query]
print('query:', query)
print('interested bidders: ', interested_bidders)
greedy(interested_bidders, bidder_budget_dict)

query: ihsa football scores
interested bidders:  [Advertiser(advertiser_id: 1, bid_value: 0.8), Advertiser(advertiser_id: 3, bid_value: 0.7), Advertiser(advertiser_id: 18, bid_value: 0.9), Advertiser(advertiser_id: 28, bid_value: 0.6), Advertiser(advertiser_id: 44, bid_value: 0.4), Advertiser(advertiser_id: 49, bid_value: 0.4), Advertiser(advertiser_id: 56, bid_value: 0.8), Advertiser(advertiser_id: 66, bid_value: 0.2)]


Advertiser(advertiser_id: 18, bid_value: 0.9)

In [9]:
random.seed(0)
random_queries = random.sample(queries, len(queries))

In [10]:
def adword(queries, keyword_bidder_bid_dict, bidder_budget_dict, algo):
    # we'll be modify the bidder's budget to keep track of remaining budget,
    # make a copy so we don't touch the original values
    remaining_budget_dict = deepcopy(bidder_budget_dict)
    total_revenue = 0
    for query in queries:
        interested_bidders = keyword_bidder_bid_dict[query]
        if algo == 'greedy':
            bidder_bid = greedy(interested_bidders, remaining_budget_dict)
        elif algo == 'balance':
            bidder_bid = balance(interested_bidders, remaining_budget_dict)
        elif algo == 'msvv':
            bidder_bid = msvv(interested_bidders, remaining_budget_dict, bidder_budget_dict)
        else:
            raise ValueError(f'not supported algo {algo}, use one of [greedy, balance, msvv]')

        if bidder_bid.advertiser_id > 0:
            bidder_budget = remaining_budget_dict[bidder_bid.advertiser_id]
            new_budget = bidder_budget.budget - bidder_bid.bid_value
            bidder_new_budget = BidderBudget(bidder_budget.advertiser_id, new_budget)
            remaining_budget_dict[bidder_bid.advertiser_id] = bidder_new_budget
            total_revenue += bidder_bid.bid_value

    return round(total_revenue, 1)

In [11]:
algo = 'greedy'
adword(random_queries, keyword_bidder_bid_dict, bidder_budget_dict, algo)

16724.9

## Balance

In [12]:
def balance(bidder_bids: List[BidderBid], remaining_budget_dict: Dict[int, BidderBudget]) -> BidderBid:
    bidder_bids_dict = {}
    bidder_budgets = []
    for bidder_bid in bidder_bids:
        bidder_bids_dict[bidder_bid.advertiser_id] = bidder_bid
        bidder_budget = remaining_budget_dict[bidder_bid.advertiser_id]
        bidder_budgets.append(bidder_budget)

    sorted_bidder_budget = sorted(bidder_budgets, key=attrgetter('budget'), reverse=True)
    for bidder_budget in sorted_bidder_budget:
        bidder_bid = bidder_bids_dict[bidder_budget.advertiser_id]
        if bidder_budget.budget > 0:
            return bidder_bids_dict[bidder_budget.advertiser_id]

    bidder_bid = BidderBid(advertiser_id=-1, bid_value=0)
    return bidder_bid

In [13]:
algo = 'balance'
adword(random_queries, keyword_bidder_bid_dict, bidder_budget_dict, algo)

12222.8

## MSVV

Allocate the next query to the bidder that maximizes the product of bid and 

ψ(x) = 1 − e
−(1−x)
Algorithm: Allocate the next query to the bidder i maximizing the product of his bid and
ψ(T(i)), where T(i) is the fraction of the bidder’s budget which has been spent so far, i.e.,
T(i) = mi
bi
, where bi
is the total budget of bidder i, mi
is the amount of money spent by bidder
i when the query arrives.

In [14]:
from math import exp
from operator import itemgetter

In [15]:
def msvv(
    bidder_bids: List[BidderBid],
    remaining_budget_dict: Dict[int, BidderBudget],
    bidder_budget_dict: Dict[int, BidderBudget]
) -> BidderBid:
    """
    """
    bidder_bid_scores = []
    for bidder_bid in bidder_bids:
        remaining_budget = remaining_budget_dict[bidder_bid.advertiser_id].budget
        budget = bidder_budget_dict[bidder_bid.advertiser_id].budget
        budget_spent_fraction = (budget - remaining_budget) / budget
        score = bidder_bid.bid_value * (1 - exp(budget_spent_fraction - 1))
        bidder_bid_score = (bidder_bid, score)
        bidder_bid_scores.append(bidder_bid_score)

    sorted_bidder_bid_scores = sorted(bidder_bid_scores, key=itemgetter(1), reverse=True)
    for bidder_bid, score in sorted_bidder_bid_scores:
        bidder_budget = remaining_budget_dict[bidder_bid.advertiser_id]
        if bidder_budget.budget > 0:
            return bidder_bid

    bidder_bid = BidderBid(advertiser_id=-1, bid_value=0)
    return bidder_bid

In [16]:
algo = 'msvv'
adword(random_queries, keyword_bidder_bid_dict, bidder_budget_dict, algo)

17439.2

## Comparison

In [17]:
algo = 'msvv'

samples = 100
total_revenue = 0.0
for _ in range(samples):
    random_queries = random.sample(queries, len(queries))
    revenue = adword(random_queries, keyword_bidder_bid_dict, bidder_budget_dict, algo)
    total_revenue += revenue

avg_revenue = total_revenue / samples
avg_revenue

17431.421000000002

In [18]:
algo = 'greedy'

samples = 100
total_revenue = 0.0
for _ in range(samples):
    random_queries = random.sample(queries, len(queries))
    revenue = adword(random_queries, keyword_bidder_bid_dict, bidder_budget_dict, algo)
    total_revenue += revenue

avg_revenue = total_revenue / samples
avg_revenue

16716.779000000002

# Reference

- http://infolab.stanford.edu/~ullman/mmds/ch8.pdf
- https://web.stanford.edu/~saberi/adwords.pdf