In [1]:
import polars as pl

# Problem 1

The goal here is to estimate the expected winrate for a bid response at a given price.

Since it is known from the problem description that outcomes are aggregated by prices and applications,

we can safely assume that pairs of rows corresponding to a particular price and win (0/1) outcome,
represents all events connected to this price.

And using this knowledge, we can estimate the expected win rate
for each price.

For example, rows with ids [2, 3] and a price=0.1 show that there were 10000 bids, 3000 of them were won and 7000 were lost.

Therefore, the expected win rate is 3000/(3000 + 7000) = 0.3

Below are some code snippets dealing with the calculations based on the idea above

In [2]:
bids = pl.read_json("/home/iomallach/Desktop/verve.json").lazy()

In [3]:
bids.fetch(20)

app,bid_price,events,win
str,f64,i64,i64
"""A""",0.01,100000,0
"""A""",0.01,0,1
"""A""",0.1,7000,0
"""A""",0.1,3000,1
"""A""",0.2,8000000,0
"""A""",0.2,2000000,1
"""A""",0.4,700000,0
"""A""",0.4,300000,1
"""A""",0.5,80000,0
"""A""",0.5,20000,1


Here we simply take all the wins and losses for each price and calculate n_event_1/(n_event_0 + n_event_1), which is basically the probability to win.

In case there are more features, this conditional probability could be estimated using, for instance, a logistic regression estimator

In [4]:
FLOAT_MULTIPLIER = 100
bid_stats = (bids
        .with_column((pl.col("bid_price") * FLOAT_MULTIPLIER).alias("dummy_group").cast(int))
        .groupby(by="dummy_group", maintain_order=True)
        .agg(pl.list("events"))
        .select([(pl.col("dummy_group") / FLOAT_MULTIPLIER).alias("bid_price"),
                 pl.col("events").arr.first().alias("event_0"),
                 pl.col("events").arr.last().alias("event_1")])
        .with_column((pl.col("event_1") / (pl.col("event_0") + pl.col("event_1"))).alias("expected_win_rate")))

Below is the table of expected win rate for each price

In [5]:
bid_stats.collect()

bid_price,event_0,event_1,expected_win_rate
f64,i64,i64,f64
0.01,100000,0,0.0
0.1,7000,3000,0.3
0.2,8000000,2000000,0.2
0.4,700000,300000,0.3
0.5,80000,20000,0.2
0.75,7000,3000,0.3
1.0,400,600,0.6
2.0,30,70,0.7
5.0,2,8,0.8
9.0,0,1,1.0


# Problem 2

Here the goal is to maximize net revenue given some conditions and constraints and find the most optimal bid valuation

Using the table that we've got in the first problem, we can find this optimal bid.

The idea here is to filter out all the bids that were greater or equal than what the advertiser is willing to pay,

and filter out all the bids with zero expected win rate, since neither of those options are going to be profitable.

It makes sense to keep only the ones that could bring at least positive net revenue.

With that in mind, we can subtract bid price from the win value (0.5 in this case) and multiply it by the expected win rate,

which would yield the expected net revenue.

In [7]:
ADV_PAYMENT = 0.5
bid_candidates = (
    bid_stats
        .filter((pl.col("bid_price") < ADV_PAYMENT) & (pl.col("expected_win_rate") > 0))
        .with_column(((pl.lit(ADV_PAYMENT) - pl.col("bid_price")) * pl.col("expected_win_rate")).alias("expected_net_revenue"))
        .with_column(((pl.lit(ADV_PAYMENT) - pl.col("bid_price")) * pl.col("event_1")).alias("history_net_revenue"))
)
bid_candidates.collect()

bid_price,event_0,event_1,expected_win_rate,expected_net_revenue,history_net_revenue
f64,i64,i64,f64,f64,f64
0.1,7000,3000,0.3,0.12,1200.0
0.2,8000000,2000000,0.2,0.06,600000.0
0.4,700000,300000,0.3,0.03,30000.0


The snippet below yields the bid with maximum expected net revenue

In [8]:
(
    bid_candidates
        .with_column(pl.max("expected_net_revenue").alias("max_exp_net_revenue"))
        .filter(pl.col("expected_net_revenue") == pl.col("max_exp_net_revenue"))
        .select(["bid_price", "expected_net_revenue"])
).collect()

bid_price,expected_net_revenue
f64,f64
0.1,0.12


For this problem it would make sense to always bid 0.1, since this bid arrives at the expected maximum.

Though there are caveats, such as it can be clearly seen from the data, that there are millions of events related to one price,

and thousands of events related to other prices.

The point is that 10000 events for price=0.1 just might be not enough to safely assume that the expected

win rate for this price is representative, and it should be treated with great care.