# Deep Learning Exploration for Recurrent Shading Estimator

This file showcases an analysis of how deep learning would be implemeneted in this
code with respect to the model recurrent shading estimator. Though deep learning
is mentioned in our writeup, it is further detailed in this file as an exploration
of different model representations and interpretations.

Our objective for the shader is to find a coefficient value to best shade our bids
as defined by the waterfall algorithm; that is to say, we aim to bid high enough 
such that we are able to beat out bidders to attain the reach for a given campaign, 
but also low enough such that we aren't overpaying and losing money that we could 
be gaining. 

Respectively, we outline our problem mathematically: Let f(x) be a convex decreasing
piecewise function represent a mapping from bid shading to index of impression claimed 
with all bidders other than the player agent. Let $f^{-1}(x)$ represent a mapping from
number of index of impression claimed to bid shading. Let us define $H$ as the set of
all hypothesis classes (or RNN parameter configurations, as per our implementation) that
represent a mapping from index of impression and state information to bid shading.
For a given segment, we assume that there are $n$ total impressions and our campaign has a
reach target of $R$.

Our objective function is as follows:

$$\text{argmin}_{h \in H} H(x, S)$$
$$\text{s.t.} \;\; H(n - R) > f^{-1}(n - R)$$

Which is to say we aim to bid just above what other players would have bid such that
we can win at least the last R bids. This ensures that we do not overpay but also
achieve our reach. Note that we do not have access to the true distribution f(x) and
instead treat H(x, S) as an intermediate approximator for the function $f^{-1}(x)$.

A common point for both of these is the question of state featurization. Our proposed
methodology was to store known information (notably quality score, budget, and known
campaigns) inside a vector. An autoencoder could be trained to represent these
states as a vectorized low-dimensional state, then this intermediate representation
would be used as an input into the recurrent network.

We now consider two approaches which drastically affect how we deal utilize the
objective function: supervised learning and reinforcement learning.

## Section 1: Supervised Learning

In supervised learning scenarios (namely regression, since our goal is to approximate)
a continuous value, we create a function of our features to approximate the true labels. 
This assumes that we have some knowledge of the true labels. We are able to generate these
labels by running simulations of TA bots against each other, then capturing where each
threshold value is.

However, note that this presupposes access to simulation, or rather also the ability to
run other bots against each other in a training loop. Since we do not directly have
access to the implementations of other users, we would only be able to train against
features and labels from training loops with the provided TA bots.

Adversarial policy heavily affects the training process. Since supervised learning
attempts to create a model that represents the true distribution $f^{-1}$, empirical
risk minimization will not yield meaningful results because we are training against/
learning the wrong distribution.

This implies that as it stands, the supervised learning approach is limited by
our access to other bots as training examples. Nevertheless, we detail an approach
to training against other bots. 

In [None]:
from adx.tier1_ndays_ncampaign_agent import Tier1NDaysNCampaignsAgent
from adx.adx_game_simulator import AdXGameSimulator
import random
from adx.tier1_ndays_ncampaign_agent import Tier1NDaysNCampaignsAgent
from adx.agents import NDaysNCampaignsAgent
from math import atan
from adx.structures import MarketSegment

# We first modifiy the AdXGameSimulator class to include a new method run_simulation.
# This method will run the simulation for a given number of times and collect the bids
# for each agent at any day, as well as the population for that given market segment.
class ModifiedAdXGameSimulator(AdXGameSimulator):
    def calculate_effective_reach(self, x: int, R: int) -> float:
        return (2.0 / 4.08577) * (atan(4.08577 * ((x + 0.0) / R) - 3.08577) - atan(-3.08577))

    # This time, we will run the simulation for a given number of times and collect the bids
    # for each agent at any day, as well as the population for that given market segment.
    def run_simulation(self, agents: list[NDaysNCampaignsAgent], num_simulations: int, target_segment : MarketSegment) -> None:
        total_agent_bids = {agent : [] for agent in agents} # ADDED, THESE ARE THE BIDS FOR EACH AGENT
        total_profits = {agent : 0.0 for agent in agents}
        for i in range(num_simulations):
            self.states = self.init_agents(agents)
            self.campaigns = dict()
            # Initialize campaigns 
            for agent in self.agents:    
                    agent.current_game = i + 1 
                    agent.my_campaigns = set()
                    random_campaign = self.generate_campaign(start_day=1)
                    agent_state = self.states[agent]
                    random_campaign.budget = random_campaign.reach
                    agent_state.add_campaign(random_campaign)
                    agent.my_campaigns.add(random_campaign)
                    self.campaigns[random_campaign.uid] = random_campaign

            for day in range(1, self.num_days + 1):
                for agent in self.agents:
                    agent.current_day = day

                # Generate new campaigns and filter
                if day + 1 < self.num_days + 1:
                    new_campaigns = [self.generate_campaign(start_day=day + 1) for _ in range(self.campaigns_per_day)]
                    new_campaigns = [c for c in new_campaigns if c.end_day <= self.num_days]
                    # Solicit campaign bids and run campaign auctions
                    agent_bids = dict()
                    for agent in self.agents:
                        agent_bids[agent] = agent.get_campaign_bids(new_campaigns)

                # Solicit ad bids from agents and run ad auctions
                ad_bids = []
                for agent in self.agents:
                    ad_bids.extend(agent.get_ad_bids())
                users = self.generate_auction_items(10000)
                self.run_ad_auctions(ad_bids, users, day)

                # Update campaign states, quality scores, and profits
                for agent in self.agents:
                    agent_state = self.states[agent]
                    todays_profit = 0.0
                    new_qs_count = 0
                    new_qs_val = 0.0

                    for campaign in agent_state.campaigns.values():
                        if campaign.start_day <= day <= campaign.end_day:
                            if day == campaign.end_day:
                                impressions = agent_state.impressions[campaign.uid]
                                total_cost = agent_state.spend[campaign.uid]
                                effective_reach = self.calculate_effective_reach(impressions, campaign.reach)
                                todays_profit += (effective_reach) * agent_state.budgets[campaign.uid] - total_cost

                                new_qs_count += 1
                                new_qs_val += effective_reach

                    if new_qs_count > 0:
                        new_qs_val /= new_qs_count
                        self.states[agent].quality_score = (1 - self.α) * self.states[agent].quality_score + self.α * new_qs_val
                        agent.quality_score = self.states[agent].quality_score

                    agent_state.profits += todays_profit
                    agent.profit += todays_profit
                
                ## ADDED : WE SAVE THE BIDS FOR EACH AGENT
                for agent in self.agents:
                    if self.states[agent].campaigns.intersection(target_segment):
                        total_agent_bids[agent].append(agent_bids[agent])

                # Run campaign auctions
                self.run_campaign_auctions(agent_bids, new_campaigns)
                # Run campaign endowments
                for agent in self.agents:
                    if random.random() < min(1, agent.quality_score):
                        random_campaign = self.generate_campaign(start_day=day)
                        agent_state = self.states[agent]
                        random_campaign.budget = random_campaign.reach
                        agent_state.add_campaign(random_campaign)
                        agent.my_campaigns.add(random_campaign)
                        self.campaigns[random_campaign.uid] = random_campaign

        # PRINTING OMITTED
        #     for agent in self.agents:
        #         total_profits[agent] += self.states[agent].profits 
        #     self.print_game_results()
        # self.print_final_results(total_profits, num_simulations)

        return total_agent_bids

# 10 sample agents. 
test_agents = [Tier1NDaysNCampaignsAgent(name=f"Agent {i + 1}") for i in range(10)]
simulator = ModifiedAdXGameSimulator()

# Arbitrarily we pick male, young, lowincome. We could apply this to all market segments and
# generalize accoridngly.
sampleSegment = MarketSegment(("Male", "Young", "LowIncome"))
bids = simulator.run_simulation(test_agents, 10, sampleSegment)

In [None]:
from adx.adx_game_simulator import CONFIG

# Optimal shading based on the aforementioned waterfall algorithm.
def optimal_shading(bids: list[float], segment : MarketSegment) -> list[float]:
    expected_population = CONFIG["market_segment_pop"][segment]
    total_bids = [sum(bid) for bid in zip(*bids)]
    return [bid / expected_population for bid in total_bids]

bids = [optimal_shading(bids, sampleSegment) for bids in bids.values()]

From our derived bids and populations, we are consequently able to run a training 
loop that trains our recurrent network on this information. 

## Section 2: Reinforcement Learning

Reinforcement learning addresses some issues raised by the supervised learning approach,
but also raises some problems of its own. We treat the problem as a POMDP, since other
people's valuations and overall campaign informations are unknown to us. Accordingly,
we can only use previous information from previous days, notably how many impressions
we were able to score in any given MarketSegment.

We considered self-play as an option. 