# Step#1

## Assumptions
* Every subcampaign is associated with a different class of customer (i.e. context)
* ...

## Variables and notation

| Variable | Description |
|---------:|:------------|
| $t$ | Time/Day  |
| $T$ | Time horizon |
| $j$ | A customer class/subcampaign |
| $N$ | Number of customer classes/subcampaigns |
| $p_{j,t}$ | Price at time $t$ for the customer class/subcampaign $j$ |
| $c_{j}$ | Conversion rate at price $p$ for the customer class/subcampaign $j$ |
| $m$ | Margin obtained by the sale at price $p$ |
| $x_{j,t}$ | Bid of subcampaign $j$ at time $t$ |
| $n_{j}$ | Number of clicks of new users of subcampaign $j$, given the value of the bid $x_{j,t}$ |
| $\tau_{j}$ | Number of times the user buy again the item by 30 days after the first purchase |
| $CPC_{j}$| Cost per click for the subcampaign $j$, given the value of the bid $x_{j,t}$ |

## Optimization problem
Objective function (maximization of the profit): 
* $ \underset{p_{j,t} , x_{j,t}} {\textrm{max}} \sum \limits _{t} ^{T} \sum \limits _{j} ^{N} n_{j}(x_{j,t}) \text{  } [ \text{  } c_{j}(p_{j,t}) \text{  } m(p_{j,t}) \text{  } (\tau_{j} + 1) - CPC_{j}(x_{j,t}) ] $

## Algorithm

***
$\mathbf{\text{Joint bidding/pricing optimization algorithm}}$<br>
***
For $t = 1$ to $T$, and for $j = 1$ to $N$ set:
* $p_{j,t}^{*}, x_{j,t}^{*} = \underset{p_{j,t} , x_{j,t}}{\textrm{arg max}} \text{  } n_{j}(x_{j,t}) \text{  } [ \text{  } c_{j}(p_{j,t}) \text{  } m(p_{j,t}) \text{  } \tau_{j} - CPC_{j}(x_{j,t}) ]$

The values of all the parameters are known and available in $\Theta(1)$.  
Defining $X$ the total number of bids and $P$ the total number of prices, the algorithm finds the optimal values $p_{j,t}^{*}, x_{j,t}^{*}$ with time complexity: $$ \Theta(T \text{ } N \text{ } X \text{ } P) $$

# Step#2



In step 2 we consider the online optimization version of the proble we described in step 1. <br>
We have to find the best values for the price and bid, without having full knowledge of the distributions governing our variables. To find an approximation of the variables we have to sample values from the environment and build an increasingly-better estimation of the underlying distribution. <br>
Each day is considered a different "round" of our problem and we consider a time horizon of 1 year.


$\mathbf{\text{Random Variables}}$<br>
In this section we list the variables of which we do not have full knowledge. These variables will be sampled at each round from a distribution in the environment. 

| Random Variable | Motivation |
|---------:|:------------|
| $CPC_{j}$| Cost per click is randomly extracted from distribution |
| $c_{j}$ | The conversion (buying an item after visiting the site) is sampled from a distribution |
| $n_{j}$| Number of new users is randomly extracted at the 'start' of each day |
| $\tau_{j}$| The number of times the user buys again is sampled from a distr. after the first purchase |


$\mathbf{\text{Random Variable Models}}$<br> 
The random variables follow a probability distribution which is set in the environment and at each round a value is sampled from these distributions.<br>
As each class has a different behaviour the underlying distributions vary for each one.<br>
In this section we present the three distributions which govern the behaviour for each class.

##### Sampling
Each point in the graph represents the mean of the distribution the variable is associated to. <br>
Sampling from a variable means extracting a value from a distribution with the chosen parameter. <br>
In particular when sampling from the Conversion Rate, we sample from a Bernoulli distribution.
Meanwhile, when sampling from the CPC, the numeber of new users and the future purchases we sample from a Gaussian distribution with the chosen mean parameter and variance ??. 


$\mathbf{\text{Conversion Rate}}$<br> 


The conversion rates differ for each class and have been modeled to follow the different beahaviour of the users of each class.
In particular C1 users will tend have a higher conversion rate as they are the most prone to buy our product, while C2 and C3 users follow a diffrent behavior with lower confersion rates.


<img src="Graphs/Conv_rates.png" width="300"/> <img src="Graphs/Conv_rates_agg.png" width="300"/>



$\mathbf{\text{Cost-Per-Click}}$<br> 



We modeled the CPC as a monotoicly increasing function of the bid.

<img src="Graphs/CPC.png" width="300"/> <img src="Graphs/CPC_agg.png" width="300"/>



$\mathbf{\text{Daily Clicks}}$<br> 



Daily clicks amount to the number of users that visit our shop every day.
We modeled it as an increasing function which grows towards a different limit for each class and it is dependent on the bid amount.

<img src="Graphs/Daily_clicks.png" width="300"/> <img src="Graphs/Daily_clicks_agg.png" width="300"/>



$\mathbf{\text{Future Purchases}}$<br> 



Future purchases are modeled with a decreasing function different for each class based on the price of the product.

<img src="Graphs/Future_purchases.png" width="300"/> <img src="Graphs/Future_purchases_agg.png" width="300"/>



$\mathbf{\text{Delays in the feedback}}$<br> 
In this step we can also introduce potential delays in the feedbacks. <br>
The delay that can be considered is that the subsequent buys from the same user are not considered instantaneous but delayed by a certain time. For the sake of simplicity, we considered the time to be fixed, but it can also be implemented as a random variable where the delay sampled from a probability distribution. 

| Delay | Description |
|---------:|:------------|
| $\alpha_{j}$| Delay in acquiring item again  | 

$\mathbf{\text{Regret}}$<br> 

To solve the online optimization problem we use a MAB approach.
In the MAB approach the objective is to minimize the regret, defined as the cumulative difference between the reward of the clairvoyant algorithm, which always chooses the optimal arm $\mu^*$, and the reward given by the arm which we choose at a specific round $\mu$.




| Random Variable | Motivation |
|---------:|:------------|
| $\mu$| Cost per click is randomly extracted from distribution |
| $c_{j}$ | The conversion (buying an item after visiting the site) is sampled from a distribution |
| $n_{j}$| Number of new users is randomly extracted at the 'start' of each day |
| $\tau_{j}$| The number of times the user buys again is sampled from a distr. after the first purchase |



$ \underset{p_{j,t} , x_{j,t}} \min \rho $
<br>
<br>
$ \rho = T \cdot \mu^* - \sum_{t=0}^T \mu_t $
<br>
<br>
The rewards use the values sampled from the distributions described in the previous section and follow the formula:
<br>
<br>
$ \mu^* = {n_{j}^*} (x_{j,t}) \text{  } [ \text{  } c_{j}^*(p_{j,t}) \text{  } m(p_{j,t}) \text{  } (\tau_{j}^* + 1) - CPC^*_{j}(x_{j,t}) ] $
<br>
<br>
$ \mu_t = {n_{j,t}} (x_{j,t}) \text{  } [ \text{  } c_{j,t}(p_{j,t}) \text{  } m(p_{j,t}) \text{  } (\tau_{j,t} + 1) - CPC_{j,t}(x_{j,t}) ] $