# Table of contents
1. [Purpose](#purpose)
2. [Update log](#update)
3. [Formula](#formula)
4. [Code](#code)
5. [Verifications](#verification)
6. [Sampling](#sampling)
7. [Information loss](#time_random)


![](https://i.ibb.co/VwzVbgh/pexels-nadi-lindsay-3521963.jpg)

pic hyperlink (free license): https://www.pexels.com/photo/gingerbread-man-near-coffee-mug-3521963/

# Purpose <a id='purpose'></a>

The purpose of this notebook is to provide sloopy baseline estimation of $\mathbb{P}(p_i|D)$. Many ideas here were tested, __none of them__ are silver bullets to this competition. Let's collaborate and have some candies together during this holiday season. 

# Update log: <a id='update'></a>

12/14/2020: Formula typo fix.

12/15/2020: 1. Sampling from custom distribution provided. 2. Change example from 100 -> 20. 3.bug fix. (introduced when reducing the trial number) 4. add DF_t in sampling example.

12/16/2020: Shows examples of information loss. Explain that it may come from sampling generated by strategy not random.






# Formula <a id='formula'></a>

Posterior calculation

For i-th slot machine, estimate $\mathbb{P}(t=0, i)=p_{i}$


Suppose you pull the machine several times what you observe are $(d_k, r_k)$, here $d_{k}$ is the discounting factor, $r_{k}$ is the reward.


Prior is trivial, $\mathbb{P}(D_{k}|D_{k-1}, p_i)=(d_k p_{i})^{r_{k}}(1 - d_k p_{i})^{1 - r_{k}}$, we have:

$$\mathbb{P}(p_i|D) \sim \prod_{k}\left[(d_k p_{i})^{r_{k}}(1 - d_k p_{i})^{1 - r_{k}}\right].$$



**Step 1:** Normalization constant:
$$N = \int_{0}^1 \prod_{k}\left[(d_k p_{i})^{r_{k}}(1 - d_k p_{i})^{1 - r_{k}}\right] dp_i.$$


**Step 2:** Compute expected mean and std:
$$\mathbb{E}(p_i^k|D) = \int_{0}^1 p_i^k\mathbb{P}(p_i|D) dp_i=\int_{0}^1 p_i^k \frac{\prod_{k}\left[(d_k p_{i})^{r_{k}}(1 - d_k p_{i})^{1 - r_{k}}\right]}{N} dp_i.$$

# Code <a id='code'></a>

In [None]:
from scipy.stats import beta
from scipy import integrate

def pdf(p, weights, rewards, normalization=1):
    s = 1
    for weight, reward in zip(weights, rewards):
        if reward == 1:
            s *= (weight * p)
        else:
            s *= (1 - weight * p)
    return s / normalization

def get_expected_mean_std(weights, rewards):
    normalization = integrate.quad(
        lambda x: pdf(x, weights, rewards), 0, 1
    )[0]
    first_order = integrate.quad(
        lambda x: x * pdf(x, weights, rewards, normalization), 0, 1
    )[0]
    second_order = integrate.quad(
        lambda x: x * x * pdf(x, weights, rewards, normalization), 0, 1
    )[0]
    return first_order, (second_order - first_order ** 2) ** 0.5

# Matching $\beta$ distribution when no discounting factor <a id='verification'></a>

In [None]:
import numpy as np
trials = 20
discountings = [1 for _ in range(trials)]
rewards = [int(np.random.choice([0, 1], p=(0.3, 0.7))) for _ in range(trials)]

In [None]:
get_expected_mean_std(discountings, rewards)

In [None]:
(beta.mean(1 + sum(rewards), 21 - sum(rewards)),
 beta.std(1 + sum(rewards), 21 - sum(rewards)))

In [None]:
# Thanks Daniel for your post. 

for it in range(trials):
    discountings[it] = 0.97**it
    p_k = 0.7 * discountings[it]
    rewards[it] = int(np.random.choice([0, 1], p=(1-p_k, p_k)))

# Use the with-weightings formula
print("Weighted approach gives  mean, std:")
print(get_expected_mean_std(discountings, rewards))

# Do the equivalent calculations using beta function:
print("The beta function gives  mean, std:")
print(
    (
        beta.mean(1 + sum(rewards), trials+1 - sum(rewards)),
        beta.std(1 + sum(rewards), trials+1 - sum(rewards))
    )
)

# Inverse Transform Sampling <a id='sampling'></a>

People may want to sample from this distribution, for example see a nice post here https://www.kaggle.com/ilialar/simple-multi-armed-bandit. Here we have a function to sample from p_i|D. More infor about inverse transform sampling, https://en.wikipedia.org/wiki/Inverse_transform_sampling. (The function below has no performance optimization ... at all.. It taks ~500ms to sample 1000. -.-|||)

In [None]:
%%time
import numpy as np
from scipy.optimize import newton
import matplotlib.pyplot as plt
def sample_one(weights, rewards):
    normalization = integrate.quad(
        lambda x: pdf(x, weights, rewards), 0, 1
    )[0]
    cdf = lambda x: integrate.quad(
        lambda p: pdf(p, weights, rewards, normalization), 0, x
    )[0]
    x = float(np.random.uniform())
    return newton(lambda p: cdf(p) - x, x0=0.5)


DF_t = 0.5
samples = [sample_one([1, 1, 1], [0, 1, 0]) * DF_t for _ in range(1000)]
plt.hist(samples)

# Information loss <a id='time_random'></a>

What people observe is not random sampling, but samples from their strategy & enemy moves.

Now consider one over simplified case:
* Your opponent pulled several times before you.
* Start from 5 exploration steps.
* Stop until getting 1 zeros at time $\tau$.

In [None]:
import numpy as np


def sample_oversimplified_strategy(p):
    sample = []
    for i in range(5):
        sample.append(int(np.random.uniform() < p))
    while np.random.uniform() < p:
        sample.append(1)
    sample.append(0)
    return sample
        

def generate_oversimplified_strategy_estimation(random_arrival_time=True):

    initial_probability = np.linspace(0, 0.95, 200)
    if random_arrival_time:
        priority = [
            float(np.random.uniform(0.4, 0.5)) for _ in range(200)
        ]
    else:
        priority = [
            1 for _ in range(200)
        ]

    observations = [
        sample_oversimplified_strategy(
            priority[i]* p
        ) for i, p in enumerate(initial_probability)
    ]

    discountings = [
        [
            priority[i] * 0.97 ** i for i in range(len(ob))
        ] for i, ob in enumerate(observations)
    ]


    return [
        get_expected_mean_std(df, ob)[0] for df, ob in zip(
            discountings, observations
        )
    ]

In [None]:
import matplotlib.pyplot as plt
plt.plot(generate_oversimplified_strategy_estimation(random_arrival_time=True), label="random arrival time")
plt.plot(generate_oversimplified_strategy_estimation(random_arrival_time=False), label="no opponent")
plt.legend()