In [None]:
!rm -fr img
!rm -fr data
!git clone https://github.com/mciprian/ml_class_content.git
!mv ml_class_content/notebooks/img img
!mv ml_class_content/notebooks/data data
!rm data/creditcard.zip
!rm -fr ml_class_content/

# Monte Carlo Simulation

From Wikipedia:

"Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. The name comes from the Monte Carlo Casino in Monaco, where the primary developer of the method, physicist Stanislaw Ulam, was inspired by his uncle's gambling habits."

Let's estimate $π$ using Monte carlo approach:



<img src="https://upload.wikimedia.org/wikipedia/commons/d/d4/Pi_monte_carlo_all.gif" alt="drawing" width="400"/>

In [None]:
import numpy as np

def is_inside_circle(x,y):
  return np.sqrt(x**2 + y**2) < 1

n_simulations = 100
points_in_circle = 0

x = np.random.uniform(0, 1, n_simulations)
y = np.random.uniform(0, 1, n_simulations)

# Complete the code
# Use the pi_estimate variable to store



pi_estimate

# Reinforcement Learning

<img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*7cuAqjQ97x1H_sBIeAVVZg.png" alt="drawing" width="600"/>

From "[Reinforcement Learning 101](https://towardsdatascience.com/reinforcement-learning-101-e24b50e1d292)".



# AB Testing and Multiarm bandits

AB testing is a primary tool in science to test hypothesis from exploring and experiment with variants. It depends of the The Explore-Exploit trade off and the automatic level of application, one of these methodologies must be used.

In AB Testing, trial period are fixed, however in MAB every version compites in "real time" to win. Thus less cost is produced in industrial applications.

(Images from <a href="https://persona.ly/glossary/performance-metrics/creative-testing-multi-armed-bandit-vs-a-b-testing/">here</a> and <a href="https://www.optimizely.com/optimization-glossary/ab-testing/"> here</a>)

In [None]:
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from scipy.stats import beta
from IPython.display import YouTubeVideo

Next image, represents a experiment configuration:

In [None]:
from IPython.display import Image
Image(filename="img/abtest.png")

How many time do you need to be sure of the test ?

In [None]:
Image(filename="img/abtest_cont.png")

### Let's study a financial example

<a haref="https://www.bbvaaifactory.com/odile/"> Odile hurracan</a> (<a href="https://github.com/bbvadata/bbvadata_papers/raw/master/20160925-d4gx-bbva-ungp.pdf">Paper</a>)

In [None]:
Image(filename="img/odile.png")

### Multiarm bandit

Think about that you have five disccounts (prices) that you must apply on your customers.

* Which price is better to **continuously** maximize revenue ?
* Which cost every experiment running in time ?

In [None]:
Image(filename="img/mab.png")

In [None]:
YouTubeVideo('nkyDGGQ5h60',width=640, height=480)

The density function of a random variable $X$ that follows a Beta distribution (with params  \alpha, \beta) is the next one:

\begin{align}
f_X(x | \; \alpha, \beta ) = \frac{ x^{(\alpha - 1)}(1-x)^{ (\beta - 1) } }{B(\alpha, \beta) }
\end{align}

Note:

* With $\alpha=1, \beta=1$ we have a Uniform distribution

In [None]:
YouTubeVideo('juF3r12nM5A',width=640, height=480)

In [None]:
# Define the number of arms (or options)
n_arms = 10

# Define one beta and alpha paramter for every arm
betas = np.ones(n_arms)
alphas = np.ones(n_arms)

# Set trials and succes
trials = np.zeros(n_arms, dtype=np.int16)
success = np.zeros(n_arms, dtype=np.int16)

best_arms = []
for round in range(150):
    # Draw a sample from beta distributions (for every arm)
    theta = np.random.beta(alphas + success, betas + trials - success)
    # Get the max value (select the best arm)
    best_arm = np.argmax(theta)
    # print("Best arm %d in round %d" % (best_arm, round))
    # Update results
    trials[best_arm] += 1

    # If feedback is positive, success +1
    feedback = None
    if best_arm == 5:
        feedback = np.random.rand() < 0.5
    else:
        feedback = np.random.rand() < 0.3
    # If positive, success +1 in the given arm
    if feedback:
        success[best_arm] += 1

plt.bar([i for i in range(n_arms)], success)
plt.title("Number of success for arm")
plt.show()

x = np.linspace(0.001, 0.999, 200)
for arm in range(n_arms):
    y = beta(alphas[arm] + success[arm], betas[arm] + trials[arm] - success[arm]).pdf(x)
    plt.plot(x, y, label="arm {}".format(arm))
plt.legend(loc="upper right")
plt.title("Beta distribution")
plt.show()


  <span style="color:blue">Which are the best arms ?</span>

  <span style="color:blue">Execute the experiment cell and plot $\beta$ distributions. Discuss the results that you will find in every execution.</span>

  <span style="color:blue">How many tests (number of trial) do you need to be confident ?</span>

  <span style="color:blue">If there is a best arm, how cost all arm explorations?</span>

  <span style="color:blue">Exercice: Propose a 500 trials experiment with a swith of arms success on iteration 250. Code and analize the results</span>