## AB Testing Simulation

Now that we've finished the AB testing calculation GIVEN data. How do we get the data?

The goal of this exercise is to create data like the one you see in your previous lesson. Here are some parameters:
- There are 14 days and 1000 users
- Assume user ids range from 0 to 999, hint: `list(range(5))`
- Each user has a 50% chance of visiting each day. If they don't visit, no views are created.
- A user should be assigned to the same treatment/control group across the days!
- Only 5% of the users should be assigned to treatment 
- A user's distribution for views follows a $Geometric(p=0.1)$. Hint: `numpy.random.geometric`
- A user's distribution for clicks **given** their views follows a $Binomial(n=views, p_*)$, where $p_a$ and $p_b$, the probability of a click given a view for group A and B should be parametrized, i.e. coded so we can change it later. Hint: `numpy.random.binomial`
- Compile all the data into 2 matrices, each are 1000 by 14, called `clicks` and `views`. Hint: `numpy.vstack()` or `numpy.hstack()`


## Task 1

Create an variable `assign` that contains the treatment/control status for each user.
- What data type can this be? Later we will need to pull up user X's treatment status quickly!


In [33]:
# assign can be a list or dictionary
n = 1000

In [36]:
assign = np.random.choice(['Treat', 'Control'], size=n, p=[0.05, 0.95])

In [37]:
assign[:10]

array(['Control', 'Control', 'Control', 'Control', 'Control', 'Treat',
       'Control', 'Control', 'Control', 'Control'], dtype='<U7')

## Task 2

Simulate a single user's visit on a single day then wrap this inside a function.
- Given a user's id, return two numbers, the number of clicks and views
  - hint: you can return 2 values like
  ```
  def my_func(x):
      return x, x + 1
  ```
  - Question, do we need to pass the assignment variable into the function?

In [39]:
import numpy as np
def sim_click_views(ctr_prob, view_p=0.1, login_rate=0.5):
    if np.random.uniform(size=1) >= login_rate:
        return 0, 0
    views = np.random.geometric(view_p)
    clicks = np.random.binomial(views, ctr_prob)
    return views, clicks

In [40]:
sim_click_views(0.05)

(3, 0)

## Task 3

Finish the whole task! Try writing a loop within another loop, e.g.
```
for i in range(3):
    for j in range(4):
        print(i + j)
```

In [38]:
days = 14
views = np.zeros((n, days))
clicks = views.copy()

In [41]:
for day in range(days):
    for uid in range(n):
        ctr = 0.05 if assign[uid] == 'control' else 0.1
        view, click = sim_click_views(ctr)
        views[uid, day] = view
        clicks[uid, day] = click

## Task 4

Write the data to a file then try to download it!
Hint: `numpy.savetxt`

In [43]:
np.savetxt("sim_ab_views.csv", views, delimiter=',')
np.savetxt("sim_ab_clicks.csv", clicks, delimiter=',')