# Scaling

We have our first implementation! I can use this to run one experiment.

Now let's look at running more than one experiment at a time. First, I'll generalize my rules a bit:

* ① Given an experimental unit, return the value of the variant to show
* ② The same experimental unit is assigned to the same variant
* ③ Different experimental units are randomly assigned
* ④ The proportion of experimental units assigned to each variant is the proper proportion.

Now I'll run two experiments:
* The color experiment with an experimental unit of user_id and proportion of 50% red and 50% blue.
* The size experiment with an experimental unit of user_id and proportion of 50% big and 50% small.

There are a few approaches I can take for scaling.

- **Scheduling**: Coordinate with whoever is running the other experiment, and only run one at a time.
- **Independent**: Pretend the other experiment doesn't exist and treat its effects as another unknown variable. 
- **Splitting traffic**: Split traffic between experiments. I won't demonstrate that in these notebooks, but the idea is taking the assignment groups introduced in [4.Rollout](4.Rollout.html), and use them to divide traffic between experiments.


Picturing the last two:
![two experiments](files/2experiments.png)

![split traffic](files/split-traffic.png)

In [1]:
import pandas as pd

from utils import spoilers
from utils.pretty import pp
from utils.simulate import n_different_users

## Independent experiments

I'll use an example of independent experiments. I'll implement a `choose_size_assignment` a lot like the `choose_color_assignment`.

In [2]:
def choose_color_assignment(user_id):
    key = "{}|color".format(user_id)    
    assignment_i = spoilers.hash_func(key) % 2
    return ['red', 'blue'][assignment_i]

def choose_size_assignment(user_id):
    key = "{}|size".format(user_id)    
    assignment_i = spoilers.hash_func(key) % 2
    return ['big', 'small'][assignment_i]

This is the same function from last time, which I already showed completed the goals. I do want to show that these are independent.

## `user_id`s assignments to the size and color experiments are independent

I'll look at a few user assignments, and compare how they were assinged to the color and size experiments.

I'll see the color assignment doesn't influence the size assignment.

In [3]:


pp(
    pd.merge(
        n_different_users(choose_color_assignment, n=20), 
        n_different_users(choose_size_assignment, n=20, key='size'), 
        on='user_id'
    )
)

Unnamed: 0,user_id,color,size
0,0,blue,big
1,1,blue,big
2,2,red,small
3,3,blue,small
4,4,blue,small
5,5,blue,big
6,6,red,big
7,7,blue,small
8,8,red,big
9,9,red,big


If I look at a bunch of users, 25% of users should end up seeing each combination.

In [4]:
pd.merge(
        n_different_users(choose_color_assignment, n=10000), 
        n_different_users(choose_size_assignment, n=10000, key='size'), 
        on='user_id'
    ).groupby(['color', 'size']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,user_id
color,size,Unnamed: 2_level_1
blue,big,2480
blue,small,2524
red,big,2504
red,small,2492


## Independence done wrong

I haven't shown a bad example for a while. Let's see what happens if I forget to update the salt for the size experiment.

`bad_choose_size_assignment` uses the same salt as `choose_color_assignment`.

In [5]:
def bad_choose_size_assignment(user_id):
    key = "{}|color".format(user_id)    
    assignment_i = spoilers.hash_func(key) % 2
    return ['big', 'small'][assignment_i]

In [6]:
pp(
    pd.merge(
        n_different_users(choose_color_assignment, n=100), 
        n_different_users(bad_choose_size_assignment, n=100, key='size'), 
        on='user_id'
    )
)

Unnamed: 0,user_id,color,size
0,0,blue,small
1,1,blue,small
2,2,red,big
3,3,blue,small
4,4,blue,small
5,5,blue,small
6,6,red,big
7,7,blue,small
8,8,red,big
9,9,red,big


Awkward! The assignment in the color experiments is completely dependent on the assignment in the size experiment!

In [7]:
pd.merge(
        n_different_users(choose_color_assignment, n=10000), 
        n_different_users(bad_choose_size_assignment, n=10000, key='size'), 
        on='user_id'
    ).groupby(['color', 'size']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,user_id
color,size,Unnamed: 2_level_1
blue,small,5004
red,big,4996


# Summary

This talked about a few ways to run more than one experiment at a time, and showed how salts can be used to shuffle users, which is useful for running independent experiments.

Dependent assignments is fun example: it's when assignment betrays and makes super dependent experiments! I might think I can analyze them independently, but I'd be getting the wrong results!


# [Next: Rollout](4.Rollout.ipynb)

With experiment rollout, we can gradually roll out new features.

# TOC
- **[0. Introduction](0.Introduction.ipynb)**: What a good `choose_color_assignment` function looks like.
- **[1. Experimental Units](1.ExperimentalUnits.ipynb)**: What happens when I don't pay attention to experimental units.
- **[2. Deterministic Assignment](2.DeterministicAssignment.ipynb)**: What it looks like to deterministically assign
- **[3. Scaling](3.Scaling.ipynb)**: How not to run two experiments at the same time.
- **[4. Rollout](4.Rollout.ipynb)**: How to gradually show users a new experiment.