# A/B Testing {#sec-ab-testing}

## Overview

In this chapter we will review one of the most commonly used testing methodologies. Specificaally, we will look into <a href="https://en.wikipedia.org/wiki/A/B_testing">A/B testing</a>.
With A/B testing the aim is to drive businees decisions based on experimentation. It employs many of the elements of
hypothesis testign that we have seen so far in particular two-sample hypothesis testing.

A/B tests are useful when we want to understand user engagement and.or satisfaction of online features like a new feature or product. In this aspect, large social media sites like LinkedIn and  Facebook use A/B testing to make user experiences more successful and as a way to streamline their services [1].

## A/B testing

A/B testing is, in principle, a randomized controlled experiment where a number of samples, e.g. A and B, of a single variable are compared [1]. Typically, the two sample will represent different variations of the variable. Our aim is to compare
which variation of the varaible is more efficient or better optimizes a given metric. 
Consider for example an e-shop like Amazon. One of our aims is to increase use interaction. How can we do this?
The marketing team prepares two flavours of the e-shop. Which one do we choose? A/B testing poses a framewok that allows
us to make such decisions.

### A/B testign stages

A/B testing typically involves three stages as shown in the figure below.

| ![simple-linear-regression](../../../imgs/statistics/hypothesis_testing/ab_testing_stages.png) |
|:--:|
| **Figure 1: Stages of A/B testing.**|

Let's briefly outline what is happening in every stage.

- **Design statge**: In this stage we need to decide how many measurement to record, and which metric or metrics we want to evaluate. Typically, we want to take as many measurements as possible, a process called **replication**, in order to reduce **natural variation** in the final estimate the selected business metrics.
- **Take measurements stage**: In this stage we measure the metrics we agreed upon in the previous stage. Notice however that we need to be careful and measure only the effect of switching from version A to version B. This can be done by using randomizaton. Randomization is essential in order to inadvertently include the impact of other factors that affect may affect the  business metric; e.g. time of day, user demographics, location of a data center e.t.c.
- **Analysis stage**: In this stage we analyse the measurements using various statistical techniques including hypothesis testing

Let's try to work out an example in order to better understand what all these imply.

#### A/B testing example

Let's consider we need to test two versions of component to be integrated into our overall system.
Let's call these versions A and B respectivelly.

In [1]:
import numpy as np

In [5]:
def sim_system(sys_name: str, error_mean: float=0.0, error_std: float =1.0) -> float:
    """Simulate a system by adding noise to the system cost
    """
    sys_cost = {'A': 25.0, 'B': 28.0}
    total_cost = sys_cost[sys_name] + np.random.nomal(error_mean, error_std)
    return total_cost
        

In [None]:
def randomized_measurement():
    asdaq_measurement = []
    byse_measurement = []
    for tod in ["morning", "afternoon"]:
        for _ in range(100):
            if np.random.randint(2) == 0:
                asdaq_measurement.append(trading_system_tod("ASDAQ", tod))
            else:
                byse_measurement.append(trading_system_tod("BYSE", tod))
    return (np.array(asdaq_measurement).mean(),
            np.array(byse_measurement).mean())

### A/B testing details

Now that we have an understanding of what A/B testing is, let's look into some important details. 

**Variation**

We mentioned that within the A/B testing framework we need to take measurements about the metrics we are interested in for both
systems. However, we need to get a high enough number of measurements so that the variation of these is neigher too high nor too low.
Variation is both unpredictable and out of our control. We can mitigate this problem by using replication meaning
taking multiple measurements and averaging over these.

Thus we need, but that doesn’t mean you can’t make a meaningful decision about which exchange to trade on

**Bias**

Another problem related to taking meauserements is sampling bias i.e. taking a measurement under different conditions will consistently yield different results. 
When the bias is applied differently and consistently to the two versions of the system being compared, we call it **confounder bias**. 
As you can understand , confounder bias can lead to incorrect decisions about whether to make changes to out system. Randomization is a method 
we can use in order to remove confounder bias.

### Ptifalls with A/B tesing

Testing an unclear hypothesis

## Summary

In this chapter, we looked into the details of A/B testing. An A/B test aims at  deciding whether to use version A or version B of
a given system, approach or implementation; basically anything we can take measurements of. It has three main stages


- Design statge
- Take measurements stage
- Analysis stage

Given that we need to take measurements of the two systems, we need to be aware of bothe variation and bias. We address the latter by taking
multiple measuremnts and averaging these. We strive to reduce the variance of these averaged measurements in order to tackle the variation of the
individual measurements. We address bias by using randomization i.e. randomly sampling for measurements.

## References

1. <a href="https://en.wikipedia.org/wiki/A/B_testing">A/B testing</a>.