Suppose you work at a video game company as a data scientist, and your company recently rolled out a new update to your game. It's been a few weeks since the update, and you want to test if the average user spends a different amount of time per game session after the update. 

Previous to the update, the average game session was **93 minutes** with a standard deviation of 20 minutes. 

You take a random sample of 500 users out of 50,000 that play your game.

You want to conduct an A/B test to decide if there is statistically significant evidence of a change in the average time spent on one game session.

Before doing anything, check that this experimental design meets the conditions for inference:
1. Random sampling
2. Normal sampling distribution
    * Parent population is normal or $n \geq 30$
3. Independent samples
    * Bootstrap (uncommon)
    * $n \leq \text{10% of parent population}$

Next, we need to set an $\alpha$. This is typically .05 in business applications.

Now we need to write our null and alternative hypotheses:

$$
H_0: \mu = 93\\
H_1: \mu \neq 93
$$

Now read in the data.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('player_times.csv', index_col=0)
df = df.rename({'0': 'session_time'}, axis=1)

In [3]:
df.head()

Unnamed: 0,session_time
0,63.869841
1,92.926903
2,70.104841
3,128.727831
4,74.937354


In [4]:
sample_mean = df['session_time'].mean()
sample_mean

87.17229134011139

In [5]:
sample_standard_deviation = df['session_time'].std()
sample_standard_deviation

18.269319276017807

Now generate the distribution of our entire player base under the assumption that the null hypothesis is true.

In [6]:
null_dist = np.random.normal(loc = 93, scale = 20, size=500)

Calculate our two-sided p-value.

In [10]:
null_dist.mean()

92.85066667161641

In [9]:
p_value = (null_dist > (93 + sample_mean)).mean()
p_value

0.0

Because $p > \alpha$ we fail to reject the null hypothesis.

This does **not** mean that the game update made no difference! It did not make a statistically significant difference under our z-test, but it may still have made a *practically significant* difference.