# Effect of effect_size, variance, and sample_size on p-values
>Day03 – Assignment – Part 2

Like we discussed in class, apart from the null hypothesis (which need not always be equivalent to random chance), the p-value of a statistical test depends on:
1. Effect size,
2. Sample size, and
3. Variance within each group

To explore how these factors influence the p-value, I have written the code below to simulate data for two groups multiple times (just like the excercise we did in class), each time varying the `effect_size`, `std_deviation`, and `sample_size`, and calculating the p-value using a T-test.

Your tasks are the following:
1. Carefully examine and annotate the code by writing detailed comments at each step.
2. Examine the figure that is being produced and write a short paragraph about you observations on how these three quantities influence the p-value.

In [None]:
## Imports
import random
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy import stats
import pandas as pd

In [None]:
# Add comments next to each code chunk to describe the data analysis steps

sample_data     = []
es_sd_ss_pvalue = []

for effect_size in np.arange(0.0, 1.05, 0.05):
    for stddev in [0.5, 1, 2]:
        for sample_size in [5, 10, 20, 50, 100, 200, 500, 1000]:
            group1 = np.random.normal(size = sample_size, loc = 0, scale = stddev)
            group2 = np.random.normal(size = sample_size, loc = effect_size, scale = stddev)

            sample_data.append([[effect_size, stddev, sample_size], group1, group2])

            ttest_out = abs(stats.ttest_ind(group1, group2)[1])
            es_sd_ss_pvalue.append([effect_size, stddev, sample_size, ttest_out])

_[ Add your notes here ]_

In [None]:
es_sd_ss_pvalue = pd.DataFrame.from_records(es_sd_ss_pvalue, columns = ["effect_size", "std_deviation", "sample_size", "pvalue"])
es_sd_ss_pvalue

_[ Add your notes here ]_

In [None]:
# Add comments next to each code chunk to describe the data analysis steps

es_sd_ss_pvalue["below_threshold"] = es_sd_ss_pvalue["pvalue"] <= 0.05
es_sd_ss_pvalue

_[ Add your notes here ]_

In [None]:
# No need to add comments to this chunk that's just making the plot.
# However, it is useful to go through it to make sure you understand what is being plotted in the figure.

fig, axarr = plt.subplots(nrows = 1, ncols = 3, figsize = (12,5), sharey = True)
cols = [0.5, 1, 2]
for j in range(3):
    subset = es_sd_ss_pvalue[es_sd_ss_pvalue["std_deviation"] == cols[j]]
    
    axarr[j].set_title(cols[j], fontsize = 12)
    axarr[j].set_xscale("log")
    
    below_threshold = subset[subset["below_threshold"] == True]
    above_threshold = subset[subset["below_threshold"] == False]
    
    axarr[j].plot(below_threshold["sample_size"], below_threshold["effect_size"], "ro")
    axarr[j].plot(above_threshold["sample_size"], above_threshold["effect_size"], "bx")
    
axarr[1].set_xlabel("Sample Size", fontsize = 16)
axarr[0].set_ylabel("Effect Size", fontsize = 16)

plt.savefig("effectsize_variance_samplesize_pvalue.pdf")

**Question:**  
The figure contains 3 plots next to each other, one for each value `std_deviation`. In each plot, the `sample_size` values are along the x-axis and the `effect_size` values are along the y-axis. Each point is either a red dot or a blue cross depending on whether the p-value for the t-test between the two groups (for that combination of `effect_size`, `std_deviation` , and `sample_size`) is below 0.05 or not.

Examine the figure and write a short paragraph about you observations on how these three quantities influence the p-value.

_[ Write your answer here. ]_