## Provide numbers whose average and standard error is given

By [Serena Bonaretti](https://sbonaretti.github.io/)  
Developed with the support of ChatGPT4o  
Content license: CC0    
Code license: Unlicense 

---

- The aim of this notebook is to **create a list of potential observations** given their **average** and **standard error**
- The use is to recreate barplots with errorbars from publications  

---



**1. Problem formulation** 

Task: Provide 6 numbers whose average is 18 and the standard error is 20

- The mean is $\mu$= 18
- The standard error (SE) is $SE=\frac{\sigma}{\sqrt{n}}$, where:
  - $\sigma$ is the standard deviation (unknown)
  - $n$ is te number of observations (that is, 6)

**2. Calculate the standard deviation**

- Formula: $SE=\frac{\sigma}{\sqrt{n}}$
- Rearranging for $\sigma$: $\sigma = SE \times \sqrt{n} =  20 \times \sqrt{6} \approx 48.99$


**3. Compute the numbers** 

- One way to do this is to start with the mean and then distribute the numbers around this mean in a way that results in the required standard error (SE). Because the standard error is a metric derived from the standard deviation, in the calculations we directly use the standard deviation

- Numbers to calculate:   
$x_1 = \mu + 2a = 18 + 2a$   
$x_2 = \mu + a = 18 + a$   
$x_3 = \mu = 18$   
$x_4 = \mu = 18$   
$x_5 = \mu - a = 18 - a$   
$x_6 = \mu - 2a = 18 - 2a$   
$a$ is the distance to the mean  
*Note*: $x_3$ and $x_4$ are the same number

- Standard deviation formula:   
$\sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}$
- Squaring the formula:   
$\sigma^{2} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$
- Inserting the numbers in the squared formula:    
$\sigma^2 = \frac{(x_1 - 18)^2 + (x_2 - 18)^2 + (x_3 - 18)^2 + (x_4 - 18)^2 + (x_5 - 18)^2 + (x_6 - 18)^2}{6}$  
$\sigma^2 = \frac{(18 + 2a - 18)^2 + (18 + a - 18)^2 + (18 - 18)^2 + (18 - 18)^2 + (18 - a - 18)^2 + (18 - 2a - 18)^2}{6}$  
$48.99^2 = \frac{(2a)^2 + (a)^2 + 0 + 0 + (a)^2 + (2a)^2}{6}$  
$48.99^2 = \frac{4a^2 + a^2 + a^2 + 4a^2}{6}$  
$48.99^2 = \frac{10a^2}{6}$  (1)  
$48.99^2 = \frac{5a^2}{3}$    
$a^2 = \frac{48.99^2 \times 3}{5}$  
$a = \sqrt{\frac{48.99^2 \times 3}{5}} \approx \sqrt{1436.94} \approx 37.89$  
*Note*: for the computational implementation below, we need to calculate $a$:  
From row (1) (four rows above):  
$a^2 = \frac{\sigma^2 \times n}{10}$, thus $a = \sqrt{\frac{\sigma^2 \times n}{10}}$


- Calculating the numbers:  
$x_1 = \mu + 2a = 18 + 2a = 18 + 75.78 = 93.78$   
$x_2 = \mu + a = 18 + a = 18 + 37.89 = 55.89$   
$x_3 = \mu = 18$   
$x_4 = \mu = 18$   
$x_5 = \mu - a = 18 - a = 18 - 37.89 = -19.89$   
$x_6 = \mu - 2a = 18 - 2a = 18 - 75.78 = -57.78$  

**4. Verifying the results**
- Inserting numbers in the mean formula:  
  $\frac{x_1 + x_2 + x_3 + x_4 + x_5 + x_6}{6} = 18$  
  $\frac{93.78+55.89+18+18+(−19.89)+(−57.78)}{6} = \frac{108}{6} = 18$
- Inserting numbers in the standard deviation formula:  
  $\sigma = \sqrt{\frac{(93.78 - 18)^2 + (55.89 - 18)^2 + (18 - 18)^2 + (18 - 18)^2 + (-19.89 - 18)^2 + (-57.78 - 18)^2}{6}} \approx 48.91$
- Inserting numbers in the standard error formula:  
  $SE=\frac{\sigma}{\sqrt{n}} = \frac{48.91}{\sqrt{6}} = 19.98$ 

---
## Implementation

In [1]:
import numpy as np
from scipy.stats import sem
import pandas as pd

In [2]:
def compute_numbers(n, mean, SE):

    # calculating the standard deviation
    sigma = SE * np.sqrt(n)
    
    # calculating the value of a for a symmetrical distribution
    a = np.sqrt((sigma**2 * n) / 10)
    
    # computing the six numbers
    x1 = np.round(mean + 2 * a ,2)
    x2 = np.round(mean + a, 2)
    x3 = np.round(mean, 2)
    x4 = np.round(mean, 2)
    x5 = np.round(mean - a, 2)
    x6 = np.round(mean - 2 * a, 2)
    
    # creating the list of numbers to return
    numbers = [x1, x2, x3, x4, x5, x6]

    # printing for check
    print("Computed numbers: ", numbers)
    print("Given mean:", mean, "| calculated mean: ", np.round(np.mean(numbers), 2))
    print("Given standard error:", SE, "| calculated standard error: ",  np.round(sem(numbers), 2))

    return numbers
    
numbers = compute_numbers(6, 28, 10)

Computed numbers:  [65.95, 46.97, 28, 28, 9.03, -9.95]
Given mean: 28 | calculated mean:  28.0
Given standard error: 10 | calculated standard error:  10.95


In [3]:
graph_8_values = {"ant_lat" : [6, 18, 18],
                  "ant_med" : [6, 27, 10],
                  "mid_ant_lat" : [6, 29, 25],
                  "mid_ant_med" : [6, 25, 12],
                  "mid_pos_lat" : [6, 14, 8],
                  "mid_pos_med" : [6, 13, 7],
                  "pos_lat" : [6, 38, 55],
                  "pos_med" : [6, 33, 2]}

In [4]:
table_dict = {}
for k,v in graph_8_values.items():
    numbers = compute_numbers(v[0], v[1], v[2])
    table_dict[k] = numbers
print (table_dict)

Computed numbers:  [86.31, 52.15, 18, 18, -16.15, -50.31]
Given mean: 18 | calculated mean:  18.0
Given standard error: 18 | calculated standard error:  19.72
Computed numbers:  [64.95, 45.97, 27, 27, 8.03, -10.95]
Given mean: 27 | calculated mean:  27.0
Given standard error: 10 | calculated standard error:  10.95
Computed numbers:  [123.87, 76.43, 29, 29, -18.43, -65.87]
Given mean: 29 | calculated mean:  29.0
Given standard error: 25 | calculated standard error:  27.39
Computed numbers:  [70.54, 47.77, 25, 25, 2.23, -20.54]
Given mean: 25 | calculated mean:  25.0
Given standard error: 12 | calculated standard error:  13.15
Computed numbers:  [44.36, 29.18, 14, 14, -1.18, -16.36]
Given mean: 14 | calculated mean:  14.0
Given standard error: 8 | calculated standard error:  8.76
Computed numbers:  [39.56, 26.28, 13, 13, -0.28, -13.56]
Given mean: 13 | calculated mean:  13.0
Given standard error: 7 | calculated standard error:  7.67
Computed numbers:  [246.71, 142.36, 38, 38, -66.36, -17

In [5]:
df = pd.DataFrame(table_dict)
df

Unnamed: 0,ant_lat,ant_med,mid_ant_lat,mid_ant_med,mid_pos_lat,mid_pos_med,pos_lat,pos_med
0,86.31,64.95,123.87,70.54,44.36,39.56,246.71,40.59
1,52.15,45.97,76.43,47.77,29.18,26.28,142.36,36.79
2,18.0,27.0,29.0,25.0,14.0,13.0,38.0,33.0
3,18.0,27.0,29.0,25.0,14.0,13.0,38.0,33.0
4,-16.15,8.03,-18.43,2.23,-1.18,-0.28,-66.36,29.21
5,-50.31,-10.95,-65.87,-20.54,-16.36,-13.56,-170.71,25.41


In [6]:
df.to_csv("data_fig_8.csv", index=False)