# Descriptive statistics

*Last updated: 2023-10-02*

- Set of M elements (numbers between 0 and 99)
    - Randomly generated
- Evaluate the distribution of generated values
- Comparison with the documentation of the function that generates the random ones
- Calculate various descriptive statistics quantities

The module for generating pseudo-random values from the Python library uses the Mersenne Twister method, developed in 1997 by Matsumoto&Nishimura. The most commonly used version of the algorithm is based on the Mersenne prime number $\displaystyle 2^{19937}-1$. A Mersenne prime number is one less than a power of two, $M_n = 2^n − 1$ for some integer $n$.

Article: M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”, ACM Transactions on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3–30 1998.

The purpose of the seed is to allow the user to "lock" the pseudo-random number generator, to allow replicable analysis.

random() generates a random float uniformly in the semi-open interval $0.0 \le x < 1.0$.

In [None]:
import math as mt
import statistics as st
import random as rd

In [26]:
rd.seed(1)
rd.random()

0.13436424411240122

In [27]:
m = 20
sample = [int(rd.random()*100) for _ in range(m)]
sample

[84, 76, 25, 49, 44, 65, 78, 9, 2, 83, 43, 76, 0, 44, 72, 22, 94, 90, 3, 2]

## Mean

In [28]:
mean = sum(sample) / len(sample)
mean

48.05

### Checking

In [29]:
st.mean(sample)

48.05

## Median

To find the median, we need to:

1. Sort the sample
2. Locate the value in the middle of the sorted sample

And then

- If the sample has an odd number of observations, then
    - The middle value is the median
- Else
    - Calculate the mean of the two middle values

Example:

- [3, 5, 1, 4, 2], sort: [1, 2, 3, 4, 5], median: 3

- [1, 2, 3, 4, 5, 6], median: (3 + 4) / 2 = 3.5

### Median of the sample

In [30]:
sor = sorted(sample)
sor

[0, 2, 2, 3, 9, 22, 25, 43, 44, 44, 49, 65, 72, 76, 76, 78, 83, 84, 90, 94]

    "//" = integer division
    "%" = remainder

In [31]:
mid = len(sor) // 2
mid

10

In [32]:
odd = len(sor) % 2
odd

0

In [33]:
if odd :
    med = sor[mid]
else :
    med = sum(sor[mid-1:mid+1])/2

med

46.5

### Checking

In [34]:
st.median(sample)

46.5

## Mode

The most frequent observation

In [35]:
from collections import Counter

"Counter" counts the number of objects

In [36]:
c = Counter(sample)
c

Counter({84: 1,
         76: 2,
         25: 1,
         49: 1,
         44: 2,
         65: 1,
         78: 1,
         9: 1,
         2: 2,
         83: 1,
         43: 1,
         0: 1,
         72: 1,
         22: 1,
         94: 1,
         90: 1,
         3: 1})

most_common() returns a list of top 'n' elements from most common to least common

In [37]:
c.most_common()

[(76, 2),
 (44, 2),
 (2, 2),
 (84, 1),
 (25, 1),
 (49, 1),
 (65, 1),
 (78, 1),
 (9, 1),
 (83, 1),
 (43, 1),
 (0, 1),
 (72, 1),
 (22, 1),
 (94, 1),
 (90, 1),
 (3, 1)]

In [38]:
c.most_common(1)

[(76, 2)]

In [39]:
m = c.most_common(1)[0][1]
m

2

In [40]:
d = c.items()
d

dict_items([(84, 1), (76, 2), (25, 1), (49, 1), (44, 2), (65, 1), (78, 1), (9, 1), (2, 2), (83, 1), (43, 1), (0, 1), (72, 1), (22, 1), (94, 1), (90, 1), (3, 1)])

Mode:

In [42]:
mde = [k for (k, v) in d if v == m]
mde

[76, 44, 2]

## Variance

Measure of how far the values in a data set are from the mean or mean value.

$\sigma^2 = \frac{1}{n}{\sum_{i=0}^{n-1}{(x_i - \mu)^2}}$

$\mu$ = mean

Example:

In [151]:
d = [3, 5, 2, 7, 1, 3]

In [152]:
u = sum(d) / len(d)
u

3.5

In [153]:
a = [(i-u)**2 for i in d]
a

[0.25, 2.25, 2.25, 12.25, 6.25, 0.25]

In [154]:
s = sum(a) / len(a)
f"{s:.4f}"

'3.9167'

### Applying to sample

In [43]:
u = sum(sample) / len(sample)
a = [(i-u)**2 for i in sample]
s = sum(a) / len(a)
f"{s:.4f}"

'1053.9475'

### Checking

In [44]:
f'{st.pvariance(sample):.4f}'

'1053.9475'

## Standard Deviation

Measures the amount of variation or dispersion of a set of numeric values.

The standard deviation is a useful measurement because it has the same unit of measurement as the data itself.

$\sigma = \sqrt{\sigma^2}$

In [47]:
sd = f'{mt.sqrt(s):.4f}'
sd

'32.4646'

### Checking

In [46]:
f'{st.pstdev(sample):.4f}'

'32.4646'

## Comments