## **Basic probability and statistics**

### Probability
 - Event
 - Random variable
   - Discrete
   - Continuous
 - Entropy (information quantity of distribution)
 - Kullback-Leibler (KL) divergence (distance between distributions)

### Statistics: Relationship between population and samples
- Parameters (unobserved): Expectation and Variance of Population
- Statistics (observed): Mean & Variance of Samples
- Expectation and Variance of Population
- Mean and Variance of Samples

#### Probability
 - Event
 - Random variable
   - Discrete
   - Continuous

In [None]:
import numpy as np

# Discrete variable
# Normal dice

dice_1 = np.ones(6)
dice_1 /= np.sum(6)
print(dice_1)

In [None]:
# Abnormal dice

dice_2 = np.array([0.05,0.4,0.25,0.15,0.05,0.1])
print(sum(dice_2))

In [None]:
# Continuous variable

x = np.random.normal(3,1,size=5000)

# Histogram

plt.figure(3)
plt.hist(x,bins=100)
plt.grid()

#### Kullback-Leibler divergence    
- $\int{P(x)log{{P(x)}\over{Q(x)}dx}}$

In [None]:
def kl_div(dist_1,dist_2):
    return np.sum(dist_1*np.log(dist_1/dist_2))

In [None]:
print(kl_div(dice_1,dice_2))

In [None]:
def entropy(dist_1):
    a = 0
    for i in dist_1:
        a += -i*np.log(i)
        return a

In [None]:
print(entropy(dice_1))
print(entropy(dice_2))

### Statistics: Relationship between population and samples
- Parameters (unobserved): Expectation and Variance of Population
- Statistics (observed): Mean & Variance of Samples

#### Expectation and Variance of Population (Note that for continuous variables, sigma would be changed to integral.)

- $ E(X)= \mu = \Sigma_{i=1} x_i P(x_i) $

- $ Var(X) = \sigma^2 = E(X^2)-\{E(X)\}^2 = E(X^2) - \mu^2$
    
    where $ E(X^2)=\Sigma_{i=1} {x_i}^2 P(x_i) $


In [None]:
# Expectation & Variance

exp = 0
for idx,i in enumerate(dice_1):
    exp += (idx+1)*i

print('Expectation of normal dice: ', exp)

var = 0
for idx,i in enumerate(dice_1):
    var += ((idx+1)**2)*i
print('Variance of normal dice: ', var-(exp**2))

exp = 0
for idx,i in enumerate(dice_2):
    exp += (idx+1)*i
print('Expectation of abnormal dice: ', exp)

var = 0
for idx,i in enumerate(dice_2):
    var += ((idx+1)**2)*i
print('Variance of abnormal dice: ', var-(exp**2))

In [None]:
# Sampling from dice_1 and dice_2

from matplotlib import pyplot as plt

num = dice_1.shape[0]
x = np.arange(num)
x_axis = list(range(1,num+1))
plt.xticks(x, x_axis)

trials = 100
y_1 = np.random.multinomial(trials, dice_1, size=1)
y_1 = y_1.tolist()
y_1 = y_1[0]

plt.figure(1)
plt.bar(x, y_1)

y_2 = np.random.multinomial(trials, dice_2, size=1)
y_2 = y_2.tolist()
y_2 = y_2[0]

plt.figure(2)
plt.bar(x, y_2, color='orange')

#### Mean and Variance of Samples

- $ \hat{\mu} = \bar{x}=  \frac{\Sigma_{i=1} x_i}{n} $

- $ \hat{\sigma^2}=s^2= \frac{\Sigma_{i=1}^{n} (x_i-\bar{x})^2}{n-1} $

In [None]:
# Mean of Samples

exp = 0

for idx,i in enumerate(y_1):
    print(idx,i)
    exp += (idx+1)*i
print('Expectation of normal dice: ', exp/trials)

exp = 0
for idx,i in enumerate(y_2):
    exp += (idx+1)*i
print('Expectation of abnormal dice: ', exp/trials)

## **Q. Variance of Samples**