# Probability Distributions, Gaussian and Uniform
stough 202-

In the [matplotlib tutorial](./matplotlib_tutorial.ipynb) we saw how to produce a histogram of color distribution in an image. These histograms represent the [probability distribution](https://en.wikipedia.org/wiki/Probability_distribution) on color; that is, the probability of a particular color in an image corresponds to the height of the histogram at that color (given the histogram of that image). 

In this demo we'll look at Uniform and Gaussian probability distributions, and what corresponding "random" images look like.

- [Uniform](#uniform)
- [Gaussian](#gaussian)

## Imports
We'll produce random numbers using numpy's updated random-number-generation techniques. Read more about it [here](https://numpy.org/doc/stable/reference/random/index.html).

In [None]:
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np

# For importing from alternative directory sources
import sys  
sys.path.insert(0, '../dip_utils')

import matrix_utils
from vis_utils import vis_rgb_cube

In [None]:
from numpy.random import default_rng
rng = default_rng() # could add parameter "seed="

In [None]:
rng

At this point the object `rng` is our random number generator, and it has been seeded (with the current time for example) to produce different random sequences each run through this notebook. You could also seed the default random number generator to ensure the same random sequence (good for debugging). 

<a id='uniform'></a>
## Uniform Random Distribution 
When you think of the word "random," you probably mean that any outcome of the "random" process is equally likely. Let's say we're generating integers between 0 and 255 (inclusive). We would expect over many samples that no number is a lot more likely than any other. 

In [None]:
X = rng.integers(0, high=256, size=2**20)

In [None]:
plt.figure(figsize=(6,3))
unif_counts, _, _ = plt.hist(X, bins = np.arange(257));

In [None]:
matrix_utils.arr_info(X)

In [None]:
X.mean()

In the above we generate about 1M samples in the range [0,255]. In fact we generate $2^{20} = 1048576$ elements. Since these random numbers are split among $256 = 2^{8}$ possibilites, that means on average we should get $2^{12} = 4096$ hits for each output integer. What makes the randomness *uniform* is that most of the bins have about the same number of outcomes, hovering near 4K as expected.  

- Little complication in generating the plot: [`plt.hist`](https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.hist.html) expects the `bins` argument to include both left end of the first bin *and* the right end of the last bin. Since [`arange`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html) provides the sequence $[0, N)$ exclusive, we call it with 257 to make sure we get all the way to $[0,...,256]$.

Now let's generate an image of uniform random colors

In [None]:
I_uniform = np.stack([rng.integers(0,256,2**14).reshape(2**7,2**7) for i in range(3)], axis=2)

In [None]:
plt.figure(figsize=(3,3))
plt.imshow(I_uniform);

In [None]:
matrix_utils.arr_info(I_uniform.ravel())

The above is what a truly random image looks like. 

- Is it anyhing like the images we ourselves capture, communicate to each other, or attempt to study? 
- What is missing from it?
- What does this say about the kinds of images that we actually care about?

**Viewing in the Color Cube Space**

If any possible color is equally likely to occur, as in the image above, what would that look like in the color cube visualization we saw in the [matplotlib_tutorial](./matplotlib_tutorial.ipynb#3D-Plotting)? Think about it, then execute the next cell.

In [None]:
vis_rgb_cube(I_uniform)

<a id='gaussian'></a>
## Gaussian Random Distribution

There is a different kind of random that is more practically interesting in the world: the [Gaussian, or Normal, Distribution](https://en.wikipedia.org/wiki/Normal_distribution), also called the Bell Curve. Many phenomena we deal with every day often [closely follow the normal distribution](https://www.mathsisfun.com/data/standard-normal-distribution.html), including heights, measurement errors, blood pressure, test scores. Here we'll take a look a couple of Guassian-distributed samples.

As an aside, most everything that we try to model with a Gaussian is better modelled with some sort of skew normal, [weibull](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.weibull.html), or [pareto](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.pareto.html). These distributions have a heavier tail on one side than the other, common when there is some physical limit on the measurement. This is a complication we'll concern ourselves with only when relevant.

In [None]:
X = rng.standard_normal(100000)

In [None]:
plt.figure(figsize=(6,3))
plt.hist(X, bins=100);

In [None]:
X.mean(), X.var()

Above we sample from a [`standard_normal`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.standard_normal.html), which you can see has a mean of 0 and a variance (average square distance from the mean) of 1, symbolized with $\mathcal{N}(\mu = 0,\sigma^{2} = 1)$. We can reposition this kind of distribution anywhere we want by 

- adding a constant, to change the mean: $C + \mathcal{N}(\mu,\sigma^{2}) \sim \mathcal{N}(C + \mu,\sigma^{2})$
- multiplying by a constant, to change the variance: $K*\mathcal{N}(\mu,\sigma^{2}) \sim \mathcal{N}(\mu,K^2\sigma^{2})$

Using the above rules, we could make a Gaussian distributed image in the range $[0,255]$. We can also use numpy's [`normal`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html) to do the same thing.

In [None]:
# Similar idea using our rules.
# X_im = (128 + 30*X).clip(0,255) 
I_normal = np.stack([rng.normal(128,30,2**14).reshape(2**7,2**7) for i in range(3)], axis=2)
I_normal = np.uint8(I_normal).clip(0,255)

In [None]:
plt.figure(figsize=(6,3))
norm_counts, _, _ = plt.hist(I_normal.ravel(), np.arange(257));

In [None]:
f, ax = plt.subplots(1,2, figsize=(6,3), sharex=True, sharey=True)
ax[0].imshow(I_uniform)
ax[0].set_title('Uniform Random')
ax[1].imshow(I_normal)
ax[1].set_title('Gauss Random');

[a.axes.get_xaxis().set_visible(False) for a in ax];
[a.axes.get_yaxis().set_visible(False) for a in ax];
plt.tight_layout()

In the above cells, we plot the distribution of the Gaussian normal image (`I_normal`) and then show both the uniform (`I_uniform`) and normal images side by side. 

You should immediately notice that the "Gauss Random" image is much more bland, or muted in its tones, than the "Uniform Random" image. This kind of difference is related to [*contrast*](https://en.wikipedia.org/wiki/Contrast_(vision)). Contrast measures how distinguishable different image parts are from one another. In `I_normal`, most pixel intensities in red, green, and blue hover around 128. This results in a middling gray tone to the whole image, with neighboring pixels only mildly distinguishable from one antoher, especially compared to `I_uniform`.

**Color Cube Again**

So what would `I_normal` look like in the color cube space, with more pixels hovering in the middle of their component ranges rather than equally all over the place? Think about it, then execute the next cell.

In [None]:
vis_rgb_cube(I_normal)

### Plotting the two distributions together
A little bit of matplotlib-ary to show the two distributions together without y-axis.

In [None]:
#Normalize the heights.
norm_counts = norm_counts/norm_counts.sum()
unif_counts = unif_counts/unif_counts.sum()

In [None]:
f, ax1 = plt.subplots(figsize=(7,3))
ax1.bar(np.arange(256), unif_counts, alpha=.5)

plt.bar(np.arange(256), norm_counts, alpha=.5);
ax1.get_yaxis().set_visible(False)
plt.tight_layout()