probability theory allows us to make uncertain statements and reason
in the presence of uncertainty, information theory allows us to
quantify the amount of uncertainty in a probability distribution

# 3.1 Why Probability?

Machine learning needs to deal with uncertainty & stochasticity.

Frequentist probabilty related directly to the rates at which events
occur. Bayesian probability related to qualitative levels of
certainty.

# 3.2 Random Variables

In [1]:
using Pkg
Pkg.add("Distributions")

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`


[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K

[?25h[32m[1m Resolving[22m[39m

 package versions...


[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`


[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m


In [1]:
# https://juliastats.github.io/Distributions.jl/latest/starting.html
using Random, Distributions
Random.seed!(666)

MersenneTwister(UInt32[0x0000029a], Random.DSFMT.DSFMT_state(Int32[-1829720698, 1073617959, -209863392, 1073342874, 851925708, 1073663428, 929754575, 1073410908, -1469009651, 1073283821  …  -1656730289, 1072845189, 1382744295, 1073272649, -1940720456, -2031062331, 1988518479, 1214015520, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], UInt128[0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000  …  0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x0000000

In [2]:
d = Normal()
x = rand(d, 10)

10-element Array{Float64,1}:
  0.0076177742633394985
  0.05702524809146323  
 -0.5484010117038763   
  0.3515255484089337   
 -0.784929798992992    
  0.24501763591278475  
  0.30611493207760276  
 -1.0304943902767967   
 -0.21629784327601548  
 -2.132446790112065    

In [3]:
rand(Cauchy(1), 10)

10-element Array{Float64,1}:
 -0.4237134662306563 
  1.9982078834292953 
 -6.925610565091822  
 -0.42168711834161665
  0.9555525938474783 
  1.7599938921654399 
  1.7950911590950587 
  2.49918091211192   
 -1.6661992395358336 
 -0.7560595129104921 

In [4]:
# generate a discrete distribution

rand(Binomial(100, 0.25), 10)

10-element Array{Int64,1}:
 18
 25
 22
 18
 27
 21
 27
 29
 21
 24

# 3.3 Probability Distributions

## 3.3.1 Discrete Variables and Probability Mass Functions

A probability distribution over discrete variables may be described
using a probability mass function (PMF)

In [5]:
pdf(Bernoulli(0.2), 0)

0.8

## 3.3.2 Continuous Variables and Probability Density Functions
$\int p(x) \mathrm(d)x = 1

In [6]:
# consider a single discrete random variable x with k different
# states. We can place a uniform distribution on x—that is, make each
# of its states equally likely—by setting its probability mass
# function to Px = 1/k
pdf(Uniform(0,1), 0.5)

1.0

In [7]:
# https://en.wikipedia.org/wiki/Von_Mises_distribution
pdf(VonMises(0, .2), 3)

0.1292701761195721

# 3.4 Marginal Probability

Marginal probability is probility distribution over a subset of variables. 

In [21]:
# Trying to make an example showing the sum rule

# 3.5 Conditional Probability

It is the probability of some event given some other event has happened. 

In [35]:
using Pkg
Pkg.add("SpecialFunctions")

[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`


[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l    





[32m[1m Resolving[22m[39m package versions...


[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`


 [90m [276daf66][39m[92m + SpecialFunctions v0.7.2[39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m


In [37]:
using SpecialFunctions

# example from p66: https://people.smp.uq.edu.au/YoniNazarathy/julia-stats/exploring-julia-a-statistical-primer-DRAFT.pdf
n = 2000

# Let A be the event of a manufacturing failure, assume to dependent on number of dust particals
probAGivenB(k) = 1 - 1 / (k+1)

# This is dust partitcal probility distribution
probB(k) = 6 / (pi * (k + 1)) ^ 2 # 

# now approximate the series until n = 2000
numerical = sum([ probAGivenB(k) * probB(k) for k in 0:n])

analytical = (pi^2 - 6 * zeta(3)) / pi ^ 2

# compare the similared & analytical solution to approximate the probabilty of manufactruing failure
numerical, analytical 

(0.26893337073278945, 0.2692370305985609)

# 3.6 The Chain Rule of Conditional Probabilities
# 3.7 Independence and Conditonal Independence
