# Learning Gaussian Mixture Models with RPCircuits

In [1]:
using RPCircuits, Random, Distributions

Random.seed!(42)

TaskLocalRNG()

First, we create a gaussian with `mean = 0.3` and `variance = 1.0` using the `Distributions` package.

In [2]:
gmm = MixtureModel(Normal[
   Normal(-2.0, 1.2),
   Normal(0.0, 1.0),
   Normal(3.0, 2.5)], [0.2, 0.5, 0.3])

MixtureModel{Normal}(K = 3)
components[1] (prior = 0.2000): Normal{Float64}(μ=-2.0, σ=1.2)
components[2] (prior = 0.5000): Normal{Float64}(μ=0.0, σ=1.0)
components[3] (prior = 0.3000): Normal{Float64}(μ=3.0, σ=2.5)


Then, we generate a dataset `D` with `N` samples of the previous distribution.

In [3]:
N = 100_000

samples = rand(gmm, N)

D = reshape(samples, length(samples), 1)

100000×1 Matrix{Float64}:
 -0.873793482209626
  2.818545329885325
  1.4386832757114134
 -1.175371133217587
  0.9523284631212727
 -2.4491221222621453
  0.20092508863895137
 -1.5308390628427373
  0.6489473612718025
  0.4860050240607374
 -2.247935737377431
  0.10477079178892111
  1.3110173557334994
  ⋮
  0.8594637451805943
  1.3584033067137333
  0.7605620037727235
 -1.2762137902064086
  2.679433979161
 -1.3598523470083919
  5.188015849234176
  4.3694021361772934
  4.6825452918826915
 -0.544051909308209
 -2.169728584197887
 -0.3504598914029526

Using `RPCircuits`, we create a `Gaussian Node` `G` with the same `mean` and `variance` as the previous gaussian distribution. Then, we apply the `NLL` function to see the Negative Log-Likelihood of `G` w.r.t. `D`.

In [4]:
G1, G2, G3 = Gaussian(1, -2.0, 1.2), Gaussian(1, 0.0, 1.0), Gaussian(1, 3.0, 2.5)
S = Sum([G1, G2, G3], [0.2, 0.5, 0.3])
println("Original model NLL = ", NLL(S, D))

Original model NLL = 2.2722570690152324


Now, we create an arbitraty `Gaussian Node`that has both `mean` and `variance` different from the distribution `gauss`. Then, we apply the `EM` algorithm to learn a better distribution.

In [5]:
G1em, G2em, G3em = Gaussian(1, -1.0, 1.0), Gaussian(1, 0.0, 1.0), Gaussian(1, 1.0, 1.0)
Sem = Sum([G1em, G2em, G3em], [1/3, 1/3, 1/3])

Lem = SEM(Sem; gauss=true)

println("EM initial NLL = ", NLL(Sem, D))

for i = 1:50
    update(Lem, D; learngaussians=true, verbose=false)
end

println("EM final NLL = ", NLL(Sem, D))

println("Sem = $Sem")
println("G1em = $G1em")
println("G2em = $G2em")
println("G3em = $G3em")

EM initial NLL = 3.3272542282069226
EM final NLL = 2.2201693104383793
Sem = + 1 0.3336793753362203 2 0.3477707524950897 3 0.31854987216868996
G1em = gaussian 1 -1.34337243854077 1.9032462644722008
G2em = gaussian 1 0.14718316991449149 0.8082957002863795
G3em = gaussian 1 2.8565047848037755 6.603907557624629


Similarly to the example above, we create a `Gaussian Node` with both `mean` and `variance` differente from the distribution `gauss`. However, we apply the `Gradient Descent` algorithm in the learning process.

In [None]:
G1grad, G2grad, G3grad = Gaussian(1, -1.0, 1.0), Gaussian(1, 0.0, 1.0), Gaussian(1, 1.0, 1.0)
# Incialize weigths close to zero
w = rand(truncated(Normal(0, 0.1), 0.1, Inf), 3)
Sgrad = Sum([G1grad, G2grad, G3grad], w)

Lgrad = Gradient(Sgrad, gauss=true)

println("Grad initial NLL = ", NLL(Sgrad, D))

for i = 1:1_000
    update(Lgrad, D; learningrate=0.01, learngaussians=true, verbose=false)
end

#norm_V = Vector{Float64}(undef, length(Sgrad))
# TODO remover isso e normalizar circuito
#norm_const = RPCircuits.log_norm_const!(norm_V,Lgrad.circ.C)

RPCircuits.normalize_circuit!(Sgrad; gauss=true)

println("Grad final NLL = ", NLL(Sgrad, D))
println("Sgrad = $Sgrad")
println("G1grad = $G1grad")
println("G2grad = $G2grad")
println("G3grad = $G3grad")

In [None]:
lo, hi = quantile.(gauss, [0.01, 0.99])
d = Normal(params[1][1], params[1][2])
min, max = quantile.(d, [0.01, 0.99])
if lo < min min = lo end
if hi > max max = hi end
x = range(min, max; length = 1_000)
@pgf Axis(Plot({thick, blue }, Table(x, pdf.(d, x))))