## Clustering and collaborative filtering (via clustering) algorithms

- [Importable source code (most up-to-date version)](https://github.com/sylvaticus/lmlj.jl/blob/master/src/clusters.jl) - [Julia Package](https://github.com/sylvaticus/lmlj.jl)
- [Demonstrative static notebook](https://github.com/sylvaticus/lmlj.jl/blob/master/notebooks/clusters.ipynb)
- [Demonstrative live notebook](https://mybinder.org/v2/gh/sylvaticus/lmlj.jl/master?filepath=notebooks%2Fclusters.ipynb) (temporary personal online computational environment on myBinder) - it can takes minutes to start with!
- Theory based on [MITx 6.86x - Machine Learning with Python: from Linear Models to Deep Learning](https://github.com/sylvaticus/MITx_6.86x) ([Unit 4](https://github.com/sylvaticus/MITx_6.86x/blob/master/Unit%2004%20-%20Unsupervised%20Learning/Unit%2004%20-%20Unsupervised%20Learning.md))
- New to Julia? [A concise Julia tutorial](https://github.com/sylvaticus/juliatutorial) - [Julia Quick Syntax Reference book](https://julia-book.com)

In [1]:
using LinearAlgebra
using Random
using Distributions
using Statistics
using DelimitedFiles

In [2]:
K = 3
X = [1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4]

9×2 Array{Float64,2}:
 1.0  10.5
 1.5  10.8
 1.8   8.0
 1.7  15.0
 3.2  40.0
 3.6  32.0
 3.3  38.0
 5.1  -2.3
 5.2  -2.4

In [3]:
"""
  initRepresentatives(X,K;initStrategy,Z₀))

Initialisate the representatives for a K-Mean or K-Medoids algorithm

# Parameters:
* `X`: a (N x D) data to clusterise
* `K`: Number of cluster wonted
* `initStrategy`: Wheter to select the initial representative vectors:
  * `random`: randomly in the X space
  * `grid`: using a grid approach [default]
  * `shuffle`: selecting randomly within the available points
  * `given`: using a provided set of initial representatives provided in the `Z₀` parameter
 * `Z₀`: Provided (K x D) matrix of initial representatives (used only together with the `given` initStrategy) [default: `nothing`]

# Returns:
* A (K x D) matrix of initial representatives

# Example:
```julia
julia> Z₀ = initRepresentatives([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.6 38],2,initStrategy="given",Z₀=[1.7 15; 3.6 40])
```
"""
function initRepresentatives(X,K;initStrategy="grid",Z₀=nothing)
    (N,D) = size(X)
    # Random choice of initial representative vectors (any point, not just in X!)
    minX = minimum(X,dims=1)
    maxX = maximum(X,dims=1)
    Z = zeros(K,D)
    if initStrategy == "random"
        for i in 1:K
            for j in 1:D
                Z[i,j] = rand(Uniform(minX[j],maxX[j]))
            end
        end
    elseif initStrategy == "grid"
        for d in 1:D
                Z[:,d] = collect(range(minX[d], stop=maxX[d], length=K))
        end
    elseif initStrategy == "given"
        if isnothing(Z₀) error("With the `given` strategy you need to provide the initial set of representatives in the Z₀ parameter.") end
        for d in 1:D
                Z = Z₀
        end
    elseif initStrategy == "shuffle"
        for d in 1:D
            zIdx = shuffle(1:size(X)[1])[1:K]
            Z = X[zIdx, :]
        end
    else
        error("initStrategy \"$initStrategy\" not implemented")
    end
    return Z
end


initRepresentatives

In [4]:
Z₀ = initRepresentatives([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.6 38],2,initStrategy="grid")

2×2 Array{Float64,2}:
 1.0   8.0
 3.6  40.0

In [5]:
# Basic K-Means Algorithm (Lecture/segment 13.7 of https://www.edx.org/course/machine-learning-with-python-from-linear-models-to)

"""
  kmeans(X,K,initStrategy)

Compute K-Mean algorithm to identify K clusters of X using Euclidean distance

# Parameters:
* `X`: a (N x D) data to clusterise
* `K`: Number of cluster wonted
* `initStrategy`: Wheter to select the initial representative vectors:
  * `random`: randomly in the X space
  * `grid`: using a grid approach [default]
  * `shuffle`: selecting randomly within the available points
  * `given`: using a provided set of initial representatives provided in the `Z₀` parameter
 * `Z₀`: Provided (K x D) matrix of initial representatives (used only together with the `given` initStrategy) [default: `nothing`]

# Returns:
* A tuple of two items, the first one being a vector of size N of ids of the clusters associated to each point and the second one the (K x D) matrix of representatives

# Notes:
* Some returned clusters could be empty

# Example:
```julia
julia> (clIdx,Z) = kmeans([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3)
```
"""
function kmeans(X,K;initStrategy="grid",Z₀=nothing)
    (N,D) = size(X)
    # Random choice of initial representative vectors (any point, not just in X!)
    minX = minimum(X,dims=1)
    maxX = maximum(X,dims=1)
    Z₀ = initRepresentatives(X,K,initStrategy=initStrategy,Z₀=Z₀)
    Z  = Z₀
    cIdx_prev = zeros(Int64,N)

    # Looping
    while true
        # Determining the constituency of each cluster
        cIdx      = zeros(Int64,N)
        for (i,x) in enumerate(eachrow(X))
            cost = Inf
            for (j,z) in enumerate(eachrow(Z))
               if (norm(x-z)^2  < cost)
                   cost    =  norm(x-z)^2
                   cIdx[i] = j
               end
            end
        end

        # Determining the new representative by each cluster
        #for (j,z) in enumerate(eachrow(Z))
        for j in  1:K
            Cⱼ = X[cIdx .== j,:] # Selecting the constituency by boolean selection
            Z[j,:] = sum(Cⱼ,dims=1) ./ size(Cⱼ)[1]
        end

        # Checking termination condition: clusters didn't move any more
        if cIdx == cIdx_prev
            return (cIdx,Z)
        else
            cIdx_prev = cIdx
        end

    end
end

kmeans

In [6]:
(clIdx,Z) = kmeans([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3)

([2, 2, 2, 2, 3, 3, 3, 1, 1], [5.15 -2.3499999999999996; 1.5 11.075; 3.366666666666667 36.666666666666664])

In [7]:
# Basic K-Medoids Algorithm (Lecture/segment 14.3 of https://www.edx.org/course/machine-learning-with-python-from-linear-models-to)

"""Square Euclidean distance"""
square_euclidean(x,y) = norm(x-y)^2

"""Cosine distance"""
cos_distance(x,y) = dot(x,y)/(norm(x)*norm(y))

cos_distance

In [8]:
"""
  kmedoids(X,K;dist,initStrategy,Z₀)

Compute K-Medoids algorithm to identify K clusters of X using distance definition `dist`

# Parameters:
* `X`: a (n x d) data to clusterise
* `K`: Number of cluster wonted
* `dist`: Function to employ as distance (must accept two vectors). Default to squared Euclidean.
* `initStrategy`: Wheter to select the initial representative vectors:
  * `random`: randomly in the X space
  * `grid`: using a grid approach
  * `shuffle`: selecting randomly within the available points [default]
  * `given`: using a provided set of initial representatives provided in the `Z₀` parameter
 * `Z₀`: Provided (K x D) matrix of initial representatives (used only together with the `given` initStrategy) [default: `nothing`]

# Returns:
* A tuple of two items, the first one being a vector of size N of ids of the clusters associated to each point and the second one the (K x D) matrix of representatives

# Notes:
* Some returned clusters could be empty

# Example:
```julia
julia> (clIdx,Z) = kmedoids([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3,dist = (x,y) -> norm(x-y)^2,initStrategy="grid")
```
"""
function kmedoids(X,K;dist=(x,y) -> norm(x-y)^2,initStrategy="shuffle",Z₀=nothing)
    (n,d) = size(X)
    # Random choice of initial representative vectors
    Z₀ = initRepresentatives(X,K,initStrategy=initStrategy,Z₀=Z₀)
    Z = Z₀
    cIdx_prev = zeros(Int64,n)

    # Looping
    while true
        # Determining the constituency of each cluster
        cIdx      = zeros(Int64,n)
        for (i,x) in enumerate(eachrow(X))
            cost = Inf
            for (j,z) in enumerate(eachrow(Z))
               if (dist(x,z) < cost)
                   cost =  dist(x,z)
                   cIdx[i] = j
               end
            end
        end

        # Determining the new representative by each cluster (within the points member)
        #for (j,z) in enumerate(eachrow(Z))
        for j in  1:K
            Cⱼ = X[cIdx .== j,:] # Selecting the constituency by boolean selection
            nⱼ = size(Cⱼ)[1]     # Size of the cluster
            if nⱼ == 0 continue end # empty continuency. Let's not do anything. Stil in the next batch other representatives could move away and points could enter this cluster
            bestCost = Inf
            bestCIdx = 0
            for cIdx in 1:nⱼ      # candidate index
                 candidateCost = 0.0
                 for tIdx in 1:nⱼ # target index
                     candidateCost += dist(Cⱼ[cIdx,:],Cⱼ[tIdx,:])
                 end
                 if candidateCost < bestCost
                     bestCost = candidateCost
                     bestCIdx = cIdx
                 end
            end
            Z[j,:] = reshape(Cⱼ[bestCIdx,:],1,d)
        end

        # Checking termination condition: clusters didn't move any more
        if cIdx == cIdx_prev
            return (cIdx,Z)
        else
            cIdx_prev = cIdx
        end

    end
end

kmedoids

In [9]:
(clIdx,Z) = kmedoids([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3,dist = (x,y) -> norm(x-y)^2,initStrategy="grid")

([2, 2, 2, 2, 3, 3, 3, 1, 1], [5.1 -2.3; 1.5 10.8; 3.3 38.0])

In [10]:
# The EM algorithm (Lecture/segment 16.5 of https://www.edx.org/course/machine-learning-with-python-from-linear-models-to)

""" log-PDF of a multidimensional normal with no covariance and shared variance across dimensions"""
logNormalFixedSd(x,μ,σ²) = - (length(x)/2) * log(2π*σ²)  -  norm(x-μ)^2/(2σ²)

""" LogSumExp for efficiently computing log(sum(exp.(x))) """
myLSE(x) = maximum(x)+log(sum(exp.(x .- maximum(x))))

"""Transform an Array{T,1} in an Array{T,2} and leave unchanged Array{T,2}."""
make_matrix(x::Array) = ndims(x) == 1 ? reshape(x, (size(x)...,1)) : x


make_matrix

In [11]:
"""
  em(X,K;p₀,μ₀,σ²₀,tol,msgStep,minVariance,missingValue)

Compute Expectation-Maximisation algorithm to identify K clusters of X data assuming a Gaussian Mixture probabilistic Model.

X can contain missing values in some or all of its dimensions. In such case the learning is done only with the available data.
Implemented in the log-domain for better numerical accuracy with many dimensions.

# Parameters:
* `X`  :          A (n x d) data to clusterise
* `K`  :          Number of cluster wanted
* `p₀` :          Initial probabilities of the categorical distribution (K x 1) [default: `nothing`]
* `μ₀` :          Initial means (K x d) of the Gaussian [default: `nothing`]
* `σ²₀`:          Initial variance of the gaussian (K x 1). We assume here that the gaussian has the same variance across all the dimensions [default: `nothing`]
* `tol`:          Tolerance to stop the algorithm [default: 10^(-6)]
* `msgStep` :     Iterations between update messages. Use 0 for no updates [default: 10]
* `minVariance`:  Minimum variance for the mixtures [default: 0.25]
* `missingValue`: Value to be considered as missing in the X [default: `missing`]

# Returns:
* A named touple of:
  * `pⱼₓ`: Matrix of size (N x K) of the probabilities of each point i to belong to cluster j
  * `pⱼ` : Probabilities of the categorical distribution (K x 1)
  * `μ`  : Means (K x d) of the Gaussian
  * `σ²` : Variance of the gaussian (K x 1). We assume here that the gaussian has the same variance across all the dimensions
  * `ϵ`  : Vector of the discrepancy (matrix norm) between pⱼₓ and the lagged pⱼₓ at each iteration
  * `lL` : The log-likelihood (without considering the last mixture optimisation)
  * `BIC` : The Bayesian Information Criterion

# Example:
```julia
julia> clusters = em([1 10.5;1.5 0; 1.8 8; 1.7 15; 3.2 40; 0 0; 3.3 38; 0 -2.3; 5.2 -2.4],3,msgStep=1,missingValue=0)
```
"""
function em(X,K;p₀=nothing,μ₀=nothing,σ²₀=nothing,tol=10^(-6),msgStep=10,minVariance=0.25,missingValue=missing)
    # debug:
    #X = [1 10.5;1.5 0; 1.8 8; 1.7 15; 3.2 40; 0 0; 3.3 38; 0 -2.3; 5.2 -2.4]
    #K = 3
    #p₀=nothing; μ₀=nothing; σ²₀=nothing; tol=0.0001; msgStep=1; minVariance=0.25; missingValue = 0

    X     = make_matrix(X)
    (N,D) = size(X)

    # Initialisation of the parameters if not provided
    minX = fill(-Inf,D)
    maxX = fill(Inf,D)
    varX_byD = fill(0,D)
    for d in 1:D
      @inbounds minX[d]  = minimum(skipmissing(X[:,d]))
      @inbounds maxX[d]  = maximum(skipmissing(X[:,d]))
      varX_byD = max(minVariance, var(skipmissing(X[:,d])))
    end
    varX = mean(varX_byD)/K^2

    pⱼ = isnothing(p₀) ? fill(1/K,K) : p₀
    if !isnothing(μ₀)
        μ₀  = make_matrix(μ₀)
        μ = μ₀
    else
        μ = zeros(Float64,K,D)
        for d in 1:D
                μ[:,d] = collect(range(minX[d], stop=maxX[d], length=K))
        end
    end
    σ² = isnothing(σ²₀) ? fill(varX,K) : σ²₀
    pⱼₓ = zeros(Float64,N,K) # The posteriors, i.e. the prob that item n belong to cluster k
    ϵ = Float64[]

    # Checking dimensions only once (but adding then inbounds doesn't change anything. Still good
    # to provide a nice informative message)
    if size(pⱼ) != (K,) || size(μ) != (K,D) || size(σ²) != (K,)
        error("Error in the dimensions of the inputs. Please check them.")
    end

    # finding empty/non_empty values
    XMask = ismissing(missingValue) ?  .! ismissing.(X)  : (X .!= missingValue)
    XdimCount = sum(XMask, dims=2)

    lL = -Inf
    while(true)
        oldlL = lL
        # E Step: assigning the posterior prob p(j|xi) and computing the log-Likelihood of the parameters given the set of data
        # (this last one for informative purposes and terminating the algorithm)
        pⱼₓlagged = copy(pⱼₓ)
        logpⱼₓ = log.(pⱼₓ)
        lL = 0
        for n in 1:N
            if any(XMask[n,:]) # if at least one true
                Xu = X[n,XMask[n,:]]
                logpx = myLSE([log(pⱼ[k] + 1e-16) + logNormalFixedSd(Xu,μ[k,XMask[n,:]],σ²[k]) for k in 1:K])
                lL += logpx
                #px = sum([pⱼ[k]*normalFixedSd(Xu,μ[k,XMask[n,:]],σ²[k]) for k in 1:K])
                for k in 1:K
                    logpⱼₓ[n,k] = log(pⱼ[k] + 1e-16)+logNormalFixedSd(Xu,μ[k,XMask[n,:]],σ²[k])-logpx
                end
            else
                logpⱼₓ[n,:] = log.(pⱼ)
            end
        end
        pⱼₓ = exp.(logpⱼₓ)

        push!(ϵ,norm(pⱼₓlagged - pⱼₓ))

        # M step: find parameters that maximise the likelihood
        nⱼ = sum(pⱼₓ,dims=1)'
        n  = sum(nⱼ)
        pⱼ = nⱼ ./ n

        #μ  = (pⱼₓ' * X) ./ nⱼ
        for d in 1:D
            for k in 1:K
                nᵢⱼ = sum(pⱼₓ[XMask[:,d],k])
                if nᵢⱼ > 1
                    μ[k,d] = sum(pⱼₓ[XMask[:,d],k] .* X[XMask[:,d],d])/nᵢⱼ
                end
            end
        end

        #σ² = [sum([pⱼₓ[n,j] * norm(X[n,:]-μ[j,:])^2 for n in 1:N]) for j in 1:K ] ./ (nⱼ .* D)
        for k in 1:K
            den = dot(XdimCount,pⱼₓ[:,k])
            nom = 0.0
            for n in 1:N
                if any(XMask[n,:])
                    nom += pⱼₓ[n,k] * norm(X[n,XMask[n,:]]-μ[k,XMask[n,:]])^2
                end
            end
            if(den> 0 && (nom/den) > minVariance)
                σ²[k] = nom/den
            else
                σ²[k] = minVariance
            end
        end

        # Information. Note the likelihood is whitout accounting for the new mu, sigma
        if msgStep != 0 && (length(ϵ) % msgStep == 0 || length(ϵ) == 1)
            println("Iter. $(length(ϵ)):\tVar. of the post  $(ϵ[end]) \t  Log-likelihood $(lL)")
        end

        # Closing conditions. Note that the logLikelihood is those without considering the new mu,sigma
        if (lL - oldlL) <= (tol * abs(lL))
            npar = K * D + K + (K-1)
            BIC  = lL - (1/2) * npar * log(N)
        #if (ϵ[end] < tol)
           return (pⱼₓ=pⱼₓ,pⱼ=pⱼ,μ=μ,σ²=σ²,ϵ=ϵ,lL=lL,BIC=BIC)
        end
    end # end while loop
end # end function

em

In [12]:
clusters = em([1 10.5;1.5 10.8; 1.8 8; 1.7 15; 3.2 40; 3.6 32; 3.3 38; 5.1 -2.3; 5.2 -2.4],3,msgStep=1)
clusters.pⱼₓ

Iter. 1:	Var. of the post  2.7798403823788407 	  Log-likelihood -62.16435618142972
Iter. 2:	Var. of the post  0.5606080950482362 	  Log-likelihood -51.82785452710985
Iter. 3:	Var. of the post  0.3047407377931759 	  Log-likelihood -47.21642564372429
Iter. 4:	Var. of the post  0.003227835533034755 	  Log-likelihood -40.26621217189606
Iter. 5:	Var. of the post  5.609004006879284e-16 	  Log-likelihood -40.26558370139746
Iter. 6:	Var. of the post  1.7801324102294862e-27 	  Log-likelihood -40.26558370139746


9×3 Array{Float64,2}:
 2.90709e-158  1.0          6.00753e-27
 1.10092e-161  1.0          2.5648e-26 
 4.57484e-102  1.0          2.31584e-31
 1.12191e-270  1.0          9.14102e-18
 0.0           8.60227e-57  1.0        
 0.0           1.7555e-29   1.0        
 0.0           1.33625e-49  1.0        
 1.0           1.59504e-14  6.0099e-59 
 1.0           9.36404e-15  2.97135e-59

In [13]:
cd(@__DIR__)
K = [1,12]
seeds = [0,1,2,3,4]

5-element Array{Int64,1}:
 0
 1
 2
 3
 4

In [14]:
# Test data
baseDir = "assets/netflix/toy_data/"
X = readdlm(joinpath(baseDir,"toy_data.txt"))
for k in K
    ulL = -Inf
    bestSeed = -1
    bestOut = nothing
    for s in seeds
        println("[INFO] Working with (k,seed) = ($(k), $(s))")
        μ₀ = readdlm(joinpath(baseDir,"init_mu_$(k)_$(s).csv"), ' ')
        σ²₀ = dropdims(readdlm(joinpath(baseDir,"init_var_$(k)_$(s).csv"), ' '),dims=2)
        p₀ = dropdims(readdlm(joinpath(baseDir,"init_p_$(k)_$(s).csv"), ' '),dims=2)
        emOut = em(X,k;p₀=p₀,μ₀=μ₀,σ²₀=σ²₀,msgStep=0,missingValue=0)
        lL  = emOut.lL
        if lL > ulL
            ulL = lL
            bestSeed = s
            bestOut = emOut
        end
    end
    println("Upper logLikelihood with $(k) clusters: $(ulL) (seed $(bestSeed))")
end

[INFO] Working with (k,seed) = (1, 0)
[INFO] Working with (k,seed) = (1, 1)
[INFO] Working with (k,seed) = (1, 2)
[INFO] Working with (k,seed) = (1, 3)
[INFO] Working with (k,seed) = (1, 4)
Upper logLikelihood with 1 clusters: -1307.2234317600933 (seed 0)
[INFO] Working with (k,seed) = (12, 0)
[INFO] Working with (k,seed) = (12, 1)
[INFO] Working with (k,seed) = (12, 2)
[INFO] Working with (k,seed) = (12, 3)
[INFO] Working with (k,seed) = (12, 4)
Upper logLikelihood with 12 clusters: -1118.6190434326675 (seed 2)


In [15]:
# Full NetFlix dataset
baseDir = "assets/netflix/full/"
X = convert(Array{Int64,2},readdlm(joinpath(baseDir,"netflix_incomplete.txt")))
for k in K
    ulL = -Inf
    bestSeed = -1
    bestOut = nothing
    for s in seeds
        println("[INFO] Working with (k,seed) = ($(k), $(s))")
        μ₀  = readdlm(joinpath(baseDir,"init_mu_$(k)_$(s).csv"), ' ')
        σ²₀ = dropdims(readdlm(joinpath(baseDir,"init_var_$(k)_$(s).csv"), ' '),dims=2)
        p₀  = dropdims(readdlm(joinpath(baseDir,"init_p_$(k)_$(s).csv"), ' '),dims=2)
        emOut = em(X,k;p₀=p₀,μ₀=μ₀,σ²₀=σ²₀,msgStep=0,missingValue=0)
        lL  = emOut.lL
        if lL > ulL
            ulL = lL
            bestSeed = s
            bestOut = emOut
        end
    end
    println("Upper logLikelihood with $(k) clusters: $(ulL) (seed $(bestSeed))")
end

[INFO] Working with (k,seed) = (1, 0)
[INFO] Working with (k,seed) = (1, 1)
[INFO] Working with (k,seed) = (1, 2)
[INFO] Working with (k,seed) = (1, 3)
[INFO] Working with (k,seed) = (1, 4)
Upper logLikelihood with 1 clusters: -1.5210609539852452e6 (seed 0)
[INFO] Working with (k,seed) = (12, 0)
[INFO] Working with (k,seed) = (12, 1)
[INFO] Working with (k,seed) = (12, 2)
[INFO] Working with (k,seed) = (12, 3)
[INFO] Working with (k,seed) = (12, 4)
Upper logLikelihood with 12 clusters: -1.3902809991574623e6 (seed 1)


In [16]:
"""
  collFilteringGMM(X,K;p₀,μ₀,σ²₀,tol,msgStep,minVariance,missingValue)

Fill missing entries in a sparse matrix assuming an underlying Gaussian Mixture probabilistic Model (GMM) and implementing
an Expectation-Maximisation algorithm.

Implemented in the log-domain for better numerical accuracy with many dimensions.

# Parameters:
* `X`  :          A (N x D) sparse matrix of data to fill according to a GMM model
* `K`  :          Number of mixtures desired
* `p₀` :          Initial probabilities of the categorical distribution (K x 1) [default: `nothing`]
* `μ₀` :          Initial means (K x D) of the Gaussian [default: `nothing`]
* `σ²₀`:          Initial variance of the gaussian (K x 1). We assume here that the gaussian has the same variance across all the dimensions [default: `nothing`]
* `tol`:          Tolerance to stop the algorithm [default: 10^(-6)]
* `msgStep` :     Iterations between update messages. Use 0 for no updates [default: 10]
* `minVariance`:  Minimum variance for the mixtures [default: 0.25]
* `missingValue`: Value to be considered as missing in the X [default: `missing`]

# Returns:
* A named touple of:
  * `̂X̂`    : The Filled Matrix of size (N x D)
  * `nFill`: The number of items filled
  * `lL`   : The log-likelihood (without considering the last mixture optimisation)
  * `BIC`  : The Bayesian Information Criterion

# Example:
```julia
julia>  cFOut = collFilteringGMM([1 10.5;1.5 0; 1.8 8; 1.7 15; 3.2 40; 0 0; 3.3 38; 0 -2.3; 5.2 -2.4],3,msgStep=1,missingValue=0)
```
"""
function collFilteringGMM(X,K;p₀=nothing,μ₀=nothing,σ²₀=nothing,tol=10^(-6),msgStep=10,minVariance=0.25,missingValue=missing)
    emOut = em(X,K;p₀=p₀,μ₀=μ₀,σ²₀=σ²₀,tol=tol,msgStep=msgStep,minVariance=minVariance,missingValue=missingValue)
    (N,D) = size(X)
    XMask = ismissing(missingValue) ?  .! ismissing.(X)  : (X .!= missingValue)
    nFill = (N * D) - sum(XMask)
    X̂ = copy(X)

    for n in 1:N
        for d in 1:D
            if !XMask[n,d]
                 X̂[n,d] = dot(emOut.μ[:,d],emOut.pⱼₓ[n,:])
            end
        end
    end
    return (X̂=X̂,nFill=nFill,lL=emOut.lL,BIC=emOut.BIC)
end

collFilteringGMM

In [17]:
X = [1 10.5;1.5 0; 1.8 8; 1.7 15; 3.2 40; 0 0; 3.3 38; 0 -2.3; 5.2 -2.4]

9×2 Array{Float64,2}:
 1.0  10.5
 1.5   0.0
 1.8   8.0
 1.7  15.0
 3.2  40.0
 0.0   0.0
 3.3  38.0
 0.0  -2.3
 5.2  -2.4

In [18]:
cFOut = collFilteringGMM(X,3,msgStep=1,missingValue=0)
cFOut.X̂

Iter. 1:	Var. of the post  2.61937747932065 	  Log-likelihood -47.59140596017498
Iter. 2:	Var. of the post  0.5226030386857065 	  Log-likelihood -34.55184066668723
Iter. 3:	Var. of the post  0.3500981768393402 	  Log-likelihood -32.92185047653772
Iter. 4:	Var. of the post  0.32940171779360017 	  Log-likelihood -30.01085600946215
Iter. 5:	Var. of the post  0.05092179105118827 	  Log-likelihood -27.686896657600293
Iter. 6:	Var. of the post  0.01144416282455234 	  Log-likelihood -27.681990100476558
Iter. 7:	Var. of the post  0.004605091358874689 	  Log-likelihood -27.681832719530703
Iter. 8:	Var. of the post  0.0022110716618263934 	  Log-likelihood -27.68179603140188
Iter. 9:	Var. of the post  0.0010765120575048945 	  Log-likelihood -27.68178722759999


9×2 Array{Float64,2}:
 1.0     10.5   
 1.5     14.1779
 1.8      8.0   
 1.7     15.0   
 3.2     40.0   
 2.8627  15.1255
 3.3     38.0   
 5.2     -2.3   
 5.2     -2.4   