EM-algorithm - Academic project

An academic research and implementation of the expectation–maximization algorithm, with Python and R.

To start off, clone the project:

git clone https://github.com/Samashi47/EM-algorithm.git

Then:

cd EM-algorithm

Python implementation

After cloning the project, go to the Python-implementation folder:

cd Python-implementation

Then, create your virutal environment:

Windows

py -3 -m venv .venv

MacOS/Linus

python3 -m venv .venv

And, activate it:

Windows

.venv\Scripts\activate

MacOS/Linus

. .venv/bin/activate

You can run the following command to install the dependencies:

pip3 install -r requirements.txt

To run the code:

Select the kernel in the jupyter notebook in the Python-implementation folder.
Run the cells.

R implementation

Note

Here we suppose that R is fully installed and configured on your computer.

R-markdownn doesn't require any further configuration to run on Rstudio or VSCode, but for a more rich experience on VSCode (live preview, generate HTML, LaTeX and pdf files) you need a TeX distribution, and pandoc on your computer. You can install pandoc from https://pandoc.org/installing.html

To start with the R implementation, you should install the required packages first, go to the R console, then:

install.packages(c("base", "methods", "datasets", "utils", "grDevices", "graphics", "stats", "plyr", "mvtnorm", "ggplot2"))

Then you are ready to run the implementations in the .rmd files chunk by chunk.

Use

To use the implementation, first you got to initiate starting values for the mean, cov, and probabilities.

The mean is a matrix, of dimensions (nbr of wanted clusters, nbr of used columns to generate clusters), of means of every column, for the number of wanted clusters.
The cov is a tensor, of dimensions (nbr of used columns to generate clusters, nbr of used columns to generate clusters, nbr of wanted clusters), of covariance between the datasets columns, for the number of wanted clusters.
The probs is a list, of dimensions (nbr of wanted clusters), of probabilities that a given data point belongs to a cluster.

To do that in code, we first generate a list of means for each column, and a covariance matrix between columns:

library(plyr)

# Create starting values
Mu = daply(iris2, NULL, function(x) colMeans(x)) + runif(4, 0, 0.5)
Cov = dlply(iris2, NULL, function(x) var(x) + diag(runif(4, 0, 0.5)))

column.names <- colnames(iris2)
row.names <- c("Cluster 1", "Cluster 2", "Cluster 3")

Then we create a 2D array of means for the number of wanted clusters with a noise to not have indentical rows, and a tensor of covariance matrices for the number of wanted clusters:

initMu = array(c(Mu[1] + 0.1, Mu[1] + 0.2, Mu[1] + 0.3, Mu[2] + 0.1, Mu[2] + 0.2, Mu[2] + 0.3, Mu[3] + 0.1, Mu[3] + 0.2, Mu[3] + 0.3, Mu[4] + 0.1, Mu[4] + 0.2, Mu[4] + 0.4) , dim = c(3, 4),dimnames = list(row.names,column.names))
initCov <- list('Cluster 1' = Cov[[1]], 'Cluster 2' = Cov[[1]], 'Cluster 3' = Cov[[1]])

For probabilities, we can initiate them manually:

initProbs = c(.1, .2, .7)

Or, randomly:

initProbs = sort(runif(3, min=0.1, max=0.9))

Finally, we encapsulate the initiated params in a variable called initParams:

initParams <- list(mu = initMu, var = initCov, probs = initProbs)

And run the algorithm with:

results = gaussmixEM(params=initParams, X=as.matrix(iris2), clusters = 3, tol=1e-10, maxits=1500, showits=T)
print(results)

References

Martin Haugh. The EM Algorithm. Published 2015. https://www.columbia.edu/~mh2078/MachineLearningORFE/EM_Algorithm.pdf
Henrik Hult. Lecture 8. https://www.math.kth.se/matstat/gru/Statistical%20inference/Lecture8.pdf
Sean Borman. The Expectation Maximization Algorithm, A short tutorial. Published July 18, 2004. https://www.lri.fr/~sebag/COURS/EM_algorithm.pdf
Tengyu Ma. and Andrew Ng. CS229 Lecture notes. Published May 13, 2019. https://cs229.stanford.edu/notes2020spring/cs229-notes8.pdf
Keng B. The Expectation-Maximization Algorithm. Bounded Rationality. Published October 7, 2016. https://bjlkeng.io/posts/the-expectation-maximization-algorithm/

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
EM-Explanation		EM-Explanation
Python-implementation		Python-implementation
R-implementation		R-implementation
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EM-algorithm - Academic project

Python implementation

R implementation

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Samashi47/EM-algorithm

Folders and files

Latest commit

History

Repository files navigation

EM-algorithm - Academic project

Python implementation

R implementation

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages