Skip to content

ekinakyurek/DPMM.jl

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DPMM.jl

This repository is a research work on parallel dirichlet process mixture models and clustering on Julia by Ekin Akyürek with supervision of John W. Fischer III.

Getting Started

Demo:

  gm = GridMixture(2)
  X, clabels = rand_with_label(gm,100000)
  fit(X; ncpu=3) # runs parallel split-merge algorithm

Visual Demo (requires OpenGL) :

  gm = GridMixture(2)
  X, clabels = rand_with_label(gm,100000)
  scene = setup_scene(X)
  fit(X; ncpu=3, scene=scene) # visualize parallel split-merge algorithm

For details please see the function documentation

Technical Report

Algorithms

  1. Collapsed Gibbs Sampler
labels = fit(X; algorithm=CollapsedAlgorithm) # serial collapsed
  1. Quasi-Collapsed Gibbs Sampler
labels = fit(X; algorithm=CollapsedAlgorithm, quasi=true) # quasi & serial collapsed
labels = fit(X; algorithm=CollapsedAlgorithm, quasi=true, ncpu=4) # quasi & parallel collapsed
  1. Direct Gibbs Sampler
labels = fit(X; algorithm=DirectAlgorithm) # direct
labels = fit(X; algorithm=DirectAlgorithm ncpu=4) # parallel direct
  1. Quasi-Direct Gibbs Sampler
labels = fit(X; algorithm=DirectAlgorithm, quasi=true) # quasi direct gibbs algorithm
labels = fit(X; algorithm=DirectAlgorithm, quasi=true, ncpu=4) # quasi & parallel direct gibbs direct gibbs
  1. Split-Merge Gibbs Sampler
labels = fit(X; algorithm=SplitMergeAlgorithm) # split-merge
labels = fit(X; algorithm=SplitMergeAlgorithm, ncpu=4) # parallel split-merge

Parallel Benchmarking

Run below command:

julia --project test/parallel_benchmark.jl  --N 1000000 --K 6 --Kinit 1 --ncpu 4
  • Results-I: Time (sec) to run 100 DP-GMM iterations for d=2, N=1e6, K=6.
Code ncpu=1 ncpu=2 ncpu=4 ncpu=8
C++ 76.94 40.57 22.23 13.01
DPMM.jl 75.71 41.54 20.86 12.77
Julia-BNP 1101.97 572.50 345.58 172.30
  • Results-II: Time (sec) to run 100 DP-MNMM iterations for d=100, N=1e6, K=6.
Code ncpu=1 ncpu=2 ncpu=4 ncpu=8
C++ 134.25 77.55 40.97 23.60
DPMM.jl 113.131 68.46 45.55 30.79
Julia-BNP 234.40 136.43 87.34 55.10

About

Clustering via Dirichlet Process Mixture Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published