Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement wrapper for diagonally-constrained GMM HMMs. #1666

Merged
merged 52 commits into from
Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
404ffe6
Added DiagonalGMM
KimSangYeon-DGU Jan 28, 2019
1bd1b35
Added test cases for DiagonalGMM
KimSangYeon-DGU Jan 28, 2019
890323e
Edited HMMs for DiagonalGMM
KimSangYeon-DGU Jan 28, 2019
a12ccbd
Edited HMM test
KimSangYeon-DGU Jan 28, 2019
7f3b93f
Edited for serialization
KimSangYeon-DGU Feb 5, 2019
0189a20
Edited the codes according to the reviews
KimSangYeon-DGU Feb 5, 2019
301d365
Added Diagonal Covariation Gaussian Distribution class
KimSangYeon-DGU Feb 5, 2019
210a3c6
Added test cases for DiagCovGaussianDistribution
KimSangYeon-DGU Feb 5, 2019
c55f7b1
Fixed the bugs in Random()
KimSangYeon-DGU Feb 5, 2019
ad886a8
Applied DiagCovGaussianDistribution class to DiagonalGMM
KimSangYeon-DGU Feb 6, 2019
1f68ddb
Edit DiagCovGaussianDistribution to use a vector for covariance
KimSangYeon-DGU Feb 16, 2019
b2a4770
Edit DiagonalGMM to compute diagonal covariance efficiently
KimSangYeon-DGU Feb 16, 2019
5eb2c79
Revert changes in EMFit class
KimSangYeon-DGU Feb 16, 2019
f5f3619
Edit to support vector covariance
KimSangYeon-DGU Feb 16, 2019
e383ff8
Add comments about sources of the value
KimSangYeon-DGU Feb 16, 2019
09cff12
Edit for style and static code analysis checks
KimSangYeon-DGU Feb 16, 2019
81e77d8
Edit to support diagonal covariance vector
KimSangYeon-DGU Feb 16, 2019
7d4d071
Change the file name gmm_diag* to diagonal_gmm*
KimSangYeon-DGU Feb 16, 2019
07c95cb
Edit for supporting vector for diagonal covariance
KimSangYeon-DGU Feb 17, 2019
c259671
Edit Estimate() for windows
KimSangYeon-DGU Feb 18, 2019
fc1c0d8
Add some GMM test cases for DiagonalGMM
KimSangYeon-DGU Feb 18, 2019
74fa23f
Fix for style and static code anlysis checks
KimSangYeon-DGU Feb 18, 2019
45c60fb
Edit for faster computation
KimSangYeon-DGU Feb 21, 2019
83aa602
Add some tests for DiagonalGMM HMMs
KimSangYeon-DGU Feb 22, 2019
8633ac9
Fix bug and Use arma::clamp for preventing zero weights
KimSangYeon-DGU Feb 24, 2019
3e52b58
Remove unnecessary header
KimSangYeon-DGU Feb 24, 2019
60784b7
Add a note about changes in HISTORY.md
KimSangYeon-DGU Feb 24, 2019
5e1d0da
Edit for faster performance
KimSangYeon-DGU Feb 24, 2019
2ad4d40
Add header
KimSangYeon-DGU Feb 24, 2019
78ef233
Improve comments on HMM's test cases
KimSangYeon-DGU Feb 28, 2019
433b672
Improve HMM tests for DiagonalGMM
KimSangYeon-DGU Feb 28, 2019
7011853
Change DiagCovGaussianDistribution to DiagonalGaussianDistribution
KimSangYeon-DGU Mar 2, 2019
0557b1a
Add base covariance when clustering initially
KimSangYeon-DGU Mar 2, 2019
9667729
Make DiagonalEMFit class
KimSangYeon-DGU Mar 3, 2019
92d8d4a
Fix style checks
KimSangYeon-DGU Mar 3, 2019
1a039c7
Check Windows build
KimSangYeon-DGU Mar 8, 2019
5527fed
Disable OpenMP on Windows
KimSangYeon-DGU Mar 9, 2019
b7cb1d1
Edit comment in CMakeLists.txt
KimSangYeon-DGU Mar 9, 2019
1bfd448
Edit according to the reviews
KimSangYeon-DGU Mar 9, 2019
ddd9bda
Merge DiagonalEMFit into EMFit
KimSangYeon-DGU Mar 9, 2019
4b967c0
Fix for style checks and Remove manually written RandomSeed in test c…
KimSangYeon-DGU Mar 9, 2019
71ffc34
Edit according to the Marcus's review
KimSangYeon-DGU Mar 13, 2019
57fa8aa
Edit according to the Ryan's review
KimSangYeon-DGU Mar 13, 2019
cdd0179
Apply constraint, change condition, and add warning
KimSangYeon-DGU Mar 14, 2019
53b38ac
Add constraint for diagonal covariance represented as a vector
KimSangYeon-DGU Mar 14, 2019
3a7b1b3
Revert the change
KimSangYeon-DGU Mar 14, 2019
9aa6aef
Remove the parenthesis
KimSangYeon-DGU Mar 14, 2019
bfba93b
Fix a bug in the PredictTest
KimSangYeon-DGU Mar 16, 2019
622954b
Fix ambiguous error
KimSangYeon-DGU Mar 17, 2019
db5dc64
Edit comments and move inline to class itself
KimSangYeon-DGU Mar 23, 2019
073bb9d
Improve and edit comments
KimSangYeon-DGU Apr 4, 2019
b61547f
Edit comment
KimSangYeon-DGU Apr 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -412,8 +412,8 @@ set(MLPACK_LIBRARY_DIRS ${MLPACK_LIBRARY_DIRS} ${Boost_LIBRARY_DIRS})
add_definitions(-DBOOST_TEST_DYN_LINK)

# Detect OpenMP support in a compiler. If the compiler supports OpenMP, flags
# to compile with OpenMP are returned and added and the HAS_OPENMP definition is
# added for compilation.
# to compile with OpenMP are returned and added and the HAS_OPENMP definition
# is added for compilation.
#
# This way we can skip calls to functions defined in omp.h with code like:
# #ifdef HAS_OPENMP
Expand Down
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
### mlpack 3.1.0
###### ????-??-??
* Add DiagonalGaussianDistribution and DiagonalGMM classes to speed up the
diagonal covariance computation and deprecate DiagonalConstraint (#1666).

* Add kernel density estimation (KDE) implementation with bindings to other
languages (#1301).

Expand Down
1 change: 1 addition & 0 deletions src/mlpack/core.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@
#include <mlpack/core/dists/gaussian_distribution.hpp>
#include <mlpack/core/dists/laplace_distribution.hpp>
#include <mlpack/core/dists/gamma_distribution.hpp>
#include <mlpack/core/dists/diagonal_gaussian_distribution.hpp>

// mlpack::backtrace only for linux
#ifdef HAS_BFD_DL
Expand Down
2 changes: 2 additions & 0 deletions src/mlpack/core/dists/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ set(SOURCES
regression_distribution.cpp
gamma_distribution.hpp
gamma_distribution.cpp
diagonal_gaussian_distribution.hpp
diagonal_gaussian_distribution.cpp
)

# add directory name to sources
Expand Down
148 changes: 148 additions & 0 deletions src/mlpack/core/dists/diagonal_gaussian_distribution.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
/**
* @file diagonal_gaussian_distribution.cpp
* @author Kim SangYeon
*
* Implementation of Gaussian distribution class with diagonal covariance.
*
* mlpack is free software; you may redistribute it and/or modify it under the
* terms of the 3-clause BSD license. You should have received a copy of the
* 3-clause BSD license along with mlpack. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#include "diagonal_gaussian_distribution.hpp"
#include <mlpack/methods/gmm/diagonal_constraint.hpp>

using namespace mlpack;
using namespace mlpack::distribution;

DiagonalGaussianDistribution::DiagonalGaussianDistribution(
const arma::vec& mean,
const arma::vec& covariance) :
mean(mean)
{
Covariance(covariance);
}

void DiagonalGaussianDistribution::Covariance(const arma::vec& covariance)
{
this->invCov = 1 / covariance;
this->logDetCov = arma::accu(log(covariance));
this->covariance = covariance;
}

void DiagonalGaussianDistribution::Covariance(arma::vec&& covariance)
{
this->invCov = 1 / covariance;
this->logDetCov = arma::accu(log(covariance));
this->covariance = std::move(covariance);
}

double DiagonalGaussianDistribution::LogProbability(
const arma::vec& observation) const
{
const size_t k = observation.n_elem;
const arma::vec diff = observation - mean;
const arma::vec logExponent = diff.t() * arma::diagmat(invCov) * diff;
return -0.5 * k * log2pi - 0.5 * logDetCov - 0.5 * logExponent(0);
}

void DiagonalGaussianDistribution::LogProbability(
const arma::mat& observations,
arma::vec& logProbabilities) const
{
const size_t k = observations.n_rows;

// Column i of 'diffs' is the difference between observations.col(i) and
// the mean.
arma::mat diffs = observations.each_col() - mean;

// Calculates log of exponent equation in multivariate gaussian
KimSangYeon-DGU marked this conversation as resolved.
Show resolved Hide resolved
// distribution. We use only diagonal part for faster computation.
arma::vec logExponents = -0.5 * arma::trans(diffs % diffs) * invCov;

logProbabilities = -0.5 * k * log2pi - 0.5 * logDetCov + logExponents;
}

arma::vec DiagonalGaussianDistribution::Random() const
{
return (arma::sqrt(covariance) % arma::randn<arma::vec>(mean.n_elem)) + mean;
}

void DiagonalGaussianDistribution::Train(const arma::mat& observations)
{
if (observations.n_cols > 1)
{
covariance.zeros(observations.n_rows);
}
else
{
mean.zeros(0);
covariance.zeros(0);
return;
}

// Calculate and normalize the mean.
mean = arma::sum(observations, 1) / observations.n_cols;

// Now calculate the covariance.
const arma::mat diffs = observations.each_col() - mean;
covariance += arma::sum(diffs % diffs, 1);

// Finish estimating the covariance by normalizing, with the (1 / (n - 1))
// to make the estimator unbiased.
covariance /= (observations.n_cols - 1);
invCov = 1 / covariance;
logDetCov = arma::accu(log(covariance));
}

void DiagonalGaussianDistribution::Train(const arma::mat& observations,
const arma::vec& probabilities)
{
if (observations.n_cols > 0)
{
covariance.zeros(observations.n_rows);
}
else
{
mean.zeros(0);
covariance.zeros(0);
return;
}

// We'll normalize the covariance with (v1 - (v2 / v1))
// for unbiased estimator in the weighted arithmetic mean. The v1 is the sum
// of the weights, and the v2 is the sum of the each weight squared.
// If you want to know more detailed description,
// please refer to https://en.wikipedia.org/wiki/Weighted_arithmetic_mean.
double v1 = arma::accu(probabilities);

// If their sum is 0, there is nothing in this Gaussian.
// At least, set the covariance so that it's invertible.
if (v1 == 0)
{
invCov = 1 / (covariance += 1e-50);
logDetCov = arma::accu(log(covariance));
return;
}

// Normalize the probabilities.
arma::vec normalizedProbs = probabilities / v1;

// Calculate the mean.
mean = observations * normalizedProbs;

// Now calculate the covariance.
const arma::mat diffs = observations.each_col() - mean;
covariance += (diffs % diffs) * normalizedProbs;

// Calculate the sum of each weight squared.
const double v2 = arma::accu(normalizedProbs % normalizedProbs);

// Finish estimating the covariance by normalizing, with
// the (1 / (v1 - (v2 / v1))) to make the estimator unbiased.
if (v2 != 1)
covariance /= (1 - v2);

invCov = 1 / covariance;
logDetCov = arma::accu(log(covariance));
}
156 changes: 156 additions & 0 deletions src/mlpack/core/dists/diagonal_gaussian_distribution.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
/**
* @file diagonal_gaussian_distribution.hpp
* @author Kim SangYeon
*
* Implementation of the Gaussian distribution with diagonal covariance.
*
* mlpack is free software; you may redistribute it and/or modify it under the
* terms of the 3-clause BSD license. You should have received a copy of the
* 3-clause BSD license along with mlpack. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#ifndef MLPACK_CORE_DISTRIBUTIONS_DIAGONAL_GAUSSIAN_DISTRIBUTION_HPP
#define MLPACK_CORE_DISTRIBUTIONS_DIAGONAL_GAUSSIAN_DISTRIBUTION_HPP

#include <mlpack/prereqs.hpp>

namespace mlpack {
namespace distribution {

//! A single multivariate Gaussian distribution with diagonal covariance.
class DiagonalGaussianDistribution
{
private:
//! Mean of the distribution.
arma::vec mean;
//! Diagonal covariance of the distribution.
arma::vec covariance;
//! Cached inverse of covariance.
arma::vec invCov;
//! Cached logdet(cov).
double logDetCov;

//! log(2pi)
static const constexpr double log2pi = 1.83787706640934533908193770912475883;

public:
//! Default constructor, which creates a Gaussian with zero dimension.
DiagonalGaussianDistribution() : logDetCov(0.0) { /* nothing to do. */ }

/**
* Create a Gaussian Distribution with zero mean and diagonal covariance
* with the given dimensionality.
*
* @param dimension Number of dimensions.
*/
DiagonalGaussianDistribution(const size_t dimension) :
KimSangYeon-DGU marked this conversation as resolved.
Show resolved Hide resolved
mean(arma::zeros<arma::vec>(dimension)),
covariance(arma::ones<arma::vec>(dimension)),
invCov(arma::ones<arma::vec>(dimension)),
logDetCov(0)
{ /* Nothing to do. */ }

/**
* Create a Gaussian distribution with the given mean and diagonal
* covariance.
*
* @param mean Mean of distribution.
* @param covariance Covariance of distribution.
*/
DiagonalGaussianDistribution(const arma::vec& mean,
KimSangYeon-DGU marked this conversation as resolved.
Show resolved Hide resolved
const arma::vec& covariance);

//! Return the dimensionality of this distribution.
size_t Dimensionality() const { return mean.n_elem; }

//! Return the probability of the given observation.
double Probability(const arma::vec& observation) const
{
return exp(LogProbability(observation));
}

//! Return the log probability of the given observation.
double LogProbability(const arma::vec& observation) const;

/**
* Calculate the multivariate Gaussian probability density function for each
* data point (column) in the given matrix.
*
* @param x Matrix of observations.
* @param probabilities Output probabilities for each input observation.
*/
void Probability(const arma::mat& x, arma::vec& probabilities) const
{
arma::vec logProbabilities;
LogProbability(x, logProbabilities);
probabilities = arma::exp(logProbabilities);
}

/**
* Calculate the multivariate Gaussian log probability density function for
* each data point (column) in the given matrix.
*
* @param observations Matrix of observations.
* @param probabilities Output log probabilities for each input observation.
*/
void LogProbability(const arma::mat& observations,
arma::vec& logProbabilities) const;

/**
* Return a randomly generated observation according to the probability
* distribution defined by this object.
*
* @return Random observation from this Diagonal Gaussian distribution.
*/
arma::vec Random() const;

/**
* Estimate the Gaussian distribution directly from the given observations.
*
* @param observations Matrix of observations.
*/
void Train(const arma::mat& observations);

/**
* Estimate the Gaussian distribution from the given observations,
* taking into account the probability of each observation actually being
* from this distribution.
*
* @param observations Matrix of observations.
* @param probabilities List of probability of the each observation being
* from this distribution.
*/
void Train(const arma::mat& observations,
const arma::vec& probabilities);

//! Return the mean.
const arma::vec& Mean() const { return mean; }

//! Return a modifiable copy of the mean.
arma::vec& Mean() { return mean; }

//! Return the covariance matrix.
const arma::vec& Covariance() const { return covariance; }

//! Set the covariance matrix.
void Covariance(const arma::vec& covariance);

//! Set the covariance matrix using move assignment.
void Covariance(arma::vec&& covariance);
KimSangYeon-DGU marked this conversation as resolved.
Show resolved Hide resolved

//! Serialize the distribution.
template<typename Archive>
void serialize(Archive& ar, const unsigned int /* version */)
{
// We just need to serialize each of the members.
ar & BOOST_SERIALIZATION_NVP(mean);
ar & BOOST_SERIALIZATION_NVP(covariance);
ar & BOOST_SERIALIZATION_NVP(invCov);
ar & BOOST_SERIALIZATION_NVP(logDetCov);
}
};

} // namespace distribution
} // namespace mlpack

#endif
3 changes: 3 additions & 0 deletions src/mlpack/methods/gmm/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ set(SOURCES
gmm.hpp
gmm.cpp
gmm_impl.hpp
diagonal_gmm.hpp
diagonal_gmm.cpp
diagonal_gmm_impl.hpp
em_fit.hpp
em_fit_impl.hpp
no_constraint.hpp
Expand Down
12 changes: 12 additions & 0 deletions src/mlpack/methods/gmm/diagonal_constraint.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,18 @@ class DiagonalConstraint
covariance = arma::diagmat(arma::clamp(covariance.diag(), 1e-10, DBL_MAX));
}

/**
* Apply the diagonal constraint to the given diagonal covariance matrix
* (which is represented as a vector), and ensure each value on the diagonal
* is at least 1e-10.
*/
static void ApplyConstraint(arma::vec& diagCovariance)
{
// Although the covariance is already diagonal, clamp it to ensure each
// value is at least 1e-10.
diagCovariance = arma::clamp(diagCovariance, 1e-10, DBL_MAX);
}

//! Serialize the constraint (which holds nothing, so, nothing to do).
template<typename Archive>
static void serialize(Archive& /* ar */, const unsigned int /* version */) { }
Expand Down
Loading