add dp_mixgauss related functions #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

murphyk merged 2 commits into probml:main from xinglong-li:main

Jun 17, 2022

Contributor

xinglong-li commented Jun 7, 2022 •

edited

Loading

add gauss_inv_wishart_utils.py, for gaussian inverse Wishart distribution;
add multivariate_t_utils.py computing log_pdf and log_prob_of_pos_predict for multivariate T distribution;
add gibbs_finite_mix_gauss_utils.py implementing Gibbs sampling for finite Gaussian mixture model;
add dp_mixgauss_utils.py implementing the forward simulation of DP mixture model and clustering analysis using DP mixture model


          add dp_mixgauss related functions

c6d48c6

Contributor Author

xinglong-li commented Jun 7, 2022 •

edited

Loading

resolves probml/pyprobml#863

murphyk requested changes

View reviewed changes

probml_utils/multivariate_t_utils.py

+                  """
+                  Evaluating the logarithm of probability of the posterior predictive multivariate T distribution.
+                  The likelihood of the observation given the parameter is Gaussian distribution.
+                  The prior distribution is Normal Inverse Wishart (NIW) with parameters given by hyper_params.

Member

murphyk Jun 7, 2022

Please add a comment with all the math details spelled out (in latex form).

probml_utils/dp_mixgauss_utils.py

+                  key: jax.random.PRNGKey
+                      Seed of initial random cluster
+                  --------------------------------------------
+                  * array(N):

Member

murphyk Jun 7, 2022

Specify that these are return parameters, and give the variable names (eg Z: array(N): ...)

probml_utils/dp_mixgauss_utils.py

		from multivariate_t_utils import log_predic_t


		def dp_mixture_simu(N, alpha, H, key):

Member

murphyk Jun 7, 2022

Rename to dp_mixgauss_ancestral_sample

probml_utils/dp_mixgauss_utils.py

+                      Number of samples to be generated from the mixture model
+                  alpha: float
+                      Concentration parameter of the Dirichlet process
+                  H: object of NormalInverseWishart

Member

murphyk Jun 7, 2022

It is better to avoid short. ambiguous variable names. Replace H with niw_prior.

probml_utils/dp_mixgauss_utils.py

+                  Z = jnp.full(N, 0)
+                  # Sample cluster assignment from the Chinese restaurant process prior
+                  CR = []
+                  for i in range(N):

Member

murphyk Jun 7, 2022

A fun (optional!) exercise would be to figure out how to vectorize this (eg with lax.scan). Might be tricky because the shapes need to be of fixed size. I think you could pre-allocate CR to a fixed sized vector and then use a binary mask to select the 'valid' prefix.

probml_utils/dp_mixgauss_utils.py

		return Z, jnp.array(X), Mu, Sigma


		def dp_cluster(T, X, alpha, hyper_params, key):

Member

murphyk Jun 7, 2022

Give this function a more descriptive name, eg dp_mixgauss_gibbs_sample

probml_utils/dp_mixgauss_utils.py

+                  new_label = 1
+                  for t in range(T):
+                      # Update the cluster assignment for every observation
+                      for i in range(n):

Member

murphyk Jun 7, 2022

Can this be vectorized?

probml_utils/gibbs_finite_mixgauss_utils.py

		from multivariate_t_utils import log_predic_t


		def gibbs_gmm(T, X, alpha, K, hyper_params, key):

Member

murphyk Jun 7, 2022

Rename this mixgauss_gibbs_sample.

Contributor Author

xinglong-li Jun 7, 2022

Thank you very much for these detailed comments. I'll try to fix these issues and take care in my future code.
I did find it hard to vectorize the DP mixture model since the sizes of many arrays are not fixed. In fact, even for the finite mixture model, the size of arrays of cluster assignment is not fixed.
As you mentioned, one possible solution is to pre-allocate a large enough vector and then use binary musks, and we might gain time efficiency by sacrificing some space. I'm just a little bit concerned if this approach is still viable when the data is huge.

murphyk mentioned this pull request

plot for fig 30.4 and 30.6 probml/pyprobml#901

Merged

coveralls commented Jun 7, 2022

Pull Request Test Coverage Report for Build 2452176098

0 of 151 (0.0%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.4%) to 11.526%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
probml_utils/gibbs_finite_mixgauss_utils.py	0	25	0.0%
probml_utils/multivariate_t_utils.py	0	29	0.0%
probml_utils/gauss_inv_wishart_utils.py	0	46	0.0%
probml_utils/dp_mixgauss_utils.py	0	51	0.0%

Totals
Change from base Build 2419970093:	-0.4%
Covered Lines:	466
Relevant Lines:	4043

💛 - Coveralls


          add the truncated DP mixture gaussian related functions

a04ce70

murphyk merged commit fde1ce4 into probml:main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet