a

jmschrei · Nov 8, 2016 · 000e0e0 · 000e0e0
1 parent 56c423f
commit 000e0e0
Show file tree

Hide file tree

Showing 2 changed files with 13 additions and 11 deletions.
diff --git a/docs/GeneralMixtureModel.rst b/docs/GeneralMixtureModel.rst
@@ -18,7 +18,7 @@ General Mixture Models can be initialized in two ways depending on if you know t
 	>>> gmm = GeneralMixtureModel([NormalDistribution(5, 2), NormalDistribution(1, 2)], weights=[0.33, 0.67])
 
 
-If you do not know the initial parameters, then the components can be initialized using kmeans++. This algorithm involves picking a point randomly to be the center for the first class, and then randomly selecting a point with weights inversely proportional to the distance to this point for each of the remaining points. kmeans is then run until convergence, and the initial parameters are selected as MLE estimates on the points assigned to that weight. This is done in the following manner:
+If you do not know the initial parameters, then the components can be initialized using kmeans to find initial clusters. Initial parameters for the models are then extracted from the clusters and EM is used to fine tune the model.
 
 .. code-block:: python
 
@@ -30,27 +30,21 @@ This allows any distribution in pomegranate to be natively used in GMMs.
 Log Probability
 ---------------
 
-The probability of a point is the sum of its probability under each of the components, multiplied by the weight of each component c, $P(D|M) = \sum\limits_{c \in M} P(D|c)$. This is easily calculated by summing the probability under each distribution in the mixture model and multiplying by the appropriate weights, and then taking the log.
+The probability of a point is the sum of its probability under each of the components, multiplied by the weight of each component c, :math:`P(D|M) = \sum\limits_{c \in M} P(D|c)`. This is easily calculated by summing the probability under each distribution in the mixture model and multiplying by the appropriate weights, and then taking the log.
 
 Prediction
 ----------
 
-The common prediction tasks involve predicting which component a new point falls under. This is done using Bayes rule to determine 
+The common prediction tasks involve predicting which component a new point falls under. This is done using Bayes rule :math:`P(M|D) = \frac{P(D|M)P(M)}{P(D)}` to determine the posterior probability :math:`P(M|D)` as opposed to simply the likelihood :math:`P(D|M)`. Bayes rule indicates that it isn't simply the likelihood function which makes this prediction but the likelihood function multiplied by the probability that that distribution generated the sample. For example, if you have a distribution which has 100x as many samples fall under it, you would naively think that there is a ~99% chance that any random point would be drawn from it. Your belief would then be updated based on how well the point fit each distribution, but the proportion of points generated by each sample is important as well.
 
-.. math::
-
-	P(M|D)
-
-, not the more likely $P(D|M)$. This means that it is not simply which component gives the highest log probability when producing the point. Bayes Rule is as follows: $P(M|D) = \frac{P(D|M)P(M)}{P(D)}$. Since we're looking for a maximum and $P(D)$ is a constant we can cross that off, and we're left with $P(M|D) = P(D|M)P(M)$. $P(D|M)$ is just the probability of the point under the distribution and $P(M)$ are the prior model weights passed in upon initialization or learned from data. This adds a regularization term, meaning that components with fewer samples corresponding to them are less likely to have given this point its label.
-
-We can get the component label assignments using `model.predict(data)`, which will return an array of indexes corresponding to the maximally likely component. If what we want is the full matrix of $P(M|D)$, then we can use `model.predict_proba(data)`, which will return a matrix with each row being a sample, each column being a component, and each cell being the probability that that model generated that data. If we want log probabilities instead we can use `model.predict_log_proba(data)` instead.
+We can get the component label assignments using ``model.predict(data)``, which will return an array of indexes corresponding to the maximally likely component. If what we want is the full matrix of :math:`P(M|D)`, then we can use ``model.predict_proba(data)``, which will return a matrix with each row being a sample, each column being a component, and each cell being the probability that that model generated that data. If we want log probabilities instead we can use ``model.predict_log_proba(data)`` instead.
 
 Fitting
 -------
 
 Training GMMs faces the classic chicken-and-egg problem that most unsupervised learning algorithms face. If we knew which component a sample belonged to, we could use MLE estimates to update the component. And if we knew the parameters of the components we could predict which sample belonged to which component. This problem is solved using expectation-maximization, which iterates between the two until convergence. In essence, an initialization point is chosen which usually is not a very good start, but through successive iteration steps, the parameters converge to a good ending.
 
-These models are fit using `model.fit(data)`. A maximimum number of iterations can be specified as well as a stopping threshold for the improvement ratio. See the API reference for full documentation.
+These models are fit using ``model.fit(data)``. A maximimum number of iterations can be specified as well as a stopping threshold for the improvement ratio. See the API reference for full documentation.
 
 
 API Reference

diff --git a/docs/conf.py b/docs/conf.py
@@ -15,8 +15,16 @@
 import sys
 import os
 import subprocess
+import mock
+
+MOCK_MODULES = ['joblib', 'networkx', 'scipy', 'scipy.special']
+for mod in MOCK_MODULES:
+  sys.modules[mod] = mock.Mock()
 
 subprocess.call('pip install numpydoc', shell=True)
+subprocess.call('pip install pomegranate', shell=True)
+
+import pomegranate
 
 # If extensions (or modules to document with autodoc) are in another directory,
 # add these directories to sys.path here. If the directory is relative to the