Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MARKOV: More efficient implementation for computing stationary distributions #79

Merged
merged 10 commits into from
Oct 29, 2014

Conversation

oyamad
Copy link
Member

@oyamad oyamad commented Sep 10, 2014

Hi,
related to #18, I wrote some code that implements an algorithm specialized in computing stationary distributions (instead of using a general routine to solve the eigenvalue problem), called the Grassmann-Taksar-Heyman (GTH) algorithm (see for example these slides, pages 46-81). It is more straightforward than the current procedure, and it turned out that it is accurate enough to give (almost) correct answers even for KMR matrices (without mpmath).

stochmatrix.py:

  • gth_solve solves for a nonzero solution to x (P - I) = 0 by the GTH algorithm for an irreducible stochastic matrix P.
  • For a possibly reducible matrix P, StochMatrix decomposes P into irreducible components (where scipy.sparse.csgraph.connected_components is used to find communication classes).
  • stationary_dists passes these irreducible components to gth_solve to obtain all the stationary distributions, one for each irreducible component. (They are contained in the output array as its rows.)

mc_tools.py:

  • mc_compute_stationary is replaced with the above stationary_dists.
  • Attributes/methods of StochMatrix are made available in DMarkov, such as comm_classes and rec_classes.
  • In passing, I changed the names of methods of DMarkov from "mc_compute_stationary" and "mc_sample_path" to "compute_stationary" and "simulate".

Please see stochmatrix_demo01.ipynb and dmarkov_demo01.ipynb for some more details.

I also prepared test_stochmatrix.py (and edited test_mc_tools.py).

There are some issues/questions:

  1. I am not very experienced in python (and in programming in general). In particular, in StochMatrix I use decorators, but I am not sure if they are properly used, as well as if my choices between attribute and method in implementing each functionality are appropriate.
  2. stationary_dists is slower than the current mc_compute_stationary, but I hope it is acceptable.
  3. In DMarkov:
    1. The argument pi_0 is not used. Is it for any future use?
    2. It has an attribute stationary_dists in addition to the method to compute stationary distributions. Is it only for the __repr__ method?
    3. It checks if the rows of the input matrix sum exactly to one. I am afraid it is too strict, and it might be better to allow some tolerance.
  4. Test is missing for mc_sample_path. I could add one.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.23%) when pulling 71551e2 on oyamad:mc_tools into 3d00d7a on jstac:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) when pulling fef5769 on oyamad:mc_tools into 3d00d7a on jstac:master.

@cc7768
Copy link
Member

cc7768 commented Sep 11, 2014

@oyamad Thanks for the PR. I don't think we've had a chance look through this carefully yet as some of us have just started the school year, but will try and get to this ASAP

With respect to your questions:

  1. When I get a chance, I will take a closer look. Full disclosure: I'm not very well versed in decorators either, but will do some reading and make sure everything is in order.
  2. We would like to keep things as fast as possible, but perhaps the generality of the algorithm outweighs the performance penalty. Another possible option is to do the same kind of thing that we are currently doing, where it first tries to use the unit eigenvectors and if there is a negative value or other difficulty then it falls back on the mpmath eigenvalues with increased precision. I suspect the GTH algorithm is faster than the calculations with mpmath, but will know more when I get a minute to go through and use the code.
  3. DMarkov
  4. I initially used it because I intended to use pi_0 as a default argument to the mc_sample_path method within the DMarkov class. It hasn't happened yet, and I don't know that it is particularly advantageous.
  5. Computing the stationary distributions isn't particularly time intensive, but once we have computed them, there is no reason to compute them again so the plan was to store them within the class. I should tell the method mc_compute_stationary to only compute them if they aren't already computed (and if they are computed then just return them). Thanks for pointing this out.
  6. You are exactly right. This should also be changed.
    1. We should get a test for this. If you would like to write it, that would be great. Otherwise, one of us can do it in the next couple weeks.

-- Couldn't help myself. I just glanced through the notebook and this looks very cool. It did seem like we should be able to leverage the fact that we know that we have a unit eigenvalue and this looks like an answer to that.

@jstac
Copy link
Contributor

jstac commented Sep 11, 2014

This looks very promising. I think the test of irreducibility followed by the specialist algorithm is the way to go. I had a different test of irreducibility in mind but this one looks better.

Speed is something of an issue because computing stationary distributions falls inside loops of certain equilibrium problems --- but accuracy is probably more important. I'm going to run some tests on all of this in a couple of days.

@jstac
Copy link
Contributor

jstac commented Sep 16, 2014

General Thoughts

I've been playing around with the new tools supplied by @oyamad in this PR and I think they're excellent. The code is very well written. I particularly like the following:

  • StochMatrix is a subclass of np.ndarray so we can apply all the usual array methods
  • We can decompose the state space into recurrent classes and match stationary distributions to those classes
  • We can immediately detect whether the process is irreducible and the underlying algorithms (from scipy) are efficient
  • The GTH algorithm seems robust and precise, and also reasonably fast. The speed difference with the earlier implementation is not large.

Requests for Comments 1

It's difficult to understand the difference between "communicating" and "recurrent" classes from the code and documentation. Moreover definitions change slightly from source to source so it would be helpful to be precise here. Evidently we are only interested in communicating classes that are maximal in some sense. For example, if every element of P is strictly positive then any nonempty subset of the state space consists of points that communicate with each other. But we only care about the maximal ergodic set, which is the whole state space. Presumably this is what's meant by a "recurrent class" in the code. But then what is a communicating class?

I'm away from my books right now but my understanding is that for a finite Markov chain there's always a decomposition of the state space into a finite number of nonempty maximal recurrent/ergodic sets. (In fact there's a theorem in Stokey, Lucas and Prescott to this effect.) If I'm not mistaken, these exactly correspond to the "recurrent classes" in @oyamad 's code. So my suggestion is that

#. We adopt the notation and terminology of Stokey et al. (since it's well known in econ), and cite their theorem for the decomposition.

#. We compute only these maximal classes and get rid of the "communicating classes" (or explain the difference very clearly).

Requests for Comments 2

Some reorganization would be helpful to accommodate this code. In particular, it seems to me that we don't really need both DMarkov and StochMatrix --- particularly if we add some kind of stationary distributions attribute to the latter. At the same time, we should be mindful of trying not to break backwards compatability. Hence functions should not be renamed or relocated unless necessary.

What do you think about the following:

#. The class DMarkov is eliminated.

#. The function stationary_dists in stochmatrix.py becomes a method of StochMatrix, and the set of stationary distributions becomes an attribute (that is only populated when the attribute is referenced)

#. The functions mc_compute_stationary and mc_sample_path remain in mc_tools. A call to mc_compute_stationary just queries the attribute mentioned in the previous dot point (after converting an ndarray or array_like object to a StochMatrix if necessary).

Further Suggestions

#. The name StochMatrix sounds a bit awkward. I suggest renaming it either StochasticMatrix or MarkovMatrix or MarkovChain instead.

#. It should perhaps be mentioned somewhere in the docs that we don't return all the stationary distributions, since that would be impossible. We return only the extremal/ergodic distributions, one for each recurrent class.

@oyamad
Copy link
Member Author

oyamad commented Sep 16, 2014

@jstac Thank you very much for your comments.

Communication Classes/Recurrent Classes

I will consult Stokey-Lucas tomorrow (I don't have it with me now).
Here are the definitions I followed:

  1. Given a nonnegative matrix P (or more generally a Metzler matrix, a matrix whose off-diagonal entries are nonnegative), let us define the binary relation "->" on the set of indices (states) as follows:
    i -> j if P_{ij}^k > 0 for some k = 1, 2, ... or if j = i. (Corrected 9/17)
    Note that by definition, -> is reflexive (i -> i for all i) and transitive.
  2. Next define "<->" by
    i <-> j if i -> j and j -> i.
    Note that <-> is an equivalent relation.
    A communication class is defined to an equivalent class for <->.
    (So maximality is already there in communication class.)
    A communication class is called a strongly connected component in graph theory.
    scipy.sparse.csgraph.connected_components with connection='strong' finds the communication classes.
    P is irreducible if it has only one communication class.
  3. A state i is recurrent if i -> j implies j -> i.
    A state is transient if it is not recurrent.
    Fact: For any i, j contained in a communication class, i is recurrent if and only if j is recurrent.
    Therefore, recurrence is a property of a communication class.
    Thus: a communication class is a recurrent class if it contains a recurrent state.
  4. The algorithm in my code (_find_rec_classes) to detect whether a communication class is a recurrent class:
    Define the binary relation "=>" on the quotient set of <-> (the set of communication classes) by [i] => [j] if i' -> j' for some i' in [i] and j' in [j] and if [i] not equal [j], where [i] is the equivalent class that contains i.
    Then [i] is a recurrent class if and only if there is no [j] such that [i] => [j].
  5. For example, if all the entries of P are strictly positive, then the whole state space is a communication class as well as a recurrent class. (More generally, if there is only one communication class, then it is a recurrent class.)
    As another example, consider the matrix [[1, 0], [0.5 0.5]]. This has two communication classes, {0} and {1}, and {0} is the only recurrent class.
  6. Maybe there is almost no loss in getting rid of the communication classes. The only information lost is how transient states are classified into different communication classes, which is not very important.
    On the other hand, many math books seem to talk about communication classes, mentioning the relationship between a Markov chain (or more generally a nonnegative matrix) and the directed graph it defines.

Other Requests/Suggestions

  1. Regarding the Requests for Comments 2, I agree. Once other lead developers agree I will reorganize the code.
  2. Regarding the name, I would keep "Matrix" rather than MarkovChain as it is subclassing np.ndarray. Both of StochasticMatrix and MarkovMatrix are fine.
    In passing, the code actually also works for Markov transition rate matrices (or generator matrices) of continuous time Markov chains.
  3. You right, I have been imprecise, it is not all stationary distributions that are returned...

Some More

It could be too much, but I have some more code which computes the period of an irreducible stochastic matrix (and thus detects whether it is periodic or aperiodic). It uses scipy.sparse.csgraph.breadth_first_order.
It is to some extent useful, since it detects whether an irreducible stochastic matrix is uniformly ergodic (where uniform ergodicity is discussed in quant-econ.net); a stochastic matrix is uniformly ergodic if and only if it is irreducible and aperiodic.
I will prepare an ipython notebook to illustrate the point.

@jstac
Copy link
Contributor

jstac commented Sep 16, 2014

@oyamad These explanations are 100% clear. So my suggestion now is that we keep both communication classes and recurrence classes and add the explanation above to the docstring, including the examples. (If it seems long it can also go into the docstring at the top of the module --- either is fine.)

In addition, it's very useful to know if the Markov matrix is aperiodic since, as you say, it means that we have uniform ergodicity and therefore convergence of the marginal distribution of X_t from every initial condition. I think it would be great to have that code as well.

@cc7768 Please let us know your thoughts.

@cc7768
Copy link
Member

cc7768 commented Sep 17, 2014

@jstac @oyamad

Lots of info to process. I will try and make my response as organized as possible (The summarized version is that I think this will be a good change):

  1. Speed wise. I don't think this is an issue at all. The difference is relatively small, and I think you undersold your addition in the speed category because for larger matrices GTH seems to be faster. Furthermore, your algorithm seems accessible to other speed up tools such as cython and numba (all very basic functions and loops).

  2. Subclass of np.array. This is a very good idea. I think it feels natural to treat the StochasticMatrix just like we would an array, but allow some additional information/structure.

  3. At one point in time for the DMarkov class (in a personal version), I kept track of the ergodic sets and believe that it is information that is worth having. Theory/DocString wise, I found your comment that explained the communication/recurrent classes very clear. I think it could be too much to include in the doc string for the subclass (but it would obviously be useful to reference some theorems and provide some basic information), but I definitely think the explanation that you gave should be included in its entirety at the top of the file.

  4. On backwards compatibility. It might take some work up front to keep things updated, but I don't think we should concern ourselves too much with backwards compatibility yet. That being said, I think that mc_sample_path and mc_compute_stationary are probably fine to use to query the new functions though.

  5. DMarkov provides no added utility, I am in favor or getting rid of it.

  6. When we update things, just worth observing, stationary_dists from stochmatrix.py returns the stationary distributions as rows instead of columns. I don't think this will make much of a difference, but it is probably better practice since numpy stores arrays as row major. Just be aware of this when we update code.

  7. With regard to the name, I am in favor of StochasticMatrix of MarkovMatrix. If I had to choose between the two, I'd lean towards MarkovMatrix. Full disclosure: I am a really bad variable/function namer, so take my opinion with a grain of salt.

That is more or less what I think right now. If I think of something else, I'll add it.

@sglyon
Copy link
Member

sglyon commented Sep 17, 2014

Quick comment about subclassing np.ndarray.

Suppose we have Q and P are both StochasticMatrices. What does Q + P mean? The rows will not sum to one any more, so this will by definition not be a StocasticMatrix anymore.

How about Q.std()? well this will return the standard deviation of the elements in Q, but again, that doesn't make sense for us. A more natural (though still probably not what we want) interpretation of std would be the asymptotic standard deviation approximated by the sample standard deviation from a long simulation.

I see the appeal of having things like Q.shape and Q.size be well defined, but for me there are far fewer of those types of functions (i.e. ones that would be well defined on this type) than there are methods of np.ndarray that just don't make any sense.

Any thoughts or comments about this?

@oyamad
Copy link
Member Author

oyamad commented Sep 17, 2014

@spencerlyon2 Very good point. I didn't think about it.

Actually, finding the communication/recurrent classes as well as computing the period are all purely graph theoretic operations. So another possible implementation strategy is to implement these functions in a class, say, Digraph (directed graph), and keeping DMakorv alive, put the other operations, such as calling the methods of Digraph to obtain the recurrent classes and passing them to gth_solve, in DMarkov. (The Digraph class might be of independent use in the future, for models that involve networks.)

(By the way, Networkx has these graph theoretic functions, but it doesn't seem to use functions from scipy such as scipy.sparse.csgraph.connected_components. I guess, although I haven't checked, the code that uses those scipy functions is faster than Networkx.)

@jstac
Copy link
Contributor

jstac commented Sep 17, 2014

@oyamad I like this suggestion of keeping DMarkov alive and setting up a Digraph class that computes communication classes and so forth. As you say, such methods might also be useful for network models --- which would be good to include --- and Digraph would be a good place to park additional graph related functionality as it comes up.

On top of that, the distinction between DMarkov and Digraph is then clear, so there's no problem having both. The only thing I would say on that point is that I think DMarkov should be renamed to something a little clearer. MarkovMatrix seems a popular choice.

Finally, while I was enthusiastic about subclassing ndarrays before, I found Spencer's argument quite convincing. A related point is that if you query the methods associated with a StochMatrix you get a huge list of methods, many of which, as Spencer pointed out, are not relevant. This makes it harder to find the methods that we do care about --- those written by Daisuke.

Do we all have agreement on the way forward now?

@cc7768
Copy link
Member

cc7768 commented Sep 17, 2014

Good point about not needing to subclass it as an array; I liked the idea that it would just be an array with some additional information, but I agree with @spencerlyon2 @jstac that it adds too many things that won't be applicable.

@albop
Copy link
Contributor

albop commented Sep 17, 2014

Very interesting thread !

It also appears to me that subclassing ndarray introduces many unpredictable operations for the few ones that would be useful. One option would be to have a is_stochastic() method that returns True or False depending on whether all rows sum to 1. Then you would have (P+Q).is_stochastic()==False but ( (1-lam)_P+lam_P).is_stochastic()==True ). There is also this intimidating page here (http://docs.scipy.org/doc/numpy/user/basics.subclassing.html) but I not sure that a lot is relevant for us here (maybe horizontal slices)

About the use of properties, there could be one simple rule of thumb: if you call mmat.ergodic_dist() you can expect the computation to take time but if you call mmat.ergodic_dist it is either an attribute or a property that is only computed once with the result cached. In this particular case, I assume that the stochastic matrix would be immutable, since there aren't small modifications that preserve the markov property (not P[0,0]=0.5 for instance). Then it would make sense to either cache the properties, or to simply compute all of them the first time the matrix is created.

On the terminology aspect, I was wondering whether we could use StochasticMatrix for a matrix whose rows sum to 1 and something like DMarkovChain for the conjunction of a StochasticMatrix, and a list of nodes. This would essentially be the object we would get as a result of the discretization routines.

@albop
Copy link
Contributor

albop commented Sep 17, 2014

Somewhat related: we still lack experience here, but it is becoming apparent that objects used for computational codes should be isomorphic to simple data structures. We are not even sure that we can optimize object methods as well as an (unbound) function using numba. In my view that would be a good reason for avoiding any complication due to object inheritance.

@jstac
Copy link
Contributor

jstac commented Sep 18, 2014

@albop Many thanks for these thoughts. One interesting piont is on the use of Numba. As you say, Numba seems happier working with less nested structures. On the other hand there are probably ways around this ex post. For example, we can probably outsource computationally intensive tasks from methods to stand alone functions later on if we really need to (i.e., method remains but hot loops are outsourced).

@oyamad I really liked the path you suggested above, particularly the idea of separating graph theoretic and Markov centric operations, and I think you can take this discussion as advice, to which free disposal applies.

@oyamad
Copy link
Member Author

oyamad commented Oct 6, 2014

I finally updated the previous PR.

Graph theoretic operations are implemented in DiGraph. DMarkov is kept and renamed to MarkovChain. Now, DiGraph is a graph theoretic object, and MarkovChain is a probability theoretic object.

  • Deleted stochmatrix.py.
  • Created graph_tools.py and gth_solve.py.
  • Modified mc_tools.py.

graph_tools.py:

  • DiGraph replaces the previous StochMatrix class. It specializes in graph theoretic operations.
  • Routines regarding periodicity are added (as attributes), which are designed to work only for strongly connected digraphs.

gth_solve.py:

  • Contains the function gth_solve.

mc_tools.py:

  • the definitions are explained as the docstring at the top of the file.
  • MarkovChain creates an instance of DiGraph and then calls its attributes. MarkovChain.stationary_distributions passes the recurrent classes to gth_solve to obtained stationary distributions one for each recurrent class.
  • mc_compute_stationary remains, which simply calls MarkovChain(P).stationary_distributions.

Issues:

  • All the functionalities, except MarkovChain.simulate, are implemented as attributes; they are actually methods, decorated with @property.
  • In the present version, the period and cyclic classes are defined only for strongly connected digraphs/irreducible Markov chains. If the user has a non-strongly connected digraph/non-irreducible chain (a NotImplementedError is raised if any of is_aperiodic, period, and cyclic_classes is called), then it is left as the user's responsibility to decompose it into strongly connected components/communication classes. If there is any well-accepted definition, I could extend these to general digraphs/chains.
  • Test for mc_sample_path is still missing.

Please see digraph_demo01.ipynb and markovchain_demo01.ipynb for some more details.

@jstac
Copy link
Contributor

jstac commented Oct 8, 2014

@oyamad I think this looks really great. I'm delighted to get this kind of functionality into quantecon.

I want to spend a bit more time with this and I'd like to get another set of eyes to look it over before we merge. I'll be in contact about that very soon.

@albop
Copy link
Contributor

albop commented Oct 15, 2014

This is very interesting. I can have a look at it later this week if you want.

@sanguineturtle
Copy link
Contributor

@oyamad You have done a great job with this. Reading through the PR thread - @jstac comments are excellent - I also agree with @spencerlyon2 re: inheritance. I initially wanted to inherit a pandas DataFrame in my own work - and I decided having control of the base level object is a good thing (and used self.data = pd.DataFrame instead). It also means that if ndarray names something in the future that is the same as one of methods then it will have a name collision. The good news is that it would get overwritten by the QuantEcon method when inheriting down stream.

There are a few minor things I would probably change - which are largely stylistic - like:

  1. all classes inherit (object) unless inheriting a different class. This just enforces that all of Python 2 considers it as the latest type of python class. But not really a big deal.
  2. I see the use of _ This is useful for hiding those attributes of the class. But do any of the attributes need to be protected (such as an internal state) that shouldn't be externally settable? (then __ can be useful - but also impacts on the readability of code. It also then requires setter methods if we do want to interact with them externally)

I do have two questions for the group:

  1. DiGraph - this is the same class name as used by networkx. Do we see QuantEcon using networkx down the road? This might be confusing down the track - having said that it is an excellent name for a Directed Graph Class.
  2. If we see the network infrastructure as important to be implemented locally within QuantEcon (and there are some good reasons for this) - perhaps we should start a subpackage for holding network classes and related infrastructure? In this case the class is essentially a wrapper around scipy tools - which is a good thing to do as it can allow the use of economics-specific attributes rather than more generic graph theoretic terms.

@oyamad
Copy link
Member Author

oyamad commented Oct 28, 2014

@sanguineturtle Thank you for your comments.

  1. I didn't know the difference between classes with and without "(object)". I am going to add "(object)". Thanks.
  2. I found somewhere on the internet this strategy, to initially define _attribute_name as None and, if attribute_name is accessed, then check whether _attribute_name is None, and if it is True, then the relevant algorithm is called to return the result. The user does not know a priori whether _attribute_name is None or has a value (it may have been populated when some other method is called). In this sense, I would like to "protect" it.
    But I agree too many _'s reduce the readability of the code, and I am open to any suggestion.

I am also open to any suggestion on the class/method names for DiGraph.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.53%) when pulling 222b8ca on oyamad:mc_tools into 24e5e0f on QuantEcon:master.

@jstac
Copy link
Contributor

jstac commented Oct 28, 2014

@oyamad Please tell me when you've added the period attribute for non-irreducible chains and I'll accept the PR.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.55%) when pulling 323eecc on oyamad:mc_tools into 24e5e0f on QuantEcon:master.

@oyamad
Copy link
Member Author

oyamad commented Oct 29, 2014

I finally made a final revision.

mc_tools.py:

  1. For MarkovChain, period and is_aperiodic are extended to general, possibly reducible, Markov chains. The period of a Markov chain is now defined to be the least common multiple of the periods of the recurrent classes.
  2. cyclic_classes is left undefined for reducible chains.

I suppose this is ready to be merged.

Just one thing about implementation of some test code:
I was struggling with assertion of the equality between two (pure Python) lists of numpy.ndarrays (of different sizes). communication_classes, recurrent_classes, and cyclic_classes each return a list of ndarrays, and I want to check that the output list is equal to the known solution list. (In the previous version, it was not an issue, probably because the previous test code only contains cases with lists of classes of the same size; now I added a test instance with classes of different sizes.) I ended up with a non-elegant strategy, which is the function list_of_array_equal in test_graph_tools.py and test_mc_tools.py. The code works and I believe it correctly tests the correctness of the outputs, but a more elegant implementation would be desirable, if any.

jstac added a commit that referenced this pull request Oct 29, 2014
MARKOV: More efficient implementation for computing stationary distributions
@jstac jstac merged commit 895ddb8 into QuantEcon:master Oct 29, 2014
@jstac
Copy link
Contributor

jstac commented Oct 29, 2014

Great work, many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants