Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up congruence-related lattice functions #23383

Closed
jm58660 mannequin opened this issue Jul 7, 2017 · 30 comments
Closed

Speeding up congruence-related lattice functions #23383

jm58660 mannequin opened this issue Jul 7, 2017 · 30 comments

Comments

@jm58660
Copy link
Mannequin

jm58660 mannequin commented Jul 7, 2017

This patch adds precomputing intervals in the HasseDiagram to speed up some congruence-related lattice functions. If L = Posets.BooleanLattice(7), then we have

sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 4.63 s per loop

and after L._hasse_diagram._precompute_intervals()

sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 656 ms per loop

OTOH this will eat memory.

Currently this feature is well hidden, only available as a _-function of a _-variable. Later we should think about interface. A global "save-cpu-eat-memory" -switch or "aux functions" to lattices (posets?) or something else?

CC: @tscrim @fchapoton

Component: combinatorics

Author: Jori Mäntysalo

Branch/Commit: 6f8c510

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/23383

@jm58660 jm58660 mannequin added this to the sage-8.0 milestone Jul 7, 2017
@jm58660 jm58660 mannequin added c: combinatorics labels Jul 7, 2017
@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 7, 2017

Branch: u/jmantysalo/precompute-intervals

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 7, 2017

Changed branch from u/jmantysalo/precompute-intervals to none

@jm58660 jm58660 mannequin added the s: needs info label Jul 7, 2017
@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 7, 2017

Branch: u/jmantysalo/precompute-intervals

@tscrim
Copy link
Collaborator

tscrim commented Jul 7, 2017

Commit: 4b183a2

@tscrim
Copy link
Collaborator

tscrim commented Jul 7, 2017

comment:4

I consider Hasse diagrams not as part of the true API, but an implementation detail of posets. So I don't see any problem with changing the output type. Although it is part of the API for HasseDiagram, thus, it technically is a backwards incompatible change.

Also, it might be faster to have the dict indexed by a pair (x,y) as it only needs to do 1 hash and table lookup instead of 2. Also, a better way to populate _intervals:

self._intervals = {u: {v: v_up[u].intersection(v_down[v]) for v in range(n)}
                   for u in range(n)}

There is no need to create the empty object and you save a lot of Python calls by doing the dict comprehension.


New commits:

4b183a2Precomputing intervals of HasseDiagram.

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 7, 2017

comment:5

Replying to @tscrim:

Also, it might be faster to have the dict indexed by a pair (x,y) as it only needs to do 1 hash and table lookup instead of 2.

Actually, even faster should be a list of lists, that needs no lookup at all.

There is no need to create the empty object and you save a lot of Python calls by doing the dict comprehension.

True. I was thinking about lists but then somehow ended up to dict in this PoC.

And now the most important:

I consider Hasse diagrams not as part of the true API, but an implementation detail of posets. So I don't see any problem with changing the output type.

OK. I tested by adding precomputing to __init__ of poset. Only errors I got from long test of combinat/posets where from .open_interval in posets.py. But I think currently intervals are given as in the linear extension order, and that is propably a good thing.

@tscrim
Copy link
Collaborator

tscrim commented Jul 9, 2017

comment:6

Replying to @jm58660:

Replying to @tscrim:

Also, it might be faster to have the dict indexed by a pair (x,y) as it only needs to do 1 hash and table lookup instead of 2.

Actually, even faster should be a list of lists, that needs no lookup at all.

Well, there still is a lookup, but now it is (relatively) very fast. +1 for list of lists.

I consider Hasse diagrams not as part of the true API, but an implementation detail of posets. So I don't see any problem with changing the output type.

OK. I tested by adding precomputing to __init__ of poset. Only errors I got from long test of combinat/posets where from .open_interval in posets.py. But I think currently intervals are given as in the linear extension order, and that is propably a good thing.

I think it would be best if the numbering was done increasing (as that is assumed to be a linear extension IIRC).

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 10, 2017

Changed commit from 4b183a2 to 0cafe94

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 10, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

0cafe94Use list of lists.

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 10, 2017

comment:8

Here it is. The speedup is quite big:

sage: L = Posets.BooleanLattice(7)
sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 4.63 s per loop
sage: timeit("L._hasse_diagram._precompute_intervals()")
25 loops, best of 3: 23.6 ms per loop
sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 656 ms per loop

@jm58660 jm58660 mannequin added s: needs review and removed s: needs info labels Jul 10, 2017
@jm58660 jm58660 mannequin modified the milestones: sage-8.0, sage-8.1 Jul 10, 2017
@tscrim
Copy link
Collaborator

tscrim commented Jul 10, 2017

comment:9

If you want even more speed:

-        v_up = [None] * n
-        v_down = [None] * n
-        for v in range(n):
-            v_up[v] = frozenset(self.depth_first_search(v))
-            v_down[v] = frozenset(self.depth_first_search(v, neighbors=self.neighbors_in))
+        v_up = [frozenset(self.depth_first_search(v)) for v in range(n)]
+        v_down = [frozenset(self.depth_first_search(v, neighbors=self.neighbors_in))
+                  for v in range(n)]
 
-        self._intervals = [[None] * n for _ in range(n)]
-        for u in range(n):
-            for v in range(n):
-                self._intervals[u][v] = sorted(v_up[u].intersection(v_down[v]))
+        self._intervals = [[sorted(up.intersection(down)) for down in v_down]
+                           for up in v_up]

I've seen using list comprehension have a significant effect for things like this because of the extra Python overhead.

I do not like the documentation of _alt_interval. It should explain what it is doing (well, really it should be similar/same as interval. Moreover, you should put somewhere in some documentation about this feature and how to use it. It probably is good to make this part of the API for posets somewhere with a warning about the memory usage.

You also should update the ticket description.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2017

Changed commit from 0cafe94 to 5c3e75c

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

5c3e75cUse list comprehension.

@jm58660

This comment has been minimized.

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 11, 2017

comment:11

Replying to @tscrim:

If you want even more speed:

Done that. And yes, it gives some milliseconds: for Posets.BooleanLattice(10) it took 1,65 seconds, now it takes 1,41 seconds. (Of course that means nothing when compared to time to compute congruences.) And the code is shorted and more pythonic.

Much more speedup should be possible with better algorithm: an interval from a with covers a1 and a2 to b is union of intervals (a1, b) and (a2, b).

I do not like the documentation of _alt_interval. It should explain what it is doing (well, really it should be similar/same as interval. Moreover, you should put somewhere in some documentation about this feature and how to use it. It probably is good to make this part of the API for posets somewhere with a warning about the memory usage.

You also should update the ticket description.

I added a note about this to the description. Needs to think for a while.

@jm58660 jm58660 mannequin added s: needs review and removed s: needs work labels Jul 11, 2017
@tscrim
Copy link
Collaborator

tscrim commented Jul 11, 2017

Reviewer: Travis Scrimshaw

@tscrim
Copy link
Collaborator

tscrim commented Jul 11, 2017

comment:12

However, I still would like to see a bit better of documentation for _alt_interval describing the algorithm (i.e., by precomputing) and what you need to do to make it work. In some ways, I don't like the implementation because you need to explicitly call another function first instead of it just working. I am see this through the eyes of someone browsing through the code and seeing this who did not read this specific ticket, which will almost certainly happen at some point.

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 11, 2017

comment:13

First, a technical question: A Python function can return an internal function, i.e. this will print "Hello":

def f():
    def g(): print("Hello")
    return g
f()()

Would it be meaningfull to use this feature, and remove _alt_interval from the top level of Hasse diagram?

Now to the main question.

The problem is, IMO, UniqueRepresentation. No created poset will ever be freed from the memory, and so making test with random posets will eventually eat all the memory. And if we compute lequal_matrix() or _meet etc, those will be saved too. That's why I am worried about this.

When should this precomputing be called? In interval()? In congruence()? In, say, is_isoform()?

I will formulate longer description for _atl_interval.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

99412feMore documentation.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 11, 2017

Changed commit from 5c3e75c to 99412fe

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 21, 2017

comment:15

More comments on this? Specially for this:

When should this precomputing be called? In interval()? In congruence()? In, say, is_isoform()?

@jm58660 jm58660 mannequin added s: needs review and removed s: needs work labels Jul 21, 2017
@tscrim
Copy link
Collaborator

tscrim commented Jul 21, 2017

comment:16

Replying to @jm58660:

First, a technical question: A Python function can return an internal function, i.e. this will print "Hello":

def f():
    def g(): print("Hello")
    return g
f()()

Yes, that is correct. We return functions in a number of places within Sage too.

Would it be meaningfull to use this feature, and remove _alt_interval from the top level of Hasse diagram?

I do not think so by default because of how much memory this would use. Although it could be something we do for small posets (less that 5 vertices?).

The problem is, IMO, UniqueRepresentation. No created poset will ever be freed from the memory, and so making test with random posets will eventually eat all the memory. And if we compute lequal_matrix() or _meet etc, those will be saved too. That's why I am worried about this.

That should not be true. Once you delete all references to a poset, it should be freeable as it is a weak reference (otherwise it is a memory leak bug).

When should this precomputing be called? In interval()? In congruence()? In, say, is_isoform()?

Probably in none of these places, but instead we should document it as an alternative method for increasing the speed.

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 22, 2017

comment:17

Replying to @tscrim:

That should not be true. Once you delete all references to a poset, it should be freeable as it is a weak reference (otherwise it is a memory leak bug).

If that would be true, then for example this should print the same number again and again:

i = 0
j = 0
for P in Posets(8):
    j += 1
    if j % 1000 == 0:
        print(sage.misc.getusage.get_memory_usage())
    if P.is_connected():
        i += 1
print(i)

(Compare this with same except digraphs(8) instead of Posets(8).)

Currently there is no way to for example search for counter-example with random posets to unlimited time.

@tscrim
Copy link
Collaborator

tscrim commented Jul 22, 2017

comment:18

I know UniqueRepresentation is a weak cache, and I do not see anything that would explicitly nail it in the memory. What is probably happening is with the iterator populating the cache and moving around all of the memory is not giving the garbage collector time to work. If I force a garbage collection, then the memory usage (basically) does not grow:

import gc
i = 0
j = 0
for P in Posets(8):
    j += 1
    if j % 100 == 0:
        gc.collect()
        print(sage.misc.getusage.get_memory_usage())
    if P.is_connected():
        i += 1
print(i)

IIRC, Python will force a garbage collection to see if it can free up the memory before giving an out-of-memory exception. Although I've never really done anything that would cause that to happen.

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 23, 2017

comment:19

Replying to @tscrim:

First, thanks for gc.collect()! I think I have read it before, and totally forget.

IIRC, Python will force a garbage collection to see if it can free up the memory before giving an out-of-memory exception. Although I've never really done anything that would cause that to happen.

The problem is propably memory overcommit or usage of every bit of memory. When almost every byte is used, Linux is extremely slow, and so never goes to the limit where Python would give out-of-memory exception.

This should be handled in Python level or in OS level. Sage can only add some temporary workarounds.

@tscrim
Copy link
Collaborator

tscrim commented Jul 25, 2017

comment:20

One last trivial thing: code format ``x`` and ``y`` in _alternate_interval. Once done, you can set a positive review on my behalf.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 25, 2017

Branch pushed to git repo; I updated commit sha1. New commits:

6f8c510Docstring formatting.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 25, 2017

Changed commit from 99412fe to 6f8c510

@jm58660
Copy link
Mannequin Author

jm58660 mannequin commented Jul 25, 2017

comment:22

Replying to @tscrim:

One last trivial thing: code format ``x`` and ``y`` in _alternate_interval. Once done, you can set a positive review on my behalf.

Done that.

@vbraun
Copy link
Member

vbraun commented Jul 29, 2017

Changed branch from u/jmantysalo/precompute-intervals to 6f8c510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants