Speeding up congruence-related lattice functions #23383

jm58660 · 2017-07-07T08:13:56Z

This patch adds precomputing intervals in the HasseDiagram to speed up some congruence-related lattice functions. If L = Posets.BooleanLattice(7), then we have

sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 4.63 s per loop

and after L._hasse_diagram._precompute_intervals()

sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 656 ms per loop

OTOH this will eat memory.

Currently this feature is well hidden, only available as a _-function of a _-variable. Later we should think about interface. A global "save-cpu-eat-memory" -switch or "aux functions" to lattices (posets?) or something else?

CC: @tscrim @fchapoton

Component: combinatorics

Author: Jori Mäntysalo

Branch/Commit: 6f8c510

Reviewer: Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/23383

The text was updated successfully, but these errors were encountered:

jm58660 · 2017-07-07T08:14:27Z

Branch: u/jmantysalo/precompute-intervals

jm58660 · 2017-07-07T08:14:48Z

Changed branch from u/jmantysalo/precompute-intervals to none

jm58660 · 2017-07-07T08:15:07Z

Branch: u/jmantysalo/precompute-intervals

tscrim · 2017-07-07T14:32:08Z

Commit: 4b183a2

tscrim · 2017-07-07T14:32:08Z

comment:4

I consider Hasse diagrams not as part of the true API, but an implementation detail of posets. So I don't see any problem with changing the output type. Although it is part of the API for HasseDiagram, thus, it technically is a backwards incompatible change.

Also, it might be faster to have the dict indexed by a pair (x,y) as it only needs to do 1 hash and table lookup instead of 2. Also, a better way to populate _intervals:

self._intervals = {u: {v: v_up[u].intersection(v_down[v]) for v in range(n)}
                   for u in range(n)}

There is no need to create the empty object and you save a lot of Python calls by doing the dict comprehension.

New commits:

`4b183a2`	`Precomputing intervals of HasseDiagram.`

jm58660 · 2017-07-07T16:33:00Z

comment:5

Replying to @tscrim:

Also, it might be faster to have the dict indexed by a pair (x,y) as it only needs to do 1 hash and table lookup instead of 2.

Actually, even faster should be a list of lists, that needs no lookup at all.

There is no need to create the empty object and you save a lot of Python calls by doing the dict comprehension.

True. I was thinking about lists but then somehow ended up to dict in this PoC.

And now the most important:

I consider Hasse diagrams not as part of the true API, but an implementation detail of posets. So I don't see any problem with changing the output type.

OK. I tested by adding precomputing to __init__ of poset. Only errors I got from long test of combinat/posets where from .open_interval in posets.py. But I think currently intervals are given as in the linear extension order, and that is propably a good thing.

tscrim · 2017-07-09T06:15:34Z

comment:6

Replying to @jm58660:

Replying to @tscrim:

Also, it might be faster to have the dict indexed by a pair (x,y) as it only needs to do 1 hash and table lookup instead of 2.

Actually, even faster should be a list of lists, that needs no lookup at all.

Well, there still is a lookup, but now it is (relatively) very fast. +1 for list of lists.

I consider Hasse diagrams not as part of the true API, but an implementation detail of posets. So I don't see any problem with changing the output type.

OK. I tested by adding precomputing to __init__ of poset. Only errors I got from long test of combinat/posets where from .open_interval in posets.py. But I think currently intervals are given as in the linear extension order, and that is propably a good thing.

I think it would be best if the numbering was done increasing (as that is assumed to be a linear extension IIRC).

sagetrac-git · 2017-07-10T09:21:29Z

Changed commit from 4b183a2 to 0cafe94

sagetrac-git · 2017-07-10T09:21:29Z

Branch pushed to git repo; I updated commit sha1. New commits:

`0cafe94`	`Use list of lists.`

jm58660 · 2017-07-10T09:24:07Z

comment:8

Here it is. The speedup is quite big:

sage: L = Posets.BooleanLattice(7)
sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 4.63 s per loop
sage: timeit("L._hasse_diagram._precompute_intervals()")
25 loops, best of 3: 23.6 ms per loop
sage: timeit("list(L._hasse_diagram.congruences_iterator())")
5 loops, best of 3: 656 ms per loop

tscrim · 2017-07-10T15:55:51Z

comment:9

If you want even more speed:

-        v_up = [None] * n
-        v_down = [None] * n
-        for v in range(n):
-            v_up[v] = frozenset(self.depth_first_search(v))
-            v_down[v] = frozenset(self.depth_first_search(v, neighbors=self.neighbors_in))
+        v_up = [frozenset(self.depth_first_search(v)) for v in range(n)]
+        v_down = [frozenset(self.depth_first_search(v, neighbors=self.neighbors_in))
+                  for v in range(n)]
 
-        self._intervals = [[None] * n for _ in range(n)]
-        for u in range(n):
-            for v in range(n):
-                self._intervals[u][v] = sorted(v_up[u].intersection(v_down[v]))
+        self._intervals = [[sorted(up.intersection(down)) for down in v_down]
+                           for up in v_up]

I've seen using list comprehension have a significant effect for things like this because of the extra Python overhead.

I do not like the documentation of _alt_interval. It should explain what it is doing (well, really it should be similar/same as interval. Moreover, you should put somewhere in some documentation about this feature and how to use it. It probably is good to make this part of the API for posets somewhere with a warning about the memory usage.

You also should update the ticket description.

sagetrac-git · 2017-07-11T05:33:50Z

Changed commit from 0cafe94 to 5c3e75c

sagetrac-git · 2017-07-11T05:33:50Z

Branch pushed to git repo; I updated commit sha1. New commits:

`5c3e75c`	`Use list comprehension.`

jm58660 · 2017-07-11T05:48:20Z

comment:11

Replying to @tscrim:

If you want even more speed:

Done that. And yes, it gives some milliseconds: for Posets.BooleanLattice(10) it took 1,65 seconds, now it takes 1,41 seconds. (Of course that means nothing when compared to time to compute congruences.) And the code is shorted and more pythonic.

Much more speedup should be possible with better algorithm: an interval from a with covers a1 and a2 to b is union of intervals (a1, b) and (a2, b).

I do not like the documentation of _alt_interval. It should explain what it is doing (well, really it should be similar/same as interval. Moreover, you should put somewhere in some documentation about this feature and how to use it. It probably is good to make this part of the API for posets somewhere with a warning about the memory usage.

You also should update the ticket description.

I added a note about this to the description. Needs to think for a while.

tscrim · 2017-07-11T07:16:34Z

Reviewer: Travis Scrimshaw

tscrim · 2017-07-11T07:16:34Z

comment:12

However, I still would like to see a bit better of documentation for _alt_interval describing the algorithm (i.e., by precomputing) and what you need to do to make it work. In some ways, I don't like the implementation because you need to explicitly call another function first instead of it just working. I am see this through the eyes of someone browsing through the code and seeing this who did not read this specific ticket, which will almost certainly happen at some point.

jm58660 · 2017-07-11T08:29:53Z

comment:13

First, a technical question: A Python function can return an internal function, i.e. this will print "Hello":

def f():
    def g(): print("Hello")
    return g
f()()

Would it be meaningfull to use this feature, and remove _alt_interval from the top level of Hasse diagram?

Now to the main question.

The problem is, IMO, UniqueRepresentation. No created poset will ever be freed from the memory, and so making test with random posets will eventually eat all the memory. And if we compute lequal_matrix() or _meet etc, those will be saved too. That's why I am worried about this.

When should this precomputing be called? In interval()? In congruence()? In, say, is_isoform()?

I will formulate longer description for _atl_interval.

sagetrac-git · 2017-07-11T08:39:37Z

Branch pushed to git repo; I updated commit sha1. New commits:

`99412fe`	`More documentation.`

sagetrac-git · 2017-07-11T08:39:37Z

Changed commit from 5c3e75c to 99412fe

jm58660 · 2017-07-21T07:39:15Z

comment:15

More comments on this? Specially for this:

When should this precomputing be called? In interval()? In congruence()? In, say, is_isoform()?

tscrim · 2017-07-21T23:11:37Z

comment:16

Replying to @jm58660:

First, a technical question: A Python function can return an internal function, i.e. this will print "Hello":
def f():
    def g(): print("Hello")
    return g
f()()

Yes, that is correct. We return functions in a number of places within Sage too.

Would it be meaningfull to use this feature, and remove _alt_interval from the top level of Hasse diagram?

I do not think so by default because of how much memory this would use. Although it could be something we do for small posets (less that 5 vertices?).

The problem is, IMO, UniqueRepresentation. No created poset will ever be freed from the memory, and so making test with random posets will eventually eat all the memory. And if we compute lequal_matrix() or _meet etc, those will be saved too. That's why I am worried about this.

That should not be true. Once you delete all references to a poset, it should be freeable as it is a weak reference (otherwise it is a memory leak bug).

When should this precomputing be called? In interval()? In congruence()? In, say, is_isoform()?

Probably in none of these places, but instead we should document it as an alternative method for increasing the speed.

jm58660 · 2017-07-22T06:57:50Z

comment:17

Replying to @tscrim:

That should not be true. Once you delete all references to a poset, it should be freeable as it is a weak reference (otherwise it is a memory leak bug).

If that would be true, then for example this should print the same number again and again:

i = 0
j = 0
for P in Posets(8):
    j += 1
    if j % 1000 == 0:
        print(sage.misc.getusage.get_memory_usage())
    if P.is_connected():
        i += 1
print(i)

(Compare this with same except digraphs(8) instead of Posets(8).)

Currently there is no way to for example search for counter-example with random posets to unlimited time.

tscrim · 2017-07-22T11:42:15Z

comment:18

I know UniqueRepresentation is a weak cache, and I do not see anything that would explicitly nail it in the memory. What is probably happening is with the iterator populating the cache and moving around all of the memory is not giving the garbage collector time to work. If I force a garbage collection, then the memory usage (basically) does not grow:

import gc
i = 0
j = 0
for P in Posets(8):
    j += 1
    if j % 100 == 0:
        gc.collect()
        print(sage.misc.getusage.get_memory_usage())
    if P.is_connected():
        i += 1
print(i)

IIRC, Python will force a garbage collection to see if it can free up the memory before giving an out-of-memory exception. Although I've never really done anything that would cause that to happen.

jm58660 · 2017-07-23T05:20:34Z

comment:19

Replying to @tscrim:

First, thanks for gc.collect()! I think I have read it before, and totally forget.

IIRC, Python will force a garbage collection to see if it can free up the memory before giving an out-of-memory exception. Although I've never really done anything that would cause that to happen.

The problem is propably memory overcommit or usage of every bit of memory. When almost every byte is used, Linux is extremely slow, and so never goes to the limit where Python would give out-of-memory exception.

This should be handled in Python level or in OS level. Sage can only add some temporary workarounds.

tscrim · 2017-07-25T00:48:45Z

comment:20

One last trivial thing: code format ``x`` and ``y`` in _alternate_interval. Once done, you can set a positive review on my behalf.

sagetrac-git · 2017-07-25T05:49:15Z

Branch pushed to git repo; I updated commit sha1. New commits:

`6f8c510`	`Docstring formatting.`

sagetrac-git · 2017-07-25T05:49:15Z

Changed commit from 99412fe to 6f8c510

jm58660 · 2017-07-25T05:49:37Z

comment:22

Replying to @tscrim:

One last trivial thing: code format ``x`` and ``y`` in _alternate_interval. Once done, you can set a positive review on my behalf.

Done that.

vbraun · 2017-07-29T15:30:50Z

Changed branch from u/jmantysalo/precompute-intervals to 6f8c510

jm58660 mannequin added this to the sage-8.0 milestone Jul 7, 2017

jm58660 mannequin added c: combinatorics labels Jul 7, 2017

jm58660 mannequin added the s: needs info label Jul 7, 2017

jm58660 mannequin added s: needs review and removed s: needs info labels Jul 10, 2017

jm58660 mannequin modified the milestones: sage-8.0, sage-8.1 Jul 10, 2017

tscrim added s: needs work and removed s: needs review labels Jul 10, 2017

This comment has been minimized.

Sign in to view

jm58660 mannequin added s: needs review and removed s: needs work labels Jul 11, 2017

tscrim added s: needs work and removed s: needs review labels Jul 11, 2017

jm58660 mannequin added s: needs review and removed s: needs work labels Jul 21, 2017

jm58660 mannequin added s: positive review and removed s: needs review labels Jul 25, 2017

vbraun removed the s: positive review label Jul 29, 2017

vbraun closed this as completed in f15aa1f Jul 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up congruence-related lattice functions #23383

Speeding up congruence-related lattice functions #23383

jm58660 mannequin commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

tscrim commented Jul 7, 2017

tscrim commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

tscrim commented Jul 9, 2017

sagetrac-git mannequin commented Jul 10, 2017

sagetrac-git mannequin commented Jul 10, 2017

jm58660 mannequin commented Jul 10, 2017

tscrim commented Jul 10, 2017

sagetrac-git mannequin commented Jul 11, 2017

sagetrac-git mannequin commented Jul 11, 2017

This comment has been minimized.

jm58660 mannequin commented Jul 11, 2017

tscrim commented Jul 11, 2017

tscrim commented Jul 11, 2017

jm58660 mannequin commented Jul 11, 2017

sagetrac-git mannequin commented Jul 11, 2017

sagetrac-git mannequin commented Jul 11, 2017

jm58660 mannequin commented Jul 21, 2017

tscrim commented Jul 21, 2017

jm58660 mannequin commented Jul 22, 2017

tscrim commented Jul 22, 2017

jm58660 mannequin commented Jul 23, 2017

tscrim commented Jul 25, 2017

sagetrac-git mannequin commented Jul 25, 2017

sagetrac-git mannequin commented Jul 25, 2017

jm58660 mannequin commented Jul 25, 2017

vbraun commented Jul 29, 2017

Speeding up congruence-related lattice functions #23383

Speeding up congruence-related lattice functions #23383

Comments

jm58660 mannequin commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

tscrim commented Jul 7, 2017

tscrim commented Jul 7, 2017

jm58660 mannequin commented Jul 7, 2017

tscrim commented Jul 9, 2017

sagetrac-git mannequin commented Jul 10, 2017

sagetrac-git mannequin commented Jul 10, 2017

jm58660 mannequin commented Jul 10, 2017

tscrim commented Jul 10, 2017

sagetrac-git mannequin commented Jul 11, 2017

sagetrac-git mannequin commented Jul 11, 2017

This comment has been minimized.

jm58660 mannequin commented Jul 11, 2017

tscrim commented Jul 11, 2017

tscrim commented Jul 11, 2017

jm58660 mannequin commented Jul 11, 2017

sagetrac-git mannequin commented Jul 11, 2017

sagetrac-git mannequin commented Jul 11, 2017

jm58660 mannequin commented Jul 21, 2017

tscrim commented Jul 21, 2017

jm58660 mannequin commented Jul 22, 2017

tscrim commented Jul 22, 2017

jm58660 mannequin commented Jul 23, 2017

tscrim commented Jul 25, 2017

sagetrac-git mannequin commented Jul 25, 2017

sagetrac-git mannequin commented Jul 25, 2017

jm58660 mannequin commented Jul 25, 2017

vbraun commented Jul 29, 2017