Improve PolynomialSequence.connected_components() #35518

dcoudert · 2023-04-15T16:48:35Z

As discussed in #35512, we propose an implementation of method PolynomialSequence.connected_components() that avoids the effective construction of the graph connecting variables that appear in a same polynomial.

📚 Description

We use a union-find data structure to encode the connected components.

📝 Checklist

The title is concise, informative, and self-explanatory.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation accordingly.

⌛ Dependencies

See discussion in #35512

videlec · 2023-04-15T18:47:24Z

Couldn't you use the variable indices rather than the variables themselves? That does require some additional code to polynomials that do not provide such method. But it is straightforward given the code of variables

class MPolynomial_libsingular:
    def variables(self):
        cdef poly *p
        cdef poly *v
        cdef ring *r = self._parent_ring
        if(r != currRing): rChangeCurrRing(r)
        cdef int i
        l = list()
        si = set()
        p = self._poly
        while p:
            for i from 1 <= i <= r.N:
                if i not in si and p_GetExp(p,i,r):
                    v = p_ISet(1,r)
                    p_SetExp(v, i, 1, r)
                    p_Setm(v, r)
                    l.append(new_MP(self._parent, v))
                    si.add(i)
            p = pNext(p)
        return tuple(sorted(l,reverse=True))

The code above builds the set si we are interested in and just drops it.

tornaria · 2023-04-15T19:21:52Z

src/sage/rings/polynomial/multi_polynomial_sequence.py

+        DS = DisjointSet(self.variables())
+        L = []  # to avoid calling twice f.variables()
        for f in self:
-            for i,c in enumerate(C):
-                if len(set(f.variables()).difference(c)) == 0:
+            var = f.variables()


This is not worse than what it was before, but this is calling self.variables() and later f.variables() for each f in self.

Actually, I do one call less to self.variables(). You forgot the calls that were done in connection_graph.

The proposal of @videlec is certainly doable but much more involved and cannot be done in the same file. I don't have enough knowledge in this part of the code to do it.

We can definitely postpone the usage of variable indices rather than variables themselves.

I made a more concrete proposal for an interface at #35523. Comments welcome.

tornaria · 2023-04-15T20:29:08Z

src/sage/rings/polynomial/multi_polynomial_sequence.py

+        # Use a union-find data structure to encode relationships between
+        # variables, i.e., that they belong to a same polynomial
+        from sage.sets.disjoint_set import DisjointSet
+        DS = DisjointSet(self.variables())


Suggested change

DS = DisjointSet(self.variables())

vs = [f.variables() for f in self]

DS = DisjointSet(set().union(*vs))

Rather than self.variables() I think than one should use self.universe().variables(). The point being that the vertices should be all variables in the underlying polynomial ring (and not just the ones that appear in some polynomial in self). Doing this change will also be better in terms of efficiency.

Actually, self.universe().variables() is only available for boolean polynomial rings. One should use self.universe().gens() for other polynomial ring (and possibly fix this discrepency between boolean and other polynomial rings).

tornaria · 2023-04-15T20:29:32Z

src/sage/rings/polynomial/multi_polynomial_sequence.py

+        # variables, i.e., that they belong to a same polynomial
+        from sage.sets.disjoint_set import DisjointSet
+        DS = DisjointSet(self.variables())
+        L = []  # to avoid calling twice f.variables()
        for f in self:


Suggested change

for f in self:

for f, var in zip(self, vs):

tornaria · 2023-04-15T20:29:47Z

src/sage/rings/polynomial/multi_polynomial_sequence.py

        for f in self:
-            for i,c in enumerate(C):
-                if len(set(f.variables()).difference(c)) == 0:
+            var = f.variables()


Suggested change

var = f.variables()

tornaria · 2023-04-15T20:55:01Z

This is the best I came about:

def connected_components(self):
    from sage.sets.disjoint_set import DisjointSet
    from collections import defaultdict

    # precompute the list of variables in each polynomial
    vss = [f.variables() for f in self]

    # Use a union-find data structure to encode relationships between
    # variables, i.e., that they belong to a same polynomial
    DS = DisjointSet(set().union(*vss))
    for u, *vs in vss:
        for v in vs:
            DS.union(u, v)

    # dict mapping each root element to the polynomials in this component
    Ps = defaultdict(lambda:[])
    for f, vs in zip(self, vss):
        r = DS.find(vs[0])
        Ps[r].append(f)

    return sorted([PolynomialSequence(sorted(p)) for p in Ps.values()])

It seems 30% faster than yours, mostly because it avoids calling f.variables() twice. The minimal changes I posted in review get almost the same.

I wonder what is the need of those sorted... The inner one costs about half the time (for the Fz example in the doctest).

Note that the code is completely deterministic even removing the inner sorted: the polynomials will appear in each component in the same order they were in the original sequence, which I would argue is a better behaviour (IOW, if the input sequence is sorted, the output components will be sorted).

As for the outer sorted: the code as is now it will be non-deterministic as it depends on the order of values() which is undefined in new python. However, with not much work one could get the order of the components to also match the order of the original sequence in such a way that for a sorted input sequence the output will be sorted.

tornaria · 2023-04-15T21:10:23Z

Something like this:

def connected_components(self):

    # precompute the list of variables in each polynomial
    vss = [f.variables() for f in self]

    # Use a union-find data structure to encode relationships between
    # variables, i.e., that they belong to a same polynomial
    from sage.sets.disjoint_set import DisjointSet
    DS = DisjointSet(set().union(*vss))
    for u, *vs in vss:
        for v in vs:
            DS.union(u, v)

    Ps = {} # map root element -> polynomials in this component
    rs = [] # sorted list of root elements
    for f, vs in zip(self, vss):
        r = DS.find(vs[0])
        try:
            Ps[r].append(f)
        except KeyError:
            rs.append(r)
            Ps[r] = [f]

    return [PolynomialSequence(self.ring(), Ps[r]) for r in rs]

Simplify the code; also, the previous code has to iterate over variables of the sequence twice (this is really bad before sagemath#35510) Moreover, now we add a clique between the variables of each polynomial, so it agrees with the description (the code before used to add just a spanning tree of the clique -- a star). This makes this method a little bit slower for the purposes of `connected_components()` (for which adding a star is equivalent). However, sagemath#35518 will rewrite `connected_components()` without using `connection_graph()` so this is not really a problem.

dcoudert · 2023-04-16T07:56:06Z

I followed the proposal of @tornaria but I let the final sorted(...) statements before returning the solution. This is to satisfy the doctest of line 79, and so to let the behavior of the method unchanged.

tornaria · 2023-04-16T11:03:35Z

I followed the proposal of @tornaria but I let the final sorted(...) statements before returning the solution. This is to satisfy the doctest of line 79, and so to let the behavior of the method unchanged.

IMO what's important is that the function is deterministic, not that it sorts. The way I proposed, the order of the output is determined by the order of the input. The doctest in line 79 can be fixed by

--- a/src/sage/rings/polynomial/multi_polynomial_sequence.py
+++ b/src/sage/rings/polynomial/multi_polynomial_sequence.py
@@ -76,7 +76,7 @@ pair and study it::
 
 We separate the system in independent subsystems::
 
-    sage: C = Sequence(r2).connected_components(); C
+    sage: C = Sequence(sorted(r2)).connected_components(); C
     [[w213 + k113 + x111 + x112 + x113,
      w212 + k112 + x110 + x111 + x112 + 1,
      w211 + k111 + x110 + x111 + x113 + 1,

It may be preferable just not so sort here but fix the output in the doctest to match whatever comes out. If you look at the output of r2 in line 43, you'll see that the order of the connected components is different... Much nicer IMO if the order of the input system is preserved.

dcoudert · 2023-04-16T11:49:42Z

Now we don't sort anymore in connected_components. The ordering of the result is the order in which keys are inserted in Ps.

github-actions · 2023-04-16T12:56:01Z

Documentation preview for this PR is ready! 🎉
Built with commit: 9bc14b3

tornaria · 2023-04-17T16:42:33Z

Now we don't sort anymore in connected_components. The ordering of the result is the order in which keys are inserted in Ps.

I had to look this up: "Changed in version 3.7: Dictionary order is guaranteed to be insertion order." https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects

May I suggest the following tests to check the order of the output:

sage: R.<x,y,z> = PolynomialRing(ZZ)
sage: Sequence([x,z,y]).connected_components()
[[x], [z], [y]]
sage: Sequence([x,z,x*y*z,y]).connected_components()
[[x, z, x*y*z, y]]

Also, my preference would be to not sort in line 79 and change the output, the sequences will be in the same order as in the output of line 43, so it's arguably more natural. But whatever you prefer is ok.

dcoudert · 2023-04-18T08:08:24Z

I did the proposed changes.

Simplify the code; also, the previous code has to iterate over variables of the sequence twice (this is really bad before #35510). This affects mainly the method `connected_components()`. EDIT: Moreover, now we add a clique between the variables of each polynomial, so it agrees with the description (the code before used to add just a spanning tree of the clique -- a star). This makes this method a little bit slower for the purposes of `connected_components()` (for which adding a star is equivalent). However, #35518 will rewrite `connected_components()` without using `connection_graph()` so this is not really a problem. ### 📝 Checklist - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] New tests added to check that a clique is added for the variables in each polynomial. ### Dependencies - #35511 URL: #35512 Reported by: Gonzalo Tornaría Reviewer(s): David Coudert, Gonzalo Tornaría, Vincent Delecroix

connected components without graph in multi_polinomial_sequence.py

6eda76a

dcoudert requested review from tornaria and videlec April 15, 2023 16:49

dcoudert self-assigned this Apr 15, 2023

dcoudert added t: enhancement s: needs review c: algebra labels Apr 15, 2023

tornaria reviewed Apr 15, 2023

View reviewed changes

tornaria mentioned this pull request Apr 15, 2023

Improve PolynomialSequence.connection_graph() implementation #35512

Merged

3 tasks

videlec mentioned this pull request Apr 16, 2023

Variables versus variable indices #35523

Open

1 task

PR sagemath#35518: avoid call to self.variables()

2b2e9f4

dcoudert added 2 commits April 16, 2023 13:41

PR sagemath#35518: avoid sorting in connected_components

1f0371a

PR sagemath#35518: some care

9bc14b3

videlec approved these changes Apr 17, 2023

View reviewed changes

PR sagemath#35518: doctest to check the order of the output

bb61d57

tornaria added s: positive review and removed s: needs review labels Apr 18, 2023

tornaria approved these changes Apr 18, 2023

View reviewed changes

vbraun merged commit cc0ea4d into sagemath:develop Apr 23, 2023
7 of 8 checks passed

mkoeppe added this to the sage-10.0 milestone Apr 23, 2023

mkoeppe removed the s: positive review label Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve PolynomialSequence.connected_components() #35518

Improve PolynomialSequence.connected_components() #35518

dcoudert commented Apr 15, 2023

videlec commented Apr 15, 2023

tornaria Apr 15, 2023

dcoudert Apr 15, 2023

videlec Apr 16, 2023

videlec Apr 16, 2023

tornaria Apr 15, 2023

videlec Apr 16, 2023

videlec Apr 16, 2023

tornaria Apr 15, 2023

tornaria Apr 15, 2023

tornaria commented Apr 15, 2023

tornaria commented Apr 15, 2023

dcoudert commented Apr 16, 2023

tornaria commented Apr 16, 2023

dcoudert commented Apr 16, 2023

github-actions bot commented Apr 16, 2023

tornaria commented Apr 17, 2023

dcoudert commented Apr 18, 2023

	DS = DisjointSet(self.variables())
	vs = [f.variables() for f in self]
	DS = DisjointSet(set().union(*vs))

Improve PolynomialSequence.connected_components() #35518

Improve PolynomialSequence.connected_components() #35518

Conversation

dcoudert commented Apr 15, 2023

📚 Description

📝 Checklist

⌛ Dependencies

videlec commented Apr 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tornaria commented Apr 15, 2023

tornaria commented Apr 15, 2023

dcoudert commented Apr 16, 2023

tornaria commented Apr 16, 2023

dcoudert commented Apr 16, 2023

github-actions bot commented Apr 16, 2023

tornaria commented Apr 17, 2023

dcoudert commented Apr 18, 2023