Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PolynomialSequence.connected_components() #35518

Conversation

dcoudert
Copy link
Contributor

As discussed in #35512, we propose an implementation of method PolynomialSequence.connected_components() that avoids the effective construction of the graph connecting variables that appear in a same polynomial.

📚 Description

We use a union-find data structure to encode the connected components.

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

See discussion in #35512

@videlec
Copy link
Contributor

videlec commented Apr 15, 2023

Couldn't you use the variable indices rather than the variables themselves? That does require some additional code to polynomials that do not provide such method. But it is straightforward given the code of variables

class MPolynomial_libsingular:
    def variables(self):
        cdef poly *p
        cdef poly *v
        cdef ring *r = self._parent_ring
        if(r != currRing): rChangeCurrRing(r)
        cdef int i
        l = list()
        si = set()
        p = self._poly
        while p:
            for i from 1 <= i <= r.N:
                if i not in si and p_GetExp(p,i,r):
                    v = p_ISet(1,r)
                    p_SetExp(v, i, 1, r)
                    p_Setm(v, r)
                    l.append(new_MP(self._parent, v))
                    si.add(i)
            p = pNext(p)
        return tuple(sorted(l,reverse=True))

The code above builds the set si we are interested in and just drops it.

Comment on lines 949 to 952
DS = DisjointSet(self.variables())
L = [] # to avoid calling twice f.variables()
for f in self:
for i,c in enumerate(C):
if len(set(f.variables()).difference(c)) == 0:
var = f.variables()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not worse than what it was before, but this is calling self.variables() and later f.variables() for each f in self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I do one call less to self.variables(). You forgot the calls that were done in connection_graph.

The proposal of @videlec is certainly doable but much more involved and cannot be done in the same file. I don't have enough knowledge in this part of the code to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can definitely postpone the usage of variable indices rather than variables themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a more concrete proposal for an interface at #35523. Comments welcome.

# Use a union-find data structure to encode relationships between
# variables, i.e., that they belong to a same polynomial
from sage.sets.disjoint_set import DisjointSet
DS = DisjointSet(self.variables())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DS = DisjointSet(self.variables())
vs = [f.variables() for f in self]
DS = DisjointSet(set().union(*vs))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than self.variables() I think than one should use self.universe().variables(). The point being that the vertices should be all variables in the underlying polynomial ring (and not just the ones that appear in some polynomial in self). Doing this change will also be better in terms of efficiency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, self.universe().variables() is only available for boolean polynomial rings. One should use self.universe().gens() for other polynomial ring (and possibly fix this discrepency between boolean and other polynomial rings).

# variables, i.e., that they belong to a same polynomial
from sage.sets.disjoint_set import DisjointSet
DS = DisjointSet(self.variables())
L = [] # to avoid calling twice f.variables()
for f in self:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for f in self:
for f, var in zip(self, vs):

for f in self:
for i,c in enumerate(C):
if len(set(f.variables()).difference(c)) == 0:
var = f.variables()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var = f.variables()

@tornaria
Copy link
Contributor

This is the best I came about:

def connected_components(self):
    from sage.sets.disjoint_set import DisjointSet
    from collections import defaultdict

    # precompute the list of variables in each polynomial
    vss = [f.variables() for f in self]

    # Use a union-find data structure to encode relationships between
    # variables, i.e., that they belong to a same polynomial
    DS = DisjointSet(set().union(*vss))
    for u, *vs in vss:
        for v in vs:
            DS.union(u, v)

    # dict mapping each root element to the polynomials in this component
    Ps = defaultdict(lambda:[])
    for f, vs in zip(self, vss):
        r = DS.find(vs[0])
        Ps[r].append(f)

    return sorted([PolynomialSequence(sorted(p)) for p in Ps.values()])

It seems 30% faster than yours, mostly because it avoids calling f.variables() twice. The minimal changes I posted in review get almost the same.

I wonder what is the need of those sorted... The inner one costs about half the time (for the Fz example in the doctest).

Note that the code is completely deterministic even removing the inner sorted: the polynomials will appear in each component in the same order they were in the original sequence, which I would argue is a better behaviour (IOW, if the input sequence is sorted, the output components will be sorted).

As for the outer sorted: the code as is now it will be non-deterministic as it depends on the order of values() which is undefined in new python. However, with not much work one could get the order of the components to also match the order of the original sequence in such a way that for a sorted input sequence the output will be sorted.

@tornaria
Copy link
Contributor

Something like this:

def connected_components(self):

    # precompute the list of variables in each polynomial
    vss = [f.variables() for f in self]

    # Use a union-find data structure to encode relationships between
    # variables, i.e., that they belong to a same polynomial
    from sage.sets.disjoint_set import DisjointSet
    DS = DisjointSet(set().union(*vss))
    for u, *vs in vss:
        for v in vs:
            DS.union(u, v)

    Ps = {} # map root element -> polynomials in this component
    rs = [] # sorted list of root elements
    for f, vs in zip(self, vss):
        r = DS.find(vs[0])
        try:
            Ps[r].append(f)
        except KeyError:
            rs.append(r)
            Ps[r] = [f]

    return [PolynomialSequence(self.ring(), Ps[r]) for r in rs]

tornaria added a commit to tornaria/sage that referenced this pull request Apr 15, 2023
Simplify the code; also, the previous code has to iterate over variables
of the sequence twice (this is really bad before sagemath#35510)

Moreover, now we add a clique between the variables of each polynomial,
so it agrees with the description (the code before used to add just a
spanning tree of the clique -- a star).

This makes this method a little bit slower for the purposes of
`connected_components()` (for which adding a star is equivalent).

However, sagemath#35518 will rewrite `connected_components()` without using
`connection_graph()` so this is not really a problem.
@dcoudert
Copy link
Contributor Author

I followed the proposal of @tornaria but I let the final sorted(...) statements before returning the solution. This is to satisfy the doctest of line 79, and so to let the behavior of the method unchanged.

@tornaria
Copy link
Contributor

I followed the proposal of @tornaria but I let the final sorted(...) statements before returning the solution. This is to satisfy the doctest of line 79, and so to let the behavior of the method unchanged.

IMO what's important is that the function is deterministic, not that it sorts. The way I proposed, the order of the output is determined by the order of the input. The doctest in line 79 can be fixed by

--- a/src/sage/rings/polynomial/multi_polynomial_sequence.py
+++ b/src/sage/rings/polynomial/multi_polynomial_sequence.py
@@ -76,7 +76,7 @@ pair and study it::
 
 We separate the system in independent subsystems::
 
-    sage: C = Sequence(r2).connected_components(); C
+    sage: C = Sequence(sorted(r2)).connected_components(); C
     [[w213 + k113 + x111 + x112 + x113,
      w212 + k112 + x110 + x111 + x112 + 1,
      w211 + k111 + x110 + x111 + x113 + 1,

It may be preferable just not so sort here but fix the output in the doctest to match whatever comes out. If you look at the output of r2 in line 43, you'll see that the order of the connected components is different... Much nicer IMO if the order of the input system is preserved.

@dcoudert
Copy link
Contributor Author

Now we don't sort anymore in connected_components. The ordering of the result is the order in which keys are inserted in Ps.

@github-actions
Copy link

Documentation preview for this PR is ready! 🎉
Built with commit: 9bc14b3

@tornaria
Copy link
Contributor

Now we don't sort anymore in connected_components. The ordering of the result is the order in which keys are inserted in Ps.

I had to look this up: "Changed in version 3.7: Dictionary order is guaranteed to be insertion order." https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects

May I suggest the following tests to check the order of the output:

sage: R.<x,y,z> = PolynomialRing(ZZ)
sage: Sequence([x,z,y]).connected_components()
[[x], [z], [y]]
sage: Sequence([x,z,x*y*z,y]).connected_components()
[[x, z, x*y*z, y]]

Also, my preference would be to not sort in line 79 and change the output, the sequences will be in the same order as in the output of line 43, so it's arguably more natural. But whatever you prefer is ok.

@dcoudert
Copy link
Contributor Author

I did the proposed changes.

@vbraun vbraun merged commit cc0ea4d into sagemath:develop Apr 23, 2023
7 of 8 checks passed
vbraun pushed a commit that referenced this pull request Apr 23, 2023
    
Simplify the code; also, the previous code has to iterate over variables
of the sequence twice (this is really bad before #35510).

This affects mainly the method `connected_components()`.

EDIT:

Moreover, now we add a clique between the variables of each polynomial,
so it agrees with the description (the code before used to add just a
spanning tree of the clique -- a star).

This makes this method a little bit slower for the purposes of
`connected_components()` (for which adding a star is equivalent).

However, #35518 will rewrite `connected_components()` without using
`connection_graph()` so this is not really a problem.


### 📝 Checklist

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] New tests added to check that a clique is added for the variables
in each polynomial.

### Dependencies
 - #35511
    
URL: #35512
Reported by: Gonzalo Tornaría
Reviewer(s): David Coudert, Gonzalo Tornaría, Vincent Delecroix
@mkoeppe mkoeppe added this to the sage-10.0 milestone Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants