Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up IntegerMatrix #26237

Closed
tscrim opened this issue Sep 11, 2018 · 57 comments
Closed

Speed up IntegerMatrix #26237

tscrim opened this issue Sep 11, 2018 · 57 comments

Comments

@tscrim
Copy link
Collaborator

tscrim commented Sep 11, 2018

We explicitly declare things to be int type so Cython can generate better C code.

CC: @sagetrac-Stefan @sagetrac-Rudi @sagetrac-yomcat @sagetrac-msaaltink

Component: matroid theory

Author: Travis Scrimshaw

Branch/Commit: 3c0195b

Reviewer: Daniel Krenn, Jeroen Demeyer, Stefan Van Zwam, Rudi Pendavingh

Issue created by migration from https://trac.sagemath.org/ticket/26237

@tscrim tscrim added this to the sage-8.4 milestone Sep 11, 2018
@jdemeyer
Copy link

comment:1

Works for me:

sage: from sage.matroids.linear_matroid import LinearMatroid
sage: from sage.matroids.lean_matrix import IntegerMatrix
sage: M = LinearMatroid(matrix=IntegerMatrix(4,3,matrix([[0,0,0],[0,0,0],[0,1,1],[1,0,1]])))
sage: M.circuits()
Iterator over a system of subsets

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:2

Hmm...strange. Let me try from a fresh session to see if I somehow unintentionally corrupted things.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:3

Okay, I had one little seemingly innocuous change: I added an explicit return type int on the corresponding get method of IntegerMatrix. Once I removed that, it worked.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:4

So what was going wrong is that there is a bint is_nonzero method which simply was return self.get(r, c), and Cython could somehow not automatically convert that to a boolean check. Would that count as a Cython bug?

If it is not, I might also recycle this ticket with some Cython tweaks to IntegerMatrix to try and squeeze a bit more speed out of it.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:5

Well, there is a bug somewhere (on vanilla 8.4.beta4):

sage: M = LinearMatroid(matrix=IntegerMatrix(4,4,matrix([[0,0,0,0],[1,1,1,1],[0,1,1,2],[0,0,1,0]])))
sage: list(M.bases())
[frozenset({0, 1, 2}), frozenset({0, 2, 3})]

sage: Mp = LinearMatroid(matrix=GenericMatrix(4,4,matrix([[0,0,0,0],[1,1,1,1],[0,1,1,2],[0,0,1,0]])))
sage: list(Mp.bases())
[frozenset({0, 1, 2}), frozenset({0, 2, 3}), frozenset({1, 2, 3})]

@jdemeyer
Copy link

comment:6

Replying to @tscrim:

So what was going wrong is that there is a bint is_nonzero method which simply was return self.get(r, c), and Cython could somehow not automatically convert that to a boolean check.

I'm not following here... what do you mean with "Cython could somehow not automatically convert that to a boolean check"?

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:7

Yes, that is correct (well, an int to a bint to be specific).

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:8

At least, once I changed that to an explicit check != 0, it fixed the problem.

@sagetrac-Stefan
Copy link
Mannequin

sagetrac-Stefan mannequin commented Sep 11, 2018

comment:9

Replying to @tscrim:

Well, there is a bug somewhere (on vanilla 8.4.beta4):

sage: M = LinearMatroid(matrix=IntegerMatrix(4,4,matrix([[0,0,0,0],[1,1,1,1],[0,1,1,2],[0,0,1,0]])))
sage: list(M.bases())
[frozenset({0, 1, 2}), frozenset({0, 2, 3})]

sage: Mp = LinearMatroid(matrix=GenericMatrix(4,4,matrix([[0,0,0,0],[1,1,1,1],[0,1,1,2],[0,0,1,0]])))
sage: list(Mp.bases())
[frozenset({0, 1, 2}), frozenset({0, 2, 3}), frozenset({1, 2, 3})]

Most (all?) LinearMatroid methods assume that we are allowed to pivot in the matrix, and do so without leaving the ring. Maybe that's what went wrong?

@jdemeyer
Copy link

comment:10

Replying to @tscrim:

At least, once I changed that to an explicit check != 0, it fixed the problem.

It's still not clear which check you mean and what the problem really is...

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Sep 11, 2018

comment:11

The method GenericMatrix.is_nonzero seems to be called only by LeanMatrix.gauss_jordan_reduce() and LeanMatrix.nonzero_positions_in row().

LeanMatrix.pivot() does not even use is_nonzero(), but ‘s=self.get_unsafe(i,y)’ followed by ‘if s and ..’ (line 303 of lean_matrix.pyx). Apparently the conversion to a bool implicit in ‘if s’ does work in that context. It may be more effcient too, since depending on the base ring evaluating s!=0 may involve casting 0 as a ring element. I’m quite sure that for finite fields ‘if s’ is about 5 times faster than ‘if (s!=0)’.

So it may also be a solution to rewrite LeanMatrix.gauss_jordan_reduce() and LeanMatrix.nonzero_positions_in row() in the more efficient way used in pivot(), completely avoiding the use of is_nonzero().

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:12

Replying to @jdemeyer:

Replying to @tscrim:

At least, once I changed that to an explicit check != 0, it fixed the problem.

It's still not clear which check you mean and what the problem really is...

Sorry, I was trying to answer quickly while I was out.

So when I explicitly tell Cython that get should return an int, then I have to do this change

     cdef bint is_nonzero(self, long r, long c) except -2:   # Not a Sage matrix operation
-        return self.get(r, c)
+        return self.get(r, c) != 0

otherwise I get the original error message in the ticket description.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:13

Replying to @sagetrac-Stefan:

Replying to @tscrim:

Well, there is a bug somewhere (on vanilla 8.4.beta4):

sage: M = LinearMatroid(matrix=IntegerMatrix(4,4,matrix([[0,0,0,0],[1,1,1,1],[0,1,1,2],[0,0,1,0]])))
sage: list(M.bases())
[frozenset({0, 1, 2}), frozenset({0, 2, 3})]

sage: Mp = LinearMatroid(matrix=GenericMatrix(4,4,matrix([[0,0,0,0],[1,1,1,1],[0,1,1,2],[0,0,1,0]])))
sage: list(Mp.bases())
[frozenset({0, 1, 2}), frozenset({0, 2, 3}), frozenset({1, 2, 3})]

Most (all?) LinearMatroid methods assume that we are allowed to pivot in the matrix, and do so without leaving the ring. Maybe that's what went wrong?

But in the latter case, the matrix is over ZZ, so I would not think that is the problem.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 11, 2018

comment:14

Replying to @sagetrac-Rudi:

The method GenericMatrix.is_nonzero seems to be called only by LeanMatrix.gauss_jordan_reduce() and LeanMatrix.nonzero_positions_in row().

LeanMatrix.pivot() does not even use is_nonzero(), but ‘s=self.get_unsafe(i,y)’ followed by ‘if s and ..’ (line 303 of lean_matrix.pyx). Apparently the conversion to a bool implicit in ‘if s’ does work in that context. It may be more effcient too, since depending on the base ring evaluating s!=0 may involve casting 0 as a ring element. I’m quite sure that for finite fields ‘if s’ is about 5 times faster than ‘if (s!=0)’.

Yes, as you surmised, the s != 0 is slower because invokes the coercion framework. So if s is the best thing to do for speed in general. It is just in this case that Cython does not seem to be converting an int (from an inline function) into a bint.

So it may also be a solution to rewrite LeanMatrix.gauss_jordan_reduce() and LeanMatrix.nonzero_positions_in row() in the more efficient way used in pivot(), completely avoiding the use of is_nonzero().

What I was hoping to do was take advantage of the known return types in the integer matrix to improve the Cython code. With the explicit casting, my computation goes from 13.6s to 3s.

diff --git a/src/sage/matroids/lean_matrix.pxd b/src/sage/matroids/lean_matrix.pxd
index c284f6a..122dab1 100644
--- a/src/sage/matroids/lean_matrix.pxd
+++ b/src/sage/matroids/lean_matrix.pxd
@@ -102,7 +102,7 @@ cdef class QuaternaryMatrix(LeanMatrix):
 cdef class IntegerMatrix(LeanMatrix):
     cdef int* _entries
 
-    cdef inline get(self, long r, long c)   # Not a Sage matrix operation
+    cdef inline int get(self, long r, long c)   # Not a Sage matrix operation
     cdef inline void set(self, long r, long c, int x)   # Not a Sage matrix operation
 
     cdef inline long row_len(self, long i) except -1   # Not a Sage matrix operation
diff --git a/src/sage/matroids/lean_matrix.pyx b/src/sage/matroids/lean_matrix.pyx
index 93a32bc..d6022c2 100644
--- a/src/sage/matroids/lean_matrix.pyx
+++ b/src/sage/matroids/lean_matrix.pyx
@@ -2844,7 +2844,7 @@ cdef class IntegerMatrix(LeanMatrix):
         """
         return "IntegerMatrix instance with " + str(self._nrows) + " rows and " + str(self._ncols) + " columns"
 
-    cdef inline get(self, long r, long c):   # Not a Sage matrix operation
+    cdef inline int get(self, long r, long c):   # Not a Sage matrix operation
         return self._entries[r * self._ncols + c]
 
     cdef inline void set(self, long r, long c, int x):   # Not a Sage matrix operation
@@ -2877,7 +2877,7 @@ cdef class IntegerMatrix(LeanMatrix):
         return 0
 
     cdef bint is_nonzero(self, long r, long c) except -2:   # Not a Sage matrix operation
-        return self.get(r, c)
+        return self.get(r, c) != 0
 
     cdef LeanMatrix copy(self):   # Deprecated Sage matrix operation
         cdef IntegerMatrix M = IntegerMatrix(self._nrows, self._ncols)
@@ -2982,12 +2982,14 @@ cdef class IntegerMatrix(LeanMatrix):
         ignored.
         """
         cdef long i
+        cdef int sval
         if s is None:
             for i from 0 <= i < self._ncols:
                 self.set(x, i, self.get(x, i) + self.get(y, i))
         else:
+            sval = int(s)
             for i from 0 <= i < self._ncols:
-                self.set(x, i, self.get(x, i) + s * self.get(y, i))
+                self.set(x, i, self.get(x, i) + sval * self.get(y, i))
         return 0
 
     cdef int swap_rows_c(self, long x, long y) except -1:
@@ -3010,8 +3012,9 @@ cdef class IntegerMatrix(LeanMatrix):
         compatibility, and is ignored.
         """
         cdef long i
+        cdef int sval = int(s)
         for i from 0 <= i < self._ncols:
-            self.set(x, i, s * self.get(x, i))
+            self.set(x, i, sval * self.get(x, i))
         return 0
 
     cdef int rescale_column_c(self, long y, s, bint start_row) except -1:
@@ -3020,8 +3023,9 @@ cdef class IntegerMatrix(LeanMatrix):
         compatibility, and is ignored.
         """
         cdef long j
+        cdef int sval = int(s)
         for j from 0 <= j < self._nrows:
-            self.set(j, y, self.get(j, y) * s)
+            self.set(j, y, self.get(j, y) * sval)
         return 0
 
     cdef int pivot(self, long x, long y) except -1:   # Not a Sage matrix operation

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Sep 12, 2018

comment:15

Replying to @tscrim:

Replying to @sagetrac-Rudi:
What I was hoping to do was take advantage of the known return types in the integer matrix to improve the Cython code. With the explicit casting, my computation goes from 13.6s to 3s.

That looks like a good solution. Great!

Could GenericMatrix perhaps also benefit from such explicit casting? Anything that improves the efficiency of row operations should have an immediate effect on overall efficiency of linear matroid code.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 12, 2018

comment:16

Replying to @sagetrac-Rudi:

Replying to @tscrim:

Replying to @sagetrac-Rudi:
What I was hoping to do was take advantage of the known return types in the integer matrix to improve the Cython code. With the explicit casting, my computation goes from 13.6s to 3s.

That looks like a good solution. Great!

Could GenericMatrix perhaps also benefit from such explicit casting? Anything that improves the efficiency of row operations should have an immediate effect on overall efficiency of linear matroid code.

You cannot explicitly cast to something you don't know. Plus the arithmetic operations being done for IntegerMatrix are explicitly as C integers and not going through Python (much less Sage) at all. So I don't see how.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 12, 2018

comment:17

I guess something you could do is also special case QQ and go directly with the C data using the mpq commands. It looks like the GF(2) and GF(3) matrices are already quite specialized. At least I would think that is the next most common field to work over (other than maybe GF(4)).

@jdemeyer
Copy link

comment:18

Replying to @tscrim:

So when I explicitly tell Cython that get should return an int

First of all, that's a bad idea: a C int has limited range (typically 32 bits) and you cannot guarantee that all entries fit in that.

then I have to do this change

     cdef bint is_nonzero(self, long r, long c) except -2:   # Not a Sage matrix operation
-        return self.get(r, c)
+        return self.get(r, c) != 0

otherwise I get the original error message in the ticket description.

I don't consider that a bug. The reason is that the conversion of a Python object to a bint essentially translates to int(bool(obj)) while the conversion to an int translates to int(obj). These are obviously different. But once on the C level, the types int and bint are exactly the same, so no conversion is done when casting int to bint.

@jdemeyer
Copy link

comment:19

So the solution for really converting an int to a bint in Cython is indeed adding an explicit != 0.

@jdemeyer
Copy link

comment:20

Should I close this ticket or do you want to recycle it?

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 12, 2018

comment:21

I will recycle this, at least for the speedups to IntegerMatrix, but maybe with some more tweaks or improvements or an implementation of a RationalMatrix.

However, the fact that bint and int are the same, then why do I get the error I am getting when I do not explicitly make it a bint with the != 0? From what you're saying, it should not result in an error. Unless there is some code elsewhere secretly expecting the result to be 0 or 1, which seems unlikely (or is bint allowed to take on different values)?

I am guessing instead of int, you would say we should have them all be Py_ssize_t?

@jdemeyer

This comment has been minimized.

@jdemeyer jdemeyer changed the title SystemError with LinearMatroid with an IntegerMatrix Speed up IntegerMatrix Sep 12, 2018
@jdemeyer
Copy link

comment:23

Replying to @tscrim:

However, the fact that bint and int are the same, then why do I get the error I am getting when I do not explicitly make it a bint with the != 0? From what you're saying, it should not result in an error. Unless there is some code elsewhere secretly expecting the result to be 0 or 1, which seems unlikely (or is bint allowed to take on different values)?

Suppose that an entry in the matrix equals the Python integer -2. Currently (without applying any changes): when calling is_nonzero() on that -2, the return value is int(bool(-2)) = 1. With your changes, the return value for the same is_nonzero() call is just -2. The Cython wrapper interprets this -2 as an exception return value (that is what except -2 does). But there hasn't been an exception raised, which gives the SystemError.

@jdemeyer
Copy link

comment:24

Replying to @tscrim:

I am guessing instead of int, you would say we should have them all be Py_ssize_t?

I just looked at the Cython sources for lean_matrix.pyx and IntegerMatrix does indeed use int internally. Therefore, returning int in get() is correct in that sense.

However... this just shows that using IntegerMatrix in general looks dangerous: operations can overflow. This is actually documented in the docstring of IntegerMatrix:

        This class is mainly intended for use with the RegularMatroid class,
        so entries are assumed to be small integers. No
        overflow checking takes place!

@sagetrac-Stefan
Copy link
Mannequin

sagetrac-Stefan mannequin commented Sep 12, 2018

comment:25

Thing is, the LeanMatrix classes are internal datatypes. Regular matroids have matrices with entries equal to -1, 0, 1, and are such that pivoting preserves that property. For a potentially regular matroid on which you want to run is_valid(), we check this condition through pivoting, which means we get to a "bad subdeterminant" of size at most 2x2, hardly enough to cause overflows if the entries are 0,1, or -1.

We took some effort to keep the LeanMatrix away from the end user (LinearMatroid.Representation returns a Sage matrix, for instance, and the datatypes don't get imported into the Sage namespace). Speed is of the essence here, since we create and destroy tons of these (Sage matrices have a lot of overhead at creation time), and we do tons of row operations on them. If you dig deep enough to use IntegerMatrix at this spot in the code, you'll have some idea of what you're getting yourself into.

Again, since you want to use IntegerMatrix in the LinearMatroid subclasses, and since you need a matrix where pivoting is safe, really the only use case is Regular Matroids where overflows are never an issue.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 13, 2018

comment:33

Done. I didn't know that. Thanks.

@jdemeyer
Copy link

comment:34

You will see that no function ever returns an mpq_t (or mpz_t or whatever). The typical calling convention is passing a return argument, for example the declaration of mpq_add:

void mpq_add(mpq_t sum, mpq_t addend1, mpq_t addend2)

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 13, 2018

comment:35

Yea, I noticed that and so I scrapped the get in place of the index for better maintainability should the internal ordering change. However, I do have some functions where I am passing an mpq_t, e.g., in pivot, and this is not working as I would expect. Also, I had tried to do something like

cdef mpq_t s = self._entries[self.index(i,j)]
mpq_add(s, s, t)

where t is some other mpq_t, but that wasn't working like I expected either.

@jdemeyer
Copy link

comment:36

Can you move "Implement RationalMatrix" to a new ticket instead?

The commits that you added here to improve IntegerMatrix make sense independently.

@jdemeyer
Copy link

comment:37

Replying to @tscrim:

I think this needs to be documented at the very least (which I will do here). I might also like to change the name of IntegerMatrix to perhaps UnitIntegerMatrix or PlusMinusOneMatrix. Thoughts?

+1 to PlusMinusOneMatrix.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 13, 2018

comment:38

Replying to @jdemeyer:

Can you move "Implement RationalMatrix" to a new ticket instead?

The commits that you added here to improve IntegerMatrix make sense independently.

Yep, I can do that. I will do that when I get into my office tomorrow.

@tscrim

This comment has been minimized.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 13, 2018

comment:39

#26269 for the RationalMatrix implementation.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 14, 2018

Changed commit from 7effef0 to fb64e43

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 14, 2018

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

fb64e43Removing superfluous int's.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 14, 2018

comment:41

I've split the ticket into the two parts. This part is now ready for review.

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 14, 2018

comment:42

(Essentially) Green patchbot.

@jdemeyer
Copy link

comment:43

So how about renaming to PlusMinusOneMatrix? I'd like the opinion of Stefan on that.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 16, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

3c0195bRenaming IntegerMatrix to PlusMinusOneMatrix.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Sep 16, 2018

Changed commit from fb64e43 to 3c0195b

@tscrim
Copy link
Collaborator Author

tscrim commented Sep 16, 2018

comment:45

I did the refactoring. Stefan, Rudi, any objections?

@sagetrac-Stefan
Copy link
Mannequin

sagetrac-Stefan mannequin commented Sep 17, 2018

comment:46

I am happy with the new name. Don’t have time for a long review.

@dkrenn
Copy link
Contributor

dkrenn commented Mar 27, 2019

comment:47

I've looked through the code and I am happy with it, so LGTM.

However contributed to this ticket (mainly review), please insert your name in the reviewer field.

@dkrenn
Copy link
Contributor

dkrenn commented Mar 27, 2019

Reviewer: Daniel Krenn

@tscrim
Copy link
Collaborator Author

tscrim commented Mar 27, 2019

comment:48

Stefan, Rudi I added you as reviewers based on our conversions above. Jeroen is obviously a reviewer for looking at the Cython code (as well as his additional useful insights).

@tscrim
Copy link
Collaborator Author

tscrim commented Mar 27, 2019

Changed reviewer from Daniel Krenn to Daniel Krenn, Jeroen Demeyer, Stefan Van Zwam, Rudi Pendavingh

@tscrim tscrim modified the milestones: sage-8.4, sage-8.8 Mar 27, 2019
@vbraun
Copy link
Member

vbraun commented Mar 29, 2019

Changed branch from public/matroids/speedup_integer_matrix-26237 to 3c0195b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants