Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading maximum n value in docstring of matrix_modn_dense_float.pyx #35365

Closed
crnsh opened this issue Mar 27, 2023 · 6 comments · Fixed by #35752
Closed

Misleading maximum n value in docstring of matrix_modn_dense_float.pyx #35365

crnsh opened this issue Mar 27, 2023 · 6 comments · Fixed by #35752

Comments

@crnsh
Copy link
Contributor

crnsh commented Mar 27, 2023

In matrix_modn_dense_float.pyx, MAX_MODULUS is set as 2**8 in line 41, and MAX_MODULUS is used to create a Matrix_modn_dense_template object. An overflow error is raised if n >= MAX_MODULUS. However, the docstring states that n values upto 2**11 are supported. The documentation should clearly mention that because LinBox runs really low on values greater than 2**8, it is set as a hard limit.

Interestingly, this is only the case for matrix_modn_dense_float.pyx and not matrix_modn_dense_double.pyx. In the double variant, MAX_MODULUS = 2**23 and is consistent with the docstring and LinBox's upper limit. Is the double variant not slow? Does LinBox only bottleneck on the float variant? I'll do some experiments tomorrow and try to figure this out, but if somebody else has worked with this, do tell.

@vneiger
Copy link
Contributor

vneiger commented Apr 10, 2023

Any new insight from the experiments?

Having a reduced performance is expected when approaching the maximum acceptable modulus value for the used techniques to be applicable. This would not happen for the double variant because 23 is actually already a bit lower than the limit, which must be between 26 and 27 in this case.

The documentation should be improved indeed. Would you have an explicit code fragment which raises an issue (overflow error)?

@crnsh
Copy link
Contributor Author

crnsh commented Apr 11, 2023

Here are some preliminary experiments I ran. I'll run some more experiments with other functions soon.

sage: %timeit A = random_matrix(GF(2**5), 1000, 1000); A.echelonize()
26.6 ms ± 984 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit A = random_matrix(GF(2**6), 1000, 1000); A.echelonize()
29.5 ms ± 562 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit A = random_matrix(GF(2**7), 1000, 1000); A.echelonize()
33.6 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit A = random_matrix(GF(2**8), 1000, 1000); A.echelonize()
50.6 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit A = random_matrix(GF(2**9), 1000, 1000); A.echelonize()
436 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit A = random_matrix(GF(2**10), 1000, 1000); A.echelonize()
463 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

sage: %timeit A = random_matrix(GF(2**11), 1000, 1000); A.echelonize()
511 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

There's a big spike after 2**8 which is consistent with what we've been warned about. Nothing seems to be giving an overflow error yet. Nothing new yet, but I'll keep looking.

@vneiger
Copy link
Contributor

vneiger commented Apr 11, 2023

Although it is indeed the expected threshold phenomenon that occurs, these cases are over non-prime fields and so are in fact not relevant for matrix_modn_dense_* classes (they would rather rely on M4RIE as indicated in the doc).

Doing similar experiments but with prime fields (and taking the random generation out of the timings):

# mats[k] is over prime field with k-bit modulus
sage: mats = [0]*5 + [random_matrix(GF(previous_prime(2**k)), 1000, 1000) for k in range(5,21)]

sage: %timeit B = copy(mats[7]); B.echelonize()
16 ms ± 87.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

sage: %timeit B = copy(mats[8]); B.echelonize()
16.9 ms ± 173 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

sage: %timeit B = copy(mats[9]); B.echelonize()
43.7 ms ± 273 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit B = copy(mats[10]); B.echelonize()
50.2 ms ± 755 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit B = copy(mats[11]); B.echelonize()
50.2 ms ± 275 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit B = copy(mats[12]); B.echelonize()
50.5 ms ± 694 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

So we do see the expected behaviour change when going from 8-bit modulus to 9-bit and more. But pushing a bit further...

sage: %timeit B = copy(mats[13]); B.echelonize()
49.7 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

sage: %timeit B = copy(mats[20]); B.echelonize()
50.1 ms ± 483 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Timings are the same up to 20-bit moduli (and about the same up to 23, in fact), for a good reason:

sage: type(mats[8])
<class 'sage.matrix.matrix_modn_dense_float.Matrix_modn_dense_float'>

sage: type(mats[9])
<class 'sage.matrix.matrix_modn_dense_double.Matrix_modn_dense_double'>

as soon as we exceed 8 bits, we switch to the double variant.

This is why I was asking about any actual detected issue related to MAX_MODULUS when running code. If there is none, then fixing the doc should be enough.

@crnsh
Copy link
Contributor Author

crnsh commented Apr 23, 2023

Ah okay. So in essence matrix_modn_dense_float switches to the double variant for any prime field above 2**8. The double variant goes upto 2**23 even though the maximum limit is higher. Would it be wise to mention in the documentation of the double variant that the maximum limit is higher but is restricted due to severe performance issues?

In this particular case, the documentation should mention that because LinBox runs really slow on values greater than 2**8 despite supporting upto 2**11, it uses the double variant which is faster than the float variant in the cases 2**8 < n < 2**11.

Also, since matrices can be over any arbitrary ring and Z/nZ is a ring when n is not prime, shouldn't the documentation for matrix_modn_dense_* mention that that the class is used only when n is prime? Currently the documentation says "Dense matrices over Z/nZ for n < 2**23", and this implies both prime and non-prime n.

@vneiger
Copy link
Contributor

vneiger commented Apr 23, 2023

Ah okay. So in essence matrix_modn_dense_float switches to the double variant for any prime field above 2**8. The double variant goes upto 2**23 even though the maximum limit is higher.

Yes, exactly. Beyond 23 bits, SageMath switches to "generic dense matrix".

Would it be wise to mention in the documentation of the double variant that the maximum limit is higher but is restricted due to severe performance issues?

The documentation is really about SageMath, not about details of the underlying libraries, so I don't think that the fact that LinBox can support primes that are more than 23 bits (but this is not offered through SageMath) is relevant information in SageMath's doc. However it is something that could be raised among developers: is this 23-bit limit still justified today? (I assume it was at the time this choice was made)

In this particular case, the documentation should mention that because LinBox runs really slow on values greater than 2**8 despite supporting upto 2**11, it uses the double variant which is faster than the float variant in the cases 2**8 < n < 2**11.

Yes, the documentation should be clearer on this point. I would suggest to make it simple, saying that the switch from float to double variant is chosen for performance reasons (in other words I would avoid negative statements like "runs really slow", since there are very good reasons behind the fact that the float variant is slower for e.g. 10 bit primes, and this phenomenon is general, it has nothing to do with LinBox in particular).

Also, since matrices can be over any arbitrary ring and Z/nZ is a ring when n is not prime, shouldn't the documentation for matrix_modn_dense_* mention that that the class is used only when n is prime? Currently the documentation says "Dense matrices over Z/nZ for n < 2**23", and this implies both prime and non-prime n.

Oh, I see my previous comment was probably misleading on this point; it focused on prime fields because this was in response to your experiments which were over a field. The current documentation is correct and non-prime integer moduli n are indeed supported. The problem in your earlier message (mentioning experiments) is that matrices were not over some Z/nZ but over a finite field with cardinality 2**e for some e > 1.

@vneiger vneiger assigned vneiger and unassigned vneiger May 16, 2023
@vneiger
Copy link
Contributor

vneiger commented May 16, 2023

@marizee

vbraun pushed a commit that referenced this issue Jun 21, 2023
    
<!-- Please provide a concise, informative and self-explanatory title.
-->
<!-- Don't put issue numbers in the title. Put it in the Description
below. -->
<!-- For example, instead of "Fixes #12345", use "Add a new method to
multiply two integers" -->

### 📚 Description

<!-- Describe your changes here in detail. -->
Improves the documentation to give more precisions on the upper limit
MAX_MODULUS of `matrix_modn_dense_float.pyx` for performances sake.
<!-- Why is this change required? What problem does it solve? -->
<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes #12345". --> Fixes #35365
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. It should be `[x]` not `[x
]`. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [x] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- #12345: short description why this is a dependency
- #34567: ...
-->

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: #35752
Reported by: Marie Bonboire
Reviewer(s): Vincent Neiger
@mkoeppe mkoeppe added this to the sage-10.1 milestone Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants