Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow doctests or mark # long time #35443

Merged
merged 12 commits into from
Apr 23, 2023
Merged

Conversation

tornaria
Copy link
Contributor

@tornaria tornaria commented Apr 5, 2023

馃摎 Description

A test is supposed to take < 1s or else be marked # long time.

Here we consider slow tests taking >> 10s. When possible we fix or change the test to take less time, otherwise we just mark the test as long time. Occasionally we create a new smaller test and keep the original one as long.

After this and #35442 the slowest tests are a few taking ~ 10s.
The total time to doctest all goes down from 880 to 806 seconds (using -tp 8 --all).

NOTE: there's a minor merge conflict with #35314 which I will resolve once that PR is merged.

馃摑 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.

@tornaria
Copy link
Contributor Author

tornaria commented Apr 6, 2023

  • rebased to 10.0.beta8
  • added a new commit which should fix some more tests taking > 5s

@tornaria
Copy link
Contributor Author

Rebased to 10.0.beta9 and added more commits.

This is long but it should be easy to review since it's mostly adding # long time labels here and there. In the gh "files changed" tab it's very easy to see this lines.

A few changes are either reducing the size of the test, or fixing so it takes less time and it doesn't need to be marked long time.

If it is easier to review, I could either
(a) separate the changes just adding # long time from the few other changes.
(b) go through the review process myself and add a comment explaining each change that is not just adding # long time.

With this PR + a few more changes that I will PR separately, I have no tests taking more than ~ 5s. This saves ~10-15% of total test time (from 215s to 187s with -tp 32). Bear in mind that some tests are much faster with -tp1 than with -tp32.

There are still ~ 700 tests taking more than ~ 1s, but I will stop here.

@@ -298,6 +298,7 @@ def __init__(self, n, q, D, secret_dist='uniform', m=None):

sage: from numpy import std
sage: while abs(std([e if e <= 200 else e-401 for e in S()]) - 3.0) > 0.01:
....: L = [] # reset L to avoid quadratic behaviour
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the idea of this test that by increasing the number of samples, the error bound will be hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. To be honest I'm not sure what is the role of this test, but the previous implementation exhibits quadratic behavior which is the cause of this test usually being ok but sometimes being very slow:

sage -t --warn-long --random-seed=110988274722243807127083377606682083581 src/sage/crypto/lwe.py
**********************************************************************
File "src/sage/crypto/lwe.py", line 300, in sage.crypto.lwe.LWE.__init__
Warning, slow doctest:
    while abs(std([e if e <= 200 else e-401 for e in S()]) - 3.0) > 0.01:
        add_samples()
Test ran for 16.66 s, check ran for 0.00 s
    [112 tests, 17.29 s]

vs.

sage -t --warn-long 0.4 --random-seed=1 src/sage/crypto/lwe.py
**********************************************************************
File "src/sage/crypto/lwe.py", line 300, in sage.crypto.lwe.LWE.__init__
Warning, slow doctest:
    while abs(std([e if e <= 200 else e-401 for e in S()]) - 3.0) > 0.01:
        add_samples()
Test ran for 0.47 s, check ran for 0.00 s
    [112 tests, 1.37 s]

As far as I understand, they want to show that these samples indeed have a normal distribution with standard deviation 3.0. They take 1000 samples and want the standard deviation of these to be close to 3.0. Otherwise they keep adding samples, etc. until the standard deviation of the samples is indeed close to 3.0.

However, the way this is implemented it becomes O(n^2) when they have to try 1000n samples.

With my change, instead of adding more samples, we take a new set of 1000 samples. In this way, trying 1000n samples is O(n). So, even if we have to try more times, this is better.

Moreover, now this is linear instead of quadratic, it's even faster too try a sample of 100.

Summary: the new way the test is done, it keeps computing the standard deviation of 100 samples until it's really close to 3.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you are happy with my explanation. Otherwise, I'll revert and place a # long time label (although I'd be more inclined to just nuke the test).

I think this was the only non-cosmetic objection you had (and the cosmetic ones are more or less all addressed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be more comfortable if we just get rid of the while loop, compute and print the std and mark the result as random

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, your solution is of course fine.

src/sage/rings/tests.py Outdated Show resolved Hide resolved
@orlitzky
Copy link
Contributor

With this PR + a few more changes that I will PR separately, I have no tests taking more than ~ 5s. This saves ~10-15% of total test time (from 215s to 187s with -tp 32). Bear in mind that some tests are much faster with -tp1 than with -tp32.

It takes several hours for me to run the test suite without --long, which really emphasizes how much of a losing battle this is while the threshold is measured in wall time and not cpu time. And that's with many files timing out completely (#32973).

NB: now that we've moved to Github, our notifications are once again being sent through SendGrid who regularly and intentionally violate the mail RFCs to delete my notifications (https://www.mail-archive.com/sage-devel@googlegroups.com/msg88600.html). Please keep that in mind if you ever want to draw my attention to a ticket.

@tornaria
Copy link
Contributor Author

@mkoeppe Thanks for your review. I added your suggestions. Also a minor change to a doctest suggested by codecov (it turns out I changed one line of code in src/sage/plot/animate.py because the method apng() was using an incorrect filename (tmp_filename('.png') instead of the correct tmp_filename('.png')). As a matter of fact, the doctest in line 46 tests for this, but this doesn't seem to satisfy codecov, so I modified a doctest in line 1046 to test this change.

@tornaria
Copy link
Contributor Author

With this PR + a few more changes that I will PR separately, I have no tests taking more than ~ 5s. This saves ~10-15% of total test time (from 215s to 187s with -tp 32). Bear in mind that some tests are much faster with -tp1 than with -tp32.

It takes several hours for me to run the test suite without --long, which really emphasizes how much of a losing battle this is while the threshold is measured in wall time and not cpu time. And that's with many files timing out completely (#32973).

For me it is now taking 4786 cputime seconds, or 187 wall time (using -tp 32 on a 36 core / 72 threads box).
This is down from 5362 cputime seconds (211 wall time) on a clean 10.0.beta8 checkout.

NB: now that we've moved to Github, our notifications are once again being sent through SendGrid who regularly and intentionally violate the mail RFCs to delete my notifications (https://www.mail-archive.com/sage-devel@googlegroups.com/msg88600.html). Please keep that in mind if you ever want to draw my attention to a ticket.

I'm sorry about that. EEE at work.

sage: L.<b> = K.extension(x^2 + 26) # optional - sage.rings.number_field
sage: EL = E.change_ring(L) # optional - sage.rings.number_field
sage: iso2 = EL.isogenies_prime_degree(2); len(iso2) # optional - sage.rings.number_field
sage: pol = NumberField(pol26,'a').optimized_representation()[0].polynomial() # optional - sage.rings.number_field, long time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks... I can do a grep and find all of those. Is there a style guide about this? I was often unsure which column to place the first #, separation between labels, etc. The only rule I know is that the first # needs to have 2 spaces before. Other than that, every convention I could think about is represented in some part of the code...

E.g. some places do # optional - A # optional - B but other places do # optional - A B, etc.

I'm not even sure about the "legal" syntax, much less about the "preferred" style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely add a style guide for this; and a linter/fixer for these would probably also be useful (see #35401).

Both forms are correct. The style # optional - A # optional - B in many places comes from using my simple editor macros.

src/sage/plot/animate.py Outdated Show resolved Hide resolved
@tornaria
Copy link
Contributor Author

@mkoeppe I did reorder almost all # long time labels as you suggested.

These are all the exceptions:

$ git diff upstream/develop -- | grep '^+.*#.*# long'
+            sage: sum(FM.plot({}, srange(-2, 2, 0.1), srange(-2, 2, 0.1), opacity=0.2)  # not tested    # long time     # optional - sage.symbolic  # optional - sage.plot  # optional - sage.rings.number_field
+            sage: for j in M.irange():  # check on M's default frame  # long time
+            sage: for j in M.irange():  # check on frame e  # long time
+            sage: F.relative_error(asy[0], alpha, [1, 2, 4, 8, 16], asy[1])  # abs tol 1e-10  # long time
+            sage: rho.non_surjective() # See Section 5.10 of [Ser1972].  # long time
+            sage: rho.isogeny_bound() # See Section 5.10 of [Ser1972].  # long time
+            sage: rho.isogeny_bound()  # No 7-isogeny, but...   # long time
+            sage: rho.reducible_primes() # See Section 5.10 of [Ser1972].  # long time
+            sage: rho.isogeny_bound()  # No 7-isogeny, but...   # long time
+        sage: sage.schemes.elliptic_curves.gal_reps_number_field._non_surjective(E) # See Section 5.10 of [Ser1972].  # long time
+        sage: (out, err, ret) = test_executable([  # optional - gdb # long time
+        sage: out.find('(gdb) ') >= 0              # optional - gdb # long time
+        sage: ret                                  # optional - gdb # long time

Only the last three seem like they could be reordered, however, the # long time labels in those three lines are aligned with other # long time lines in the same file and it looks ok this way.

@tornaria
Copy link
Contributor Author

I think I'm done here, unless something else is really necessary, I'd rather finish this PR.

Aside: this "codecov" check is quite annoying, since I don't know how to make it happy. It seems all of my latest PRs are marked check failure because of this. A couple had actual errors, but since all of them have red crosses, it's not immediate to tell which ones.

I think we should be really serious about CI passing and PRs be reworked if some check fail. But it's quite frustrating to be aiming for a moving target that we don't know how it works.

Is it possible that the codecov checks run and indicate something but that they are not taken into account in the global "pass/fail" decision for a PR?

Also maybe it's better to have a separate repo / branch where CI experiments are carried before being pushed to develop?

Comment on lines 428 to 433
sage: K = NumberField(x**2 - 29, 'a'); a = K.gen()
sage: E = EllipticCurve([1, 0, ((5 + a)/2)**2, 0, 0])
sage: sage.schemes.elliptic_curves.gal_reps_number_field._non_surjective(E) # See Section 5.10 of [Ser1972].
sage: sage.schemes.elliptic_curves.gal_reps_number_field._non_surjective(E) # See Section 5.10 of [Ser1972]. # long time
[3, 5, 29]
sage: E = EllipticCurve_from_j(1728).change_ring(K) # CM
sage: sage.schemes.elliptic_curves.gal_reps_number_field._non_surjective(E)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is really far right (column 130). However, it seems the comment before may be more important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In situations like this, I have often rewritten the test as from sage.schemes.elliptic_curves.gal_reps_number_field import _non_surjective (even if this import line is very long).

src/sage/tests/cmdline.py Outdated Show resolved Hide resolved
@mkoeppe
Copy link
Member

mkoeppe commented Apr 15, 2023

this "codecov" check is quite annoying, since I don't know how to make it happy. It seems all of my latest PRs are marked check failure because of this. A couple had actual errors, but since all of them have red crosses, it's not immediate to tell which ones.

I haven't followed the recent work on codecov; maybe @tobiasdiez or @kwankyu can comment on this

@mkoeppe
Copy link
Member

mkoeppe commented Apr 15, 2023

@mkoeppe I did reorder almost all # long time labels as you suggested.

These are all the exceptions:

$ git diff upstream/develop -- | grep '^+.*#.*# long'
+            sage: sum(FM.plot({}, srange(-2, 2, 0.1), srange(-2, 2, 0.1), opacity=0.2)  # not tested    # long time     # optional - sage.symbolic  # optional - sage.plot  # optional - sage.rings.number_field
+            sage: for j in M.irange():  # check on M's default frame  # long time
+            sage: for j in M.irange():  # check on frame e  # long time
+            sage: F.relative_error(asy[0], alpha, [1, 2, 4, 8, 16], asy[1])  # abs tol 1e-10  # long time
+            sage: rho.non_surjective() # See Section 5.10 of [Ser1972].  # long time
+            sage: rho.isogeny_bound() # See Section 5.10 of [Ser1972].  # long time
+            sage: rho.isogeny_bound()  # No 7-isogeny, but...   # long time
+            sage: rho.reducible_primes() # See Section 5.10 of [Ser1972].  # long time
+            sage: rho.isogeny_bound()  # No 7-isogeny, but...   # long time
+        sage: sage.schemes.elliptic_curves.gal_reps_number_field._non_surjective(E) # See Section 5.10 of [Ser1972].  # long time
+        sage: (out, err, ret) = test_executable([  # optional - gdb # long time
+        sage: out.find('(gdb) ') >= 0              # optional - gdb # long time
+        sage: ret                                  # optional - gdb # long time

Only the last three seem like they could be reordered, however, the # long time labels in those three lines are aligned with other # long time lines in the same file and it looks ok this way.

I don't really have a strong preference for the order of # long and the # optional annotations that correspond to the traditional "optional packages"; just the new modularization# optional - sage... should not start before column 88 (and preferably exactly at column 88) to avoid being too distracting.

In any case, aligning the annotations in a column certainly reduces the visual clutter and is a good thing to do when one makes changes to these lines anyway.

@tobiasdiez
Copy link
Contributor

this "codecov" check is quite annoying, since I don't know how to make it happy. It seems all of my latest PRs are marked check failure because of this. A couple had actual errors, but since all of them have red crosses, it's not immediate to tell which ones.

From what I observed its actually not an issue with codecov but that some tests have random input and thus trigger different code paths. I've opened #35522 for this. If you experience any other problems, please open a new issue and I'll have a look.

@tornaria
Copy link
Contributor Author

this "codecov" check is quite annoying, since I don't know how to make it happy. It seems all of my latest PRs are marked check failure because of this. A couple had actual errors, but since all of them have red crosses, it's not immediate to tell which ones.

From what I observed its actually not an issue with codecov but that some tests have random input and thus trigger different code paths. I've opened #35522 for this. If you experience any other problems, please open a new issue and I'll have a look.

Please have a look at the codecov/patch check: it's very specific

Check warning on line 1064 in src/sage/plot/animate.py

Codecov / codecov/patch

src/sage/plot/animate.py#L1064

Added line #L1064 was not covered by tests

However, if you look at the diff I added a test in lines 1046 and 1047 that would fail without the change I did to L1064. If that is not covering this line, please tell me what would cover that change.

As for the other issue: maybe codecov could be run with --random-seed=0 so it is more deterministic.

@tornaria
Copy link
Contributor Author

Looking at https://app.codecov.io/gh/sagemath/sage/pull/35443/blob/src/sage/plot/animate.py#L1064 I think I understand what is going on here. The codecov/patch test only runs the testsuite in normal (not long) mode. In this case, the method apng() is never called in a normal-mode test.

Maybe an option is to run long tests at least just for those files that are changed in the patch. In fact, it'd be nice to run the whole testsuite in "normal" mode and the changed files a second time in "long" mode. This could also catch cases when doctesting works in "long" mode but it doesn't in "normal" mode because of a missing # long time label.


In this particular case, maybe there could be a test that calls apng() for a trivial animation so it's fast but it would still test that the filename is set ok.

But before worrying about that, we should be clear about what is the expectation: do we aim for 100% code coverage on normal test? on long test? Is this an aim for the whole codebase, or just for lines that change?

Whatever is the answer to those questions, IMO we must stick to them, either do not merge PRs that don't pass coverage check (with few reasonable exceptions) or else don't make coverage failure part of PR failure.

Otherwise, we risk making the whole CI check useless.

@tornaria
Copy link
Contributor Author

As per my previous comment, I added a small quick test that should satisfy codecov/patch.

@github-actions
Copy link

Documentation preview for this PR is ready! 馃帀
Built with commit: 1021032

@tobiasdiez
Copy link
Contributor

Maybe an option is to run long tests at least just for those files that are changed in the patch. In fact, it'd be nice to run the whole testsuite in "normal" mode and the changed files a second time in "long" mode. This could also catch cases when doctesting works in "long" mode but it doesn't in "normal" mode because of a missing # long time label.

I don't think such a hybrid mode is supported (yet) by our doctest framework, or is it? Maybe we can always run all long tests in CI, or would this be to long?

But before worrying about that, we should be clear about what is the expectation: do we aim for 100% code coverage on normal test? on long test? Is this an aim for the whole codebase, or just for lines that change?

As far as I understand it, sage has a high priority on writing tests with a high coverage. Striving for 100% coverage however is usually not a good idea, since the additional tests you create to cover "trivial branches" create a maintenance overhead without providing real value.

Maybe we should move this discussion to a new issue?

@mkoeppe
Copy link
Member

mkoeppe commented Apr 18, 2023

Maybe an option is to run long tests at least just for those files that are changed in the patch. In fact, it'd be nice to run the whole testsuite in "normal" mode and the changed files a second time in "long" mode. This could also catch cases when doctesting works in "long" mode but it doesn't in "normal" mode because of a missing # long time label.

I don't think such a hybrid mode is supported (yet) by our doctest framework, or is it? Maybe we can always run all long tests in CI, or would this be to long?

I'd be +1 on running the long tests in CI.

And before running the long tests, perhaps we can run the changed files of the PR first (similar to sage -t --new) for a quick turnaround.

@vbraun vbraun merged commit ef68bee into sagemath:develop Apr 23, 2023
6 of 7 checks passed
@mkoeppe mkoeppe added this to the sage-10.0 milestone Apr 23, 2023
@tornaria tornaria deleted the slow_doctests branch November 27, 2023 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants