Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a very slow doctest in sage/data_structures/stream.py #35127

Merged

Conversation

tornaria
Copy link
Contributor

This single test used to take ~ 200s.
Now it takes ~ 1.5s and I marked it long time.

This was introduced in 94219d4 (#34413).

πŸ“š Description

πŸ“ Checklist

  • I have made sure that the title is self-explanatory and the description concisely explains the PR.
  • I have linked an issue or discussion.

βŒ› Dependencies

@tscrim
Copy link
Collaborator

tscrim commented Feb 14, 2023

This is slightly worrisome to me because I recall testing this (repeatedly) and not noticing such a slowdown. Although it is purely in the plethysm code that is taking its time. Anyways, this is fine with me. It still tests what it needs to test.

@mantepse
Copy link
Collaborator

This is extremely strange. Could you check which commit caused the slowdown @saraedum ?

@codecov-commenter
Copy link

Codecov Report

Base: 88.60% // Head: 88.58% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (a3b1dcd) compared to base (c000c95).
Patch has no changes to coverable lines.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #35127      +/-   ##
===========================================
- Coverage    88.60%   88.58%   -0.02%     
===========================================
  Files         2136     2140       +4     
  Lines       396142   396961     +819     
===========================================
+ Hits        350990   351663     +673     
- Misses       45152    45298     +146     
Impacted Files Coverage Ξ”
src/sage/data_structures/stream.py 97.47% <ΓΈ> (ΓΈ)
src/sage/groups/affine_gps/euclidean_group.py 88.88% <0.00%> (-11.12%) ⬇️
src/sage/interfaces/qsieve.py 71.30% <0.00%> (-2.61%) ⬇️
src/sage/groups/affine_gps/affine_group.py 96.62% <0.00%> (-2.25%) ⬇️
src/sage/groups/additive_abelian/qmodnz.py 91.48% <0.00%> (-2.13%) ⬇️
src/sage/modular/local_comp/liftings.py 98.33% <0.00%> (-1.67%) ⬇️
src/sage/schemes/elliptic_curves/hom_velusqrt.py 94.09% <0.00%> (-1.58%) ⬇️
src/sage/modular/hecke/algebra.py 94.65% <0.00%> (-1.07%) ⬇️
src/sage/graphs/generators/random.py 91.95% <0.00%> (-0.69%) ⬇️
src/sage/groups/generic.py 88.34% <0.00%> (-0.68%) ⬇️
... and 65 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

β˜” View full report at Codecov.
πŸ“’ Do you have feedback about the report comment? Let us know in this issue.

@mantepse
Copy link
Collaborator

So, this is an extreme regression. I am very much against tuning done the doctest. In SageMath version 9.8.beta2, Release Date: 2022-10-16, testing the file takes about 1 second.

@mantepse
Copy link
Collaborator

9.8.beta7 takes 125.00s

@mantepse
Copy link
Collaborator

Ah, it might have been #27652. In sage 10.0.beta0:

sage: %time B = p[2, 2, 1](sum((s[i]) for i in range(6)))
CPU times: user 29.2 s, sys: 23.6 ms, total: 29.2 s
Wall time: 29.4 s

whereas

sage: %time B = p[2, 2, 1](sum(p(s[i]) for i in range(6)))
CPU times: user 210 ms, sys: 14 Β΅s, total: 210 ms
Wall time: 227 ms
sage: %time B = p[2, 2, 1](sum(p(s[i]) for i in range(7)))
CPU times: user 849 ms, sys: 0 ns, total: 849 ms
Wall time: 855 ms

In the following line,

            sage: all(h.compute_product(k, Partition([2, 2, 1])) == B.restrict_degree(k) for k in range(7))
            True

the left hand side is unaffected by the basis B is computed in. I indicated the correct fix in the PR as an inline comment, I did not manage to copy it here, still new to github.

@tscrim
Copy link
Collaborator

tscrim commented Feb 14, 2023

I am unable to see the suggestion. Can you post it directly here?

From what I remember, when we want to propose an alternative fix, we are suppose to do a separate PR. Hopefully we will still be able to have a coherent discussion about the issue if we do that...

@tscrim tscrim self-requested a review February 14, 2023 12:06
@mantepse
Copy link
Collaborator

OK, the suggestion is lost, I misunderstood github. I'll try again, differently.

@mantepse
Copy link
Collaborator

I am unable to see the suggestion. Can you post it directly here?

From what I remember, when we want to propose an alternative fix, we are suppose to do a separate PR. Hopefully we will still be able to have a coherent discussion about the issue if we do that...

I guess it would be wise to keep the discussion on the issue page then. Or perhaps separate discussion of the issue from discussion of the specific implementation of the fix?

This single test used to take ~ 200s, now it takes ~ 0.5s.
Marked long time since the other 769 tests together take < 1s.
@tornaria tornaria force-pushed the slow_doctest-sage/data_structures/stream.py branch from a3b1dcd to 2bdb515 Compare February 14, 2023 12:38
@mantepse
Copy link
Collaborator

@tornaria, I'm afraid something didn't work out right. The forced push I see is the same fix as before.

@mantepse
Copy link
Collaborator

Ah, no, now I see, you introduced the p, but forgot to remove the # long markers, it seems.

@tornaria
Copy link
Contributor Author

tornaria commented Feb 14, 2023

I am unable to see the suggestion. Can you post it directly here?

From what I remember, when we want to propose an alternative fix, we are suppose to do a separate PR. Hopefully we will still be able to have a coherent discussion about the issue if we do that...

You can comment in the review, also you can make small suggested changes that I can then apply to the PR.

Another way is to do a PR against my branch in my repo then I can incorporate into this PR.

In any case, this PR is just to fix the slow doctest and IMO should stay at that which is trivial to merge (and very important since some automatic testing fails due to timeout, e.g. https://github.com/void-linux/void-packages/actions/runs/4169062901/jobs/7216578712 failed only because of this slow test).

If you want to fix the speed regression in the code (which would be great), I say make a separate PR. Doing that is outside my expertise.

Edit: the run I referenced failed for a different reason, but I did get a timeout somewhere because of this.

@mantepse
Copy link
Collaborator

@tornaria, I don't get it. There is no code to fix. The fix is to introduce the additional p, as you did. Now you can remove the markers. The test now takes about 400ms on my machine, which is certainly OK.

@saraedum
Copy link
Member

This is extremely strange. Could you check which commit caused the slowdown @saraedum ?

Do you mean whether we can check this with asv benchmarks? asv supports this but we'd need to teach it how to rebuild sage after switching branches. That is not implemented currently.

@mantepse
Copy link
Collaborator

This is extremely strange. Could you check which commit caused the slowdown @saraedum ?

Do you mean whether we can check this with asv benchmarks? asv supports this but we'd need to teach it how to rebuild sage after switching branches. That is not implemented currently.

asv would have caught this, because it was a switch of convention in a method other than the method which was doctested, and the time went from 500ms to 120s on my computer.

@tornaria, I'm not sure whether my other comment is visible: the reason I put in this doctest was to catch speed regressions, so I am very much against marking this as long. I am quite sure it would not have been noticed if it had the long marker before.

@tornaria
Copy link
Contributor Author

@tornaria, I don't get it. There is no code to fix. The fix is to introduce the additional p, as you did. Now you can remove the markers. The test now takes about 400ms on my machine, which is certainly OK.

The original test was fast when you introduced, but now it's slow. That's a speed regression.

The test takes 0.6 - 1.0 s depending on the computer. That is long time.

As I said in the review comment: Another option would be to always ran the test for range(5) (it's very fast) and then have a long time test to check 6 and 7 (and maybe 8). It all depends on what you are testing here (the compute_product() function).


If you want to test for speed regressions, I'm not sure this is the right place. Note that here you are doctesting compute_product() but the speed regression is not in that function but in the alternative way to compute the same. Also, using p(s[i]) is actually hiding the speed regression. I'd say the proper thing to do is try to fix the speed regression and then add back a speed regression test for the computation of B = p[2, 2, 1](sum(s[i] for i in range(7))) in the place where it belongs.

The speed regression seems unrelated to sage/data_structures/stream.py:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SageMath version 9.7, Release Date: 2022-09-19                     β”‚
β”‚ Using Python 3.11.2. Type "help()" for help.                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
sage: s = SymmetricFunctions(QQ).s()
sage: p = SymmetricFunctions(QQ).p()
sage: %time B = p[2, 2, 1](sum(s[i] for i in range(6)))
CPU times: user 277 ms, sys: 3.79 ms, total: 280 ms
Wall time: 285 ms
sage: %time B = p[2, 2, 1](sum(s[i] for i in range(7)))
CPU times: user 438 ms, sys: 8.08 ms, total: 446 ms
Wall time: 446 ms
sage: %time B = p[2, 2, 1](sum(s[i] for i in range(8)))
CPU times: user 1.41 s, sys: 14.1 ms, total: 1.42 s
Wall time: 1.43 s
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ SageMath version 9.8, Release Date: 2023-02-11                     β”‚
β”‚ Using Python 3.11.2. Type "help()" for help.                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
sage: s = SymmetricFunctions(QQ).s()
sage: p = SymmetricFunctions(QQ).p()
sage: %time B = p[2, 2, 1](sum(s[i] for i in range(6)))
CPU times: user 16.6 s, sys: 2.99 ms, total: 16.6 s
Wall time: 16.7 s
sage: %time B = p[2, 2, 1](sum(s[i] for i in range(7)))
CPU times: user 2min 39s, sys: 16 ms, total: 2min 39s
Wall time: 2min 40s

Feel free to copy paste in a separate issue.

@mantepse
Copy link
Collaborator

@tornaria, I'm afraid I communicated not clearly enough. Let me say it again: there is no speed regression in the code.

The parent of plethysm changed (see #27652), that's why adding the p fixes the apparent speed regression. There is no way that p[2, 2, 1](sum(s[i] for i in range(7))) could be even close to the speed of p[2, 2, 1](sum(p(s[i]) for i in range(7))), one computes the result in the Schur basis, the other one in the powersum basis.

The policy on long doctests in sage is specified here: https://doc.sagemath.org/html/en/developer/doctesting.html#section-options, it clearly says "anything beyond about one second") This test takes less than half a second on my very ordinary computer. There are many tests in sage which take much longer.

The reason I put in range(7) is that it was the cutoff for the limit of 1 second. Thus, any speed regression in the plethysm code of sfa.py or in the lazy plethysm code in stream.py would trigger this. There is currently no other way - at least that I know of - to catch speed regressions.

@github-actions
Copy link

Documentation preview for this PR is ready! πŸŽ‰
Built with commit: 2bdb515

@mantepse
Copy link
Collaborator

Ping?

@tscrim
Copy link
Collaborator

tscrim commented Feb 19, 2023

@mantepse Are you suggesting to remove the addition of the # long time from the current branch and waiting on a response from @tornaria? I am not sure that adding this changes too much since we don't (automatically) check when a regular doctest starts taking a long time (there are a few tests that necessarily break this rule because it needs to do some initialization, but fundamentally it is fast IIRC).

The current PR keeps the test at range(7) but adds the p and # long time. I am willing to set a positive review based on this, but I want to confirm with you @mantepse that this is acceptable.

@mantepse
Copy link
Collaborator

@tscrim, I am very surprised. With the p inserted, the doctest takes less than half a second. Why would you mark it as long then?

Yes, i am against adding the long.

I know that there is currently no automatic tool that checks the timings. There is effort underway to change this, but it is not clear how well it will work. However, this doctest actually was detected (and quite early so) which should be a sign that it's well designed.

@tscrim
Copy link
Collaborator

tscrim commented Feb 19, 2023

It takes close to half a second on my fairly new laptop. So it isn't exactly "fast", and on slower machines, it might take longer than a second.

This doctest would also have been detected even with being marked # long time. So I don't see that as a valid argument. Even long tests are supposed to be around 1-5s with a maximum of 30s (of course, we do often break this guideline).

I don't see any harm in marking in # long time since it is a bit longer and testing more for (large) speed regressions.

@mantepse
Copy link
Collaborator

Actually it wouldn't have been detected if it had had the long marker. It was detected because of a timeout. At least, I don't know of any mechanism that would have detected it as long.

Moreover, i doubt it would have hit the timeout limit with 6 instead of 7.

Finally, we have very many regular doctests which take roughly the same amount of time. Why is this one different, especially, since it serves a purpose?

@tscrim
Copy link
Collaborator

tscrim commented Feb 19, 2023

Actually it wouldn't have been detected if it had had the long marker. It was detected because of a timeout. At least, I don't know of any mechanism that would have detected it as long.

Are timeouts for long tests any different than for those not marked as long? I don't recall this being the case. It would have appeared from the --warn-long 30 marker; which we could very easily implement something that checks for new doctests that raise such a warning.

I don't know for the present tester, but previously the patchbots ran --long, so adding # long time did not change anything. (I also strongly hope the tester does actually run all the long tests, otherwise we are more likely to have errors slip through to Volker.)

I still do not see the harm in adding it.

Moreover, i doubt it would have hit the timeout limit with 6 instead of 7.

I don't see the relevance of this here. It is about the duration of a doctest, not any specific one.

Finally, we have very many regular doctests which take roughly the same amount of time. Why is this one different, especially, since it serves a purpose?

Not that many, and most that do don't already have other tests that cover them. Even if they are marked as # long time, that would be fine (although potentially more burdensome for the developers). On the other hand, we also almost certainly have tests marked as # long time that now don't need it.

@mantepse
Copy link
Collaborator

If I understand correctly, the timeout was not within sage but elsewhere. In any case, neither the patchbot nor Volker noticed it.

@mantepse
Copy link
Collaborator

That said, if you are convinced that long is better, please go ahead. I don't see the reasons I gave addressed, but possibly this all becomes irrelevant once a proper benchmark tool is integrated.

I do vaguely remember fixing a speed regression detected by findstat several times, and I must say that this is not only annoying but also time consuming.

@tscrim
Copy link
Collaborator

tscrim commented Feb 19, 2023

As I understand @tornaria, it occurred from one of the testers. I didn't see it on my desktop or laptop either, likely because they weren't slow enough to detect the timeout (on develop on my laptop):

Doctesting 1 file using 16 threads.
sage -t --random-seed=46345267998674379276588152833005580169 stream.py
    [771 tests, 111.64 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 111.8 seconds
    cpu time: 103.0 seconds
    cumulative wall time: 111.6 seconds                                      

I understand what you want to protect against (and +1 for trying to find ways to detect it), but I don't think it will work at this point without someone manually noticing something. There are definite ways we could improve this, but (in principle) it should equally be detected by # long time tests.

Indeed, finding speed regressions is a painful process (even more so in compiled code, which I had to do recently). Having a proper benchmarking tool will definitely be the way forward, if for nothing else to detect major regressions. We've tried a few times over the years as I recall, but getting a good robust framework has been the sticking point IIRC.

Anyways, I should state that I am not opposed to adding the # long time, but I would also be okay with leaving them in. Perhaps @tornaria you could weigh in how much you want it marked as such. (I am just guessing that you fairly strongly want it.) I am trying to find the quickest way we can all agree to getting this fix in.

@tornaria
Copy link
Contributor Author

Some comments:

  • I wouldn't say this speed regression was detected: it wasn't. By the time this is in a released version of sage, it is too late.
  • This test correspond to the function compute_product(), which is reasonably fast. I wouldn't mind at all you running this function up to 7 or even more here, although I'm not sure it's really adding anything.
  • The slow part of the test is to compute the same "in a different way" to compare the results. But that belongs to a different module; this is definitely not the place to test that.
  • It seems whoever changed the semantics of SymmetricFunctions(QQ).p() didn't think of testing this file. That adds to my opinion that this is not the place to test it.
  • Anyway, when I develop, I always run long tests, even more so in the files that I am developing. I might even run more tests. For instance, if you are worried about speed regressions in your part of the code, it may be good idea to have some testsuite that you can run thoroughly when you make significant changes to the code, for peace of mind.
  • Here's one idea: have a TestSuite() function which takes some parameters. In the doctests you run the testsuite with mild parameters that run a quick version of the test suite (this is important to make sure the test suite is not broken by other changes). But when you are developing, you can run the same function with larger parameters so more tests are run. The point is that you have a one-liner that you run often.
  • Again, about normal vs. long tests: for development the long tests should always be run!!! But from a systems integration point of view, we want to run the doctests a lot of times in different OS / distributions / library version / etc, and for that it is quite useful to have a faster subset of the doctests.

BTW, in my (not-shiny-and-new) server that I use for sage testing it takes more than 1 second:

sage -t --long --warn-long 0.5 --random-seed=338250697636905235235876997205634301766 src/sage/data_structures/stream.py
**********************************************************************
File "src/sage/data_structures/stream.py", line 2093, in sage.data_structures.stream.Stream_plethysm.compute_product
Warning, slow doctest:
    B = p[2, 2, 1](sum(p(s[i]) for i in range(7)))  # long time
Test ran for 1.22 s, check ran for 0.00 s
**********************************************************************
File "src/sage/data_structures/stream.py", line 2143, in sage.data_structures.stream.Stream_plethysm.stretched_power_restrict_degree
Warning, slow doctest:
    B = p[2,2,2](sum(g[n] for n in range(7)))  # long time
Test ran for 2.85 s, check ran for 0.00 s
    [774 tests, 6.16 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------

these two are the only two tests marked long time, everything else is very fast.

@mantepse
Copy link
Collaborator

@tornaria , would you have spotted the regression also if the test was marked as long?

@tornaria
Copy link
Contributor Author

@tornaria , would you have spotted the regression also if the test was marked as long?

Yes, I run normal and long testsuite often. For the former I use --warn-long 30.0 and for the latter I use --warn-long 60.0. Your test was taking 200 seconds on my fast computer.

You can also run the long testsuite when developing. It takes time but you don't have to run the whole thing, you could have a list of files that are influenced by code you care about, and run this more often when developing.

As a starting point, use the output of git grep -l SymmetricFunctions. E.g.

$ ./sage -tp 8 --long --warn-long 30.0  $(git grep -l SymmetricFunctions)
...
real    3m0.333s
user    15m23.331s
sys     0m12.504s

@mantepse
Copy link
Collaborator

mantepse commented Feb 19, 2023

Ok, then leave it as it is.

I know how to run the tests, but I do not run the long ones outside of the module, because they take too long. I don't know why this regression was not noticed immediately when I fixed the semantics of plethysm.. neither did anybody else, except you. So i take your word that you will also catch similar regressions in the future.

Apart from that , I'm sure that help on #35046 is highly welcome.

@tornaria
Copy link
Contributor Author

I know how to run the tests, but I do not run the long ones outside of the module, because they take too long.

It takes me 3 minutes to run all long tests in the 111 files that match the word SymmetricFunction.

I don't know why this regression was not noticed immediately when I fixed the semantics of plethysm..

Because you didn't run the long tests outside the module. That's why I said that if you intend this test to catch regressions in this plethysm code, then it doesn't belong here. It seems better to place the test in the same module/function which might cause a failure.

That being said, when changing semantics in some module, it seems very desirable to check thoroughly all users of the module, not just the module itself... For instance, it takes 3 minutes to run long tests in all 111 files. That's 180 seconds, so a single test taking 200 seconds does stand out (you are developing and you run tests often: if suddenly a single test takes 200 seconds this will piss you enough that you will spot the problem).

neither did anybody else, except you. So i take your word that you will also catch similar regressions in the future.

Again, I'd appreciate if you don't depend on me for this. Rather, figure out strategies to spot regressions early. Realize that once this test gets into a beta release, it will be run thousands of times and burden all CI. Worse for stable releases. I am still glad to catch and report stuff that slips through the cracks (there will always be some), but please do your best effort and learn from the things that escape your testing so that you catch similar issues next time.

Apart from that , I'm sure that help on #35046 is highly welcome.

@mantepse
Copy link
Collaborator

Thank you for your great advice on how to develop code for sage.

@vbraun vbraun merged commit 53362d1 into sagemath:develop Mar 26, 2023
@mkoeppe mkoeppe added this to the sage-10.0 milestone Mar 26, 2023
vbraun pushed a commit that referenced this pull request Apr 1, 2023
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes #1234" use "Introduce new method to
calculate 1+1"
-->
### πŸ“š Description

This address the arity problem noted in #35261. It also removes a `TODO`
in a `revert()` implementation by making the behavior the same across
different implementations. Warnings are added for the user for the
assumptions made by the code.

### πŸ“ Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->

- [x] I have made sure that the title is self-explanatory and the
description concisely explains the PR.
- [x] I have linked an issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation accordingly.

### βŒ› Dependencies
<!-- List all open pull requests that this PR logically depends on -->

- #35127 Avoiding a long test.
- #35254 Avoiding merge conflicts.
- #35291 Making sure these changes work together.
- #35265 For some slight simplification of the new stream's
`__getitem__`.
    
URL: #35293
Reported by: Travis Scrimshaw
Reviewer(s): Martin Rubey, Travis Scrimshaw
@tornaria tornaria deleted the slow_doctest-sage/data_structures/stream.py branch April 4, 2023 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants