providing different number of checkruns in a test suite #59

FrankMittelbach · 2018-03-03T09:48:58Z

I just wrote a bunch of tests that require 2 or even 3 checkruns before they get the right result. Thus in the confix I have to set checkruns=3 with the result that if I would put that into a larger suite of tests the processing times goes up drastically (as 99% of the tests would come out right after a single run).

Now on fast machines that is not so much of an issues, but on this rather old desktop, for example, the 2e suite already runs for 20 minutes so I really don't need another 10 or 40 minutes.

Proposal:

checkruns =  <number> | *

If * then do a loop in the check target (up to 3 times maybe) comparing the log against the tlg and rerun if differences. Fail after 3 run. This way the majority of tests would finish after 1 run, and real failures would run 1 or 2 times unnecessarily (but that's the exceptional case).

Of course that also requires a change in the in the save, either by running that always 3 times (if * is set above) or by offering something like

texlua build.lua save --checkruns=2 tlb-foo

same could be in principle also be offered in check for locally overwriting the config value for testing purposes.

The text was updated successfully, but these errors were encountered:

josephwright · 2018-03-03T09:59:11Z

Perhaps the easiest way is if I add the ability to pick .lvt files in a config. One can then have two configs which differ only in checkruns: easier than trying to extend what is at present a simple syntax ...

wspr · 2018-03-03T09:59:21Z

No against the idea, but this sparked another thought. While another way to do this could be to use a separate build file which collect together all the “3 run” tests, we’re starting to run into cases like this where individual tests need some tweaking. (I have set up a strange system for unicode-math which typesets the documentation as part of the test suite… it’s quite nice and works at present by using the “checkconfigs” feature.) Would there be any appetite for a test-specific file, with extension say ".l3b”, that contained a Lua table with setup for that particular test to override the defaults in the build.lua file?

josephwright · 2018-03-03T10:06:19Z

@wspr Could just be an appropriately-named .lua file, doesn't have to have an 'odd' extension.

wspr · 2018-03-03T10:14:47Z

@josephwright Sure; makes little big difference whether you have l3test01.lvt then l3test01.l3b or l3test01-l3b.lua but having a separate extension kind of ‘fits’ nicely.

FrankMittelbach · 2018-03-03T10:22:07Z

Am 03.03.2018 um 10:59 schrieb Joseph Wright:

Perhaps the easiest way is if I add the ability to pick |.lvt| files in a config. One can then have two configs which differ only in |checkruns|: easier than trying to extend what is at present a simple syntax ...

well from a user/developer perspective I think my scheme is much simpler if a save always runs 3 times (under the assumption that runs do not change logs after a certain time) then specifying * means all I have to do is writing test files and they automatically runs as of ten as necessary, no need to also list them i some extra config or decide which config to use etc.. Ok that doesn't solve more complicated scenarios and yes, maybe such a features of config selection is also quite nice. But is is for the much more complicated stuff and what I was trying to propose is for the simepl stuff an easy way.

josephwright · 2018-03-03T17:10:09Z

@FrankMittelbach At present, our configs are simple .lua files with no odd parsing. In current case, checkruns is an integer. In we want to sue the suggested syntax, we'd have to parse the entire config file separately to pick out the value. Assuming we don't want to do that, we might make it a string so we can parse just the value itself, but that still feels awkward to me ...

davidcarlisle · 2018-03-03T17:24:08Z

you could use negative values to checkruns so 5 do 5 runs 0 skip checks -5 do at most 5, but stop if unchanged since last time perhaps.

…

On 3 March 2018 at 17:10, Joseph Wright ***@***.***> wrote: @FrankMittelbach <https://github.com/frankmittelbach> At present, our configs are simple .lua files with no odd parsing. In current case, checkruns is an integer. In we want to sue the suggested syntax, we'd have to parse the entire config file separately to pick out the value. Assuming we don't want to do that, we might make it a string so we can parse just the value itself, but that still feels awkward to me ... — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#59 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABNcAoTp0rMx2_ylQCIZZ1PzxfC3J1C2ks5tas5ygaJpZM4Sa2zy> .

blefloch · 2018-03-03T17:24:13Z

Could we use "checkruns=0" to denote what Frank wanted?

FrankMittelbach · 2018-03-03T17:28:28Z

@josephwright hmm. I wonder if that approach then is valid (and not dangerous):

 checkruns = <max-number>

runs the checks a maximum number of times each time comparing the result. Stops if either the results match or we have tried of times. For save always do

Offhand I can't really see a case where that is going to fail, ie matching first time by not second or third. I know one can construct such cases, but that has to be rather deliberately in my opinion.

But I think that @davidcarlisle 's suggestion is even better. 0 doesn't quite work as you may want to say how often to try.

josephwright · 2018-03-05T12:11:35Z

I'll have to adjust the code to let us 'break out' of the loop to get to the .tlg comparison stage. I'd like to do #50 first (it's a big-ish job), then I guess I can look at this and #6. I'll flag for TL'18.

wspr · 2018-03-05T12:20:33Z

Hmmmm, what if you were testing that a certain condition didn't converge after a number of runs? I'm thinking something cross-ref related where you might expect a value after two runs but due to some misbehaving package still have ??. Possibly too contrived an example...

Secondly, I guess this isn't a major problem but this does slow down all legitimately failing tests, right? (Especially if not running --halt.) Unless you subdivide all tests that need multiple runs, but if you were willing to do that you wouldn't need the variable number of runs to start with.

I'm sorry to be contrary but if it were up to a vote I think I'd prefer a simple mechanism to indicate how many runs a given test should use. (Even if it were embedded somehow in the .lvt file?)

FrankMittelbach · 2018-03-05T12:28:30Z

Am 05.03.18 um 13:20 schrieb Will Robertson:

I'm sorry to be contrary but if it were up to a vote I think I'd prefer a simple mechanism to indicate how many runs a given test should use. (Even if it were embedded somehow in the |.lvt| file?)

well and I still prefer David's simple suggestion of checkruns = -2 note that this is totally independent of coming up with a scheme that allows to set it explicitly per test file. This is simply a gain on (what I think is a typical scenario like the 2e case: many test files a small number needs 2 but I have to run all with 2 or group them into different classes with different configs ... with the above it does 2 only if they aren't equal after one run. this means some extra processing (if really 2 are needed) as the normalization stuff and diff is executed unnecessary) but overall it should speed up the processing considerably.

wspr · 2018-03-05T12:59:35Z

with the above it does 2 only if they aren't equal after one run

I guess it’s fine as an opt-in part of the code :) For me, it’s not uncommon to have quite a number of tests fail after some kind of strange happening, and doubling the time it takes to run `l3build check` wouldn’t be something I’d invite too quickly…

blefloch · 2018-03-06T03:37:58Z

> with the above it does 2 only if they aren't equal after one run I guess it’s fine as an opt-in part of the code :) For me, it’s not uncommon to have quite a number of tests fail after some kind of strange happening, and doubling the time it takes to run `l3build check` wouldn’t be something I’d invite too quickly…

Frank's scenario is when a few tests are known to take 2 runs to stabilize. Currently one has to run all tests twice, doubling the time of `l3build check`. With David's proposal the failure case would be just as bad as it is now (running twice) while the success case would be twice faster for all tests that only require a single run. This proposal does not preclude yours, which is to have test-specific set-ups. Bruno

wspr · 2018-03-06T03:52:59Z

Good point, Bruno — it’s still an advantage over a current setup. In that case I think we’re all agreed — everyone’s plan is good!

FrankMittelbach · 2018-03-06T07:15:35Z

Am 06.03.2018 um 04:38 schrieb Bruno Le Floch:

Frank's scenario is when a few tests are known to take 2 runs to stabilize. Currently one has to run all tests twice, doubling the time of `l3build check`. With David's proposal the failure case would be just as bad as it is now (running twice) while the success case would be twice faster for all tests that only require a single run. This proposal does not preclude yours, which is to have test-specific set-ups.

exactly ... what I was trying to say only said much better :-)

FrankMittelbach · 2018-03-06T09:12:16Z

Am 06.03.18 um 04:53 schrieb Will Robertson:

In that case I think we’re all agreed — everyone’s plan is good!

only have to execute them then

At present this just covers .tlg-based tests.

josephwright · 2018-03-21T10:10:20Z

I've added some code for this but not documented yet. I wonder if we should just enable this approach all of the time: typically the TeX run is slower than the comparison, and we are normally running only one check run so there would be no impact. That avoids the entire need for some specialised interface.

I also need to get it working with PDF-based tests, but to be honest that entire area needs re-doing so I've not worried at present.

blefloch · 2018-03-21T14:32:58Z

I think I agree at check time, but isn't the problem at save time? One needs to know how many times to run before saving, no?

FrankMittelbach · 2018-03-21T15:16:29Z

my approach would have been to use the max number of reruns at save time (ie no breaking out earlier)

josephwright · 2018-03-21T15:29:39Z

On saving, I think the current logic should be OK. When you save, there are two cases:

There is no existing .tlg file: the comparison will always fail and the maximum number of runs will be applied
There is an existing .tlg, which will only match if we can safely break out of the loop

That said, I've not tried this out just yet: I thought I'd first see if the entire plan sounded any good. I can look at forcing 'no break out', but it's a bit tricky as the runtest() function doesn't actually know if the required target is check or save ...

josephwright · 2018-03-21T16:44:26Z

I've done some (simple) tests and I'm reasonably confident that there should be no issue with the save target ...

blefloch · 2018-03-21T17:22:55Z

I thought "That avoids the entire need for some specialised interface." was referring to the whole `checkruns` variable rather than just the specialized interface being about negative checkruns. My nebulous comment was meant to point out that checkruns is a useful variable. About "There is an existing .tlg, which will only match if we can safely break out of the loop", that seems like odd logic: do you mean that the save target cares about a previously existing tlg?? Say I save a tlg with checkruns=1 then I realize it wasn't right and I set checkruns=2, do I need to explicitly empty the tlg, lest the save target stop at one run?

josephwright · 2018-03-21T18:20:07Z

@blefloch I'd not thought of that case: I guess I will need to work out how best to handle it. All doable of course.

More general thoughts on leaving the interface unchanged? I wondered if we should have a boolean to turn this on-and-off: optimisecheckruns or similar?

FrankMittelbach · 2018-03-21T18:31:50Z

More general thoughts on leaving the interface unchanged? I wondered if we should have a boolean to turn this on-and-off: |optimisecheckruns| or similar?

In my opinion save should always run the full number and not try to bail out. With that you could optimize always on checking (I think) If you want some sort of flag then I still think the simplest way is to use negative numbers to mean optimized and positives values for full runs (or the other way arround) rather than a separate boolean.

blefloch · 2018-03-21T18:41:16Z

If you want some sort of flag then I still think the simplest way is to use negative numbers to mean optimized and positives values for full runs (or the other way arround) rather than a separate boolean.

I'm changing my mind on this. What does "negative" have to do with "run at most n times"? It seems much more logical to have a separate boolean like Joseph's suggestion of optimizecheckruns. That makes the code in one's build.lua easier to read. Perhaps checkruns should be maxcheckruns?

FrankMittelbach · 2018-03-21T18:46:11Z

I'm changing my mind on this. What does "negative" have to do with "run at most n times"?

well "-3" it is kind of a short way to say "1-3" (at least that is one way to think of it) but I'm not really bothered either way, if people think a separate boolean is better fine. I wouldn't change the name though (as that gets you into compatibility questions if checkruns becomes "maxcheckruns" and supporting both ... urg)

This means that the loop always runs in full for "save".

josephwright · 2018-03-22T17:27:35Z

The latest change means that treating checkruns as a maximum only applies when checking, not when saving. If people could test it out, or at least have a look over the commits, that would be great! I'll document once it's clear this is the desired behaviour.

josephwright · 2018-03-24T16:23:47Z

I'm calling this fixed ...

FrankMittelbach · 2018-03-24T16:28:15Z

I think so ... but my machine is still too slow :-(

josephwright · 2018-03-24T16:29:13Z

@FrankMittelbach Comes down to how many tests one decides to run: we could only pick 'obvious' ones for multiple engines, but in the past it's been the non-obvious ones that have been problematic.

FrankMittelbach · 2018-03-24T16:31:35Z

no comes down to that being an inexpensive machine that by now is really old so ... it is simply slow. It is always the non-obvious that show errors so there is a good reason to run all (I'm not complaining)

josephwright · 2018-03-24T16:34:27Z

@FrankMittelbach One reason people use a branch-and-pull-request workflow: you check stuff in on a branch, the CI does the tests, you only merge when they pass ;)

FrankMittelbach added the enhancement label Mar 3, 2018

josephwright self-assigned this Mar 3, 2018

josephwright added this to the TL'18 milestone Mar 5, 2018

josephwright added a commit that referenced this issue Mar 21, 2018

Allow for breaking check loop cycle (see #59)

c5a7514

At present this just covers .tlg-based tests.

josephwright added a commit that referenced this issue Mar 22, 2018

Only break out of runtest() for "check" (see #59)

d8d4618

This means that the loop always runs in full for "save".

josephwright closed this as completed Mar 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

providing different number of checkruns in a test suite #59

providing different number of checkruns in a test suite #59

FrankMittelbach commented Mar 3, 2018

josephwright commented Mar 3, 2018

wspr commented Mar 3, 2018 via email

josephwright commented Mar 3, 2018

wspr commented Mar 3, 2018

FrankMittelbach commented Mar 3, 2018 via email

josephwright commented Mar 3, 2018

davidcarlisle commented Mar 3, 2018 via email

blefloch commented Mar 3, 2018 via email

FrankMittelbach commented Mar 3, 2018

josephwright commented Mar 5, 2018

wspr commented Mar 5, 2018

FrankMittelbach commented Mar 5, 2018 via email

wspr commented Mar 5, 2018 via email

blefloch commented Mar 6, 2018 via email

wspr commented Mar 6, 2018 via email

FrankMittelbach commented Mar 6, 2018 via email

FrankMittelbach commented Mar 6, 2018 via email

josephwright commented Mar 21, 2018

blefloch commented Mar 21, 2018 via email

FrankMittelbach commented Mar 21, 2018

josephwright commented Mar 21, 2018

josephwright commented Mar 21, 2018

blefloch commented Mar 21, 2018 via email

josephwright commented Mar 21, 2018

FrankMittelbach commented Mar 21, 2018 via email

blefloch commented Mar 21, 2018 via email

FrankMittelbach commented Mar 21, 2018 via email

josephwright commented Mar 22, 2018

josephwright commented Mar 24, 2018

FrankMittelbach commented Mar 24, 2018

josephwright commented Mar 24, 2018

FrankMittelbach commented Mar 24, 2018

josephwright commented Mar 24, 2018

providing different number of checkruns in a test suite #59

providing different number of checkruns in a test suite #59

Comments

FrankMittelbach commented Mar 3, 2018

josephwright commented Mar 3, 2018

wspr commented Mar 3, 2018 via email

josephwright commented Mar 3, 2018

wspr commented Mar 3, 2018

FrankMittelbach commented Mar 3, 2018 via email

josephwright commented Mar 3, 2018

davidcarlisle commented Mar 3, 2018 via email

blefloch commented Mar 3, 2018 via email

FrankMittelbach commented Mar 3, 2018

josephwright commented Mar 5, 2018

wspr commented Mar 5, 2018

FrankMittelbach commented Mar 5, 2018 via email

wspr commented Mar 5, 2018 via email

blefloch commented Mar 6, 2018 via email

wspr commented Mar 6, 2018 via email

FrankMittelbach commented Mar 6, 2018 via email

FrankMittelbach commented Mar 6, 2018 via email

josephwright commented Mar 21, 2018

blefloch commented Mar 21, 2018 via email

FrankMittelbach commented Mar 21, 2018

josephwright commented Mar 21, 2018

josephwright commented Mar 21, 2018

blefloch commented Mar 21, 2018 via email

josephwright commented Mar 21, 2018

FrankMittelbach commented Mar 21, 2018 via email

blefloch commented Mar 21, 2018 via email

FrankMittelbach commented Mar 21, 2018 via email

josephwright commented Mar 22, 2018

josephwright commented Mar 24, 2018

FrankMittelbach commented Mar 24, 2018

josephwright commented Mar 24, 2018

FrankMittelbach commented Mar 24, 2018

josephwright commented Mar 24, 2018