-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory model tests #11004
Memory model tests #11004
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good idea to me.
- I made some optional comments with proposals to use the standard library instead of writing your own traversals, if you want.
- To me the test names are opaque (AP, LB...), you could consider adding a commend to expand the acronyms in some places, but I guess only memory model experts will look at this file and they know the lingo already?
Great job! There's not only a function named Two of the CI jobs (linux-debug and testsuite/normal) fail because the new test runs for more than 10 minutes and is killed on a timeout. Either it's looping or it's taking too long. |
You are welcome
This surprises me, I thought I had experienced decent runtimes on raspberry PI. How many cores has the CI machine? Confirmed: about 40 seconds on raspberry PI. |
I don't know! These are virtual machines provided by Github Actions... Note that they are x86-64 machines; maybe you want to test on a PC too. |
Tests run much faster on my PC (no virtual machine...). I'll try with less demanding parameters. |
The Github Actions runners have 2 cores, apparently: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners |
Thanks, this drastically limits memory model testing... I adapt parameters for using 2 cores only, |
Sounds like the right thing to do, at least for the "standard" runs of the test suite. Some of the machines used in the Jenkins CI at Inria have a fair number of cores (x86 Linux and macOS, ARM 64 Linux and macOS, PowerPC) but others are 2-core virtual machines (x86 Windows, System Z). We could find a way to communicate the number of usable cores to the tests, e.g. via an environment variable... |
I have followed most of your suggestions, except when there is a risk of building large intermediate lists.
Those names are conventional. It would be quite verbose to document them properly in the code, |
Forbidden test gathers forbidden and allowed tests in a single `forbidden.ml` source file. All tests are 2 thread tests, by default (ocamltest mode), only forbidden litmus tests with minimal output are run. One behaviour of such litmus tests is illegal according to the OCaml memory model but can show up on relaxed hardware if implementation is too weak. The test can also be run manually, passing changing parameters on the command line. In particular, option `-v` commands also running _allowed_ tests. One relaxed behaviour of those tests is legal according the OCaml memory model but will show up on relaxed hardware or due to legal compiler optimisations. Observing those tests gives some insight on the underlying hardware. Due to CI limiting the number of available codes, tests assume that no more than 2 cores are available.
Split `forbidden` test source into several source files which are now shared with the new `publish` test. Also add a Makefile to compile the tests, for direct execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fully in favour of test coverage for the memory model code, so this is great to see.
I have run these tests (295046e) merged with the Arm64 support (#10972 at e7b3839) and it passed on a M1 Mac mini (8 core), an AWS Graviton2 (32 core, ARMv8) and a dual socket Intel Xeon E5-2695 (72 cores).
Is it easy to add some coverage for the other 'types' of atomic write (specifically Atomic.compare_and_set
and Atomic.fetch_and_add
)? The motivation is how best to test commits that alter the C side memory barriers (such as d7eced9).
You mean replacing |
I'm suggesting to parameterise over Could be 'cut-and-paste' if it is only a couple of variants that make sense rather than trying to restructure the code, not sure what is best. |
I have written a few variants: starting from tests MP+AP let x = Atomic.make 0 and y = ref 0
let code0 (x,y) =
Atomic.set x 1 ;
y := 1
let code1 (x,y) =
let r0 = !y in
let r1 = Artomic.get y in
{r0; r1}
exists (r0=1 && r1=0 ) I then replace I have also added two variations of MP+PA (in code above |
The new primitives replace `Atomic.set` in a few tests. In particular, one aims at testing their "fence effet" when occuring first in the write side of MP tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm delighted to see this working. These tests will be very useful for the ARMv8 and the Power ports.
I haven't read all the code, but I trust @maranget on getting the tests right.
@maranget and I discussed the interface of the test yesterday. With ocamltest
, the test is lightweight (2 cores, runs in 10 seconds on my Linux PC) and should run fine on all our CI machines. More demanding parameters can be selected by building the test programs with make
and running them manually.
I'm sure other approaches than make
can be considered, but this should suffice for the time being. We can already consider a special job on the Jenkins CI at Inria that would run these tests and maybe some other tests with higher degrees of parallelism on CI machines that have enough cores.
I just observed a CI run where forbidden.ml fails with a timeout:
Could we consider tuning the test to make sure that this does not happen? (Otherwise if it fails every N builds, many people in the future are going to spend some useless time figuring out why their CI build fails.) |
Most of the parallel tests fail once in a while during CI, even now that they've been scaled down. This may be because they are sensitive to scheduling, and the VMs used for CI (Github Actions as well as Jenkins @ Inria) exhibit bizarre schedules... One possible solution would be to run these parallel tests only on real hardware with enough cores, but not in Github Actions nor on the main Jenkins jobs. |
I don't know what should be done, but I think that we should strongly avoid tests that fail once in a while from marking the CI as "failed". This has a cost, even to people working on completely unrelated things; these costs quickly adds up into making the development environment feel unpleasant. So yes, if we don't have another idea, disabling certain tests from "CI machines" could be a choice. I think it's better than doing nothing at least, or than removing the test altogether of course. |
Note: I just observed another test failure due to this test failing (in native, but not in bytecode) with a timeout: https://github.com/ocaml/ocaml/runs/5962530485 . |
A new sub-directory of
testsuite
is created for testing the memory model. A first testforbidden. ml
checks 2 thread forbidden tests.