Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory model tests #11004

Merged
merged 6 commits into from
Feb 18, 2022
Merged

Memory model tests #11004

merged 6 commits into from
Feb 18, 2022

Conversation

maranget
Copy link
Contributor

A new sub-directory of testsuite is created for testing the memory model. A first test forbidden. ml checks 2 thread forbidden tests.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good idea to me.

  • I made some optional comments with proposals to use the standard library instead of writing your own traversals, if you want.
  • To me the test names are opaque (AP, LB...), you could consider adding a commend to expand the acronyms in some places, but I guess only memory model experts will look at this file and they know the lingo already?

testsuite/tests/memory-model/forbidden.ml Outdated Show resolved Hide resolved
testsuite/tests/memory-model/forbidden.ml Outdated Show resolved Hide resolved
testsuite/tests/memory-model/forbidden.ml Outdated Show resolved Hide resolved
@xavierleroy
Copy link
Contributor

Great job! There's not only a function named zyva, but also a module named Zyva; this is so nice of you :-)

Two of the CI jobs (linux-debug and testsuite/normal) fail because the new test runs for more than 10 minutes and is killed on a timeout. Either it's looping or it's taking too long.

@maranget
Copy link
Contributor Author

maranget commented Feb 11, 2022

Great job! There's not only a function named zyva, but also a module named Zyva; this is so nice of you :-)

You are welcome

Two of the CI jobs (linux-debug and testsuite/normal) fail because the new test runs for more than 10 minutes and is killed on a timeout. Either it's looping or it's taking too long.

This surprises me, I thought I had experienced decent runtimes on raspberry PI. How many cores has the CI machine?

Confirmed: about 40 seconds on raspberry PI.

@xavierleroy
Copy link
Contributor

How many cores has the CI machine?

I don't know! These are virtual machines provided by Github Actions... Note that they are x86-64 machines; maybe you want to test on a PC too.

@maranget
Copy link
Contributor Author

I don't know! These are virtual machines provided by Github Actions... Note that they are x86-64 machines; maybe you want to test on a PC too.

Tests run much faster on my PC (no virtual machine...). I'll try with less demanding parameters.

@xavierleroy
Copy link
Contributor

The Github Actions runners have 2 cores, apparently: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners

@maranget
Copy link
Contributor Author

Thanks, this drastically limits memory model testing... I adapt parameters for using 2 cores only,

@xavierleroy
Copy link
Contributor

I adapt parameters for using 2 cores only,

Sounds like the right thing to do, at least for the "standard" runs of the test suite.

Some of the machines used in the Jenkins CI at Inria have a fair number of cores (x86 Linux and macOS, ARM 64 Linux and macOS, PowerPC) but others are 2-core virtual machines (x86 Windows, System Z). We could find a way to communicate the number of usable cores to the tests, e.g. via an environment variable...

@maranget
Copy link
Contributor Author

maranget commented Feb 11, 2022

  • I made some optional comments with proposals to use the standard library instead of writing your own traversals, if you want.

I have followed most of your suggestions, except when there is a risk of building large intermediate lists.

  • To me the test names are opaque (AP, LB...), you could consider adding a commend to expand the acronyms in some places, but I guess only memory model experts will look at this file and they know the lingo already?

Those names are conventional. It would be quite verbose to document them properly in the code,

Forbidden test gathers forbidden and allowed tests in a
single `forbidden.ml` source file.

All tests are 2 thread tests, by default (ocamltest mode),
only forbidden litmus tests with minimal output are run.

One behaviour of such litmus tests  is illegal according
to the OCaml memory model but can show up on relaxed
hardware if implementation is too weak.

The test can also be run manually, passing changing
parameters on the command line. In particular,
option `-v` commands also running _allowed_ tests.
One relaxed behaviour of those tests is legal according
the OCaml memory model but will show up on relaxed hardware
or due to legal compiler optimisations. Observing those tests
gives some insight on the underlying hardware.

Due to CI limiting the number of available codes,
tests assume that no more than 2 cores are available.
@maranget maranget marked this pull request as ready for review February 15, 2022 21:26
Split `forbidden` test source into several source files which
are now shared with the new `publish` test.

Also add a Makefile to compile the tests, for direct execution.
Copy link
Contributor

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully in favour of test coverage for the memory model code, so this is great to see.

I have run these tests (295046e) merged with the Arm64 support (#10972 at e7b3839) and it passed on a M1 Mac mini (8 core), an AWS Graviton2 (32 core, ARMv8) and a dual socket Intel Xeon E5-2695 (72 cores).

Is it easy to add some coverage for the other 'types' of atomic write (specifically Atomic.compare_and_set and Atomic.fetch_and_add)? The motivation is how best to test commits that alter the C side memory barriers (such as d7eced9).

@maranget
Copy link
Contributor Author

s it easy to add some coverage for the other 'types' of atomic write (specifically Atomic.compare_and_set and Atomic.fetch_and_add)? The motivation is how best to test commits that alter the C side memory barriers (such as d7eced9).

You mean replacing Atomic.set by the other atomic functions? Or something more sophisticated?

@ctk21
Copy link
Contributor

ctk21 commented Feb 17, 2022

You mean replacing Atomic.set by the other atomic functions? Or something more sophisticated?

I'm suggesting to parameterise over Atomic.set, Atomic.compare_and_set and Atomic.fetch_and_add. That is we get executions over all three of the underlying (C) implementations and the barriers in them.

Could be 'cut-and-paste' if it is only a couple of variants that make sense rather than trying to restructure the code, not sure what is best.

@maranget
Copy link
Contributor Author

I have written a few variants: starting from tests MP+AP

let x = Atomic.make 0 and y = ref 0

let code0 (x,y) =
  Atomic.set x 1 ;
  y := 1

let code1  (x,y) =
  let r0 = !y in
  let r1 = Artomic.get y in
  {r0; r1}

exists (r0=1 && r1=0 )

I then replace Atomic.set in code0 by Atomic.fetch_and_and and Atomic.compare_and_set. This should check the "fence effect` of atomic write primitives (a.k.a C side memory barriers in d7eced9),

I have also added two variations of MP+PA (in code above x is a reference and y is an atomic int).

The new primitives replace `Atomic.set` in a few tests. In particular,
one aims at testing their "fence effet" when occuring first
in the write side of MP tests.
Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm delighted to see this working. These tests will be very useful for the ARMv8 and the Power ports.

I haven't read all the code, but I trust @maranget on getting the tests right.

@maranget and I discussed the interface of the test yesterday. With ocamltest, the test is lightweight (2 cores, runs in 10 seconds on my Linux PC) and should run fine on all our CI machines. More demanding parameters can be selected by building the test programs with make and running them manually.

I'm sure other approaches than make can be considered, but this should suffice for the time being. We can already consider a special job on the Jenkins CI at Inria that would run these tests and maybe some other tests with higher degrees of parallelism on CI machines that have enough cores.

@xavierleroy xavierleroy merged commit dc79f48 into ocaml:trunk Feb 18, 2022
@gasche
Copy link
Member

gasche commented Mar 3, 2022

I just observed a CI run where forbidden.ml fails with a timeout:

2022-03-03T15:53:19.6746434Z Running tests from 'tests/memory-model' ...
2022-03-03T15:53:19.6747276Z  ... testing 'forbidden.ml' with 1 (native) => Timeout expired, killing all child processes
2022-03-03T15:53:19.6747762Z Process 17185 got signal 9(Killed), no core dumped
2022-03-03T15:53:19.6748541Z failed (Running program /home/runner/work/ocaml/ocaml/testsuite/tests/memory-model/_ocamltest/tests/memory-model/forbidden/ocamlopt.byte/forbidden.opt without any argument: command
2022-03-03T15:53:19.6749376Z /home/runner/work/ocaml/ocaml/testsuite/tests/memory-model/_ocamltest/tests/memory-model/forbidden/ocamlopt.byte/forbidden.opt 
2022-03-03T15:53:19.6749903Z failed with exit code -9)

Could we consider tuning the test to make sure that this does not happen? (Otherwise if it fails every N builds, many people in the future are going to spend some useless time figuring out why their CI build fails.)

@xavierleroy
Copy link
Contributor

Most of the parallel tests fail once in a while during CI, even now that they've been scaled down. This may be because they are sensitive to scheduling, and the VMs used for CI (Github Actions as well as Jenkins @ Inria) exhibit bizarre schedules... One possible solution would be to run these parallel tests only on real hardware with enough cores, but not in Github Actions nor on the main Jenkins jobs.

@gasche
Copy link
Member

gasche commented Mar 3, 2022

I don't know what should be done, but I think that we should strongly avoid tests that fail once in a while from marking the CI as "failed". This has a cost, even to people working on completely unrelated things; these costs quickly adds up into making the development environment feel unpleasant.

So yes, if we don't have another idea, disabling certain tests from "CI machines" could be a choice. I think it's better than doing nothing at least, or than removing the test altogether of course.

@gasche
Copy link
Member

gasche commented Apr 10, 2022

Note: I just observed another test failure due to this test failing (in native, but not in bytecode) with a timeout: https://github.com/ocaml/ocaml/runs/5962530485 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants