Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lkmm making rmw barriers explicit #865

Conversation

hernanponcedeleon
Copy link
Contributor

This PR implements the changes discussed in [1,2]

The idea is to

  • change herd7's source code to avoid explicitly creating Mb fences before and after RMWs
  • propagate Mb tags from RMW to the corresponding read and writes
  • update the lkmm model to enforce the ordering that is now missing due to the removed fences.

I tested the changes using the scripts from the kernel (tools/memory-model/scripts) and I do get the expected results.

I also added a test case showing that the first two cases alone are not enough and indeed the model needs to be updated.

I have a few open questions and thus the DRAFT status

  • do my changes affect other parts of the source code I am not aware of?
  • do we need to update the bell file to allow mb in RMW instructions as mentioned in [3]? I would expect so, but I still get the correct results without this change
  • does the code looks ok? can we do further simplifications? This is my first experience with OCaml. While the changes have the impact I expect, I am not sure this is the best way to achieve the points from above.

NOTE: the proposed changes in the cat file are not final. I just used one of the alternatives discussed in the mailing list.

[1] https://lkml.org/lkml/2024/4/4/1236
[2] https://lkml.org/lkml/2024/5/15/1026
[3] https://lkml.org/lkml/2024/5/21/357

@JonasOberhauser
Copy link

JonasOberhauser commented May 23, 2024

Do you really need all of those matches? It seems to me most of those are just identity now, e.g.,

                 (match a with ["acquire"] -> MOorAN.AN a | ["mb"] -> MOorAN.AN a | _ -> an_once)

would be

                 MOorAN.AN a

and

                (match a with ["release"] -> a | ["mb"] -> a | _ -> a_once)

would be

               a

@hernanponcedeleon
Copy link
Contributor Author

It seems to me most of those are just identity now

I think this is not true. Otherwise, we would end up we R[release] and W[acquire] events.

@JonasOberhauser
Copy link

JonasOberhauser commented May 24, 2024

Good point, I forgot about those.

I would exclude those in the .bell, although perhaps there ought to be a way to either state what tags apply to each read/write or how to "decay" barriers where they don't belong - one could say that if one declares R[x,y,z] then every tag other than x y z on a R decays to x.

@akiyks
Copy link
Contributor

akiyks commented May 26, 2024

It looks to me like what your are proposing here ends up in backward incompatible changes in herd7.
I mean herd7 should keep its behavior with regard to existing CAT/BELL code.

Please note that you can not expect every existing out-of-tree CAT/BELL code be updated in sync with herd7.

So I think you need to come up with a more considerate approach which won't need changes in existing code.

Or am saying something unreasonable?

@hernanponcedeleon
Copy link
Contributor Author

It looks to me like what your are proposing here ends up in backward incompatible changes in herd7. I mean herd7 should keep its behavior with regard to existing CAT/BELL code.

It is true that if we change herd7, but the cat model is not updated, one gets different results.

Please note that you can not expect every existing out-of-tree CAT/BELL code be updated in sync with herd7.

Backward compatibility is probably a must for previous lkmm versions in the kernel, but I assume the changes in the cat model can be backported to all stable kernel versions since lkmm was introduced.

My (probably naive) question is: do we really have a backward compatibility requirement for cat/bell code outside the kernel?

Notice that lkmm has evolved since its first version and it definitely has changes that modify the expected results on some litmus tests. Thus, if you are using an old cat file, you anyway get "wrong" results.

Wouldn't it be sufficient to have this backward incompatibility clearly documented somewhere? E.g., not sure if herd7 uses semantic versioning, but there are proper ways of documenting such kind of changes.

@JonasOberhauser
Copy link

It looks to me like what your are proposing here ends up in backward incompatible changes in herd7. I mean herd7 should keep its behavior with regard to existing CAT/BELL code.

Please note that you can not expect every existing out-of-tree CAT/BELL code be updated in sync with herd7.

There's two ways, one is to:

  • update the LKMM first in a way that works with either version of herd
  • wait a long time so that people can catch up
  • update the herd model in a way that doesn't break the new LKMM code, but can break the old one
  • wait a long time
  • update the LKMM with everything that relies on the new herd

The other is not to update herd at all, and just hide its hardcoded behaviors as much as possible.

In my opinion, all of this hardcoded behavior ought to be removed as soon as possible, independently of whether we add a fuller macro language and define the representations explicitly in .def, or change the representations and define analogous semantics in .cat and .bell.

@hernanponcedeleon hernanponcedeleon force-pushed the lkmm-making-rmw-barriers-explicit branch 2 times, most recently from 3a3d387 to d462f13 Compare July 12, 2024 05:01
@hernanponcedeleon
Copy link
Contributor Author

The latest version of the code solves the backward compatibility mentioned by @akiyks.

I added an option that allows to skip the hardcoded barriers, but by default herd7 still behaves the same.
When the flag is set, using all the patches mentioned here and here, I get correct results for all litmus tests from this repo.

@hernanponcedeleon hernanponcedeleon changed the title [DRAFT] Lkmm making rmw barriers explicit Lkmm making rmw barriers explicit Jul 15, 2024
@akiyks
Copy link
Contributor

akiyks commented Jul 16, 2024

The latest version of the code solves the backward compatibility mentioned by @akiyks.

I added an option that allows to skip the hardcoded barriers, but by default herd7 still behaves the same. When the flag is set, using all the patches mentioned here and here, I get correct results for all litmus tests from this repo.

Thank you for taking your time for backward compatibility!

That said, if I mistakenly put the -lkmm-legacy false option when using a legacy LKMM model, I would get the slightly different results WRT atomic ops. This looks to me as another mode of backward incompatibility.

Wouldn't it possible to put a version info somewhere in your new LKMM model, and make herd7 -lkmm-legacy false error out if the version info is missing? Latest release of herd7 doesn't know of -lkmm-legarcy and it won't able to run the new LKMM model, so there is no compatibility issue in this combination.

@hernanponcedeleon
Copy link
Contributor Author

Wouldn't it possible to put a version info somewhere in your new LKMM model, and make herd7 -lkmm-legacy false error out if the version info is missing? Latest release of herd7 doesn't know of -lkmm-legarcy and it won't able to run the new LKMM model, so there is no compatibility issue in this combination.

One possibility would be to implement the legacy mode as an "architecture variant" rather than as an option and then use something like this in the new model

flag ~empty (if "lkmm-legacy" then 0 else _)
  as new-lkmm-models-does-not-support-variant-legacy

Some arm models use something like this.

Strictly speaking, this would not make herd7 to error out, but at least it would flag that the model is not being used as intended.

@hernanponcedeleon
Copy link
Contributor Author

The code is fully functional with the changes proposed in the LKML and backwards compatible by default.

@maranget would you mind taking a look to this PR?

@hernanponcedeleon
Copy link
Contributor Author

There is a new patch series in the linux kernel mailing list depending on these changes. It would be great if somebody can take a look.

@maranget
Copy link
Member

Hi @hernanponcedeleon. I am willing to merge. Is the default behaviour still the intended one ?

@hernanponcedeleon
Copy link
Contributor Author

hernanponcedeleon commented Sep 20, 2024

Hi @hernanponcedeleon. I am willing to merge.

Thanks @maranget for taking a look. Can we wait a bit to see if someone requests changes on the patch series? If those need further changes in herd, I would prefer to have those integrated in this PR as a single commit.

Is the default behaviour still the intended one ?

Yes, this is to guarantee backwards compatibility with all versions of the model as requested by Akira.

@maranget
Copy link
Member

Ok, let us wait. I have checked that the PR does not impact C11 tests. Everything is fine.

…xplicit Mb fences and adding noreturn tags

Signed-off-by: Hernan Ponce de Leon <zeta96@gmail.com>
@hernanponcedeleon
Copy link
Contributor Author

@maranget the last commit just renames the variants naming to start from v1 instead of v0 (as requested by @akiyks). There are no functional changes in that commit.

@akiyks
Copy link
Contributor

akiyks commented Oct 2, 2024

@maranget the last commit just renames the variants naming to start from v1 instead of v0 (as requested by @akiyks). There are no functional changes in that commit.

Yes. This works as expected against Jonas' v4 patch series at https://lore.kernel.org/lkmm/20240930105710.383284-1-jonas.oberhauser@huaweicloud.com/.

@maranget, this should be good for merging.

Thanks!

maranget added a commit that referenced this pull request Oct 2, 2024
…king-rmw-barriers-explicit'

LKMM upgrade
@maranget
Copy link
Member

maranget commented Oct 2, 2024

Merged by hand, thanks @hernanponcedeleon, @akiyks, @JonasOberhauser.

@maranget maranget closed this Oct 2, 2024
maranget added a commit that referenced this pull request Oct 10, 2024
[LKMM] Fix regression of atomic_add_unless 

@akiyks noticed I introduced a regression in c0ce66d (PR #865). In the version previous to that commit, `atomic_add_unless` could be used even if not properly defined in the *.bell file.

This PR restores the previous behavior, but it also allows to "redefine" the `atomic_add_unless` macro in the *.bell file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants