Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upA Formal C Memory Model Supporting Integer-Pointer Casts #30
Comments
nikomatsakis
added
the
K-Related-Work
label
Sep 21, 2016
This comment has been minimized.
This comment has been minimized.
|
My notes: First, keep in mind that most of the examples are equally valid when Second, in Rust we probably want to have the physical addresses only be legal for limited periods of time. In fact, that basically results in a variant of the ACA model where all mallocs are asserted.
It might not be that important for performance, but LLVM will happily do the optimization for us, so it has to be a part of our memory model. |
This comment has been minimized.
This comment has been minimized.
Ericson2314
commented
Sep 21, 2016
|
I need to read the paper more closely, but I wonder whether it was considered to do dataflow analysis on cast(ed) pointers---i.e. once a pointer is casted, only functions that get the cast-result integer or the original pointer are allowed/assumed to make arbitrary reads/writes to the block in which the pointer points. (On the Rust side of things, I've similar thought about doing dataflow analysis on "poisoned" unsafe values.) |
This comment has been minimized.
This comment has been minimized.
|
Memory models don't do dataflow. Also, any attempt at interesting analysis runs into the problem that optimizers will optimize |
This comment has been minimized.
This comment has been minimized.
jeehoonkang
commented
Sep 22, 2016
•
|
Hello, I am an author of the paper, and would like to participate in this discussion. I am really glad that the Rust community is interested in the paper!
|
This comment has been minimized.
This comment has been minimized.
|
Thanks for coming to talk to us @jeehoonkang. We'll take a look at the relaxed-memory paper, but currently we have enough trouble getting the single-threaded semantics right. @orilahav's previous taming-acquire-release paper was excellent, so I hope this one is similarly good. First, the ACA and capability models are our currently very-informal-and-incomplete candidate models (see #26 and #28). It would be nice if someone in the know came to look and them. I don't like a "dataflow"-style model for the same reason as you. However, C has its odd "counterfactual" definition for restrict - a pointer expression
That's the entire point of the memory model: If you keep to hereditarily-safe Rust, you could just have the "fully symbolic" memory model. Correctly-written unsafe Rust is supposed to be UB-free, and we are trying to find a good enough definition of UB that programmers, sanitizers and optimizers can agree on. Also, in Rust, the widely-used raw pointers are pretty much interconvertible with integers (at the very least, int<->raw-ptr casts are safe and therefore must never be UB). The ACA model's idea is to handle raw pointers just like integers, which works moderately well. If we can tame LLVM (it's not much use having a memory model that your optimizer misoptimizes by design!), I think that we can have something that handles raw pointers and integers separately. |
This comment has been minimized.
This comment has been minimized.
|
I feel like we have to treat raw pointers and integers separately, because LLVM does. Yes, you can cast them, but for example LLVM feels free to reorder accesses to pointers it deems "noaliasing" even if, after casting them to integers, they compare equal. |
This comment has been minimized.
This comment has been minimized.
jugglerchris
commented
Sep 22, 2016
|
I'm a bit of an outsider, but I think it would be a shame if the rules for writing unsafe code ended up being compromised by the current LLVM (which I consider an implementation detail) at the expense of ease of understanding (for "unsafe" programmers) or possible future optimisations (for future Rust implementations, or updated LLVM). |
This comment has been minimized.
This comment has been minimized.
gilhur
commented
Sep 22, 2016
|
@arielb1 We have a formal model for "noalis" in LLVM that can be very well explained in our pointer-integer casting model (as long as our understanding of no alias is correct). I or Jeehoon will get back with more details tmr because it is midnight in Korea :) |
This comment has been minimized.
This comment has been minimized.
You can compare pointers even without casting them to integers. |
This comment has been minimized.
This comment has been minimized.
Our memory model does need to be compatible with LLVM's or it won't be worth much. |
This comment has been minimized.
This comment has been minimized.
jugglerchris
commented
Sep 22, 2016
Yes, of course. Are changes/enhancements to LLVM to support a better overall Rust memory model an option that would be considered? I understand that resources are limited for such work. |
This comment has been minimized.
This comment has been minimized.
sanxiyn
commented
Sep 28, 2016
|
@gilhur Can you tell us more about this formal model for noalias in LLVM? |
This comment has been minimized.
This comment has been minimized.
|
Just FYI, I took a stab at summarizing the "promising semantics" paper in #32, though I don't think I did it justice. =) Still digesting. |
This comment has been minimized.
This comment has been minimized.
burdges
commented
Sep 30, 2016
•
|
Interesting SOPS'13 paper on analyzing optimization-unstable code : Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior (SOSP'13) by Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. |
This comment has been minimized.
This comment has been minimized.
gilhur
commented
Oct 4, 2016
|
@sanxiyn The formal model for noalias in LLVM essentially requires the notion of logical memory, which is the underlying model for CompCert and also for our ptr-int casting model. I am a bit hesitant to explain the idea here because we haven't published it yet. I will write a short tech report about it and give a link to it here. But please wait until the PLDI deadline in Nov because I am too busy until then. |
This comment has been minimized.
This comment has been minimized.
Zoxc
commented
Jul 23, 2017
|
@gilhur I would like some information about your model for noalias/pointer aliasing in LLVM, especially for things relating to pointer equality and pointer comparisons. |
This comment has been minimized.
This comment has been minimized.
gilhur
commented
Jul 28, 2017
|
@Zoxc We are currently working on developing an LLVM model for handling pointers including casting between pointers and integers, and pointer equality/comparison. We also update the LLVM compiler accordingly including the alias analysis module. Probably we can make some written documents available within a few months. I will notify you here when it is out. |
nikomatsakis commentedSep 21, 2016
•
edited by arielb1
Links: PDF, ACM
This is a memory model targeting C. The C specification itself has very loose rules that don't support a lot of common things people do in C programs, primarily around using pointers as integers (e.g., bitmasking, etc).
The key ideas of the model are as follows:
ito a pointerp, you have to assume that accesses throughpcould affect any block with a physical address, but you know they can't affect anything that doesn't have a physical address.Some examples from the paper:
Keeping local variables private (supported)
Consider this question:
In this model, the answer is no: the allocation of
ais private, and at no point did the code do something like(int) &ato convert it into a concrete block.Keeping local allocations private (supported)
Consider this question:
In this model, the answer is no: the allocation of
pis private, and at no point did the code do something like(int) pto convert it into a concrete block.The paper calls this "ownership transfer". I've avoided this term because it's so different from what Rust means by this term. It's more like "retaining" or "respecting" ownership (of
p).The paper says that
clang -O2does this kind of optimization, but notgcc -O2, and claims therefore that it may not be that important.Dead Cast Elimination (unsupported in the model)
Consider this question (not from paper):
In this model, we cannot eliminate
(int) peven if the result is not used because it has a side-effect of (potentially) assigningpa concrete value.The paper addresses this by doing this optimization later in the pipeline. Basically at the point where we drop this model and convert to something more concrete.
Example limitation
Here is an example of an optimization prohibited by this approach.
The answer is no:
pwas made "concrete", and hence may be affected bybar(). Interestingly, a slight reordering would enable the optimization: