C++: Better IR for varargs #3092

dbartol · 2020-03-19T23:49:24Z

This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using BuiltInOperationInstruction to model the various va_* macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs.

The va_start macro now generates a unary VarArgsStart instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type std::va_list. This value is then stored into the actual std::va_list variable via a regular Store.

The va_arg macro now loads the std::va_list argument, then emits a VarArg instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular Load to return the value of the argument. To model the side effect of moving to the next argument, we emit a NextVarArg instruction that takes the previous std::va_list value and returns an updated one, which is then stored back into the std::va_list variable.

The va_end macro just emits a VarArgsEnd unary instruction that takes the address of the std::va_list argument and does nothing, since va_end doesn't really do anything on most compiler implementations anyway.

The va_copy macro is just modeled as a plain copy.

Next up: Vararg-aware alias analysis (at least enough to get us to the point where we can thread taint flow).

This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using `BuiltInOperationInstruction` to model the various `va_*` macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs. The `va_start` macro now generates a unary `VarArgsStart` instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type `std::va_list`. This value is then stored into the actual `std::va_list` variable via a regular `Store`. The `va_arg` macro now loads the `std::va_list` argument, then emits a `VarArg` instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular `Load` to return the value of the argument. To model the side effect of moving to the next argument, we emit a `NextVarArg` instruction that takes the previous `std::va_list` value and returns an updated one, which is then stored back into the `std::va_list` variable. The `va_end` macro just emits a `VarArgsEnd` unary instruction that takes the address of the `std::va_list` argument and does nothing, since `va_end` doesn't really do anything on most compiler implementations anyway. The `va_copy` macro is just modeled as a plain copy.

jbj · 2020-03-20T11:12:29Z

The std::va_list type can effectively be a pointer or an array, depending on how it's used. I've only looked at the pretty-printed IR diff so far, but it looks like the va_arg macro previously let the array decay to a pointer before computing the address of the vararg to be loaded. That means it should behave the same (equally bad) with a va_list local var and a va_list parameter. But the new code generation for va_arg doesn't do the array-to-pointer conversion, so does that mean it loads and stores to a pointer instead of an array when va_arg is used as a parameter? Is that desired?

The same considerations apply to va_copy (see the discussion at https://linux.die.net/man/3/va_copy), but va_copy is of course not a major priority.

In any case, please add (or point me to) a test where va_start is in one function, and va_arg is in another.

dbartol · 2020-03-20T14:55:40Z

@jbj I thought I had the whole pointer-vs-array thing figured out, but clearly I should have added the test case you suggest, which would have shown me what I was missing. I'll add that test case and update the codegen accordingly.

In the Unix ABI, `std::va_list` is defined as `typedef struct __va_list_tag { ... } va_list[1];`, which means that any `std::va_list` used as a function parameter decays to `struct __va_list_tag*`. Handling this actually made the QL code slightly cleaner. The only tricky bit is that we have to determine what type to use as the actual `va_list` type when loading, storing, or modifying a `std::va_list`. To do this, we look at the type of the argument to the `va_*` macro. A detailed QLDoc comment explains the details. I added a test case for passing a `va_list` as an argument, and then manipulating that `va_list` in the callee.

dbartol · 2020-03-20T16:55:18Z

Fixing the va_list parameter case actually managed to make the code cleaner. Go figure.

jbj

Looks good. The new TranslatedElements have a daunting amount of detail in them, but it looks mostly routine, and the test output looks as I'd expect.

dbartol added enhancement New feature or request C++ labels Mar 19, 2020

dbartol requested review from jbj and rdmarsh2 March 19, 2020 23:49

dbartol requested review from a team as code owners March 19, 2020 23:49

C++: Fix formatting

fb71f78

jbj approved these changes Mar 23, 2020

View reviewed changes

jbj merged commit 1346592 into github:master Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Better IR for varargs #3092

C++: Better IR for varargs #3092

Uh oh!

dbartol commented Mar 19, 2020

Uh oh!

jbj commented Mar 20, 2020

Uh oh!

dbartol commented Mar 20, 2020

Uh oh!

dbartol commented Mar 20, 2020

Uh oh!

jbj left a comment

Uh oh!

Uh oh!

C++: Better IR for varargs #3092

C++: Better IR for varargs #3092

Uh oh!

Conversation

dbartol commented Mar 19, 2020

Uh oh!

jbj commented Mar 20, 2020

Uh oh!

dbartol commented Mar 20, 2020

Uh oh!

dbartol commented Mar 20, 2020

Uh oh!

jbj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!