Skip to content

Conversation

dbartol
Copy link

@dbartol dbartol commented Mar 19, 2020

This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using BuiltInOperationInstruction to model the various va_* macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs.

The va_start macro now generates a unary VarArgsStart instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type std::va_list. This value is then stored into the actual std::va_list variable via a regular Store.

The va_arg macro now loads the std::va_list argument, then emits a VarArg instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular Load to return the value of the argument. To model the side effect of moving to the next argument, we emit a NextVarArg instruction that takes the previous std::va_list value and returns an updated one, which is then stored back into the std::va_list variable.

The va_end macro just emits a VarArgsEnd unary instruction that takes the address of the std::va_list argument and does nothing, since va_end doesn't really do anything on most compiler implementations anyway.

The va_copy macro is just modeled as a plain copy.

Next up: Vararg-aware alias analysis (at least enough to get us to the point where we can thread taint flow).

This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using `BuiltInOperationInstruction` to model the various `va_*` macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs.

The `va_start` macro now generates a unary `VarArgsStart` instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type `std::va_list`. This value is then stored into the actual `std::va_list` variable via a regular `Store`.

The `va_arg` macro now loads the `std::va_list` argument, then emits a `VarArg` instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular `Load` to return the value of the argument. To model the side effect of moving to the next argument, we emit a `NextVarArg` instruction that takes the previous `std::va_list` value and returns an updated one, which is then stored back into the `std::va_list` variable.

The `va_end` macro just emits a `VarArgsEnd` unary instruction that takes the address of the `std::va_list` argument and does nothing, since `va_end` doesn't really do anything on most compiler implementations anyway.

The `va_copy` macro is just modeled as a plain copy.
@dbartol dbartol added enhancement New feature or request C++ labels Mar 19, 2020
@dbartol dbartol requested review from jbj and rdmarsh2 March 19, 2020 23:49
@dbartol dbartol requested review from a team as code owners March 19, 2020 23:49
@jbj
Copy link
Contributor

jbj commented Mar 20, 2020

The std::va_list type can effectively be a pointer or an array, depending on how it's used. I've only looked at the pretty-printed IR diff so far, but it looks like the va_arg macro previously let the array decay to a pointer before computing the address of the vararg to be loaded. That means it should behave the same (equally bad) with a va_list local var and a va_list parameter. But the new code generation for va_arg doesn't do the array-to-pointer conversion, so does that mean it loads and stores to a pointer instead of an array when va_arg is used as a parameter? Is that desired?

The same considerations apply to va_copy (see the discussion at https://linux.die.net/man/3/va_copy), but va_copy is of course not a major priority.

In any case, please add (or point me to) a test where va_start is in one function, and va_arg is in another.

@dbartol
Copy link
Author

dbartol commented Mar 20, 2020

@jbj I thought I had the whole pointer-vs-array thing figured out, but clearly I should have added the test case you suggest, which would have shown me what I was missing. I'll add that test case and update the codegen accordingly.

In the Unix ABI, `std::va_list` is defined as `typedef struct __va_list_tag { ... } va_list[1];`, which means that any `std::va_list` used as a function parameter decays to `struct __va_list_tag*`. Handling this actually made the QL code slightly cleaner. The only tricky bit is that we have to determine what type to use as the actual `va_list` type when loading, storing, or modifying a `std::va_list`. To do this, we look at the type of the argument to the `va_*` macro. A detailed QLDoc comment explains the details.

I added a test case for passing a `va_list` as an argument, and then manipulating that `va_list` in the callee.
@dbartol
Copy link
Author

dbartol commented Mar 20, 2020

Fixing the va_list parameter case actually managed to make the code cleaner. Go figure.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The new TranslatedElements have a daunting amount of detail in them, but it looks mostly routine, and the test output looks as I'd expect.

@jbj jbj merged commit 1346592 into github:master Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants