Skip to content

C++: Better IR for varargs #3092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 23, 2020

Conversation

dbartol
Copy link

@dbartol dbartol commented Mar 19, 2020

This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using BuiltInOperationInstruction to model the various va_* macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs.

The va_start macro now generates a unary VarArgsStart instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type std::va_list. This value is then stored into the actual std::va_list variable via a regular Store.

The va_arg macro now loads the std::va_list argument, then emits a VarArg instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular Load to return the value of the argument. To model the side effect of moving to the next argument, we emit a NextVarArg instruction that takes the previous std::va_list value and returns an updated one, which is then stored back into the std::va_list variable.

The va_end macro just emits a VarArgsEnd unary instruction that takes the address of the std::va_list argument and does nothing, since va_end doesn't really do anything on most compiler implementations anyway.

The va_copy macro is just modeled as a plain copy.

Next up: Vararg-aware alias analysis (at least enough to get us to the point where we can thread taint flow).

This PR changes the IR we generate for functions that accept a variable argument list. Rather than simply using `BuiltInOperationInstruction` to model the various `va_*` macros as mysterious function-like operations, we now model them in more detail. The intent is to enable better alias analysis and taint flow through varargs.

The `va_start` macro now generates a unary `VarArgsStart` instruction that takes the address of the ellipsis pseudo-parameter as its operand, and returns a value of type `std::va_list`. This value is then stored into the actual `std::va_list` variable via a regular `Store`.

The `va_arg` macro now loads the `std::va_list` argument, then emits a `VarArg` instruction on the result. This returns the address of the vararg argument to be loaded. That address is later used as the address operand of a regular `Load` to return the value of the argument. To model the side effect of moving to the next argument, we emit a `NextVarArg` instruction that takes the previous `std::va_list` value and returns an updated one, which is then stored back into the `std::va_list` variable.

The `va_end` macro just emits a `VarArgsEnd` unary instruction that takes the address of the `std::va_list` argument and does nothing, since `va_end` doesn't really do anything on most compiler implementations anyway.

The `va_copy` macro is just modeled as a plain copy.
@dbartol dbartol added enhancement New feature or request C++ labels Mar 19, 2020
@dbartol dbartol requested review from jbj and rdmarsh2 March 19, 2020 23:49
@dbartol dbartol requested review from a team as code owners March 19, 2020 23:49
@jbj
Copy link
Contributor

jbj commented Mar 20, 2020

The std::va_list type can effectively be a pointer or an array, depending on how it's used. I've only looked at the pretty-printed IR diff so far, but it looks like the va_arg macro previously let the array decay to a pointer before computing the address of the vararg to be loaded. That means it should behave the same (equally bad) with a va_list local var and a va_list parameter. But the new code generation for va_arg doesn't do the array-to-pointer conversion, so does that mean it loads and stores to a pointer instead of an array when va_arg is used as a parameter? Is that desired?

The same considerations apply to va_copy (see the discussion at https://linux.die.net/man/3/va_copy), but va_copy is of course not a major priority.

In any case, please add (or point me to) a test where va_start is in one function, and va_arg is in another.

@dbartol
Copy link
Author

dbartol commented Mar 20, 2020

@jbj I thought I had the whole pointer-vs-array thing figured out, but clearly I should have added the test case you suggest, which would have shown me what I was missing. I'll add that test case and update the codegen accordingly.

In the Unix ABI, `std::va_list` is defined as `typedef struct __va_list_tag { ... } va_list[1];`, which means that any `std::va_list` used as a function parameter decays to `struct __va_list_tag*`. Handling this actually made the QL code slightly cleaner. The only tricky bit is that we have to determine what type to use as the actual `va_list` type when loading, storing, or modifying a `std::va_list`. To do this, we look at the type of the argument to the `va_*` macro. A detailed QLDoc comment explains the details.

I added a test case for passing a `va_list` as an argument, and then manipulating that `va_list` in the callee.
@dbartol
Copy link
Author

dbartol commented Mar 20, 2020

Fixing the va_list parameter case actually managed to make the code cleaner. Go figure.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The new TranslatedElements have a daunting amount of detail in them, but it looks mostly routine, and the test output looks as I'd expect.

@jbj jbj merged commit 1346592 into github:master Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants