Skip to content

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Nov 29, 2021

This PR generalizes argument-to-parameter matching in the shared data-flow library from simple integer position based matching to per-language defined matching.

Why

For static languages, integer based matching has worked just fine, since each call knows the signature of the callee(s), and adjustments of argument positions can be made accordingly at the call sites. For example, if we have a C# call

Foo(snd: x, fst: y)

which targets a Foo(int fst, int snd) method, but swaps the order using named arguments, then we know that x must have position 1 and y must have position 0. Similarly, for calls to methods with variadic arguments

Bar(1, 2, 3, 4)

where the signature is Bar(params int[] args), we can treat the call as syntactic sugar for

Bar(new int[] {1, 2, 3, 4})

that is, 1..4 are no longer ArgumentNodes, but instead there is a synthesized array argument node with position 0.

However, for dynamic languages we do not know the signature of the callee(s), so we may have a Ruby call

foo(1, 2, 3, 4)

that targets both a foo(a, b, c, d) function and a foo(first, *mid, last) function, so whether 2 and 3 should be wrapped in an array depends on the callee, and what gets passed into last depends on the number of arguments at the call sites.

How

This PR introduces two new classes ParameterPosition and ArgumentPosition, and a predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) to the interface of the shared data-flow library. Each ParameterNode has a ParameterPosition, each ArgumentNode has an ArgumentPosition, and an argument is compatible with a parameter as dictated by parameterMatch. (Note that parameterMatch does not rely on the call graph, we already have viableCallable for that.)

Defining ParameterPosition = ArgumentPosition = int and parameterMatch(ParameterPosition pp, ArgumentPosition ap) { pp = ap } yields the existing behaviour, and that is how all languages are currently implemented (except for C#, where I decided to use different IPA wrappers to catch type errors).

Follow-up work

For the named argument example, assume we have two Ruby calls

foo(fst: a, snd: b)
foo(snd: c, fst: d)

both targeting foo(fst: , snd: ). We can assign to fst, a, and d the positions Named(fst), assign to snd, b, and c the positions Named(snd), and have parameterMatch be the identity.

For the variadic arguments example, consider the Ruby call

foo(1, 2, 3, 4)

that targets foo(a, b, c, d) and foo(first, *mid, last). Assuming we have PositionalArg(x, y) mean positional argument x of y, PositionalParam(x) mean positional parameter x, SplatParam(x, y) mean splat parameter at position x skipping the last y arguments,LastParam(x) mean the xth last parameter, and mid_synth be a synthesized parameter node, we can assign the following positions

Element Position
1 PositionalArg(0, 4)
2 PositionalArg(1, 4)
3 PositionalArg(2, 4)
4 PositionalArg(3, 4)
a PositionalParam(0)
b PositionalParam(1)
c PositionalParam(2)
d PositionalParam(3)
first PositionalParam(0)
mid_synth SplatParam(1, 1)
last LastParam(0)

and define

parameterMatch(ParameterPosition ppos, ArgumentPosition apos) {
  exists(int x, int y |
    apos = PositionalArg(x, y) |
    ppos = PositionalParam(x)
    or
    exists(int from, int except |
      ppos = SplatParam(from, except) and
      x >= from and
      y - x > except
    )
    or
    exists(int last |
      ppos = LastParam(last) and
      y - x - 1 = last
    )
  )
}

mid will then no longer be a ParameterNode, but instead we will have an array store-step from mid_synth to mid, so we are effectively moving the implicit array creation from the caller to the callee, but in the context of the callee we know exactly which arguments to include.

@hvitved hvitved force-pushed the dataflow/argument-parameter-matching branch 2 times, most recently from 1faf063 to b1bb13d Compare November 30, 2021 13:17
@hvitved hvitved force-pushed the dataflow/argument-parameter-matching branch from b1bb13d to d54ae4e Compare November 30, 2021 15:13
@hvitved hvitved force-pushed the dataflow/argument-parameter-matching branch from d54ae4e to 31374b4 Compare December 1, 2021 08:01
@hvitved hvitved marked this pull request as ready for review December 1, 2021 09:23
@hvitved hvitved requested review from a team as code owners December 1, 2021 09:23
private predicate viableParamNonLambda(DataFlowCall call, ArgumentPosition apos, ParamNode p) {
exists(ParameterPosition ppos |
p.isParameterOf(viableCallable(call), ppos) and
parameterMatch(ppos, apos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right join-order, when parameterMatch isn't the identity.
I'd expect the following to do better:

private predicate argumentPosition(DataFlowCall call, ArgNode arg, ParameterPosition ppos) {
  exists(ArgumentPosition apos |
    arg.argumentOf(call, apos) and
    parameterMatch(ppos, apos)
  )
}
private predicate viableParamNonLambda(DataFlowCall call, ParameterPosition ppos, ParamNode p) {
  p.isParameterOf(viableCallable(call), ppos)
}
private predicate viableParamArgNonLambda(DataFlowCall call, ParamNode p, ArgNode arg) {
  exists(ParameterPosition ppos |
    viableParamNonLambda(call, ppos, p) and
    argumentPosition(call, arg, ppos)
  )
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this i more safe; I have pushed updates.

Comment on lines 1145 to 1150
override predicate argumentOf(DataFlowCall call, int pos) {
FlowSummaryImpl::Private::summaryArgumentNode(call, this, pos)
exists(ArgumentPosition apos |
FlowSummaryImpl::Private::summaryArgumentNode(call, this, apos) and
apos.getPosition() = pos
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be using an int for pos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is purely an internal C# implementation predicate.

yoff
yoff previously approved these changes Dec 7, 2021
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels good, I am looking forward to seeing how much of ArgumentPassing it will be able to replace :-)

@hvitved
Copy link
Contributor Author

hvitved commented Dec 8, 2021

This feels good, I am looking forward to seeing how much of ArgumentPassing it will be able to replace :-)

Yeah, I was expecting that we should be able to improve that code. One of the key points is that we will no longer need the call graph (connects in ArgumentPassing, I believe) in order to map argument positions to parameter positions, which means we will be able to also handle the case where a call might target different functions with very different signatures (see examples in the PR description).

@aschackmull
Copy link
Contributor

A few more inline comments, otherwise LGTM.

@aschackmull aschackmull merged commit 38d0bb4 into github:main Dec 8, 2021
@hvitved hvitved deleted the dataflow/argument-parameter-matching branch December 8, 2021 11:49
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 9, 2022
Corresponds to github#7260, though some
of those changes had already been made.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 9, 2022
Corresponds to github#7260, though some
of those changes had already been made.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 9, 2022
Corresponds to github#7260, though some
of those changes had already been made.
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 17, 2022
Corresponds to github#7260, though some
of those changes had already been made.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants