Skip to content

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Jun 2, 2020

Adds the SharedTaintStep class as discussed in https://github.com/github/codeql-javascript-team/issues/110.

This PR first bases it on a unit type, and then towards the end changes it to a string type. The categorization of steps is then evolved a bit more at the end.

Performance is OK.

Some previous experimental results that are not quite up-to-date with this PR or the performance PRs that landed separately in the meantime.

  • The unit type was slightly faster (without changes to the optimizer)
  • The FlowLabel.isSharedStep trick reduced tuple counts but caused a regression in real time anyway.

Some points to discuss:

  • Is this the right degree of categorization? I have feeling that I went slightly overboard with it.
  • Is it worth all the boilerplate?

@asgerf asgerf added the JS label Jun 2, 2020
@erik-krogh
Copy link
Contributor

What was the point of the Cache in different order commit?
Because if we go back to the previous approach, then I don't think we need the heapStep(), arrayStep() etc. predicates.

@asgerf
Copy link
Contributor Author

asgerf commented Jun 4, 2020

That was for performance reasons. It seemed to be faster and my theory is:

  • Projecting away a suffix of columns (in this case the last column) preserves the ordering of tuples, so the projection is cheaper. (In fact, it shouldn't even need to materialize the projection at all as it can be done during the join, but our optimizier materializes it anyway.)
  • Projecting away the first column requires a re-sorting of tuples even if the input relation was sorted.

The heapStep predicates etc are boilerplate on the implementation side, but serve a real purpose. I'd like to keep separate the predicate used to access steps from the one that must be overridden to contribute steps. If you use the same predicate for both, there is no way to ever change that predicate without breaking changes.

A concrete example of this is InvokeNode.getACallee(int imprecision). This can be overridden to augment the call graph, or called to read the call graph. If we wanted to deprecate that predicate there is no way to make it have the correct value (for callers of the predicate) while also ensuring that the data-flow library sees all call edges contributed through that predicate (without introducing spurious recursion, that is).

To support people overriding InvokeNode.getACallee(int) the flow of information goes like this:

FlowSteps::calls <--- InvokeNode.getACallee(int) <---- CallGraphs::getACallee
                                            ^--------- Overriders

But as far as I can tell, there is no way to change this without breaking either the callers or the overriders of InvokeNode.getACallee(int).

@asgerf
Copy link
Contributor Author

asgerf commented Mar 15, 2021

Superseded by the backlinked PR above.

@asgerf asgerf closed this Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants