Ruby: Use taint tracking instead of type tracking to define `regExpSource` #8332

hvitved · 2022-03-04T13:42:44Z

Follow-up to #8293.

aibaars

This looks fine to me. How does this approach compare to using a type tracker in terms of precision and efficiency?

hvitved · 2022-03-09T08:40:33Z

How does this approach compare to using a type tracker in terms of precision and efficiency?

Precision: The data-flow library has much higher precision, e.g. we are tracking contents (fields) and call contexts precisely.
Performance: While it may a bit slower, it doesn't matter much in practice (as witnessed by DCA). Unlike type-tracking, the data-flow library applies sophisticated pruning of nodes before tracking flow from a given source to a given node, which helps a lot on performance.

RasmusWL · 2022-03-09T09:56:24Z

Since we discussed these two approaches in architecture meeting, I would be very interested to see whether you have any concrete examples where the new taint-tracking based modeling is able to give a result for regExpSource where the old type-tracking based approach wasn't. I couldn't see any changes to tests, or any new results in alerts/alert-meta in the performance test.

hmac · 2022-03-09T20:14:11Z

As I understand it, we have to be careful to not re-use the same dataflow library multiple times (hence DataFlow2,3 etc.) If we want to use taint tracking for another library concept, will we have to duplicate this library again in case someone writes a query that uses both regExpSource and the other library concept? In other words, would this copy be better named something like DataFlowImplForRegExpSource, if we can't re-use it for anything else?

hvitved · 2022-03-10T15:41:47Z

Since we discussed these two approaches in architecture meeting, I would be very interested to see whether you have any concrete examples where the new taint-tracking based modeling is able to give a result for regExpSource where the old type-tracking based approach wasn't. I couldn't see any changes to tests, or any new results in alerts/alert-meta in the performance test.

There is, in my opinion, really no reason to use type tracking if data-flow can be used. Even though no tests change, it doesn't mean that using taint-tracking instead of type tracking will have no effect in the future. In addition to precise tracking of fields and call contexts, data-flow also takes flow summaries into account, and applies the data-flow library's more precise call graph for callbacks/lambdas. I would be interested to see examples where using data-flow/taint-tracking instead of type tracking yields performance problems.

hvitved · 2022-03-10T15:44:49Z

As I understand it, we have to be careful to not re-use the same dataflow library multiple times (hence DataFlow2,3 etc.) If we want to use taint tracking for another library concept, will we have to duplicate this library again in case someone writes a query that uses both regExpSource and the other library concept? In other words, would this copy be better named something like DataFlowImplForRegExpSource, if we can't re-use it for anything else?

That is correct. There are cases where reusing the same copy of the data-flow library in multiple places in other libraries is OK (specifically when those two libraries are known to always be in scope at the same time), but in general it is safer to always just use a separate copy. However, in the not-too-distant future we will be able to get rid of all the copies using a parameterized module instead.

asgerf

Overall I approve of switching to taint-tracking here. In JS I believe it was a case of "all we have is a hammer" so we used type-tracking because we don't have multiple copies of the data-flow configuration.

ruby/ql/lib/codeql/ruby/security/performance/ParseRegExp.qll

nickrolfe

The changes in this PR look good, and I think it should be merged.

But I think there is more work to do in this area. What I want to be able to do is take a DataFlow::Node for some arbitrary expression (that our analysis says has a regexp value), and get the resulting root RegExpTerm. It's not clear to me how to do that. That's what motivated me to make #7985, but I'm not sure if that change is still needed after this PR and #8293.

aibaars · 2022-03-18T11:56:25Z

The changes in this PR look good, and I think it should be merged.

But I think there is more work to do in this area. What I want to be able to do is take a DataFlow::Node for some arbitrary expression (that our analysis says has a regexp value), and get the resulting root RegExpTerm. It's not clear to me how to do that. That's what motivated me to make #7985, but I'm not sure if that change is still needed after this PR and #8293.

Would that be solved by the RegExpPatternSource defined in #7917 ?

…urce`

hvitved · 2022-03-18T13:48:53Z

Rebased again to sync in latest DataFlowImpl.qll/TaintTrackingImpl.qll changes.

github-actions bot added the Ruby label Mar 4, 2022

hvitved force-pushed the ruby/regexp-taint-flow branch from 549194c to 85415c9 Compare March 7, 2022 11:52

hvitved marked this pull request as ready for review March 8, 2022 08:05

hvitved requested a review from a team as a code owner March 8, 2022 08:05

hvitved added the no-change-note-required This PR does not need a change note label Mar 8, 2022

aibaars reviewed Mar 8, 2022

View reviewed changes

hvitved closed this Mar 9, 2022

hvitved reopened this Mar 9, 2022

asgerf reviewed Mar 15, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/security/performance/ParseRegExp.qll Show resolved Hide resolved

ruby/ql/lib/codeql/ruby/security/performance/ParseRegExp.qll Show resolved Hide resolved

hvitved force-pushed the ruby/regexp-taint-flow branch 3 times, most recently from da66e1f to 8b9efd4 Compare March 16, 2022 12:35

nickrolfe previously approved these changes Mar 18, 2022

View reviewed changes

hvitved added 2 commits March 18, 2022 14:48

Ruby: Add dataflow/taintracking copies for use in libraries

d97eaba

Ruby: Use taint tracking instead of type tracking to define `regExpSo…

1437aef

…urce`

hvitved dismissed nickrolfe’s stale review via 1437aef March 18, 2022 13:48

hvitved force-pushed the ruby/regexp-taint-flow branch from 8b9efd4 to 1437aef Compare March 18, 2022 13:48

nickrolfe approved these changes Mar 18, 2022

View reviewed changes

aibaars merged commit beef8e2 into github:main Mar 18, 2022

hvitved deleted the ruby/regexp-taint-flow branch March 22, 2022 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruby: Use taint tracking instead of type tracking to define `regExpSource` #8332

Ruby: Use taint tracking instead of type tracking to define `regExpSource` #8332

hvitved commented Mar 4, 2022 •

edited

Loading

aibaars left a comment

hvitved commented Mar 9, 2022

RasmusWL commented Mar 9, 2022

hmac commented Mar 9, 2022

hvitved commented Mar 10, 2022

hvitved commented Mar 10, 2022

asgerf left a comment

nickrolfe left a comment

aibaars commented Mar 18, 2022

hvitved commented Mar 18, 2022

Ruby: Use taint tracking instead of type tracking to define regExpSource #8332

Ruby: Use taint tracking instead of type tracking to define regExpSource #8332

Conversation

hvitved commented Mar 4, 2022 • edited Loading

aibaars left a comment

Choose a reason for hiding this comment

hvitved commented Mar 9, 2022

RasmusWL commented Mar 9, 2022

hmac commented Mar 9, 2022

hvitved commented Mar 10, 2022

hvitved commented Mar 10, 2022

asgerf left a comment

Choose a reason for hiding this comment

nickrolfe left a comment

Choose a reason for hiding this comment

aibaars commented Mar 18, 2022

hvitved commented Mar 18, 2022

Ruby: Use taint tracking instead of type tracking to define `regExpSource` #8332

Ruby: Use taint tracking instead of type tracking to define `regExpSource` #8332

hvitved commented Mar 4, 2022 •

edited

Loading