-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby: Use taint tracking instead of type tracking to define regExpSource
#8332
Conversation
549194c
to
85415c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me. How does this approach compare to using a type tracker in terms of precision and efficiency?
Precision: The data-flow library has much higher precision, e.g. we are tracking contents (fields) and call contexts precisely. |
Since we discussed these two approaches in architecture meeting, I would be very interested to see whether you have any concrete examples where the new taint-tracking based modeling is able to give a result for |
As I understand it, we have to be careful to not re-use the same dataflow library multiple times (hence DataFlow2,3 etc.) If we want to use taint tracking for another library concept, will we have to duplicate this library again in case someone writes a query that uses both |
There is, in my opinion, really no reason to use type tracking if data-flow can be used. Even though no tests change, it doesn't mean that using taint-tracking instead of type tracking will have no effect in the future. In addition to precise tracking of fields and call contexts, data-flow also takes flow summaries into account, and applies the data-flow library's more precise call graph for callbacks/lambdas. I would be interested to see examples where using data-flow/taint-tracking instead of type tracking yields performance problems. |
That is correct. There are cases where reusing the same copy of the data-flow library in multiple places in other libraries is OK (specifically when those two libraries are known to always be in scope at the same time), but in general it is safer to always just use a separate copy. However, in the not-too-distant future we will be able to get rid of all the copies using a parameterized module instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I approve of switching to taint-tracking here. In JS I believe it was a case of "all we have is a hammer" so we used type-tracking because we don't have multiple copies of the data-flow configuration.
da66e1f
to
8b9efd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this PR look good, and I think it should be merged.
But I think there is more work to do in this area. What I want to be able to do is take a DataFlow::Node
for some arbitrary expression (that our analysis says has a regexp value), and get the resulting root RegExpTerm
. It's not clear to me how to do that. That's what motivated me to make #7985, but I'm not sure if that change is still needed after this PR and #8293.
Would that be solved by the |
8b9efd4
to
1437aef
Compare
Rebased again to sync in latest |
Follow-up to #8293.