Skip to content

Swift: Add CollectionContent to defaultImplicitTaintRead #14521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 17, 2023

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Oct 16, 2023

It's been observed that quite a lot of Swift queries have an allowImplicitRead rule for reading out of CollectionContent; and it's been further pointed out that we could put that rule in defaultImplicitTaintRead so that it applies to all queries. This behaviour is not expected to ever be undesirable in the context of taint flow queries, so having it there simplifies queries and helps avoid mistakes*. This PR does this change.

* - it turns out we do indeed find some new good results in some of the encryption queries as a result of this change - though new barriers were also required to prevent result duplication.


Performance is a slight concern, though we've seen no problems in the queries that were already doing this via allowImplicitRead. I want to call out that we're matching CollectionContent without restricting the node, whereas the Java equivalent case makes some effort to narrow the DataFlow nodes it applies to. I believe matching all nodes is what we want - quite a broad range of nodes could contain CollectionContent and I would not want to miss some of them out. There is a restriction to (roughly) sink nodes that comes into defaultImplicitTaintRead through the bindingset[node] annotation, so we're not really matching all nodes anyway. There's also only one CollectionContent type in Swift, so a cartesian product is impossible. In any case I've briefly tested performance locally (with no clear result), and DCA should confirm no slowdown.


Note to self: there's another case of allowImplicitRead in #14383 . I can remove it after this PR is merged.

@geoffw0 geoffw0 added the Swift label Oct 16, 2023
@geoffw0 geoffw0 requested a review from a team as a code owner October 16, 2023 17:50
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// We often expect taint to reach a sink inside `CollectionContent`, for example an array element
// or pointer contents. It is convenient to have a default implicit read step for these cases rather
// than implementing this step in a lot of separate `allowImplicitRead`s.
cs.getAReadContent() instanceof DataFlow::Content::CollectionContent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever run into performance problems here a natural thing to do would be to add a conjunct that further restricts node (for example: it's of a type that can actually carry CollectionContent). But since we're not observing any performance issues here yet I don't think we need to go down that rabbit hole for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, indeed Java already does this. My worry is that we miss out some nodes that could hold collection content and thus end up missing results that we could have got for less effort. Absolutely an option if we see any problems though.

The DCA run looks fine, aside from a bit of wobble.

@MathiasVP MathiasVP merged commit 0ad338f into github:main Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants