Skip to content

Swift: Adopt the shared sensitive data library #13190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 2, 2023

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented May 16, 2023

Adopt the shared sensitive data library SensitiveDataHeuristics.qll, that is shared across language
teams. Unfortunately it turns out this only replaces about half of the logic in SensitiveExprs.qll (the rest being handled by another set of heuristics PrivateData.qll in some languages). Thus, things aren't as clean as I'd hoped at this stage, but this PR is a start...

  • we find more sensitive expressions, because the heuristics are more thorough, and as a result we find quite a lot of new (mostly correct) real world results for all four of the swift/cleartext-* queries 🎉
  • but as you'd expect from a heuristic approach, not every source is ideal. In particular, we often match sources that look good at a glance but aren't quite what the heuristic probably had in mind, and on occasion a dubious source is linked to a sink and we get a dubious result.
    • I've already fixed some of the less than ideal results I found in Swift: Fix some FPs from the sensitive data library #13167 .
    • I'm not sure about the userId and username patterns; some of these results will be valuable TPs, some FP because the username is public anyway. A clear example: URL(string: "https://socialwebsite.com/\(username)") --- addressed by removing the id() category of heuristics.
    • I'm not confident about uuid either (.uuid, .uuidString, taskUUID and stuff like that); these mostly come from https://developer.apple.com/documentation/foundation/uuid , and I don't see any reason to mark them as sensitive. --- addressed by removing the id() category of heuristics.
    • "count", as in SecTrustGetCertificateCount is clearly not a certificate. --- exclude by "count" but not "account"?
    • "image", as in .gitAccountImage is clearly not an account number. --- addressed by removing the id() category of heuristics; I'm also considering the word "image" as a "probably safe" indicator.
    • "useridle" is not a "userid". --- addressed by removing the id() category of heuristics
    • accountVC (VC = ViewController); the old version avoided this by only matching accountid / account.?key / bank.?account so I'm wondering if all uses of account is too wide. --- this seems to be a one-off, most matches of "account" LGTM.
    • we could potentially also avoid some FPs by excluding sources that are constant, or NSLocalizedString(constant), regardless of the name of that constant. e.g. something like let passwordPrompt = NSLocalizedString(passwordPrompt). And NSLocalizedString(:tableName:comment:). I've seen this a couple of times so it's probably worth doing at some point. --- planned as follow-up.
  • I had to keep a few of our original regexps as extra cases to avoid losing some good results; these should probably be pushed up to the shared library (but that will require review from the other language teams so I will deal with it separately).
  • performance is a little bit worse (tested locally), probably due to evaluating a number of regexps per name. Not drastic, and I plan to fix it.
  • this is a draft PR for now, pending discussion and/or my exploring this a bit deeper.
  • DCA run to follow...

TODO: when this is ready, it will need a change note as well. --- done.

@geoffw0 geoffw0 added no-change-note-required This PR does not need a change note Swift and removed no-change-note-required This PR does not need a change note labels May 16, 2023
@geoffw0
Copy link
Contributor Author

geoffw0 commented May 26, 2023

I've just pushed changes so that we no longer use the id() category from the shared library, which appears to have a reputation for being inaccurate and has caused many of the FPs discussed in my PR description. I'm still running some more MRVA tests before I bring this PR out of draft...

@geoffw0 geoffw0 marked this pull request as ready for review May 30, 2023 08:16
@geoffw0 geoffw0 requested a review from a team as a code owner May 30, 2023 08:16
@geoffw0
Copy link
Contributor Author

geoffw0 commented May 30, 2023

Ready for review. The vast majority of false positive concerns have been addressed by removing the id() class of results, which at least one other language team has admitted to excluding for the same reasons. I will start a DCA run shortly.

@geoffw0 geoffw0 force-pushed the sharedsensitive branch from 63a61b3 to c546e7c Compare May 30, 2023 08:31
@geoffw0 geoffw0 force-pushed the sharedsensitive branch from c546e7c to d506172 Compare May 30, 2023 08:41
@geoffw0
Copy link
Contributor Author

geoffw0 commented May 30, 2023

I should add, I've finished the MRVA testing. We're not completely free of false positives sources (we never were, it's heuristic), e.g.

introducerToken (nicklockwood/SwiftFormat)
       ^^^^
kSecReturnData (uber/rides-ios-sdk)
 ^^^^^^

but after a few iterations I'm confident that actual results (i.e. sources that connect to sinks) are very good.

Most of the testing actually included some additional heuristic regexps as well, which I don't want to pollute this PR with. i.e. there will be a second PR. It was more efficient to test both sets of changes together.

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 30, 2023

DCA shows three result changes:

  • 1 new result for swift/cleartext-storage-database in SideStore__SideStore
    • writing something called certificateSerialNumber in a Core Data DB
    • looks good as far as I understand this
  • 2 lost results for swift/cleartext-storage-database in signalapp__Signal-iOS
    • variables called accountId, with a GRDB sink
    • it looks like I missed patching this case when I removed the id() class of results
      • I've pushed a fix, but I'll run DCA again...

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 30, 2023

DCA still shows three result changes, but now they're all new results:

  • 1 new result for swift/cleartext-storage-database in SideStore__SideStore
    • writing something called certificateSerialNumber in a Core Data DB
  • 2 new results for swift/cleartext-storage-database in signalapp__Signal-iOS
    • certificateDataValue is being stored in a GRDB (I think) DB

I think these are all good, or at least plausibly good results.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a small comment.

swift/schema.py Outdated
Comment on lines 231 to 232
name: optional[string] | doc("name of this callable") | desc("The name includes any arguments "
"of the callable, for example `myFunction(arg:)`.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To stick with the right Swift terminology we should probably be more precise what we mean by "argument" here. Swift has both argument labels, and parameter names (see here), and I think the name we produce for a function declaration uses the argument label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the argument labels here. Parameter names are internal implementation details of the function and nobody else should care about them.

I've reworded slightly (and fixed the merge conflict).

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP MathiasVP merged commit 0adff53 into github:main Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants