Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a non-deprecated API for Keychain access #7780

Closed
NickBorgersOnLowSecurityNode opened this issue Sep 29, 2022 · 13 comments · Fixed by #8192
Closed

Use a non-deprecated API for Keychain access #7780

NickBorgersOnLowSecurityNode opened this issue Sep 29, 2022 · 13 comments · Fixed by #8192
Labels
macOS triage Issue needs to be verified, reproduced and prioritized

Comments

@NickBorgersOnLowSecurityNode
Copy link
Contributor

Bug report

What operating system and version are you using?

 version = 12.6
   build = 21G115
platform = darwin

What version of osquery are you using?

Latest, the issue is in master.

What steps did you take to reproduce the issue?

We are seeing occasional corruption of Login Keychains across a fleet of 1000s of Mac OS devices. We have not observed any correlation (or significant occurence) of System Keychain corruption.

Our baseline was to be running SELECT * from certificates every 60 seconds as part of a device health compliance system. With this in-place as a query_pack we see ~0.3% of assets experience Login Keychain corruption every week, with the systems experiencing the bad event tending to experience it on an ongoing recurrent basis. There is a correlation with older hardware experiencing the brunt of the impact; newer systems are less likely to be affected.

What did you expect to see?

We did not expect osquery interacting with the keychain at any frequency to be correlated with corruption of the keychain.

What did you see instead?

When osquery interacts with the keychain frequently, we see occasional corruption of user keychains.

Attempt at RCA

Now that we have determined a correlation between osquery interacting with the keychain and corruption, we went looking at osquery source code and identified function calls like this:

auto status = SecKeychainOpen(keychain_path.c_str(), &keychain);

status = SecKeychainGetPath(keychain, &path_size, keychain_path);

Both of those functions, SecKeychainOpen and SecKeychainGetPath, are part of a set of MacOS APIs for Keychain management that were deprecated by MacOS 12.0 at the latest.

There do appear to be currently supported MacOS APIs for interacting with the keychain, but I'm not currently prepared to build and test those changes and haven't touched C/C++ since high school. We have a support contract with a vendor who supports our deployment and will be contacting them as well.

Sharing here so that:

  • Someone on the project can tell me I've misinterpreted the code
  • Someone else who's encountered this issue can find this and know they're not alone
  • Collect facts in a shared place if/when this can be addressed
@mike-myers-tob mike-myers-tob added macOS triage Issue needs to be verified, reproduced and prioritized labels Oct 3, 2022
@directionless
Copy link
Member

@sharvilshah you tend to be the most knowledgable here, any thoughts?

@directionless
Copy link
Member

Hi @NickBorgersOnLowSecurityNode Thanks for reporting this.

My gut sense, is that the older APIs and the corruption are separate issues. While it would be nice for osquery to use the current APIs, I would expect any shipped apple API to cause corruption.

What kind of query are you running, and at what frequency?

Do you see this on any particular macOS versions?

Have you reported this to Apple? (no worries if not, I will eventually, and I'm trying to understand what's up)

@sharvilshah
Copy link
Member

The certificates table uses OpenSSL APIs to parse the X509/ASN1 data from the keychain. AFAIK the only Keychain API that table uses is to enumerate and get the path I believe.

We updated that table to use OpenSSL APIs because there was a leak in the Keychain API, and it was slow (this was I believe 5 years ago), and also to encourage porting this to Linux.

@NickBorgersOnLowSecurityNode what does a corrupted keychain look like? Does it not open with Keychain access?

@directionless
Copy link
Member

One of Kolide's customers is bumping this. I got a bit more info from them that I can share:

They see corruption on the login keychain. It is corrupt in macOS. The user gets weird errors, and on reboot macOS tells them Your login keychain is damaged and cannot be used. A new keychain has been created for you. This causes loss of any stored passwords.
IMG_6589

They report some unified logs like:

osqueryd: (Security) [com.apple.securityd:security_exception] CSSM Exception: -2147413737 CSSMERR_DL_DATASTORE_DOESNOT_EXIST
osqueryd: (Security) [com.apple.securityd:security_exception] CSSM Exception: -2147413759 CSSMERR_DL_DATABASE_CORRUPT
osqueryd: (Security) [com.apple.securityd:integrity] dbBlobVersion() failed for a CssmError: -2147413759 CSSMERR_DL_DATABASE_CORRUPT

This appears on both Big Sur and Monterey. It is sporadic, but appears more often if they have minutely queries that involved the certificates table.

They do have an issue open with apple. And somewhat worrying is that they report Apple saying that "an abnormal amount of reads may lead to corruption of the keychain"

@NickBorgersOnLowSecurityNode
Copy link
Contributor Author

What @directionless has shared is exactly what we have been seeing. In addition to losing saved passwords, the private keys for certificates that were stored in the Login Keychain are lost. This interferes with some authentication use cases where we would like to be using the Login Keychain of the user to provide maximum protection to credentials stored on the device.

What kind of query are you running, and at what frequency?

The part I can share now is SELECT * from certificates and we are running it every 60 seconds. I need to double-check appropriateness before I share the full query pack. The certificate information is used so the results of the query we collect include the identifying information of the certificates which makes lookups easier.

Do you see this on any particular macOS versions?

This appears on both Big Sur and Monterey. Across thousands of assets no clear version association pattern has emerged, though older hardware manifests the problem more frequently.

Have you reported this to Apple?

Yes, we have an open ticket with Apple. All we have gotten back is aligned with

"an abnormal amount of reads may lead to corruption of the keychain"

Probably silly idea based on this mention

We updated that table to use OpenSSL APIs because there was a leak in the Keychain API, and it was slow (this was I believe 5 years ago), and also to encourage porting this to Linux.

Would it make any difference to create a copy of the keychain and then interact with that copy?
Another layer would be whether you could inspect the "real" keychain to see if it has been changed, and keep the copy up to date based on that check. Then, regardless of how often you read from the copy you aren't needing to interact with the real keychain.

Quite possible this is anti pattern for how you want osquery to work, and I have no idea if there's a reliable way to see if the keychain has been updated. Perhaps as simple as checking a filesystem timestamp, but I'd generally use a checksum. I am still new to Mac so am not clear on whether the keychain is really a file or something that just gets represented that way a la /proc in Linux.

@directionless
Copy link
Member

directionless commented Oct 12, 2022

If we accept these apple comments at face value, I think we don't have good options. We could implement a caching layer, or inspect mod times. But that won't eliminate the possibility, just reduce it.

I'm not sure if we can copy the files -- I feel like there's a lot of weird apple frameworks? Maybe? TBD I guess.

I think Sharvil is trying to reproduce.

@mike-myers-tob mike-myers-tob changed the title On MacOS/Darwin a deprecated API is used for Keychain access Use a non-deprecated API for Keychain access Oct 31, 2022
@zwass
Copy link
Member

zwass commented Apr 25, 2023

@directionless
Copy link
Member

Chatting a little in office hours today, we wondered:

  • what is the security tool doing?
  • Can we parse these files directly? seph thought they were sqlite somewhere. (see Zach's links above)
  • What if we did the silly thing and copied the files to a tmp dir first? (Or even in memory)

@getvictor
Copy link
Contributor

Did some research into this issue today. Simply using SecItemCopyMatching without the deprecated API calls does not return certificates from SystemRootCertificates.keychain, which is most of the certificates on a typical system. I believe this is Apple's intention -- to not have API access to root certificates.

@zwass @directionless I will attempt to fix this by copying the files to tmp dir first.

@directionless
Copy link
Member

Did some research into this issue today. Simply using SecItemCopyMatching without the deprecated API calls does not return certificates from SystemRootCertificates.keychain, which is most of the certificates on a typical system. I believe this is Apple's intention -- to not have API access to root certificates.

@zwass @directionless I will attempt to fix this by copying the files to tmp dir first.

I tried implementing with the new apis (#7822), but we had access to significantly less data. I don't think anyone has tried the copy to tmp approach.

@getvictor
Copy link
Contributor

@zwass @directionless I opened a PR #8189 that fixes certificates and keychain_items by coping keychains.

The keychain_acls table doesn't seem to work with copies, so we'll need another approach there.

This is my first osquery PR. Early feedback will be appreciated.

zwass pushed a commit that referenced this issue Dec 15, 2023
…keychain_items` tables. (#8192)

Fixes #7780 
Related issue: fleetdm/fleet#13065

Adding a cache for macOS keychain file accesses.

The cache checks whether a keychain file has been modified by comparing the file's SHA256 hash. If the file has been modified, the cache also checks whether the file has been accessed recently. If it has been accessed within the configured interval, the old cached results are returned.

The cache works independently for each table. This means that multiple tables can access the keychain files within the interval, but each one of them can only do so once.

The following feature flags have been added:
```
    --keychain_access_cache                          Use a cache for keychain accesses (default true)
    --keychain_access_interval VALUE                 Minimum minutes required between keychain accesses. Keychain cache must be enabled to use
```

Default `keychain_access_interval` is 5 minutes.

Old table results exactly match new table results when ordered by primary key.

Performance results for `certificates` for 10 rounds (old vs new):
```
 U:1  C:0  M:3  F:0  D:0  manual: utilization: 13.940000000000001 cpu_time: 0.14197365480000002 memory: 35979264.0 fds: 4.0 duration: 0.5259468078613281
 U:1  C:0  M:3  F:0  D:0  manual: utilization: 13.236666666666668 cpu_time: 0.13929492440000002 memory: 35386163.2 fds: 4.0 duration: 0.5783782720565795
```

Performance results for `certificates` for 10 counts (new is faster and less memory due to cache):
```
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 44.29999999999999 cpu_time: 0.669130928 memory: 50331648.0 fds: 4.0 duration: 1.5303008556365967
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 28.366666666666664 cpu_time: 0.42878788 memory: 40534016.0 fds: 4.0 duration: 1.0357069969177246
```

Performance results for `keychain_acls` for 10 rounds (old vs new):
```
 U:1  C:0  M:3  F:0  D:0  manual: utilization: 10.57 cpu_time: 0.10779758619999999 memory: 29374873.6 fds: 4.0 duration: 0.5254422664642334
 U:1  C:0  M:3  F:0  D:0  manual: utilization: 11.366666666666669 cpu_time: 0.1201387166 memory: 29420748.8 fds: 4.0 duration: 0.5800627708435059
```

Performance results for `keychain_acls` for 10 counts (new is faster and less memory due to cache):
```
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 22.059999999999995 cpu_time: 0.557502384 memory: 77463552.0 fds: 4.0 duration: 2.0405030250549316
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 26.733333333333334 cpu_time: 0.405785928 memory: 36782080.0 fds: 4.0 duration: 1.0386288166046143
```

Performance results for `keychain_items` for 10 rounds (old vs new). **New performance is better.** This is likely because the new code only opens each keychain file once, while old code opened each keychain file multiple times -- once for each keychain item type (password, certificate, etc.)
```
 U:2  C:1  M:3  F:0  D:1  manual: utilization: 30.510833333333334 cpu_time: 0.45226203760000006 memory: 29961420.8 fds: 4.0 duration: 0.9804916620254517
 U:2  C:0  M:3  F:0  D:0  manual: utilization: 20.805 cpu_time: 0.2112246234 memory: 26363494.4 fds: 4.0 duration: 0.5286885976791382
```

Performance results for `keychain_items` for 10 counts (new is way faster and less memory):
```
 U:3  C:2  M:3  F:0  D:3  manual: utilization: 78.25999999999999 cpu_time: 3.946559488 memory: 41320448.0 fds: 4.0 duration: 4.564563989639282
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 25.474999999999998 cpu_time: 0.511153552 memory: 33996800.0 fds: 4.0 duration: 1.5302011966705322
```
@directionless
Copy link
Member

Worth noting that we're still using the deprecated APIs -- we don't see any other way to get at the data. But #8192 adds a cache layer in which, hopefully, alleviates the corruption

@NickBorgersOnLowSecurityNode
Copy link
Contributor Author

Worth noting that we're still using the deprecated APIs -- we don't see any other way to get at the data. But #8192 adds a cache layer in which, hopefully, alleviates the corruption

As original reporter, I believe my organization's pain point will be resolved with the caching solution. Thank you to all involved in the project and @getvictor for writing the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
macOS triage Issue needs to be verified, reproduced and prioritized
Projects
None yet
6 participants