Added cache and throttling for `certificates`, `keychain_acls`, and `keychain_items` tables. #8192

getvictor · 2023-11-14T20:58:47Z

Fixes #7780
Related issue: fleetdm/fleet#13065

Adding a cache for macOS keychain file accesses.

The cache checks whether a keychain file has been modified by comparing the file's SHA256 hash. If the file has been modified, the cache also checks whether the file has been accessed recently. If it has been accessed within the configured interval, the old cached results are returned.

The cache works independently for each table. This means that multiple tables can access the keychain files within the interval, but each one of them can only do so once.

The following feature flags have been added:

    --keychain_access_cache                          Use a cache for keychain accesses (default true)
    --keychain_access_interval VALUE                 Minimum minutes required between keychain accesses. Keychain cache must be enabled to use

Default keychain_access_interval is 5 minutes.

Old table results exactly match new table results when ordered by primary key.

Performance results for certificates for 10 rounds (old vs new):

 U:1  C:0  M:3  F:0  D:0  manual: utilization: 13.940000000000001 cpu_time: 0.14197365480000002 memory: 35979264.0 fds: 4.0 duration: 0.5259468078613281
 U:1  C:0  M:3  F:0  D:0  manual: utilization: 13.236666666666668 cpu_time: 0.13929492440000002 memory: 35386163.2 fds: 4.0 duration: 0.5783782720565795

Performance results for certificates for 10 counts (new is faster and less memory due to cache):

 U:2  C:1  M:3  F:0  D:2  manual: utilization: 44.29999999999999 cpu_time: 0.669130928 memory: 50331648.0 fds: 4.0 duration: 1.5303008556365967
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 28.366666666666664 cpu_time: 0.42878788 memory: 40534016.0 fds: 4.0 duration: 1.0357069969177246

Performance results for keychain_acls for 10 rounds (old vs new):

 U:1  C:0  M:3  F:0  D:0  manual: utilization: 10.57 cpu_time: 0.10779758619999999 memory: 29374873.6 fds: 4.0 duration: 0.5254422664642334
 U:1  C:0  M:3  F:0  D:0  manual: utilization: 11.366666666666669 cpu_time: 0.1201387166 memory: 29420748.8 fds: 4.0 duration: 0.5800627708435059

Performance results for keychain_acls for 10 counts (new is faster and less memory due to cache):

 U:2  C:1  M:3  F:0  D:2  manual: utilization: 22.059999999999995 cpu_time: 0.557502384 memory: 77463552.0 fds: 4.0 duration: 2.0405030250549316
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 26.733333333333334 cpu_time: 0.405785928 memory: 36782080.0 fds: 4.0 duration: 1.0386288166046143

Performance results for keychain_items for 10 rounds (old vs new). New performance is better. This is likely because the new code only opens each keychain file once, while old code opened each keychain file multiple times -- once for each keychain item type (password, certificate, etc.)

 U:2  C:1  M:3  F:0  D:1  manual: utilization: 30.510833333333334 cpu_time: 0.45226203760000006 memory: 29961420.8 fds: 4.0 duration: 0.9804916620254517
 U:2  C:0  M:3  F:0  D:0  manual: utilization: 20.805 cpu_time: 0.2112246234 memory: 26363494.4 fds: 4.0 duration: 0.5286885976791382

Performance results for keychain_items for 10 counts (new is way faster and less memory):

 U:3  C:2  M:3  F:0  D:3  manual: utilization: 78.25999999999999 cpu_time: 3.946559488 memory: 41320448.0 fds: 4.0 duration: 4.564563989639282
 U:2  C:1  M:3  F:0  D:2  manual: utilization: 25.474999999999998 cpu_time: 0.511153552 memory: 33996800.0 fds: 4.0 duration: 1.5302011966705322

…orkflow failing test.

osquery/tables/system/tests/CMakeLists.txt

zwass

This is mostly looking good to me. Minor changes requested.

I would like to manually verify I get the same results before and after this change but I'm currently blocked by #8196 (and working on resolving that).

osquery/tables/system/darwin/keychain_items.cpp

osquery/tables/system/darwin/keychain.h

osquery/tables/system/darwin/keychain_utils.cpp

Smjert

Something to be careful about is that every table can be called concurrently, by a query via the scheduler and a query via the distributed/ad-hoc system.

Here the cache is global, so it would need to be thread safe.

getvictor · 2023-11-27T21:22:38Z

@Smjert I added mutex locks to make Keychain Cache thread safe.

Should I also update table documentation for these tables? For example:

table_name("keychain_acls")
description("Applications that have ACL entries in the keychain. NOTE: osquery limits frequent access to keychain files. This limit is controlled by keychain_access_interval flag.")

zwass · 2023-11-27T21:23:40Z

I definitely like the idea of updating the table documentation. The mutex work lgtm but I'll let @Smjert weigh in.

Smjert · 2023-11-28T11:50:39Z

This is a bit of a drive by review: one thing I noticed is that putting the lock inside the cache itself, while it protects the cache data from corruption or half reads, the operation done in the table is not a transaction. You have to consider that the threads can stop at any point in between the read and write, and they can swap order.

What could happen is that two concurrent queries both initially read the cache and have no hit, so both go and try to read the keychain data from the file. But then the one that read the keychain for first (with the data less up to date) gets preempted and the one that came second is the one first writing into the cache; then when the first thread wakes up, it will overwrite the most recent data with an older copy.

There's to say though that it's unlikely that something changes in the time that passes between the preemption of the thread, the keychain file read from the second thread and the first thread later wake up.

I don't know if you have already considered this; again it doesn't seems necessarily an issue from this quick review, but I thought I'd mention it.

directionless · 2023-11-28T13:01:15Z

I see that Stefano approved this, but I find myself confused about how the cache works. Do entries persist between queries? If so, do they expire? There's an access interval, which isn't quite the same as cache expiration.

If you have a reasonable story for those, I'm not this this PR needs to change. But a followup PR with some clear docs in the "wiki" would help.

Smjert · 2023-11-28T13:07:04Z

I see that Stefano approved this, but I find myself confused about how the cache works. Do entries persist between queries? If so, do they expire? There's an access interval, which isn't quite the same as cache expiration.

If you have a reasonable story for those, I'm not this this PR needs to change. But a followup PR with some clear docs in the "wiki" would help.

Yes, the cache is global and persists. Basically the cache has a copy of all the results (as if no constraint was applied) for each of the specified category here: https://github.com/osquery/osquery/pull/8192/files#diff-cbf1c8c0ddce99dc2309a7852b01cdd527797a808a0c1ad0aa4a231729bb8213R32

Each osquery table writes in one of the different categories, always substituting the whole thing in that category.
So there's no way to continuosly increase the cache size because you're keeping old and new results.

But there's still the problem of deciding for how long to reuse the data (when to invalidate), and currently that's implemented with the table checking the hash of the file; if the hash has changed and enough time has passed (keychain_access_interval) from the last keychain read, then we invalidate the cache and re-read the data from disk, otherwise we return the cached data.

This is my understanding for how this works.

EDIT: sorry I forgot to mention the hashing.

Smjert · 2023-11-28T13:48:53Z

if the hash has changed and enough time has passed (keychain_access_interval) from the last keychain read, then we invalidate the cache and re-read the data from disk, otherwise we return the cached data.

With the further caveat that I expressed here: #8192 (comment), where two queries running concurrently could either find the cache empty or the hash file changed and enough time to have been passed, and both decide they need to access the keychain in close succession, not respecting the interval after the new, first access.

getvictor · 2023-11-28T20:22:40Z

@Smjert Yes, if two threads miss in the cache at the same time, they will both access the keychain files and both write to the cache. It the keychain file was also modified during this, we can end up with:

Old hash pointing to old data.
Old hash pointing to new data.

In both cases, the data will remain there for the keychain_access_interval, and hash/data will be updated with a subsequent access. Since osquery will recover from this corner case situation, it doesn't seem like a huge concern.

@directionless I will add additional documentation to the Command Line Flags wiki in a separate PR.

getvictor · 2023-11-28T21:12:01Z

Additional documentation added in PR #8205

getvictor · 2023-11-29T13:10:26Z

@Smjert After thinking some more, there is a chance of cache corruption with my current approach:

Thread1 reads cache, creates hash1, and reads keychain file
Keychain file modified
Thread2 reads cache and creates hash2
Thread2 writes cache
Thread1 writes cache <-- bad data!

I will make a change to use 1 mutex across all 3 keychain tables, so that only 1 keychain table can be accessed at a time. This will have the side benefit of guaranteeing that multiple threads will not be using Apple keychain APIs at the same time. Perhaps that was actually the root cause of the bug in the first place.

Smjert · 2023-11-29T13:16:08Z

@Smjert After thinking some more, there is a chance of cache corruption with my current approach:

Thread1 reads cache, creates hash1, and reads keychain file

Keychain file modified

Thread2 reads cache and creates hash2

Thread2 writes cache

Thread1 writes cache <-- bad data!

I will make a change to use 1 mutex across all 3 keychain tables, so that only 1 keychain table can be accessed at a time. This will have the side benefit of guaranteeing that multiple threads will not be using Apple keychain APIs at the same time. Perhaps that was actually the root cause of the bug in the first place.

I'm all for the more correct approach, although I don't follow where the corruption is in that example.

From what I see Thread2 is writing the cache with hash2 and the new data (post keychain modification) and Thread1 is later overwriting that with hash1 and the old data (pre keychain modification), which is undesirable but it is still a complete result which will recover after an interval.
What I'm missing?

getvictor · 2023-11-29T13:40:56Z

You're right -- I don't see how we can get new hash with old data unless the keychain file is modified twice. I think it is still worthwhile to limit access to 1 keychain table at a time.

Smjert · 2023-11-29T13:43:50Z

EDIT: ah saw your response now; yeah I guess this is further reason to simplify this ^^'. Thanks!

directionless · 2023-12-13T12:12:47Z

@Smjert @zwass How does this look?

zwass

This lgtm. Thank you Victor! @Smjert please have a look soon otherwise I propose we merge this week.

great944 · 2024-01-07T20:04:55Z

hello, while I understand the value of a cache for osquery when running as a service, I am curious if we have considered scenarios where the main use cases are from a command line? do we have a way to optout of caching ? our primary use case is to execute a query on the command line i.e. osqueryd.exe -S --json "select * from xyz" and we do not wish to activate any caching. in other words all caching should be inactive by default for non interactive cmd execution

directionless · 2024-01-08T21:20:52Z

Hi @great944

The cache in this PR is specific to the macOS keychain related tables, and given the underlying corruption issues from the OS it would be a bad idea to disable it. That said, this PR also includes an option to disable it.

Can you share what your concerns are, and why you wouldn't want it for straight CLI usage?

Smjert · 2024-01-08T21:31:32Z

Also keep into account that the cache would only work for multiple queries to the keychain tables done in the same command line execution, given that the cache is in memory only.
So sure, there might be a slight slowdown to create the cache at each execution, but if you have a single query to a keychain table, each execution will not use the cache, and will not be subject to the interval.

Added cache for keychain_acls table.

7d2ff19

getvictor changed the title ~~Added cache for keychain_acls table.~~ Added cache for certificates and keychain_acls tables. Nov 14, 2023

Added cache for certificates table.

5947569

getvictor mentioned this pull request Nov 14, 2023

Copy keychain files. #8189

Closed

getvictor changed the title ~~Added cache for certificates and keychain_acls tables.~~ Added cache and throttling for certificates, keychain_acls, and keychain_items tables. Nov 15, 2023

Added cache for keychain_items table.

0ba3c8c

getvictor marked this pull request as ready for review November 15, 2023 17:20

getvictor requested review from a team as code owners November 15, 2023 17:20

Added test for KeychainCache.

c767487

getvictor mentioned this pull request Nov 15, 2023

[osquery 5.11] macOS keychain corrupted after querying the certificates table fleetdm/fleet#13065

Closed

getvictor added 4 commits November 15, 2023 14:05

Fixing copyright header in keychain_test.cpp

e86085c

Fixing format check in keychain_test.cpp

28033fa

Modifying how keychain_test writes files -- speculative fix for Git w…

d37381d

…orkflow failing test.

Another speculative fix for failing keychain_test.

c29fdc8

getvictor commented Nov 15, 2023

View reviewed changes

osquery/tables/system/tests/CMakeLists.txt Outdated Show resolved Hide resolved

getvictor added 3 commits November 15, 2023 16:48

Yet another speculative fix for failing keychain_test.

6a61a1c

Renaming keychain test.

42f5dbc

Fixing initialization issue in keychain test.

ec1b516

zwass requested changes Nov 21, 2023

View reviewed changes

osquery/tables/system/darwin/keychain_items.cpp Outdated Show resolved Hide resolved

osquery/tables/system/darwin/keychain.h Outdated Show resolved Hide resolved

osquery/tables/system/darwin/keychain_utils.cpp Outdated Show resolved Hide resolved

Fixing Zach's review comments.

146a977

Smjert requested changes Nov 23, 2023

View reviewed changes

directionless added this to the 5.11.0 milestone Nov 25, 2023

Added locks to make Keychain Cache thread safe.

bd0ec51

getvictor requested review from Smjert and zwass November 27, 2023 21:22

Updated table documentation.

ce17c5a

Smjert previously approved these changes Nov 28, 2023

View reviewed changes

getvictor mentioned this pull request Nov 28, 2023

Keychain cache and throttling documentation. #8205

Merged

Only allowing 1 thread to access a keychain table at a time.

4036afe

getvictor dismissed Smjert’s stale review via 4036afe November 29, 2023 16:59

getvictor requested a review from Smjert November 29, 2023 17:00

zwass approved these changes Dec 13, 2023

View reviewed changes

zwass merged commit 6f380fc into osquery:master Dec 15, 2023
16 checks passed

directionless mentioned this pull request Dec 19, 2023

Use a non-deprecated API for Keychain access #7780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added cache and throttling for `certificates`, `keychain_acls`, and `keychain_items` tables. #8192

Added cache and throttling for `certificates`, `keychain_acls`, and `keychain_items` tables. #8192

getvictor commented Nov 14, 2023 •

edited

Loading

zwass left a comment

Smjert left a comment •

edited

Loading

getvictor commented Nov 27, 2023

zwass commented Nov 27, 2023

Smjert commented Nov 28, 2023 •

edited

Loading

directionless commented Nov 28, 2023

Smjert commented Nov 28, 2023 •

edited

Loading

Smjert commented Nov 28, 2023 •

edited

Loading

getvictor commented Nov 28, 2023

getvictor commented Nov 28, 2023

getvictor commented Nov 29, 2023

Smjert commented Nov 29, 2023

getvictor commented Nov 29, 2023

Smjert commented Nov 29, 2023 •

edited

Loading

directionless commented Dec 13, 2023

zwass left a comment

great944 commented Jan 7, 2024 •

edited

Loading

directionless commented Jan 8, 2024

Smjert commented Jan 8, 2024 •

edited

Loading

Added cache and throttling for certificates, keychain_acls, and keychain_items tables. #8192

Added cache and throttling for certificates, keychain_acls, and keychain_items tables. #8192

Conversation

getvictor commented Nov 14, 2023 • edited Loading

zwass left a comment

Choose a reason for hiding this comment

Smjert left a comment • edited Loading

Choose a reason for hiding this comment

getvictor commented Nov 27, 2023

zwass commented Nov 27, 2023

Smjert commented Nov 28, 2023 • edited Loading

directionless commented Nov 28, 2023

Smjert commented Nov 28, 2023 • edited Loading

Smjert commented Nov 28, 2023 • edited Loading

getvictor commented Nov 28, 2023

getvictor commented Nov 28, 2023

getvictor commented Nov 29, 2023

Smjert commented Nov 29, 2023

getvictor commented Nov 29, 2023

Smjert commented Nov 29, 2023 • edited Loading

directionless commented Dec 13, 2023

zwass left a comment

Choose a reason for hiding this comment

great944 commented Jan 7, 2024 • edited Loading

directionless commented Jan 8, 2024

Smjert commented Jan 8, 2024 • edited Loading

Added cache and throttling for `certificates`, `keychain_acls`, and `keychain_items` tables. #8192

Added cache and throttling for `certificates`, `keychain_acls`, and `keychain_items` tables. #8192

getvictor commented Nov 14, 2023 •

edited

Loading

Smjert left a comment •

edited

Loading

Smjert commented Nov 28, 2023 •

edited

Loading

Smjert commented Nov 28, 2023 •

edited

Loading

Smjert commented Nov 28, 2023 •

edited

Loading

Smjert commented Nov 29, 2023 •

edited

Loading

great944 commented Jan 7, 2024 •

edited

Loading

Smjert commented Jan 8, 2024 •

edited

Loading