kcm: improve performance #1165

pbrezina · 2021-03-02T10:47:31Z

This is based on the following krbdev thread:
https://mailman.mit.edu/pipermail/krbdev/2021-February/013424.html

@greghudson Is this something MIT Kerberos would accept?
@nicowilliams Is this something that could be accepted for KCM protocol?

kcm: use GET_CRED_LIST where available

This KCM operation will returned all credentials of the selected
ccache. This change will remove unnecessary IO operations that
are a bottleneck for large ccaches, mostly for GSSAPI and
applications that iterates over credentials often. The implementation
is backwards compatible - if the call is not available or fails, it
will fallback to GET_CRED_UUID_LIST.

cc: add krb5_cc_notification_path

This adds new API function krb5_cc_notification_path. The purpose
of this call is to create a file that can be watched for events
using inotify or similar tools.

It is currently documented only in the header file and implemented only
for KCM. I will finish it once we agree on the API.

The KCM implementation introduces new operation
GET_CACHE_NOTIFICATION_PATH (ccache) -> (path).

pbrezina · 2021-03-02T10:58:22Z

About the performance improvements. I used GSSAPI to acquire and establish security context with 200 services. SSSD KCM took 33 minutes as a starting point. When I addressed bottlenecks on our side, I got to 2 minutes. When addressing IO bottleneck with this patch I got to 15 seconds. For comparison, keyring took 2 seconds.

frozencemetery · 2021-03-02T17:16:18Z

FWIW: this would be more interesting as a proof of concept if you showed it implemented for a daemon as well. Best would probably be kcmserver.py, while showing it for SSSD is probably better than nothing.

greghudson · 2021-03-02T21:43:52Z

Apple has added twelve KCM opcodes in sequential order (per their latest Heimdal snapshot from opensource.apple.com). Any opcodes we add will need to start at some offset to avoid conflicts. opcodes are unsigned 16-bit values; I suggest a base of 13000 ('M' for 'MIT' being the 13th letter).

It seems like FILE could easily support notification path by returning a copy of the residual.

pbrezina · 2021-03-03T08:04:55Z

FWIW: this would be more interesting as a proof of concept if you showed it implemented for a daemon as well. Best would probably be kcmserver.py, while showing it for SSSD is probably better than nothing.

SSSD part is in https://github.com/pbrezina/sssd/tree/kcm. GET_CRED_LIST is finished, GET_NOTIFICATIONS_PATH is just PoC so far.

greghudson · 2021-03-03T17:12:06Z

I'd like to see the notification change split into a separate PR, since it might be difficult to follow the discussion of both changes at once. For the iteration performance improvements:

I'd like to see us follow Apple's lead and use KCM_OP_RETRIEVE for the retrieve operation. We can make the client pass the KRB5_GC_CACHED flag to prevent the Heimdal KCM from going out and getting tickets. I think that would provide a significant speedup to the 200-service scenario, with or without the iteration changes. But it's probably not enough by itself, due to scan_ccache().

It is possible that we might find that storing the entire ccache contents in memory within libkrb5 causes problems for large caches. At that point we might address the scan_ccache() problem and go back to slow one-cred-per-request iteration, or we might implement a more complicated buffered solution. This might look like:

(name, size, token) -> (token, cred, ...)

where "token" is a continuation token, initially empty. In either of those scenarios, we'd be abandoning KCM_OP_GET_CRED_LIST. I think I'm okay with that because we could drop the client logic immediately; sssd could decide whether to drop the server-side logic or phase it out.

pbrezina · 2021-03-04T14:09:18Z

I'd like to see the notification change split into a separate PR, since it might be difficult to follow the discussion of both changes at once. For the iteration performance improvements:

Done. Notifications PR: #1167

I'd like to see us follow Apple's lead and use KCM_OP_RETRIEVE for the retrieve operation. We can make the client pass the KRB5_GC_CACHED flag to prevent the Heimdal KCM from going out and getting tickets. I think that would provide a significant speedup to the 200-service scenario, with or without the iteration changes. But it's probably not enough by itself, due to scan_ccache().

What is this operation supposed to do?

It is possible that we might find that storing the entire ccache contents in memory within libkrb5 causes problems for large caches. At that point we might address the scan_ccache() problem and go back to slow one-cred-per-request iteration, or we might implement a more complicated buffered solution. This might look like:
(name, size, token) -> (token, cred, ...)
where "token" is a continuation token, initially empty. In either of those scenarios, we'd be abandoning KCM_OP_GET_CRED_LIST. I think I'm okay with that because we could drop the client logic immediately; sssd could decide whether to drop the server-side logic or phase it out.

I'm fine with this. However, my knowledge is not good enough to understand your proposal. Could you please elaborate on that more?

greghudson · 2021-03-04T16:51:55Z

KCM_OP_RETRIEVE accepts a cache name, a 32-bit flags value, and a cred tag (the same inputs as KCM_OP_REMOVE_CRED). On success, it responds with a credential matching the tag. In essence, it matches the semantics of krb5_cc_retrieve_cred().

(Forget what I said about KRB5_GC_CACHED. Heimdal's KCM does look for that flag, but that's a botch; the flags should be from the KRB5_TC number space, not the KRB5_GC number space, and KRB5_GC_CACHED has the same value as KRB5_TC_MATCH_IS_SKEY. So we should just send the flags we got from the caller.)

I'm not suggesting that anyone implement the (name, size, token) -> (token, cred, ...) proposal right now. I'm just thinking ahead to what it would mean for this code if we later find that we need a compromise between minimizing IPC and limiting memory usage.

pbrezina · 2021-03-05T13:05:36Z

Do I understand it correctly that you suggest to implement KCM_OP_RETRIEVE in KCM and then use it in krb5 gssapi instead of iteration? And do you suggest to drop GET_CRED_LIST or keep it?

greghudson · 2021-03-05T16:49:40Z

I am suggesting to implement KCM_OP_RETRIEVE in KCM and see how that affects the performance of the 200-service test, with and without GET_CRED_LIST. I'm not suggesting a change to the higher-level code at this time.

pbrezina · 2021-03-08T10:55:03Z

Ah, I see that KCM code calls k5_cc_retrieve_cred_default which does the iteration. I was not aware of this before. I will implement RETRIEVE and see what happens. Thank you.

pbrezina · 2021-03-10T13:11:08Z

Establishing security context if the ticket is not yet available takes in my tests two iterations (GET_CRED_UUID_LIST) and three RETRIEVE (the first two yields no match). This is the full op flow:

KCM operation GET_CACHE_UUID_LIST
KCM operation GET_DEFAULT_CACHE
KCM operation GET_PRINCIPAL
KCM operation GET_KDC_OFFSET
KCM operation GET_CRED_UUID_LIST
KCM operation GET_CRED_BY_UUID * n
KCM operation ...
KCM operation GET_CACHE_UUID_LIST
KCM operation GET_DEFAULT_CACHE
KCM operation GET_PRINCIPAL
KCM operation GET_PRINCIPAL
KCM operation GET_PRINCIPAL
KCM operation GET_KDC_OFFSET
KCM operation GET_CRED_UUID_LIST
KCM operation GET_CRED_BY_UUID * n
KCM operation ...
KCM operation GET_PRINCIPAL
KCM operation RETRIEVE
KCM operation RETRIEVE
KCM operation RETRIEVE
KCM operation STORE

200 ticket tests result:

Code changes	Duration
none	1m51.725s
RETRIEVE	0m31.744s
GET_CRED_LIST	0m9.992s
GET_CRED_LIST + RETRIEVE	0m9.012s

Using RETRIEVE makes it a lot faster, but not enough. Using GET_CRED_LIST makes the two iterations way faster and it is the same performance as using GET_CRED_LIST + RETRIEVE because the RETRIEVE operation only brings the same logic from libkrb5 to kcm so we just avoid sending all credentials through the socket, otherwise it is the same.

This tells me that we can not avoid making the iteration more performant.

greghudson · 2021-03-10T16:05:54Z

RETRIEVE should be able to perform a fast lookup within the KCM daemon, at least in principle. But perhaps that's not important. There is a seven-second performance gap between KCM+iteration and KEYRING for the 200-service scenario. If those seven seconds are spent mostly on IPC overhead not related to iteration (GET_PRINCIPAL, GET_KDC_OFFSET, etc.) then it won't matter how much retrievals are sped up (although as the value 200 increases, iteration cost should eventually dominate).

While investigating possible reasons for there being two non-retrieval iterations, I was reminded of krb5_cccol_have_content(), which stops at the first non-config cred it sees. Unfortunately, neither the old nor the new iteration design can make this operation fast for very large ccaches, because the old GET_CACHE_UUID_LIST is O(n). To eliminate this bottleneck we would have to introduce a new ccache operation and KCM opcode to check for the presence of non-config creds in a cache. (I don't know whether your test program is running into this operation; it depends on how the GSSAPI caller is written.)

greghudson · 2021-03-11T00:52:57Z

I pushed some changes, primarily to fix potential memory issues.

pbrezina · 2021-03-11T11:19:18Z

RETRIEVE should be able to perform a fast lookup within the KCM daemon, at least in principle. But perhaps that's not important.

RETRIEVE in KCM is fast, all operations are fast on their own. The problem is that you need to run iteration 5 times
(without RETRIEVE in KCM) or 2 times (with RETRIEVE implemented), which means 5 * (n + 1) or 2 * (n + 1) requests where n is the number of existing credentials which makes the whole process slow. The more credentials you have, the slower it gets. This is kind of expected, but we need to bring it to a point where it is usable. Original KCM was 900 times slower then KEYRING, with KCM bottlenecks fixed it is 60 times slower.

There is a seven-second performance gap between KCM+iteration and KEYRING for the 200-service scenario. If those seven seconds are spent mostly on IPC overhead not related to iteration (GET_PRINCIPAL, GET_KDC_OFFSET, etc.) then it won't matter how much retrievals are sped up (although as the value 200 increases, iteration cost should eventually dominate).

Constant number of requests is fine. The iteration is indeed the problem. There might be still space to reduce the number of requests, e.g. we probably can avoid calling the same operations multiple times since it should yield the same result anyway. But this does not matter if we keep iterating over the cache by sending one request per credential.

While investigating possible reasons for there being two non-retrieval iterations, I was reminded of krb5_cccol_have_content(), which stops at the first non-config cred it sees. Unfortunately, neither the old nor the new iteration design can make this operation fast for very large ccaches, because the old GET_CACHE_UUID_LIST is O(n).

Iteration in single process is fast. Iteration using IO is slow. GET_CRED_UUID_LIST is called from three places:

GET_CRED_UUID_LIST -> krb5_cccol_have_content -> krb5_gss_acquire_cred
- IIUC checks if there is at least one non-config ticket
GET_CRED_UUID_LIST -> krb5_cc_start_seq_get -> scan_ccache -> kg_cred_resolve -> krb5_gss_acquire_cred
- This seems to lookup TGT but there seems to be more to it that I don't fully understand from a quick look.
The rest of the GET_CRED_UUID_LIST calls are in fact RETRIEVE operations so we can already eliminate them

To eliminate this bottleneck we would have to introduce a new ccache operation and KCM opcode to check for the presence of non-config creds in a cache. (I don't know whether your test program is running into this operation; it depends on how the GSSAPI caller is written.)

Yes, it hits the operation, see above. But since control creds are stored at the end in my scenario it iterates basically over every ticket.

I think what we have with GET_CRED_LIST (current patch) is the top performance we can achieve without further reducing the number of requests and I think it is good enough: its 5 times slower then keyring, it is still a lot but way better then 60 times slower what we have now. The advantage is that also klist or other programs that needs iteration will be fast as well. Disadvantage is bigger memory consumption but we just move it from KCM to the target program so total memory usage is the same.

If you prefer to avoid this, I think we can move 1 (krb5_cccol_have_content), 2 (whatever iteration in scan_ccache does), 3 (retrieve) to KCM. It will however require to publish lots of internal API so we don't reimplement the logic there. The RETRIEVE already requires three functions - mcred (un)marshall and (modified) krb5_cc_retrieve_cred_match.

I pushed some changes, primarily to fix potential memory issues.

Thank you. Does this mean that you would like to continue to implement this GET_CRED_LIST operation?

greghudson · 2021-03-11T15:05:48Z

But since control creds are stored at the end in my scenario it [krb5_cccol_have_content] iterates basically over every ticket.

It should stop at the first non-config credential. It's not looking for anything specific, just one real cred.

Does this mean that you would like to continue to implement this GET_CRED_LIST operation?

I think it's a significant improvement, and (as I said before) we can always remove the client code later if we make better improvements for large ccaches. The old iteration method is also O(n) in memory usage due to the cred UUID list, although the constant factor isn't as large.

pbrezina · 2021-03-15T12:05:46Z

Thank you. Are there any changes you'd like me to do or is this ready to be merged?

greghudson · 2021-03-17T20:45:30Z

I added test coverage of the non-fallback iteration path. I think this can be merged, but perhaps @frozencemetery can have a look at my changes first.

frozencemetery

Thanks, this looks good by me.

(I considered suggesting caching the fallback result, but I don't think that's worth doing.)

For large caches, one IPC operation per credential dominates the cost of iteration. Instead transfer the whole list of credentials to the client in one IPC operation. Add optional support for the new opcode to the test KCM server to allow testing of the main and fallback code paths. [ghudson@mit.edu: fixed memory leaks and potential memory errors; adjusted code style and comments; rewrote commit message; added kcmserver.py support and tests] ticket: 8990 (new)

pbrezina force-pushed the kcm branch from 161726a to feb2b08 Compare March 4, 2021 14:04

pbrezina changed the title ~~kcm: improve performance and support notifications~~ kcm: improve performance Mar 8, 2021

greghudson force-pushed the kcm branch 2 times, most recently from 7ed2916 to c92f417 Compare March 11, 2021 00:51

greghudson force-pushed the kcm branch from c92f417 to 494b4ae Compare March 17, 2021 20:44

greghudson force-pushed the kcm branch from 494b4ae to 9348b14 Compare March 18, 2021 01:46

frozencemetery approved these changes Mar 18, 2021

View reviewed changes

greghudson force-pushed the kcm branch 3 times, most recently from be33ff2 to f53609f Compare March 22, 2021 05:59

greghudson force-pushed the kcm branch from f53609f to 81bdb47 Compare March 22, 2021 06:10

greghudson merged commit 81bdb47 into krb5:master Mar 22, 2021

This was referenced Mar 22, 2021

kcm: implement GET_CRED_LIST for faster iteration SSSD/sssd#5545

Closed

kcm: add GET_CRED_LIST for faster iteration SSSD/sssd#5546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kcm: improve performance #1165

kcm: improve performance #1165

pbrezina commented Mar 2, 2021

pbrezina commented Mar 2, 2021

frozencemetery commented Mar 2, 2021

greghudson commented Mar 2, 2021

pbrezina commented Mar 3, 2021

greghudson commented Mar 3, 2021

pbrezina commented Mar 4, 2021

greghudson commented Mar 4, 2021

pbrezina commented Mar 5, 2021

greghudson commented Mar 5, 2021 •

edited

Loading

pbrezina commented Mar 8, 2021

pbrezina commented Mar 10, 2021

greghudson commented Mar 10, 2021 •

edited

Loading

greghudson commented Mar 11, 2021

pbrezina commented Mar 11, 2021

greghudson commented Mar 11, 2021

pbrezina commented Mar 15, 2021

greghudson commented Mar 17, 2021 •

edited

Loading

frozencemetery left a comment

kcm: improve performance #1165

kcm: improve performance #1165

Conversation

pbrezina commented Mar 2, 2021

kcm: use GET_CRED_LIST where available

cc: add krb5_cc_notification_path

pbrezina commented Mar 2, 2021

frozencemetery commented Mar 2, 2021

greghudson commented Mar 2, 2021

pbrezina commented Mar 3, 2021

greghudson commented Mar 3, 2021

pbrezina commented Mar 4, 2021

greghudson commented Mar 4, 2021

pbrezina commented Mar 5, 2021

greghudson commented Mar 5, 2021 • edited Loading

pbrezina commented Mar 8, 2021

pbrezina commented Mar 10, 2021

greghudson commented Mar 10, 2021 • edited Loading

greghudson commented Mar 11, 2021

pbrezina commented Mar 11, 2021

greghudson commented Mar 11, 2021

pbrezina commented Mar 15, 2021

greghudson commented Mar 17, 2021 • edited Loading

frozencemetery left a comment

Choose a reason for hiding this comment

greghudson commented Mar 5, 2021 •

edited

Loading

greghudson commented Mar 10, 2021 •

edited

Loading

greghudson commented Mar 17, 2021 •

edited

Loading