Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kcm: improve performance #1165

Merged
merged 1 commit into from
Mar 22, 2021
Merged

kcm: improve performance #1165

merged 1 commit into from
Mar 22, 2021

Conversation

pbrezina
Copy link
Contributor

@pbrezina pbrezina commented Mar 2, 2021

This is based on the following krbdev thread:
https://mailman.mit.edu/pipermail/krbdev/2021-February/013424.html

  • @greghudson Is this something MIT Kerberos would accept?
  • @nicowilliams Is this something that could be accepted for KCM protocol?

kcm: use GET_CRED_LIST where available

This KCM operation will returned all credentials of the selected
ccache. This change will remove unnecessary IO operations that
are a bottleneck for large ccaches, mostly for GSSAPI and
applications that iterates over credentials often. The implementation
is backwards compatible - if the call is not available or fails, it
will fallback to GET_CRED_UUID_LIST.

cc: add krb5_cc_notification_path

This adds new API function krb5_cc_notification_path. The purpose
of this call is to create a file that can be watched for events
using inotify or similar tools.

It is currently documented only in the header file and implemented only
for KCM. I will finish it once we agree on the API.

The KCM implementation introduces new operation
GET_CACHE_NOTIFICATION_PATH (ccache) -> (path).

@pbrezina
Copy link
Contributor Author

pbrezina commented Mar 2, 2021

About the performance improvements. I used GSSAPI to acquire and establish security context with 200 services. SSSD KCM took 33 minutes as a starting point. When I addressed bottlenecks on our side, I got to 2 minutes. When addressing IO bottleneck with this patch I got to 15 seconds. For comparison, keyring took 2 seconds.

@frozencemetery
Copy link
Contributor

FWIW: this would be more interesting as a proof of concept if you showed it implemented for a daemon as well. Best would probably be kcmserver.py, while showing it for SSSD is probably better than nothing.

@greghudson
Copy link
Member

Apple has added twelve KCM opcodes in sequential order (per their latest Heimdal snapshot from opensource.apple.com). Any opcodes we add will need to start at some offset to avoid conflicts. opcodes are unsigned 16-bit values; I suggest a base of 13000 ('M' for 'MIT' being the 13th letter).

It seems like FILE could easily support notification path by returning a copy of the residual.

@pbrezina
Copy link
Contributor Author

pbrezina commented Mar 3, 2021

FWIW: this would be more interesting as a proof of concept if you showed it implemented for a daemon as well. Best would probably be kcmserver.py, while showing it for SSSD is probably better than nothing.

SSSD part is in https://github.com/pbrezina/sssd/tree/kcm. GET_CRED_LIST is finished, GET_NOTIFICATIONS_PATH is just PoC so far.

@greghudson
Copy link
Member

I'd like to see the notification change split into a separate PR, since it might be difficult to follow the discussion of both changes at once. For the iteration performance improvements:

I'd like to see us follow Apple's lead and use KCM_OP_RETRIEVE for the retrieve operation. We can make the client pass the KRB5_GC_CACHED flag to prevent the Heimdal KCM from going out and getting tickets. I think that would provide a significant speedup to the 200-service scenario, with or without the iteration changes. But it's probably not enough by itself, due to scan_ccache().

It is possible that we might find that storing the entire ccache contents in memory within libkrb5 causes problems for large caches. At that point we might address the scan_ccache() problem and go back to slow one-cred-per-request iteration, or we might implement a more complicated buffered solution. This might look like:

(name, size, token) -> (token, cred, ...)

where "token" is a continuation token, initially empty. In either of those scenarios, we'd be abandoning KCM_OP_GET_CRED_LIST. I think I'm okay with that because we could drop the client logic immediately; sssd could decide whether to drop the server-side logic or phase it out.

@pbrezina
Copy link
Contributor Author

pbrezina commented Mar 4, 2021

I'd like to see the notification change split into a separate PR, since it might be difficult to follow the discussion of both changes at once. For the iteration performance improvements:

Done. Notifications PR: #1167

I'd like to see us follow Apple's lead and use KCM_OP_RETRIEVE for the retrieve operation. We can make the client pass the KRB5_GC_CACHED flag to prevent the Heimdal KCM from going out and getting tickets. I think that would provide a significant speedup to the 200-service scenario, with or without the iteration changes. But it's probably not enough by itself, due to scan_ccache().

What is this operation supposed to do?

It is possible that we might find that storing the entire ccache contents in memory within libkrb5 causes problems for large caches. At that point we might address the scan_ccache() problem and go back to slow one-cred-per-request iteration, or we might implement a more complicated buffered solution. This might look like:

(name, size, token) -> (token, cred, ...)

where "token" is a continuation token, initially empty. In either of those scenarios, we'd be abandoning KCM_OP_GET_CRED_LIST. I think I'm okay with that because we could drop the client logic immediately; sssd could decide whether to drop the server-side logic or phase it out.

I'm fine with this. However, my knowledge is not good enough to understand your proposal. Could you please elaborate on that more?

@greghudson
Copy link
Member

KCM_OP_RETRIEVE accepts a cache name, a 32-bit flags value, and a cred tag (the same inputs as KCM_OP_REMOVE_CRED). On success, it responds with a credential matching the tag. In essence, it matches the semantics of krb5_cc_retrieve_cred().

(Forget what I said about KRB5_GC_CACHED. Heimdal's KCM does look for that flag, but that's a botch; the flags should be from the KRB5_TC number space, not the KRB5_GC number space, and KRB5_GC_CACHED has the same value as KRB5_TC_MATCH_IS_SKEY. So we should just send the flags we got from the caller.)

I'm not suggesting that anyone implement the (name, size, token) -> (token, cred, ...) proposal right now. I'm just thinking ahead to what it would mean for this code if we later find that we need a compromise between minimizing IPC and limiting memory usage.

@pbrezina
Copy link
Contributor Author

pbrezina commented Mar 5, 2021

Do I understand it correctly that you suggest to implement KCM_OP_RETRIEVE in KCM and then use it in krb5 gssapi instead of iteration? And do you suggest to drop GET_CRED_LIST or keep it?

@greghudson
Copy link
Member

greghudson commented Mar 5, 2021

I am suggesting to implement KCM_OP_RETRIEVE in KCM and see how that affects the performance of the 200-service test, with and without GET_CRED_LIST. I'm not suggesting a change to the higher-level code at this time.

@pbrezina pbrezina changed the title kcm: improve performance and support notifications kcm: improve performance Mar 8, 2021
@pbrezina
Copy link
Contributor Author

pbrezina commented Mar 8, 2021

Ah, I see that KCM code calls k5_cc_retrieve_cred_default which does the iteration. I was not aware of this before. I will implement RETRIEVE and see what happens. Thank you.

@pbrezina
Copy link
Contributor Author

Establishing security context if the ticket is not yet available takes in my tests two iterations (GET_CRED_UUID_LIST) and three RETRIEVE (the first two yields no match). This is the full op flow:

KCM operation GET_CACHE_UUID_LIST
KCM operation GET_DEFAULT_CACHE
KCM operation GET_PRINCIPAL
KCM operation GET_KDC_OFFSET
KCM operation GET_CRED_UUID_LIST
KCM operation GET_CRED_BY_UUID * n
KCM operation ...
KCM operation GET_CACHE_UUID_LIST
KCM operation GET_DEFAULT_CACHE
KCM operation GET_PRINCIPAL
KCM operation GET_PRINCIPAL
KCM operation GET_PRINCIPAL
KCM operation GET_KDC_OFFSET
KCM operation GET_CRED_UUID_LIST
KCM operation GET_CRED_BY_UUID * n
KCM operation ...
KCM operation GET_PRINCIPAL
KCM operation RETRIEVE
KCM operation RETRIEVE
KCM operation RETRIEVE
KCM operation STORE

200 ticket tests result:

Code changes Duration
none 1m51.725s
RETRIEVE 0m31.744s
GET_CRED_LIST 0m9.992s
GET_CRED_LIST + RETRIEVE 0m9.012s

Using RETRIEVE makes it a lot faster, but not enough. Using GET_CRED_LIST makes the two iterations way faster and it is the same performance as using GET_CRED_LIST + RETRIEVE because the RETRIEVE operation only brings the same logic from libkrb5 to kcm so we just avoid sending all credentials through the socket, otherwise it is the same.

This tells me that we can not avoid making the iteration more performant.

@greghudson
Copy link
Member

greghudson commented Mar 10, 2021

RETRIEVE should be able to perform a fast lookup within the KCM daemon, at least in principle. But perhaps that's not important. There is a seven-second performance gap between KCM+iteration and KEYRING for the 200-service scenario. If those seven seconds are spent mostly on IPC overhead not related to iteration (GET_PRINCIPAL, GET_KDC_OFFSET, etc.) then it won't matter how much retrievals are sped up (although as the value 200 increases, iteration cost should eventually dominate).

While investigating possible reasons for there being two non-retrieval iterations, I was reminded of krb5_cccol_have_content(), which stops at the first non-config cred it sees. Unfortunately, neither the old nor the new iteration design can make this operation fast for very large ccaches, because the old GET_CACHE_UUID_LIST is O(n). To eliminate this bottleneck we would have to introduce a new ccache operation and KCM opcode to check for the presence of non-config creds in a cache. (I don't know whether your test program is running into this operation; it depends on how the GSSAPI caller is written.)

@greghudson
Copy link
Member

I pushed some changes, primarily to fix potential memory issues.

@pbrezina
Copy link
Contributor Author

RETRIEVE should be able to perform a fast lookup within the KCM daemon, at least in principle. But perhaps that's not important.

RETRIEVE in KCM is fast, all operations are fast on their own. The problem is that you need to run iteration 5 times
(without RETRIEVE in KCM) or 2 times (with RETRIEVE implemented), which means 5 * (n + 1) or 2 * (n + 1) requests where n is the number of existing credentials which makes the whole process slow. The more credentials you have, the slower it gets. This is kind of expected, but we need to bring it to a point where it is usable. Original KCM was 900 times slower then KEYRING, with KCM bottlenecks fixed it is 60 times slower.

There is a seven-second performance gap between KCM+iteration and KEYRING for the 200-service scenario. If those seven seconds are spent mostly on IPC overhead not related to iteration (GET_PRINCIPAL, GET_KDC_OFFSET, etc.) then it won't matter how much retrievals are sped up (although as the value 200 increases, iteration cost should eventually dominate).

Constant number of requests is fine. The iteration is indeed the problem. There might be still space to reduce the number of requests, e.g. we probably can avoid calling the same operations multiple times since it should yield the same result anyway. But this does not matter if we keep iterating over the cache by sending one request per credential.

While investigating possible reasons for there being two non-retrieval iterations, I was reminded of krb5_cccol_have_content(), which stops at the first non-config cred it sees. Unfortunately, neither the old nor the new iteration design can make this operation fast for very large ccaches, because the old GET_CACHE_UUID_LIST is O(n).

Iteration in single process is fast. Iteration using IO is slow. GET_CRED_UUID_LIST is called from three places:

  1. GET_CRED_UUID_LIST -> krb5_cccol_have_content -> krb5_gss_acquire_cred
    • IIUC checks if there is at least one non-config ticket
  2. GET_CRED_UUID_LIST -> krb5_cc_start_seq_get -> scan_ccache -> kg_cred_resolve -> krb5_gss_acquire_cred
    • This seems to lookup TGT but there seems to be more to it that I don't fully understand from a quick look.
  3. The rest of the GET_CRED_UUID_LIST calls are in fact RETRIEVE operations so we can already eliminate them

To eliminate this bottleneck we would have to introduce a new ccache operation and KCM opcode to check for the presence of non-config creds in a cache. (I don't know whether your test program is running into this operation; it depends on how the GSSAPI caller is written.)

Yes, it hits the operation, see above. But since control creds are stored at the end in my scenario it iterates basically over every ticket.

I think what we have with GET_CRED_LIST (current patch) is the top performance we can achieve without further reducing the number of requests and I think it is good enough: its 5 times slower then keyring, it is still a lot but way better then 60 times slower what we have now. The advantage is that also klist or other programs that needs iteration will be fast as well. Disadvantage is bigger memory consumption but we just move it from KCM to the target program so total memory usage is the same.

If you prefer to avoid this, I think we can move 1 (krb5_cccol_have_content), 2 (whatever iteration in scan_ccache does), 3 (retrieve) to KCM. It will however require to publish lots of internal API so we don't reimplement the logic there. The RETRIEVE already requires three functions - mcred (un)marshall and (modified) krb5_cc_retrieve_cred_match.

I pushed some changes, primarily to fix potential memory issues.

Thank you. Does this mean that you would like to continue to implement this GET_CRED_LIST operation?

@greghudson
Copy link
Member

But since control creds are stored at the end in my scenario it [krb5_cccol_have_content] iterates basically over every ticket.

It should stop at the first non-config credential. It's not looking for anything specific, just one real cred.

Does this mean that you would like to continue to implement this GET_CRED_LIST operation?

I think it's a significant improvement, and (as I said before) we can always remove the client code later if we make better improvements for large ccaches. The old iteration method is also O(n) in memory usage due to the cred UUID list, although the constant factor isn't as large.

@pbrezina
Copy link
Contributor Author

Thank you. Are there any changes you'd like me to do or is this ready to be merged?

@greghudson
Copy link
Member

greghudson commented Mar 17, 2021

I added test coverage of the non-fallback iteration path. I think this can be merged, but perhaps @frozencemetery can have a look at my changes first.

Copy link
Contributor

@frozencemetery frozencemetery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good by me.

(I considered suggesting caching the fallback result, but I don't think that's worth doing.)

For large caches, one IPC operation per credential dominates the cost
of iteration.  Instead transfer the whole list of credentials to the
client in one IPC operation.

Add optional support for the new opcode to the test KCM server to
allow testing of the main and fallback code paths.

[ghudson@mit.edu: fixed memory leaks and potential memory errors;
adjusted code style and comments; rewrote commit message; added
kcmserver.py support and tests]

ticket: 8990 (new)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants