Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(k8s): Improve performance of kubeconfig module #6032

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

alvaroaleman
Copy link
Contributor

@alvaroaleman alvaroaleman commented Jun 11, 2024

This module currently takes about 200 ms when using our ~10MiB kubeconfig. This change improves its performance by:

  • Only parsing the file once: Reduces time to around 130 ms
  • (Naively) checking if the content is yaml or json and potentially parse as the latter, as that seems to be much faster, reducing the runtime to ~30ms

All timings with warm page cache.

Description

Motivation and Context

Closes #

Screenshots (if appropriate):

How Has This Been Tested?

  • I have tested using MacOS
  • I have tested using Linux
  • I have tested using Windows

Checklist:

  • I have updated the documentation accordingly.
  • I have updated the tests accordingly.

@alvaroaleman alvaroaleman marked this pull request as draft June 11, 2024 02:18
@alvaroaleman
Copy link
Contributor Author

alvaroaleman commented Jun 11, 2024

Main workflow / Check if config schema is up to date (pull_request) Failing after 33s

How can this be fixed?

@alvaroaleman alvaroaleman marked this pull request as ready for review June 11, 2024 03:42
.map(String::from),
namespace: ctx_yaml["context"]["namespace"]
.as_str()
fn get_current_kube_context_name(document: &JsonOrYaml) -> Option<String> {
Copy link

@alexpovel alexpovel Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn get_current_kube_context_name(document: &JsonOrYaml) -> Option<String> {
fn get_current_kube_context_name(document: &JsonOrYaml) -> Option<&str> {

This avoids cloning, and thanks to lifetime elision, there's no need to specify lifetime annotations (Rust infers that the return value borrows from document, as there's no other option). The function return value seems to be used in places where read-only access/&str suffices (except for one place, where you then need to clone).

So this change doesn't improve performance in the sense of avoiding clones (before: 1, after: 1), but it's neat to lean on the side of 'least capability necessary'. The effect would start kicking in if this function were called more than once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 115 to 93
enum JsonOrYaml {
Json(JsonValue),
Yaml(Yaml),
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum JsonOrYaml {
Json(JsonValue),
Yaml(Yaml),
}
#[derive(Debug, Clone)]
enum Document {
Json(JsonValue),
Yaml(Yaml),
}
  • AorB is not a great abstraction, I feel like Document is a better name; the variable bound to this value is often called document in this file as well, makes sense!
  • it's good practice to derive the most common traits. Downstream crates wouldn't be able to otherwise. Debug is important for a baseline string representation (for logging, ...). Cloning is convenient. More is possible, but better to implement as-needed (this is subjective). More context: https://rust-lang.github.io/api-guidelines/interoperability.html#types-eagerly-implement-common-traits-c-common-traits (that book is a great resource anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 34 to 30
fn get_kube_ctx_components(
document: &JsonOrYaml,
current_ctx_name: &str,
) -> Option<KubeCtxComponents> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a great candidate for borrowing. Something along the lines of... (just as an example)

Suggested change
fn get_kube_ctx_components(
document: &JsonOrYaml,
current_ctx_name: &str,
) -> Option<KubeCtxComponents> {
fn get_kube_ctx_components<'doc>(
document: &'doc Document,
current_ctx_name: &str,
) -> Option<KubeCtxComponents<'doc>> {

with something like

#[derive(Default)]
struct KubeCtxComponents<'doc> {
    user: Option<&'doc str>,
    namespace: Option<&'doc str>,
    cluster: Option<&'doc str>,
}

This:

  • gets rid of clones (String::from)
  • is unlikely to improve performance in meaningful ways, while complicating (== adding more constraints to) the code... but if this PR is about performance, why not give it a shot...
  • might not work (depending on how this change affects other parts of the code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the tiny, likely not even measurable performance improvement is worth the added complexity

@@ -97,6 +108,12 @@ fn get_aliased_name<'a>(
}
}

#[derive(Debug)]
enum Document {
Copy link
Contributor

@jankatins jankatins Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really happy that this ends up in two different codepath everywhere, just to get a bit more speed in an edge case (That's at least how I perceive this: how often do you have such big kube configs?) :-(

Personally, I would also have expected to get a common data structure from both parser.

If this is acceptable, then I would expect an explanation in the code why this solution was chosen, so one does not have to look up the original commits to understand why two different parsers are used, either here where the data structure is defined or in the parser function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, the yaml-rust2 crate doesn't support serde yet and serde-yaml is unmaintained, but there are plans for this at the more actively developed branch of yaml-rust2.

Nevertheless, it might be possible to reduce duplication a bit with macro_rules! at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to make this better by using generics to not need to duplicate the logic that finds the relevant fields for the yaml and json case and also left a comment as to why to have the two codepaths.

I do agree kubeconfigs of this size are unusual but it is not "a bit" more speed, its about 77% faster so this makes a really huge difference once you do have such a kubeconfig.

Some(value) => match value.chars().next() {
// Parsing as json is about an order of magnitude faster than parsing
// as yaml, so do that if possible.
Some('{') => serde_json::from_str(&value).ok().map(Document::Json),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to fall back to YamlLoader::load_from_str if serde_json::from_str fails, even if the first character is {.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a fallback and test, ptal

This module currently takes about 200 ms when using our ~10MiB
kubeconfig. This change improves its performance by:
* Only parsing the file once
* (Naively) checking if the content is yaml or json and potentially
  parse as the latter, as that seems to be much faster
@andytom andytom changed the title perf: Improve performance of kubeconfig module perf(k8s): Improve performance of kubeconfig module Jun 16, 2024
@alvaroaleman
Copy link
Contributor Author

@davidkna gentle ping, any chance you could give this another look, please? Really appreciate it!

Copy link
Member

@davidkna davidkna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alvaroaleman
Copy link
Contributor Author

@davidkna what is the process to get a change like this actually merged?

@andytom andytom merged commit fae92b2 into starship:master Jul 16, 2024
21 checks passed
@andytom
Copy link
Member

andytom commented Jul 16, 2024

Thank you for your contribution @alvaroaleman and thanks for reviewing @davidkna, @jankatins and @alexpovel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants