New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1829836: Fix performance issues when deploying Metering on Ansible 2.9.6+ versions. #1200
Bug 1829836: Fix performance issues when deploying Metering on Ansible 2.9.6+ versions. #1200
Conversation
8a7ef3c
to
18927d4
Compare
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
18927d4
to
0d52c08
Compare
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
0d52c08
to
3c8a6af
Compare
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Not too sure what's happening as I was able to get this working locally on a 4.5 cluster, but having a hard time getting this working on a 4.4 one. |
7628eff
to
237b1bf
Compare
…h k8s_info. The k8s_fact module is being depreciated in 2.9 in favor of the [`k8s_info` module](https://docs.ansible.com/ansible/latest/modules/k8s_info_module.html)
237b1bf
to
31c5d56
Compare
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In 2.9.6, changes were made to how the template caching mechanism works. In previous versions, all variables (i.e. jinja2 expressions) were being cached, instead of the intended behavior which is a single variable gets cached. This was obviously problematic, especially in the case where you're trying to generate a password a couple of times, which meant the first password result gets cached, and any subsequent calls to generating passwords would result in the value of the first, cached password being assigned to the corresponding variable. In the case of Metering, our Ansible role heavily relies on the notion of lazy evaluation - Ansible evaluates any variables at the last possible second. When this change to 2.9.6 was made, Metering saw a significant performance degradation, going from an average of 3-7 minutes to finish role execution, to 45+ minutes. This was because we were relying on a buggy implementation of the templating caching mechanism. In our current implementation, we pass a variable to the name field of the module. This meant that this module needed to make a filesystem lookup call many, many times for a particular value stored in this meteringconfig_spec variable, which is a large value dictionary and essentially serves as the single source of truth that Metering references. A more concrete example: ```yaml ASK [meteringconfig : Check for the existence of the Presto TLS secret] ******* task path: /opt/ansible/roles/meteringconfig/tasks/configure_presto_tls.yml:15 Wednesday 29 April 2020 21:43:56 +0000 (0:00:00.380) 0:00:38.695 ******* File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file File lookup using /opt/ansible/charts/openshift-metering/values.yaml as file ... ``` That variable in turn also relies on the value of the `meteringconfig_default_values`, which is a dictionary containing the default helm chart values. Previously, the result of that expression was cached. Now, due to how lazy evaluation works, the `meteringconfig_default_values` needed to be re-evaluated every time it's used, causing the aforementioned performance issues. The workaround is to cache or finalize, for a lack of a better phrase, the resultant of this `meteringconfig_default_values` expression. This would help performance as we're no longer re-evaluating this variable every time we want to template something, and all changes made go through the meteringconfig_spec dictionary so there's no risk of creating this variable early in the role. Also included in the changeset is the switch to the [`k8s_info` module](https://docs.ansible.com/ansible/latest/modules/k8s_info_module.html), as the `k8s_facts` module was depreciated in 2.9, and the `k8s_info` has the same usage.
31c5d56
to
f578925
Compare
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: EmilyM1, timflannagan1 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.4 |
@timflannagan1: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@timflannagan1: This pull request references Bugzilla bug 1829836, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@timflannagan1: All pull requests linked via external trackers have merged: kube-reporting/metering-operator#1200. Bugzilla bug 1829836 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.4 |
This PR was merged in the time period when ci-operator was mistakenly reporting failed tests as passing. If this repository has ci-operator jobs, please inspect their test results even if passing, and consider the need for fixing or even reverting. |
In 2.9.6, changes were made to how the template caching mechanism works. In previous versions, all variables (i.e. jinja2 expressions) were being cached, instead of the intended behavior which is a single variable gets cached. This was obviously problematic, especially in the case where you're trying to generate a password a couple of times, which meant the first password result gets cached, and any subsequent calls to generating passwords would result in the value of the first, cached password being assigned to the corresponding variable.
In the case of Metering, our Ansible role heavily relies on the notion of lazy evaluation - Ansible evaluates any variables at the last possible second. When this change to 2.9.6 was made, Metering saw a significant performance degradation, going from an average of 3-7 minutes to finish role execution, to 45+ minutes.
This was because we were relying on a buggy implementation of the templating caching mechanism. In our current implementation, we pass a variable to the name field of the
k8s_info
module. This meant that this module needed to make a filesystem lookup call many, many times for a particular value stored in thismeteringconfig_spec
variable, which is a large value dictionary and essentially serves as the single source of truth that Metering references.A more concrete example:
That variable in turn also relies on the value of the
meteringconfig_default_values
, which is a dictionary containing the default helm chart values. Previously, the result of that expression was cached. Now, due to how lazy evaluation works, themeteringconfig_default_values
needed to be re-evaluated every time it's used, causing the aforementioned performance issues. The workaround is to cache or finalize, for a lack of a better phrase, the resultant of thismeteringconfig_default_values
expression.This would help performance as we're no longer re-evaluating this variable every time we want to template something, and all changes made go through the meteringconfig_spec dictionary so there's no risk of creating this variable early in the role.
Also included in the changeset is the switch to the
k8s_info
module, as thek8s_facts
module was depreciated in 2.9, and thek8s_info
has the same usage.