Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1894267: Remove the rollbackcopier #494

Merged

Conversation

ironcladlou
Copy link
Contributor

The rollbackcopier container exists to support a 4.4->4.5 upgrade. Once
4.5 is running, the container is no longer needed provided a 4.4 backup
exists prior to the upgrade, which the rollbackcopier basically assures.

The copier itself incurs a performance penalty that 4.5 users shouldn't
be subject to once the upgrade is complete.

The rollbackcopier container exists to support a 4.4->4.5 upgrade. Once
4.5 is running, the container is no longer needed provided a 4.4 backup
exists prior to the upgrade, which the rollbackcopier basically assures.

The copier itself incurs a performance penalty that 4.5 users shouldn't
be subject to once the upgrade is complete.
@openshift-ci-robot
Copy link

@ironcladlou: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

Remove the rollbackcopier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2020
@ironcladlou
Copy link
Contributor Author

@hexfusion
Copy link
Contributor

/retest

Copy link
Contributor

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

This should be safe as we still have the command in 4.5.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, ironcladlou

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [hexfusion,ironcladlou]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

15 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Contributor

/override ci/prow/e2e-aws-disruptive

@openshift-ci-robot
Copy link

@hexfusion: Overrode contexts on behalf of hexfusion: ci/prow/e2e-aws-disruptive

In response to this:

/override ci/prow/e2e-aws-disruptive

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

@openshift-ci-robot
Copy link

@openshift-bot: This pull request references Bugzilla bug 1894267, which is invalid:

  • expected dependent Bugzilla bug 1896894 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Recalculating validity in case the underlying Bugzilla bug has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor

@sdodson this PR removes rollbackcopier from 4.5. The net gain is that the cluster does not perform snapshots every hour without users consent. We have seen some indications that these snapshots can lead to throttling events on certain providers.

As this is already removed from 4.6 I am going to manually give greens on valid Bugzilla.

@hexfusion hexfusion added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 18, 2020
@ecordell
Copy link

ecordell commented Nov 18, 2020

Could there not be users that are relying on this behavior? It seems strange to completely remove a component in a z-stream.

@hexfusion / @ironcladlou

@hexfusion
Copy link
Contributor

hexfusion commented Nov 18, 2020

@ecordell this was undocumented thus not a supported feature so I would not expect folks to be relying on it. We implemented it as a fail-safe for 4.4 -> 4.5 upgrades.

@ironcladlou
Copy link
Contributor Author

Could there not be users that are relying on this behavior? It seems strange to completely remove a component in a z-stream.

Anything's possible, but the facts are we have substantial evidence the function is causing performance problems for real users today, and we have no evidence of any users relying on it. If we became aware of users relying on it, we would be instructing them to cease relying upon it, and would feel justified in doing so because we never intended or communicated support for such uses in the first place. It was a single purpose tool to mitigate risk in a specific version upgrade, and beyond that it has only deleterious effects on a cluster.

Hope that helps justify the change.

@ecordell
Copy link

4.4 -> 4.5 upgrades are still supported - is there some reason why this is no longer needed this as a fail-safe when upgrading to newer 4.5.z?

@ecordell
Copy link

@hexfusion and I spoke about this some, recording the discussion here:

  • we don't think users are depending on this, but we don't have information to know for sure
  • the worst case is that someone was relying on this for etcd backups in 4.5, updates to a new 4.5.z with this removed, and sometime later has a disaster that requires etcd recovery only to find they don't have them.
  • the bug is currently low / medium, but the impact is really quite high/expensive for users that are hitting the io limits in their cloud provider

a couple of options to move forward:

  • gather some evidence that this isn't something a customer could reasonably have relied upon, and go ahead and remove
    or
  • look into options to disable this for any user that is having problems due to large etcd + small io budgets, with full removal in 4.6

@sdodson sdodson added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Nov 20, 2020
@sdodson
Copy link
Member

sdodson commented Nov 20, 2020

cherry-pick-approved on @ecordell behalf as he's got permissions issues on this repo

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

15 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit f1b7301 into openshift:release-4.5 Nov 21, 2020
@openshift-ci-robot
Copy link

@ironcladlou: All pull requests linked via external trackers have merged:

Bugzilla bug 1894267 has been moved to the MODIFIED state.

In response to this:

Bug 1894267: Remove the rollbackcopier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants