Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status missed even if backup job succeeds (#3648) #3667

Merged
merged 1 commit into from Dec 30, 2020

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #3648 to release-1.1


What problem does this PR solve?

Fix the issue that in some cases some fields, e.g. BACKUPPATH, BACKUPSIZE, COMMITTS, etc. in the status of Backup CR or Restore CR are missed even if the backup or restore job succeed.
Fix #3635

What is changed and how does it work?

Root cause:
Take the Backup CR for example, and the Restore CR has a similar logic.
Currently, two controllers will update the status of Backup CR, one is the backup controller in the tidb-controller-manager Pod, which only updates the conditions slice in the status. The other is the controller in the backup-manager Pod which is the real backup job and will update all the fields in the status.
With the current status update logic, if something wrong with the update API call, the status updater will retrieve the latest Backup resource from the lister and replace the status field totally with the old status, so if the backup-manager Pod updates all the fields in the status and then the tidb-controller-manager Pod updates the conditions slice in the status, all the other fields updated by the backup-manager will be lost.
So we should update the status based on the latest status instead of the old status.

Code changes

  • Has Go code change
  • Has CI related scripts change

Tests

  • Unit test
  • E2E test
  • Manual test
    • Run BackupSchedule with BR
      • The clean of the Backup CRs works as expected
      • The status of the Backup is updated correctly even if an error occurs during the update:
E1230 07:06:10.482472       1 backup_status_updater.go:87] Failed to update backup [ns0/minio-br-2020-12-30t07-06-00], error: Operation cannot be fulfilled on backups.pingcap.com "minio-br-2020-12-30t07-06-00": the object has been modified; please apply your changes to the latest version and try again
I1230 07:06:10.498200       1 backup_status_updater.go:84] Backup: [ns0/minio-br-2020-12-30t07-06-00] updated successfully
# k get bk -n ns0
NAME                                  STATUS     BACKUPPATH                                                     BACKUPSIZE   COMMITTS             AGE
minio-br-2020-12-30t06-26-00          Complete   s3://backup/ns0/br/ns0-pd.ns0-2379-2020-12-30t06-26-00         278 kB       421870849284636673   38m
minio-br-2020-12-30t06-28-00          Complete   s3://backup/ns0/br/ns0-pd.ns0-2379-2020-12-30t06-28-00         278 kB       421870881108918275   36m
minio-br-2020-12-30t06-30-00          Complete   s3://backup/ns0/br/ns0-pd.ns0-2379-2020-12-30t06-30-00         278 kB       421870912146767873   34m
minio-br-2020-12-30t06-32-00          Complete   s3://backup/ns0/br/ns0-pd.ns0-2379-2020-12-30t06-32-00         278 kB       421870943866191875   32m
  • Run BackupSchedule with Dumpling
    • The clean of the Backup CRs works as expected
    • The status of the Backup is updated correctly even if an error occurs during the update:
E1230 07:08:09.489023       1 backup_status_updater.go:87] Failed to update backup [ns0/minio-dump-2020-12-30t07-08-00], error: Operation cannot be fulfilled on backups.pingcap.com "minio-dump-2020-12-30t07-08-00": the object has been modified; please apply your changes to the latest version and try again
I1230 07:08:09.890044       1 backup_status_updater.go:84] Backup: [ns0/minio-dump-2020-12-30t07-08-00] updated successfully
# k get bk -n ns0
NAME                                  STATUS     BACKUPPATH                                                     BACKUPSIZE   COMMITTS             AGE
minio-dump-2020-12-30t06-26-00        Complete   s3://backup/ns0/dumpling/backup-2020-12-30T06:26:16Z.tgz       721 B        421870849520566278   38m
minio-dump-2020-12-30t06-28-00        Complete   s3://backup/ns0/dumpling/backup-2020-12-30T06:28:15Z.tgz       721 B        421870880728809476   36m
minio-dump-2020-12-30t06-30-00        Complete   s3://backup/ns0/dumpling/backup-2020-12-30T06:30:15Z.tgz       721 B        421870912382697476   34m
  • Run Restore with BR and the status of the Restore is updated correctly even if an error occurs during the update:
E1230 06:45:05.777648       1 restore_status_updater.go:79] Failed to update resotre [ns1/br-minio0], error: Operation cannot be fulfilled on restores.pingcap.com "br-minio0": the object has been modified; please apply your changes to the latest version and try again
I1230 06:45:05.794604       1 restore_status_updater.go:76] Restore: [ns1/br-minio0] updated successfully
# k get rt -n ns1
NAME                 STATUS     STARTED                COMPLETED              COMMITTS             AGE
br-minio0            Complete   2020-12-30T06:44:53Z   2020-12-30T06:45:05Z   421870503280771073   28m
  • Run Restore with Lightning and the status of the Restore is updated correctly even if an error occurs during the update:
E1230 06:41:13.551646       1 restore_status_updater.go:79] Failed to update resotre [ns1/light-minio0], error: Operation cannot be fulfilled on restores.pingcap.com "light-minio0": the object has been modified; please apply your changes to the latest version and try again
I1230 06:41:13.568063       1 restore_status_updater.go:76] Restore: [ns1/light-minio0] updated successfully
# k get rt -n ns1
NAME                 STATUS     STARTED                COMPLETED              COMMITTS             AGE
light-minio0         Complete   2020-12-30T06:41:12Z   2020-12-30T06:41:13Z   421870441493430276   34m

Side effects

  • Breaking backward compatibility
  • Other side effects:

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release Notes

Please refer to Release Notes Language Style Guide before writing the release note.

Fix the issue that `commitTS` or other fields in status may be lost even if the backup job succeeds

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor Author

@DanielZhangQD please accept the invitation then you can push to the cherry-pick pull requests.
https://github.com/ti-srebot/tidb-operator/invitations

Copy link
Contributor

@DanielZhangQD DanielZhangQD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DanielZhangQD
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 7a6478e

@handlerww
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

@handlerww: /merge is only allowed for the committers in list.

In response to this:

/merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/ti-community-prow repository.

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (release-1.1@489e89b). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             release-1.1    #3667   +/-   ##
==============================================
  Coverage               ?   52.09%           
==============================================
  Files                  ?      153           
  Lines                  ?    14920           
  Branches               ?        0           
==============================================
  Hits                   ?     7773           
  Misses                 ?     6411           
  Partials               ?      736           
Flag Coverage Δ
unittest 52.09% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot merged commit 90ea4e5 into pingcap:release-1.1 Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants