Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to take backup when there were about 500 secrets in local cluster #16

Closed
sowmyav27 opened this issue Aug 29, 2020 · 4 comments
Closed
Assignees

Comments

@sowmyav27
Copy link

What kind of request is this (question/bug/enhancement/feature request): bug

Steps to reproduce (least amount of steps as possible):

  • Create about 500 secrets in local cluster.
  • create a backup CR
  • Backup fails with error
  • Backup CR error:
status:
  conditions:
  - lastUpdateTime: "2020-08-29T22:34:25Z"
    message: the server was unable to return a response in the time allotted, but
      may still be processing the request
    reason: Error
    status: "False"
    type: Reconciling
  • backup-restore operator logs:
INFO[2020/08/29 22:33:25] Processing backup b2                         
INFO[2020/08/29 22:33:25] For backup CR b2, filename: default-b2-28e36ff0-c48a-41fd-9d71-5b760c5748e7-2020-08-29T22#33#25Z 
INFO[2020/08/29 22:33:25] Temporary backup path for storing all contents for backup CR b2 is /tmp/default-b2-28e36ff0-c48a-41fd-9d71-5b760c5748e7-2020-08-29T22#33#25Z680481891 
INFO[2020/08/29 22:33:25] Using resourceSet ecm-resource-set for gathering resources for backup CR b2 
INFO[2020/08/29 22:33:25] Gathering resources for backup CR b2         
INFO[2020/08/29 22:33:25] Gathering resources for groupVersion: v1     
INFO[2020/08/29 22:33:25] resource kind namespaces, matched regex ^namespaces$ 
INFO[2020/08/29 22:33:25] Gathering resources for groupVersion: v1     
INFO[2020/08/29 22:33:25] resource kind secrets, matched regex ^Secret$|^serviceaccounts$ 
INFO[2020/08/29 22:33:25] resource kind serviceaccounts, matched regex ^Secret$|^serviceaccounts$ 
ERRO[2020/08/29 22:34:25] error syncing 'default/b2': handler backups: the server was unable to return a response in the time allotted, but may still be processing the request, requeuing 
INFO[2020/08/29 22:34:25] Processing backup b2                         

Expected Result:
Backup should happen with NO error.

Other details that may be helpful:
After this the rancher server also crashed.

Environment information

  • Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI): master-head - commit id: 8d9cedde9
  • Installation option (single install/HA): HA
@mrajashree
Copy link
Contributor

I had slightly more than 500 secrets in a cluster and I was able to do a backup of them. Is this always reproducible on your setup?

@sowmyav27
Copy link
Author

@mrajashree I think these secrets had big values.

@mrajashree
Copy link
Contributor

mrajashree commented Sep 1, 2020

With the fix, we make multiple calls to the apiserver, so we are paginating the list call response to get a certain number (currently 200) of resources per call. So the call may seem to take time but it shouldn't timeout

@sowmyav27
Copy link
Author

Verified on master-head with backup restore operator tag: rc7

  • Create about 500 secrets in local cluster.
  • create a backup CR (resource modified to included all namespaces/resources created ^^^)
  • Bakcup has been created successfully.
  • Backup is available in S3.
  • User is able to restore from this backup successfully.

@maggieliu maggieliu transferred this issue from rancher/rancher Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants