Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pb 3707 large resource count support #1345

Merged
merged 4 commits into from
Apr 3, 2023

Conversation

lalat-das
Copy link
Contributor

@lalat-das lalat-das commented Mar 26, 2023

What type of PR is this? feature

Uncomment only one and also add the corresponding label in the PR:
bug
feature
improvement
cleanup
api-change
design
documentation
failing-test
unit-test
integration-test

What this PR does / why we need it: This adds support for taking backup which involves large count of resources. the pb-3696 explains in detail the root cause and its solution.

Does this PR change a user-facing CRD or CLI?: nope

Is a release note needed?: yes

Issue: Large Resource count based backup fails due to GRPC & other payload size checks
User Impact: backup with large resource count will fail
Resolution 
Stripped the resource Info from backup and restore CRs to eliminate the GRPC & etcd payload limit.

Does this change need to be cherry-picked to a release branch?: no

Unit test detail
Note: the stork changes are vendored to px-backup and some changes in px-backup is done to exercise these unit tests

Backup of 10K resources with single namespace
Create Backup
Size of backup is around 1.7MB
Pasted Graphic 1
backup succeed
Pasted Graphic 2
backup info
Pasted Graphic 6
The file shows resource.json 10000 resource
Pasted Graphic 7

**Multi name space with 35000 resources. CR size almost 7MB

Pasted Graphic 4
Pasted Graphic 8
Pasted Graphic 9

Restore Path Verification is done for below test cases

  1. Restore with retain enebaled , the restore became partial success which is expected (10K configMaps)
  2. restore with replace enabled (13k cm)
  3. restore for ns having both volume and configMaps.

Pasted Graphic 11
Pasted Graphic 10
image
image

@cnbu-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch from 7ed0ed9 to bd618ee Compare March 29, 2023 07:20
@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch 3 times, most recently from 9100ec9 to a9e5ccc Compare March 29, 2023 11:17
@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch from a9e5ccc to f860e56 Compare March 30, 2023 02:01
@lalat-das lalat-das added do-not-merge unit-test The change adds or updates a unit test labels Mar 30, 2023
@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch from bf0caff to ef5b6eb Compare April 1, 2023 21:04
@lalat-das lalat-das removed do-not-merge unit-test The change adds or updates a unit test labels Apr 1, 2023
@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch from 0a462a6 to d29b7f9 Compare April 2, 2023 09:43
Added two fields to applicationbackup application restore CR spec for

Signed-off-by: Lalatendu Das <ldas@purestorage.com>
- if the application backup cr size increase more than threshhold then
  strip the resource info
- update the large resource flag to true
- update the resource count so that mongo can refer it in px-backup side

Signed-off-by: Lalatendu Das <ldas@purestorage.com>
- if the application restore cr size increase more than threshhold
  then strip the resource info
- update the large resource enable flag to true
- update the resource count so that mongo can refer it in px-backup
  side
- Update the timestamp of restore CR when more thann 3000 resources are
  applied, so that it won't be considered a stale CR and get cleaned up.
- Used a temporary list to gather up all the resources, so that timestamp
  update of CR shouldn't overwhem the GRPC/etcd channel

Signed-off-by: Lalatendu Das <ldas@purestorage.com>
@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch 2 times, most recently from e909c8a to 2396bbd Compare April 3, 2023 06:36
@lalat-das
Copy link
Contributor Author

@siva-portworx Can you please verify the delete of go-routine logic that we discussed in the morning

- added a go routine to update the restore CR
- In the Restore Resourcei path sending signal on channel to
  go-routine in every 15 mins
- altered the caller of DeleteResources() to pass a nil channel

Signed-off-by: Lalatendu Das <ldas@purestorage.com>
@lalat-das lalat-das force-pushed the pb-3707-largeResourceCountSupport branch from 2396bbd to 010705f Compare April 3, 2023 08:52
@lalat-das lalat-das merged commit 7cc7baa into master Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants