Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise the restoration process #57

Merged
merged 5 commits into from Nov 14, 2018

Conversation

shreyas-s-rao
Copy link
Collaborator

@shreyas-s-rao shreyas-s-rao commented Oct 10, 2018

What this PR does / why we need it:
This PR optimises the restoration process by parallelizing the fetching of delta snapshots.

Which issue(s) this PR fixes:
Fixes #41

Special notes for your reviewer:

Release note:

Restoration time optimised by parallelising the fetching of delta snapshots. Added the --max-fetchers flag to the etcdbrctl command to specify the maximum number of fetcher threads that are allowed to run in parallel.

@CLAassistant
Copy link

CLAassistant commented Oct 10, 2018

CLA assistant check
All committers have signed the CLA.

@swapnilgm swapnilgm added kind/enhancement Enhancement, improvement, extension status/in-progress Issue is in progress/work needs/review Needs review size/s Size of pull request is small (see gardener-robot robot/bots/size.py) component/etcd-backup-restore ETCD Backup & Restore exp/intermediate Issue that requires some project experience platform/all priority/normal area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related labels Oct 10, 2018
@shreyas-s-rao shreyas-s-rao changed the title Optimise the restoration process Optimise the restoration process (work-in-progress) Oct 10, 2018
@amshuman-kr
Copy link
Collaborator

As discussed last week, can we include some tests please? ;-)

Copy link
Contributor

@georgekuruvillak georgekuruvillak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!! I have some suggestions. Please check it out.

pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
@swapnilgm swapnilgm added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Oct 12, 2018
pkg/snapshot/restorer/restorer.go Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
@swapnilgm swapnilgm added this to the 0.4.0 milestone Oct 24, 2018
@shreyas-s-rao
Copy link
Collaborator Author

Code changes are finished and ready for review. Will push the unit tests shortly.

Copy link
Contributor

@swapnilgm swapnilgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR Shreyas. I have suggested some changes PTAL. I haven't traced the channel handling much. I'll go through another iteration of review once you address the comments. Also, please rebase the branch to master and work.

pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
pkg/snapshot/restorer/restorer.go Outdated Show resolved Hide resolved
@swapnilgm swapnilgm added the needs/rebase Needs git rebase label Oct 26, 2018
@shreyas-s-rao
Copy link
Collaborator Author

Will push unit tests shortly.

@shreyas-s-rao shreyas-s-rao changed the title Optimise the restoration process (work-in-progress) Optimise the restoration process Nov 5, 2018
Copy link
Collaborator

@amshuman-kr amshuman-kr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good too! If you can address the couple of questions, we can go to LGTM :-)

errCh <- fmt.Errorf("snap index mismatch for delta snapshot %d; expected snap index to be atleast %d", fetchedSnapIndex, nextSnapIndexToApply)
return
}
if fetchedSnapIndex == nextSnapIndexToApply {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if fetchedSnapIndex > nextSnapIndexToApply?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will just continue to the next iteration. The logic is: when fetchedSnapIndex == nextSnapIndexToApply, it applies that snap as well as any existing subsequent snaps till an unfetched snap is encountered, and it moves to the next iteration. This way, all snaps are applied in the right order.

DeltaSnapList: deltaSnapList,
}
err = rstr.Restore(restoreOptions)
Expect(err).Should(HaveOccurred())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to validate the error message also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will definitely look into it.

@swapnilgm swapnilgm removed needs/rebase Needs git rebase reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies labels Nov 12, 2018
@amshuman-kr
Copy link
Collaborator

LGTM

Copy link
Contributor

@swapnilgm swapnilgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Thanks for writing test cases as well. I have some concern regarding applier logic. Will you please address it?

r.logger.Infof("Cleanup complete")
}

// fetch fetches delta snapshots as events and persists them onto disk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/fetch/fetchSnaps/


// fetch fetches delta snapshots as events and persists them onto disk.
func (r *Restorer) fetchSnaps(fetcherIndex int, fetcherInfoCh <-chan fetcherInfo, applierInfoCh chan<- applierInfo, snapLocationsCh chan<- string, errCh chan<- error, stopCh chan bool, wg *sync.WaitGroup) {
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor:

defer wg.Done()

}
}

// apply applies delta snapshot events to the embedded etcd sequentially, in the right order of snapshots, regardless of the order in which they were fetched.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/apply/applySnaps/


// apply applies delta snapshot events to the embedded etcd sequentially, in the right order of snapshots, regardless of the order in which they were fetched.
func (r *Restorer) applySnaps(client *clientv3.Client, remainingSnaps snapstore.SnapList, applierInfoCh <-chan applierInfo, errCh chan<- error, stopCh <-chan bool, wg *sync.WaitGroup) {
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.


wg.Add(1)

eventsList := make([][]event, len(remainingSnaps))
Copy link
Contributor

@swapnilgm swapnilgm Nov 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need eventsList? It is kind of invalidating purpose of persisting fetched delta snapshot locally, isn't it? This way at one stage we might have data from all delta snapshots in memory, which will lead to unnecessary memory consumptions and OOM issues. Please read the persisted snapshots in memory in order of index and apply it. If for current index, snapshot is not yet fetched, wait until it gets fetched.

@swapnilgm swapnilgm added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 13, 2018
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 13, 2018
@swapnilgm swapnilgm added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/review Needs review labels Nov 14, 2018
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 14, 2018
@swapnilgm swapnilgm merged commit 9d149af into gardener:master Nov 14, 2018
@shreyas-s-rao shreyas-s-rao deleted the restore-parallelization branch January 23, 2019 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance (across all domains, such as control plane, networking, storage, etc.) related component/etcd-backup-restore ETCD Backup & Restore exp/intermediate Issue that requires some project experience kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) platform/all reviewed/lgtm Has approval for merging size/s Size of pull request is small (see gardener-robot robot/bots/size.py) status/in-progress Issue is in progress/work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise the restoration process
6 participants