Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEP-7] Adds shoot component interfaces for Migration and Restoration. #2511

Merged

Conversation

plkokanov
Copy link
Contributor

How to categorize this PR?

/area control-plane-migration
/kind enhancement
/priority normal

What this PR does / why we need it:
This PR adds interfaces which can be used by shoot components (namely for extension resources) if they need to be Migrated and then Restored during control plane migration.
The PR also implements these interfaces for the Network Component and fixes an issue which prevented the network resource from having its Restore annotation applied due to always running into conflict when update is called.

Which issue(s) this PR fixes:
Part of #1631

Special notes for your reviewer:
In the future we should do this for all botanist shoot components responsible for extension resources.

Release note:

Adds Migrator/Restorer interfaces to the botanist shoot components

@plkokanov plkokanov requested a review from a team as a code owner June 25, 2020 12:30
@gardener-robot gardener-robot added area/control-plane-migration Control plane migration related kind/enhancement Enhancement, improvement, extension priority/normal labels Jun 25, 2020
@plkokanov plkokanov changed the title Adds shoot component interfaces for Migration and Restoration. [WIP] Adds shoot component interfaces for Migration and Restoration. Jun 25, 2020
@plkokanov plkokanov force-pushed the cp-migration/migration-interfaces branch from 8bfd577 to 0a69bf1 Compare June 25, 2020 13:09
@plkokanov plkokanov changed the title [WIP] Adds shoot component interfaces for Migration and Restoration. Adds shoot component interfaces for Migration and Restoration. Jun 25, 2020
@plkokanov plkokanov mentioned this pull request Jun 25, 2020
@swilen-iwanow
Copy link
Contributor

Quick question, because I am starting to get conceptually confused with these interfaces. I thought the idea was to change the implementation of the deployer depending on the use case. If not what is the idea of the NewDefaultNetwork constructor about. In the end migrate or restore are either destroying the component or deploying it, couldn't we fit in the Deploy and Destroy methods instead?

@plkokanov
Copy link
Contributor Author

Quick question, because I am starting to get conceptually confused with these interfaces. I thought the idea was to change the implementation of the deployer depending on the use case. If not what is the idea of the NewDefaultNetwork constructor about. In the end migrate or restore are either destroying the component or deploying it, couldn't we fit in the Deploy and Destroy methods instead?

I managed to come up with a couple of options to add the ShootState from the botanist to the network component since we can't pass it in the component's constructor (the ShootState ptr is modifed during reconciliation):

  1. We add it as an additional parameter to the Deploy function, but then all components need to have this (really bad idea)
  2. We add a function argument to the Network component constructor - func() *ShootState and create a method botanist.GetShootState() *ShootState which will fetch the shoot state and pass it to the network
  3. Add additional interfaces as is in this PR.

Migrate will generally be the same for extension resources (just adding an annotation), but it could also be implement it for the ETCD (if that becomes a component) where it will make sure that a snapshot is taken.

I prefer the additional interfaces a bit more than option 2. (or adding some type of Injector interface)

@swilen-iwanow
Copy link
Contributor

Quick question, because I am starting to get conceptually confused with these interfaces. I thought the idea was to change the implementation of the deployer depending on the use case. If not what is the idea of the NewDefaultNetwork constructor about. In the end migrate or restore are either destroying the component or deploying it, couldn't we fit in the Deploy and Destroy methods instead?

I managed to come up with a couple of options to add the ShootState from the botanist to the network component since we can't pass it in the component's constructor (the ShootState ptr is modifed during reconciliation):

  1. We add it as an additional parameter to the Deploy function, but then all components need to have this (really bad idea)
  2. We add a function argument to the Network component constructor - func() *ShootState and create a method botanist.GetShootState() *ShootState which will fetch the shoot state and pass it to the network
  3. Add additional interfaces as is in this PR.

Migrate will generally be the same for extension resources (just adding an annotation), but it could also be implement it for the ETCD (if that becomes a component) where it will make sure that a snapshot is taken.

I prefer the additional interfaces a bit more than option 2. (or adding some type of Injector interface)

Exactly, and this is weird because in future, ShootState would be deployed as a component as well right. So possibly we are going to have multiple such dependencies between components. Are we going to expand these interfaces then, or what? I know that @mvladev had a similar case with the external DNS if I am not wrong, where he had to set the the service IP later in the flow.

func (b *Botanist) setAPIServerServiceClusterIP(clusterIP string) {
. Could it be done likewise

@rfranzke
Copy link
Member

TBH, I'm also confused why this PR is needed. I haven't really understood it out of the above explanation. As @swilen-iwanow suggested, can't we use the existing Deploy functions to keep it simple? I'm afraid such interfaces add additional complexity.

@rfranzke
Copy link
Member

/assign @swilen-iwanow @rfranzke

@plkokanov
Copy link
Contributor Author

The initial problem was because of this:

func (b *Botanist) DeployNetwork(ctx context.Context) error {
	if err := b.Shoot.Components.Network.Deploy(ctx); err != nil {
		return err
	}

	if b.isRestorePhase() {
		return b.restoreExtensionObject(ctx, b.K8sSeedClient.Client(), &extensionsv1alpha1.Network{
			ObjectMeta: metav1.ObjectMeta{
				Name:      b.Shoot.Info.Name,
				Namespace: b.Shoot.SeedNamespace,
			},
		}, extensionsv1alpha1.NetworkResource)
	}
	return nil
}

Since tryUpdate was removed from b.restoreExtensionObject() we do not try to Get the resource before updating it's status and therefore just operate on basically an empty resource (just with metadata). Generally, for the network this isn't a problem since we could add a check if there's no state to skip the update (the check is currently missing) but for the worker and infra resources we would run in the same problem so we have the following options:

We could re-add the Get method in b.restoreExtensionObject() and simply leave it as it is.

The other option is to make b.restoreExtensionObject() not rely on the botanist and add it to the network.Deploy() method. Then we would just need a simple method like setAPIServerServiceClusterIP(clusterIP string) to add the ShootState to the network component during reconciliation, right before Deploy() is called. Which was the initial idea.

But after discussing it really briefly with @mvladev and @danielfoehrKn I went with the Restore and Migrate methods/interfaces.

Anyway, maybe the new interfaces really add more complexity and aren't really necessary. @mvladev @danielfoehrKn wdyt?

@danielfoehrKn
Copy link
Contributor

Having Restore and Migrate interfaces that can optionally be implemented by components that have a state (all extension resources), does not sound like a bad idea to me.
In general I think we need to decouple the restore & migrate functionality from the botanist.

I ll have to look more into the details here tomorrow.

@rfranzke rfranzke changed the title Adds shoot component interfaces for Migration and Restoration. [GEP-7] Adds shoot component interfaces for Migration and Restoration. Jun 26, 2020
Copy link
Contributor

@danielfoehrKn danielfoehrKn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me adding shoot component interfaces for Migration and Restoration makes sense.
This way the restore and migrate operations are also decoupled from the botanist - easier to write unit tests.

@mvladev WDYT

pkg/operation/common/extensions.go Outdated Show resolved Hide resolved
pkg/operation/common/extensions.go Outdated Show resolved Hide resolved
pkg/operation/common/extensions.go Show resolved Hide resolved
pkg/operation/common/extensions.go Outdated Show resolved Hide resolved
pkg/operation/common/extensions.go Show resolved Hide resolved
Copy link
Contributor

@stoyanr stoyanr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the new interfaces make sense, and I find the new structure of the code cleaner than before. I only have a minor remark on code duplication. I haven't tested the migration itself though. Will try to do it and post an update if I find anything.

if err := b.Shoot.Components.Network.WaitMigrate(ctx); err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplication between the code that follows and the WaitUntilExtensionCRMigrated method - could this method be used below? Or could perhaps the common code be extracted into a helper function to avoid duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I should've extracted it in the common package similar to the RestoreExtensionObjectState method.

@stoyanr
Copy link
Contributor

stoyanr commented Jun 26, 2020

I encountered the following issue when trying to test this PR during the preparation phase:

msg="Failed to prepare Shoot \"i500152-gcp\" for migration: flow \"Shoot's control plane preparation for migration\" encountered task errors: [task \"Annotating BackupEntry in Seed with operation - migration\" failed: resource name may not be empty]" operation=reconcile shoot=garden-dev/i500152-gcp

I noticed that the backupentry I have is called shoot--dev--i500152-az--983b53da-7b98-4cac-a729-15b46ae32cd4, but the UID in the shoot status is 5f3e7eb9-ac3c-4880-a5bd-727118558a76. This causes getting the BackupEntry to fail with not found, but the not found is ignored by the code, so the subsequent Update fails with the above error.

Not sure if this is related to this PR but I guess it should be investigated.

@plkokanov
Copy link
Contributor Author

I encountered the following issue when trying to test this PR during the preparation phase:

msg="Failed to prepare Shoot \"i500152-gcp\" for migration: flow \"Shoot's control plane preparation for migration\" encountered task errors: [task \"Annotating BackupEntry in Seed with operation - migration\" failed: resource name may not be empty]" operation=reconcile shoot=garden-dev/i500152-gcp

I noticed that the backupentry I have is called shoot--dev--i500152-az--983b53da-7b98-4cac-a729-15b46ae32cd4, but the UID in the shoot status is 5f3e7eb9-ac3c-4880-a5bd-727118558a76. This causes getting the BackupEntry to fail with not found, but the not found is ignored by the code, so the subsequent Update fails with the above error.

Not sure if this is related to this PR but I guess it should be investigated.

Hmm, most likely isn't related. I didn't hit this error, but I'll see if I can reproduce it somehow.

@rfranzke
Copy link
Member

rfranzke commented Jul 3, 2020

/unhold as v1.7.0 is released
/needs rebase

@plkokanov
Copy link
Contributor Author

I executed the test a few times successfully. One issue that I came across is that sometimes (in one of my tests) the virtual machines for the nodes are recreated instead of preserved. Not sure if this could be related to this PR or not.

@swilen-iwanow ran into the same issue without this PR, so I guess it's not related but we should still investigate (however, add the fix with a different PR)

Anyway, I'll rebase and update asap.

@stoyanr
Copy link
Contributor

stoyanr commented Jul 9, 2020

/lgtm

@timebertt
Copy link
Member

/needs rebase

@timuthy
Copy link
Contributor

timuthy commented Jul 15, 2020

@danielfoehrKn can you have another look?

@danielfoehrKn
Copy link
Contributor

/lgtm - did not test the migration & restoration aspects of it.
After a rebase it is ready from my perspective.

@plkokanov
Copy link
Contributor Author

Rebased

@danielfoehrKn
Copy link
Contributor

@rfranzke if you want to have another look, otherwise looks good to me

@rfranzke
Copy link
Member

Thanks for checking, as it got approved already by you and Stoyan I'm okay with /merge.
Thanks again!

@rfranzke rfranzke merged commit 07c5488 into gardener:master Jul 20, 2020
@plkokanov plkokanov deleted the cp-migration/migration-interfaces branch July 29, 2020 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane-migration Control plane migration related kind/enhancement Enhancement, improvement, extension
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet