Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alibaba: support existing VPC, VSwitchs and PrivateZone #5379

Merged

Conversation

bd233
Copy link
Contributor

@bd233 bd233 commented Nov 12, 2021

This PR is based on #5378
If the user wants the Installer to install the cluster in an existing network, it is now supported.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 12, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 12, 2021

Hi @bd233. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 16, 2021
Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, please refrain from adding so many separate parts in one commit.
For example, it would have been better to have one commit each for the existing VPC, the existing VSwitches, and the existing private zones. They could also be done as separate PRs, which would make it much faster to get approval.

data/data/alibabacloud/cluster/outputs.tf Outdated Show resolved Hide resolved
pkg/asset/installconfig/alibabacloud/alibabacloud.go Outdated Show resolved Hide resolved
pkg/types/alibabacloud/platform.go Outdated Show resolved Hide resolved
data/data/alibabacloud/cluster/outputs.tf Outdated Show resolved Hide resolved
@@ -12,6 +12,27 @@ type Platform struct {
// +optional
ResourceGroupID string `json:"resourceGroupID"`

// VpcID is the ID of an already existing VPC where the cluster should be installed.
// If empty, a new VPC will created for the cluster.
// Destroying the cluster using installer will delete this VPC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the cluster destroy delete the VPC? Is that because the VPC must be in the existing resource group? If so, then we are missing some validation. If not, then I don't think that the cluster destroy should delete the existing VPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no clear description here. If VpcID is empty, a new VPC will be created, and released when the cluster is destroyed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to remove the language around deleting the VPC.

	// VpcID is the ID of the VPC into which the cluster's machines will be created.
	// Leave the VPC ID unset to have the installer create a VPC on your behalf.

pkg/types/alibabacloud/platform.go Outdated Show resolved Hide resolved
func validateVSwitchs(client *Client, ic *types.InstallConfig, path *field.Path) field.ErrorList {
allErrs := field.ErrorList{}

if ic.AlibabaCloud.VpcID == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked via static validation in pkg/types/alibabacloud/validation/platform.go.

// +optional
VpcID string `json:"vpcID,omitempty"`

// VSwitchIDs is the ID list of already existing VSwitchs where the master should be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Globally, replace VSwitchs with VSwitches.

Suggested change
// VSwitchIDs is the ID list of already existing VSwitchs where the master should be created.
// VSwitchIDs is the ID list of already existing VSwitches where the masters should be created.

// +optional
VpcID string `json:"vpcID,omitempty"`

// VSwitchIDs is the ID list of already existing VSwitchs where the master should be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vswitches are not just for the masters.

@bd233
Copy link
Contributor Author

bd233 commented Nov 16, 2021

@staebler I have updated, please continue to review

@jstuever
Copy link
Contributor

/uncc

@openshift-ci openshift-ci bot removed the request for review from jstuever November 16, 2021 22:03
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2021
@patrickdillon
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 22, 2021
data/data/alibabacloud/cluster/dns/privatezone.tf Outdated Show resolved Hide resolved
type = list(string)
description = "The VSwitch IDs of the master ECS. Example: [vsw-xxx1, vsw-xxx2, vsw-xxx3]"
description = "The availability zones in which to create the masters. The length of this list must match master_count."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description = "The availability zones in which to create the masters. The length of this list must match master_count."
description = "The availability zones in which to create the masters. The length of this list must match instance_count."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bd233 Please update, thanks.

data/data/alibabacloud/cluster/dns/privatezone.tf Outdated Show resolved Hide resolved
resource "alicloud_vswitch" "vswitchs" {
count = length(var.zone_ids)
resource "alicloud_vswitch" "vswitches" {
count = length(var.vswitch_ids) == 0 ? length(var.zone_ids) : 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This count is not correct when the installer is creating the vSwitches. Using length(var.zone_ids) assumes that the zones in var.zone_ids are unique. If the master replica count is greater than the number of availability zones used, then this will create multiple vSwitches in some of the zones, and those extras will be omitted from the mapping from zone ID to vSwitch ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can solve this by length(toset(var.zone_ids)), but I think this may not be the most suitable solution. I should ensure that the zones in var_zone_ids is unique, at the same time, the length of ali_zone_ids need not be equal master_count. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, var.zone_ids should be a distinct list of availability zones. It should also include zones for the workers, which appears to be missing currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for your reminder, the worker zones is missing currently, I will create a new PR to update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, var.zone_ids should be a distinct list of availability zones. It should also include zones for the workers, which appears to be missing currently.

@staebler #5438

if ic.AlibabaCloud.VpcID != "" {
allErrs = append(allErrs, validateVpc(client, ic, path)...)
}
if ic.AlibabaCloud.VSwitchIDs != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ic.AlibabaCloud.VSwitchIDs != nil {
if len(ic.AlibabaCloud.VSwitchIDs) != 0 {


// VSwitchIDs is the ID list of already existing VSwitches where cluster resources will be created.
// If empty, the new VSwitches will created for the cluster,
// and destroying the cluster using installer will delete these VSwitches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for the comment regarding deleting the VPC.

// +optional
VpcID string `json:"vpcID,omitempty"`

// VSwitchIDs is the ID list of already existing VSwitches where cluster resources will be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that since the vSwitch is part of the VPC, that the vSwitches can only be user-provided if the VPC is also user-provided. If so, then there should (1) be text to that affect in this description, (2) be validation that the VPC ID is specified when the vSwitch IDs are specified, and (3) be validation that the vSwitches are part of the VPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding (2) and (3), relevant validation have already been made, and I will add text in this description

that the vSwitches can only be user-provided if the VPC is also user-provided.

I disagree with this point. I think it’s okay for the user to only provide the VPC ID, and the installer will create the VSwitches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought that I had deleted parts (2) and (3) after re-checking that the validation was indeed already there.

that the vSwitches can only be user-provided if the VPC is also user-provided.

I disagree with this point. I think it’s okay for the user to only provide the VPC ID, and the installer will create the VSwitches

Yes, I agree that it is OK to use a user-provided VPC and installer-generated VSwitches. My point is the converse. If the user provides the VSwitches, then they must all provide the VPC. I would like to call that out specifically in the description for the field so that the user does not assume that the installer will deduce the VPC to use from the VSwitches (which the installer could presumably do).

@@ -4,9 +4,17 @@ locals {
prefix = var.cluster_id
newbits = tonumber(split("/", var.vpc_cidr_block)[1]) < 16 ? 20 - tonumber(split("/", var.vpc_cidr_block)[1]) : 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit not related to this PR: This would be a lot easier to read if it used the max function.

Suggested change
newbits = tonumber(split("/", var.vpc_cidr_block)[1]) < 16 ? 20 - tonumber(split("/", var.vpc_cidr_block)[1]) : 4
newbits = max(4, 20 - tonumber(split("/", var.vpc_cidr_block)[1]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #5414

allErrs = append(allErrs, field.Required(fldPath.Child("vpcID"), "when using existing VSwitches, an existing VPC must be used"))
}

vswitchIDs := []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a map instead of a list.

}
}
if isRepeated {
allErrs = append(allErrs, field.Duplicate(fldPath.Child("vswitchIDs"), vswitchID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the index to the duplicate in the field path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to include the index in the field path.

Suggested change
allErrs = append(allErrs, field.Duplicate(fldPath.Child("vswitchIDs"), vswitchID))
allErrs = append(allErrs, field.Duplicate(fldPath.Child("vswitchIDs").Index(i), vswitchID))

@@ -7,11 +7,32 @@ type Platform struct {

// ResourceGroupID is the ID of an already existing resource group where the cluster should be installed.
// This resource group must be empty with no other resources when trying to use it for creating a cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this strictly true? Or could the user-supplied VPC be in the resource group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it has no effect if it is not empty. we will not query and destroy resources through resource group. So this description is incorrect, I will delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the destroyer delete the resource group if it is user-supplied?

Comment on lines 7 to 9
VpcID string `json:"vpcID"`
VSwitchIDs []string `json:"vswitchIDs"`
PrivateZoneID string `json:"privateZoneID"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way are these values going to be used while destroying the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that these are not used, I will delete it.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2021
@openshift-ci openshift-ci bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 24, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 26, 2021
@bd233
Copy link
Contributor Author

bd233 commented Nov 29, 2021

@staebler I should have updated everything, please continue to review

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@kwoodson
Copy link
Contributor

@staebler Looks like an error in the AWS terraform generation. Any ideas on this one?

level=error
level=error msg=Error: Provider produced inconsistent result after apply
level=error
level=error msg=When applying changes to module.vpc.aws_vpc_dhcp_options_association.main[0],
level=error msg=provider "registry.terraform.io/-/aws" produced an unexpected new value for
level=error msg=was present, but now absent.
level=error
level=error msg=This is a bug in the provider, which should be reported in the provider's own
level=error msg=issue tracker.
level=error
level=fatal msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply Terraform: failed to complete the change

@mtulio
Copy link
Contributor

mtulio commented Dec 15, 2021

@kwoodson this looks like a possible flake similar to what was reported in hashicorp/terraform-provider-aws#16142
/retest-required

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@mtulio
Copy link
Contributor

mtulio commented Dec 15, 2021

/test e2e-aws

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2021

@bd233: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-libvirt 4be9a0b link false /test e2e-libvirt
ci/prow/e2e-aws-single-node 4be9a0b link false /test e2e-aws-single-node
ci/prow/e2e-aws-workers-rhel7 4be9a0b link false /test e2e-aws-workers-rhel7
ci/prow/okd-e2e-aws 4be9a0b link false /test okd-e2e-aws
ci/prow/e2e-aws-fips 4be9a0b link false /test e2e-aws-fips
ci/prow/e2e-crc 4be9a0b link false /test e2e-crc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 1ecfddc into openshift:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants