Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gke provisioning #11025

Merged

Conversation

mantis-toboggan-md
Copy link
Member

@mantis-toboggan-md mantis-toboggan-md commented May 14, 2024

Summary

Fixes #8967 #10621

Occurred changes and/or fixed issues

Functionality should be on par with existing GKE provisioning in the Ember UI. Some Ember validation happens at same time: it's all form validation in this port. There are a handful of areas where I have deviated from Ember:

  • logging and monitoring services are hardcoded lists with two options, one of which is none: I'm using a checkbox instead of a dropdown.
  • node pool 'local ssd count/local ssd disks' is a GB input in Ember - this is inaccurate, the google ui describes the field as: "For every node in a node-pool, the number of local SSD disks are attached."
  • node pool preemptible is labeled as beta - it appears to be GA now https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
  • node pool has only 1 thing under 'show advanced options' so I just got rid of the show advanced toggle - seems excessive
  • (autoscaling) node minimum is not configurable in Ember. The input component has a max value of...itself. The input is now validated so that the minimum number of nodes specified can't exceed the maximum number of nodes specified
  • Ember had two 'description' inputs - one for the norman cluster object and one for the GKE cluster. I combined them in the same way that the 'name' input actually sets a norman cluster name and gkeconfig name (aks and eks have the same thing with names)
  • image type options list is shorter and simpler - my excuses justification for this is detailed in inline comments
  • Ember UI had a section called 'config' with what appears to be entirely networking stuff, plus a kubernetes version dropdown. I did not care for this, particularly given that the networking stuff is the most complicated part of the form, and re-arranged the sections so that the k8s version dropdown appears with location and labels ('general stuff' I guess)
  • in the Ember location can't be changed once you start configuring the rest of the form...I allow it to be changed and re-fetch relevant GCP data when it does
  • unlike EKS and AKS I've left projectid/cloud credential un-changeable once the user starts configuring the rest of the cluster (more in line with the Ember UX). This is because the project id field is an input, while EKS/AKS are able to rely entirely on dropdowns...the ux where we just automatically verify the new cred selection and re-fetch any data from azure/aws does not work well here

Areas or cases that should be tested

  • GKE provisioning generally
  • editing clusters when the underlying GKE cluster was provisioned successfully - most fields can't be edited, though new node pools can be added/fully configured
  • editing clusters when GKE provisioning fails (one way to achieve this state consistently is to use one of the spoofed 'shared subnetwork' options) - the user should be able to edit gke config fields they otherwise wouldn't be able to
  • validation: the nodepool name validation, min/max node validation is the most deviant and probably worthy of double-checking
  • adding/removing nodepools on edit
  • adding cluster members on create or on edit
  • in Ember I see a request for GKE service accounts but I don't see them used anywhere in the UI so I left it out. if you see something say something.

Areas which could experience regressions

This PR is pretty self-contained and should not impact areas outside of GKE provisioning. There is a minor stylistic update to ArrayList so it matches KeyValue (both are used next to eachother in node pools and the difference in spacing was apparent, and ugly)

Screenshot/Video

Checklist

  • The PR is linked to an issue and the linked issue has a Milestone, or no issue is needed
  • The PR has a Milestone
  • The PR template has been filled out
  • The PR has been self reviewed
  • The PR has a reviewer assigned
  • The PR has automated tests or clear instructions for manual tests and the linked issue has appropriate QA labels, or tests are not needed
  • The PR has reviewed with UX and tested in light and dark mode, or there are no UX changes

@@ -339,7 +339,7 @@ export default {
</div>
<div
v-if="showAdd && !isView"
class="footer"
class="footer mt-20"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to make this consistent with keyvalue styling

export function getGKESharedSubnetworks(store: Store<any>, cloudCredentialId: string, projectId: string, location: {zone?: string, region?: string}): Promise<getGKESharedSubnetworksResponse> {
// return getGKEOptions('gkeSharedSubnets', store, cloudCredentialId, projectId, location);
// TODO nb remove this test code
return Promise.resolve({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed before the PR is merged. Our dev GCP setup doesn't have permission to create shared subnets - this data was borrowed from rancher/ui#4790

@mantis-toboggan-md mantis-toboggan-md added this to the v2.9.0 milestone May 14, 2024
@mantis-toboggan-md mantis-toboggan-md marked this pull request as ready for review May 15, 2024 14:08
Copy link
Member

@richard-cox richard-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given the functional side a first pass, findings below

  • Cloud Credentials
    • When creating a Google Cloud Credential directly via the Cloud Credential page no project id is requested, however whilst on the create cloud cred steps when creating a cluster (when no cred found) it is
      • Is the project id something only storable / discernible in the cluster context?
    • When selecting a Google Cloud Credential via Cluster Create page
      • clicking the Create button without going through Authenticate gives the error notification cluster kubernetes version cannot be empty string. Can we disable the button if the user hasn't authed?
      • clicking the Create button without going through Authenticate, and then supplying the project id --> authenticate results in the rest of the form showing however the node pools section is empty
    • the red notification at top of the create cluster create cloud cred page saying cloud credential or project id missing pops up in odd ways
      • on a system with no cloud creds, when going to the cluster create page it's shown before the user hasn't had a chance to enter a project id, and will then blip in/out when the user enters/clears field
    • After selecting a Cloud Cred and clicking on Authenticate the button width changes to show the spinner which wiggles the two input field widths
  • Create
  • Create (form)
    • logging and monitoring services are hardcoded lists with two options, one of which is none: I'm using a checkbox instead of a dropdown.
      • Not sure about this suggestion, but should we inform the user of the service names in a tooltip?
    • Old UI Cluster Options --> Pod secondary Range Name is missing (but Pod Secondary CIDR Block is shown). Is this related to the getGKESharedSubnetworks comment?
      • Same with missing Services Secondary Range Name
    • New UI Networking --> Advanced Options
      • Can the checkboxes be vertical instead of horizontal? They all seem to affect each other in some way, and it would tie together CIDR Block key value component exposed via Master Authorized Network.
    • After filling in the form and clicking Create the contents of the Node Pool tabs is briefly shown with no content
    • Node Pools
      • Old UI Image Type has an additional option Windows Long Term Service Channel with Containerd (zonal, us-central1-c)
      • API Access Scopes --> Set access for each API. We've lost horizontal space in the UI given the tab bar. For me full screen this means only one column of select controls are shown. Given the content of these is small can we shrink them width wise to allow more column/s? Think maybe 2, 3 if they don't look squished?
  • Detail View - Config
    • This shows a red banner Network Error and the cloud creds + authenticate button. Pressing Authenticate shows another network error banner. For this use case is there an easy way to avoid the user having to press Authenticate?
  • Edit
    • Same as detail view config (red banner network error)

Bonus Point (not related to PR) - Create EKS --> Cluster Options --> shows a Kube Version of [1.29. Not sure where this comes from in hosted world, but it also happens in master

Edit: I've added a sections for Detail View - Config and Edit

pkg/gke/util/validators.ts Show resolved Hide resolved
pkg/gke/types/gcp.d.ts Outdated Show resolved Hide resolved
pkg/gke/util/oauth.ts Show resolved Hide resolved
Copy link
Member

@richard-cox richard-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the code a once over, just minor things.

pkg/gke/components/CruGKE.vue Show resolved Hide resolved
pkg/gke/components/CruGKE.vue Show resolved Hide resolved
pkg/gke/components/CruGKE.vue Outdated Show resolved Hide resolved
pkg/gke/components/Networking.vue Outdated Show resolved Hide resolved
pkg/gke/components/Networking.vue Outdated Show resolved Hide resolved
pkg/gke/components/Networking.vue Outdated Show resolved Hide resolved
pkg/gke/components/GKENodePool.vue Outdated Show resolved Hide resolved
pkg/gke/components/Config.vue Outdated Show resolved Hide resolved
pkg/gke/components/AdvancedOptions.vue Outdated Show resolved Hide resolved
pkg/gke/components/AccountAccess.vue Outdated Show resolved Hide resolved
@mantis-toboggan-md
Copy link
Member Author

mantis-toboggan-md commented Jun 10, 2024

  • Cloud Credentials

    • When creating a Google Cloud Credential directly via the Cloud Credential page no project id is requested, however whilst on the create cloud cred steps when creating a cluster (when no cred found) it is

That's an oversight on my part - the project id isn't required for cloud credential creation, just cluster creation. I've hidden the input while the user is creating cloud credentials.

  • Is the project id something only storable / discernible in the cluster context?

The cloud credential json has its project id in it, so when the user creates one inline, I've updated the form to automatically populate the project id input with the value from that JSON data (we can't do this in other scenarios because that data is not saved on the cloud credential itself). I opted not to automatically authenticate as well in the event that the user wants to use a different project id (honestly don't know how valid that is, our dev gcp access only has one project, but the old ui allows it)

  • When selecting a Google Cloud Credential via Cluster Create page

    • clicking the Create button without going through Authenticate gives the error notification cluster kubernetes version cannot be empty string. Can we disable the button if the user hasn't authed?
    • clicking the Create button without going through Authenticate, and then supplying the project id --> authenticate results in the rest of the form showing however the node pools section is empty
  • the red notification at top of the create cluster create cloud cred page saying cloud credential or project id missing pops up in odd ways

    • on a system with no cloud creds, when going to the cluster create page it's shown before the user hasn't had a chance to enter a project id, and will then blip in/out when the user enters/clears field

Above bugs should all be resolved now

  • After selecting a Cloud Cred and clicking on Authenticate the button width changes to show the spinner which wiggles the two input field widths

Honestly, I'm not sure what a good alternative would be here. Putting the button on a new line when the input widths are already huge looks weird, too.

  • Create
  • I created a cluster by adding just a name, no other changes, and it fails to come up
* rancher status `Waiting on gke crd to be initialized`, i don't see any clusters in https://console.cloud.google.com/kubernetes/list/overview?authuser=1&project= (is that the right place?)

discussed above offline

  • Create (form)

    • logging and monitoring services are hardcoded lists with two options, one of which is none: I'm using a checkbox instead of a dropdown.

      • Not sure about this suggestion, but should we inform the user of the service names in a tooltip?

I wasn't sure either so I did a little investigating. Googling those service names doesn't turn up much helpful information (nor many direct hits at all) so I'm not sure there would be any value added in showing them to the user. I did however notice that there are a handful of ways that logging and monitoring can be configured in GKE, so I added an info banner clarifying which aspects are enabled with these Rancher options.

Screenshot 2024-06-10 at 4 27 41 PM

  • Old UI Cluster Options --> Pod secondary Range Name is missing (but Pod Secondary CIDR Block is shown). Is this related to the getGKESharedSubnetworks comment?

    • Same with missing Services Secondary Range Name

My form logic was a bit off here. The Pod/Service Secondary Range Name inputs should be shown whenever a subnet is selected; I was showing them only when the subnet also had secondary ranges defined.

  • New UI Networking --> Advanced Options

    • Can the checkboxes be vertical instead of horizontal? They all seem to affect each other in some way, and it would tie together CIDR Block key value component exposed via Master Authorized Network.

Done - also changed the other networking checkboxes to be vertical for consistency:

Screenshot 2024-06-10 at 4 29 21 PM

  • After filling in the form and clicking Create the contents of the Node Pool tabs is briefly shown with no content

Should be fixed now

  • Node Pools

    • Old UI Image Type has an additional option Windows Long Term Service Channel with Containerd (zonal, us-central1-c)

I see this option in the new UI as well - do you perhaps mean Windows Semi-Annual Channel with Containerd ? I didn't include that in the new UI as it is deprecated in all versions of kubernetes supported in Rancher 2.9.x ( I did include the translation in in the event that someone edits an ooold GKE cluster in an up-to-date Rancher instance)

* API Access Scopes --> `Set access for each API`. We've lost horizontal space in the UI given the tab bar. For me full screen this means only one column of select controls are shown. Given the content of these is small can we shrink them width wise to allow more column/s? Think maybe 2, 3 if they don't look squished?

I've opted for two columns - I found a 2x9 matrix easier to scan through than 3x6

Screenshot 2024-06-10 at 4 37 09 PM

  • Detail View - Config

    • This shows a red banner Network Error and the cloud creds + authenticate button. Pressing Authenticate shows another network error banner. For this use case is there an easy way to avoid the user having to press Authenticate?

I'll admit, I forgot about the 'view config' mode in general. I've made a handful of small improvements around this: in addition to hiding the authenticate button, I've prevented any of the gcp api calls from running in view mode (no need to show loading spinners for disabled inputs). I also corrected up the display mode of a few inputs when they're disabled, and a few other minor tweaks related to loading gcp data.

  • Edit

    • Same as detail view config (red banner network error)

I think I fixed this as part of my view config fixes.

Bonus Point (not related to PR) - Create EKS --> Cluster Options --> shows a Kube Version of [1.29. Not sure where this comes from in hosted world, but it also happens in master

There have already been two issues filed about this :( I took a peek and know what's going on, but will add to an EKS-scoped PR.

Copy link
Member

@richard-cox richard-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the functionality another pass. Almost all issues resolved (i'll start removing them in the next pass)

  • Cloud Credentials

    • When creating a Google Cloud Credential directly via the Cloud Credential page no project id is requested, however whilst on the create cloud cred steps when creating a cluster (when no cred found) it is
      • The cloud credential json has its project id in it, so when the user creates one inline, I've updated the form to automatically populate the project id input with the value from that JSON data

      • Nice!
    • When selecting a Google Cloud Credential via Cluster Create page
      • clicking the Create button without going through Authenticate gives the error notification cluster kubernetes version cannot be empty string. Can we disable the button if the user hasn't authed?
      • clicking the Create button without going through Authenticate, and then supplying the project id --> authenticate results in the rest of the form showing however the node pools section is empty
    • the red notification at top of the create cluster create cloud cred page saying cloud credential or project id missing pops up in odd ways
      • on a system with no cloud creds, when going to the cluster create page it's shown before the user hasn't had a chance to enter a project id, and will then blip in/out when the user enters/clears field
    • After selecting a Cloud Cred and clicking on Authenticate the button width changes to show the spinner which wiggles the two input field widths
      • Honestly, I'm not sure what a good alternative would be here. Putting the button on a new line when the input widths are already huge looks weird, too.

      • We could add something like this below, which gives the area enough room so only the button changes and not forms
        min-width: 150px;
        display: flex;
        align-items: center;
        justify-content: center;
        
    • After adding junk creds the button shows an error state, but with incorrect text colour. Is this something within this PR or a general regression?
      • image
  • Create

    • I created a cluster by adding just a name, no other changes, and it fails to come up
  • Create (form)

    • logging and monitoring services are hardcoded lists with two options, one of which is none: I'm using a checkbox instead of a dropdown.
    • Old UI Cluster Options --> Pod secondary Range Name is missing (but Pod Secondary CIDR Block is shown). Is this related to the getGKESharedSubnetworks comment?
    • New UI Networking --> Advanced Options
      • Can the checkboxes be vertical instead of horizontal? They all seem to affect each other in some way, and it would tie together CIDR Block key value component exposed via Master Authorized Network.
      • Can the Master Authorized Network CIDR Block control go underneath the checkbox? just to keep them tied together
    • After filling in the form and clicking Create the contents of the Node Pool tabs is briefly shown with no content
    • Node Pools
      • Old UI Image Type has an additional option Windows Long Term Service Channel with Containerd (zonal, us-central1-c)
      • API Access Scopes --> Set access for each API. We've lost horizontal space in the UI given the tab bar. For me full screen this means only one column of select controls are shown. Given the content of these is small can we shrink them width wise to allow more column/s? Think maybe 2, 3 if they don't look squished?
  • Detail View - Config

    • This shows a red banner Network Error and the cloud creds + authenticate button. Pressing Authenticate shows another network error banner. For this use case is there an easy way to avoid the user having to press Authenticate?
    • Console in error when going to page
      • entry-helpers.js:53 TypeError: Cannot read properties of undefined (reading 'createSubnetwork')
        at Proxy.render (CruGKE.vue:465:1)
        
  • Detail View - Node list

    • Scale down failed with Cannot read properties of undefined (reading 'includes'). This points to somewhere that isn't hit when adding a breakpoint to and trying again (handleSpoofedRequest
  • Edit

    • Same as detail view config (red banner network error)
    • On going to Edit Config the first page has some floating buttons
      • image
    • Think this has a similar issue to edit EKS. Existing node pools cannot have their name set (disabled - expected) however new node pools also can't (no node pool name)

Bonus Point (not related to PR) - Create EKS --> Cluster Options --> shows a Kube Version of [1.29. Not sure where this comes from in hosted world, but it also happens in master

Edit: Added scale down issue

@mantis-toboggan-md
Copy link
Member Author

We could add something like this below, which gives the area enough room so only the button changes and not forms

...why didn't I think of that? Done.

After adding junk creds the button shows an error state, but with incorrect text colour. Is this something within this PR or a general regression?

This was caused by using therole-secondary class on the authenticate button; I've updated it to use the default 'role' styling (blue with white text).

Can the Master Authorized Network CIDR Block control go underneath the checkbox? just to keep them tied together

Sure. I played around with a few different arrangements and landed on this (screen recordings are ~1080p window, for context):

Screen.Recording.2024-06-14.at.1.06.13.PM.mov

I don't have strong feelings on these conditionally rendered inputs being to the side vs below the checkboxes, but I will note that they don't work well stacked below the checkboxes; one or the other will be fully out out view when enabling private cluster:

Screen.Recording.2024-06-14.at.12.43.08.PM.mov

I also updated the text on the add button to align better with what that keyvalue component corresponds to in the GCP ui.

Console in error when going to page

entry-helpers.js:53 TypeError: Cannot read properties of undefined (reading 'createSubnetwork')
at Proxy.render (CruGKE.vue:465:1)

Fixed by adding the Loading component

On going to Edit Config the first page has some floating buttons

I've added a fix to all 3 hosted providers. SelectCredential rendered before a value prop was provided and did not update itself accordingly...I 'fixed' this by adding a loading component to CruEKS CruGKE CruAKS, so SelectCredential doesn't render until cluster credential data has been fetched. I think SelectCredential is not handling the value prop correctly, but these 3 components should use the loading indicator anyway.

Think this has a similar issue to edit EKS. Existing node pools cannot have their name set (disabled - expected) however new node pools also can't (no node pool name)

I'm not seeing this issue in GKE - new node pools in a provisioned cluster can have their name altered; existing pools can have their name altered only if the cluster has not finished provisioning. I have however moved 'group details' to the top of the node tab, to be consistent with the other two providers.

@mantis-toboggan-md
Copy link
Member Author

Scale down failed with Cannot read properties of undefined (reading 'includes'). This points to somewhere that isn't hit when adding a breakpoint to and trying again (handleSpoofedRequest

I don't think that button should be shown at all for kontainer driver clusters: these are node groups but they're being shown as individual nodes that can be scaled down individually. I don't think this PR introduced the bug; I see the issue on master as well. I dont see the bug in 2.8-head however so it is some kinda regression. I'll keep investigating next week, but would prefer to tackle this in a separate PR since it's a shared component and the problem exists across kontainer driver clusters. #11246

@richard-cox richard-cox self-requested a review June 17, 2024 08:00
Copy link
Member

@richard-cox richard-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some new (in review, not regressions given pr changes) edit items remaining

  • Cloud Credentials
    • Nowt
  • Create (form)
    • Nowt
  • Detail View - Config
    • Nowt
  • Detail View - Node list
    • Nowt
  • Edit
    • Think this has a similar issue to edit EKS. Existing node pools cannot have their name set (disabled - expected) however new node pools also can't (no node pool name)
      • No longer seeing, or just dreamt it
    • Copying in the project id every time get's a bit tiresome. Is there anything we can do to improve this?
      • does it exist anywhere in the cluster (i think you confirmed already it's not in the cloud cred)?
      • would we be able to provide a list for the user to select?
    • Going to edit a single node pool cluster (with default/highest kube version), add a new node pool, Node Details kube version is replaced by check box with missing data
      • Upgrade the node pool version from to 1.29.5-gke.1121000
      • Not sure on the mechanism, but for the no-op world we should hide the checkbox (but show an empty section?)
    • existing values of gkeConfig.locations are ignored
      • cluster in region europe-west1. in edit page only europe-west1-b is shown selected (even when gkeConfig.locations contains 3 entries "europe-west1-b", "europe-west1-c", "europe-west1-d"). 'detail view - config' no extra zones are shown.

@richard-cox richard-cox self-requested a review June 19, 2024 07:26
richard-cox
richard-cox previously approved these changes Jun 20, 2024
Copy link
Member

@richard-cox richard-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some new (in review, not regressions given pr changes) edit items remaining

  • Edit
    • Copying in the project id every time get's a bit tiresome. Is there anything we can do to improve this?
      • Not sure what was happening before, but i see this there on edit now. for create it's fine leaving out for the mo
    • Going to edit a single node pool cluster (with default/highest kube version), add a new node pool, Node Details kube version is replaced by check box with missing data
    • existing values of gkeConfig.locations are ignored

});
}
if (this.region) {
const out = this.zonesByRegion[this.region] || [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only applicable if making another change, this can just be straight return d

@mantis-toboggan-md
Copy link
Member Author

mantis-toboggan-md commented Jun 20, 2024

  • Edit

    • Copying in the project id every time get's a bit tiresome. Is there anything we can do to improve this?

      • does it exist anywhere in the cluster (i think you confirmed already it's not in the cloud cred)?
      • would we be able to provide a list for the user to select?
    • Going to edit a single node pool cluster (with default/highest kube version), add a new node pool, Node Details kube version is replaced by check box with missing data

Thanks for reminding me; I've filed a backend issue rancher/rancher#45880 hopefully this is something we can add for 2.10... It should be possible to list projects through the gcp API, but we're using a subset of available gcp api options through the norman UI (similar to AKS) so we need support added there. Auto-filling when they createa a credential or edit a cluster is the most we can do on our side without that additional api call.

@mantis-toboggan-md mantis-toboggan-md merged commit ae9d1cb into rancher:master Jun 21, 2024
26 checks passed
nwmac pushed a commit to nwmac/dashboard that referenced this pull request Oct 24, 2024
* gke provisioning without validation or tests

* move version from config tab to location tab, rename config networking and location config

* add form validation

* sort en-us

mock gcp data

networking tests

config tests

fix upgrade np version  checkbox

fix account access on edit

typescript errors

authscopes util tests

* add private nodes warning banner and fix accountaccess on create

* lint

* update yarn lock to include ipaddr.js

* fix e2e test

* testing credential bug

* fix selectcredential not allowing new credential to be created

* automatically use project id when a new gcp credential is created; block saving cluster until project id is authenticated

* logging banner

* fix networking when subnet ha sno secondary ip ranges

* add svc account fetch and some node pool tests

* fix auth scopes styling and view config mode

* review feedback and ts errors

* no auto-authenticate

* fix lint

* update networking input placement

* add loading component to crugke, crueks, cruaks; shuffle gke node pool component inputs

* fix lint and loading import

* fix extra zone checkboxes on edit and add unit test

* fix gke node pool version checkbox when adding pools

* fix lint

* fix extrazoneoptions display in view config mode

* clean up extraZoneOptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move GKE from Ember to Vue
2 participants