Skip to content

Commit

Permalink
fix(k8s): avoid issues with NFS provisioner on node/pod eviction
Browse files Browse the repository at this point in the history
Enabling persistence hould avoid problems when pods
are evicted, e.g. due to resource constraints or node upgrades.

We're making the transition relatively seamless by creating a new
storage class, and switching from StatefulSet to Deployment (which
makes the underlying volume easier to manage and manipulate).

This needed a fork of the Helm chart, so we now need to track that
manually.
  • Loading branch information
edvald committed Oct 10, 2019
1 parent 698af0b commit 2f2eef8
Show file tree
Hide file tree
Showing 25 changed files with 767 additions and 33 deletions.
33 changes: 29 additions & 4 deletions docs/reference/providers/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,9 +536,9 @@ Storage parameters to set for the in-cluster builder, container registry and cod
These are all shared cluster-wide across all users and builds, so they should be resourced accordingly,
factoring in how many concurrent builds you expect and how large your images and build contexts tend to be.

| Type | Required | Default |
| -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `object` | No | `{"builder":{"size":20480,"storageClass":null},"registry":{"size":20480,"storageClass":null},"sync":{"size":10240,"storageClass":null}}` |
| Type | Required | Default |
| -------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `object` | No | `{"builder":{"size":20480,"storageClass":null},"nfs":{"storageClass":null},"registry":{"size":20480,"storageClass":null},"sync":{"size":10240,"storageClass":null}}` |

### `providers[].storage.builder`

Expand Down Expand Up @@ -572,6 +572,29 @@ Storage class to use for the volume.
| -------- | -------- | ------- |
| `string` | No | `null` |

### `providers[].storage.nfs`

[providers](#providers) > [storage](#providersstorage) > nfs

Storage parameters for the NFS provisioner, which we automatically create for the sync volume, _unless_
you specify a `storageClass` for the sync volume. See the below `sync` parameter for more.

Only applies when `buildMode` is set to `cluster-docker` or `kaniko`, ignored otherwise.

| Type | Required | Default |
| -------- | -------- | ----------------------- |
| `object` | No | `{"storageClass":null}` |

### `providers[].storage.nfs.storageClass`

[providers](#providers) > [storage](#providersstorage) > [nfs](#providersstoragenfs) > storageClass

Storage class to use as backing storage for NFS .

| Type | Required | Default |
| -------- | -------- | ------- |
| `string` | No | `null` |

### `providers[].storage.registry`

[providers](#providers) > [storage](#providersstorage) > registry
Expand Down Expand Up @@ -613,7 +636,7 @@ Storage parameters for the code sync volume, which build contexts are synced to
in-cluster builds.

Important: The storage class configured here has to support _ReadWriteMany_ access.
If you don't specify a storage class, Garden creates an NFS provisioner and provisions an ephemeral
If you don't specify a storage class, Garden creates an NFS provisioner and provisions an
NFS volume for the sync data volume.

Only applies when `buildMode` is set to `cluster-docker` or `kaniko`, ignored otherwise.
Expand Down Expand Up @@ -936,6 +959,8 @@ providers:
builder:
size: 20480
storageClass: null
nfs:
storageClass: null
registry:
size: 20480
storageClass: null
Expand Down
33 changes: 29 additions & 4 deletions docs/reference/providers/local-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,9 +536,9 @@ Storage parameters to set for the in-cluster builder, container registry and cod
These are all shared cluster-wide across all users and builds, so they should be resourced accordingly,
factoring in how many concurrent builds you expect and how large your images and build contexts tend to be.

| Type | Required | Default |
| -------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `object` | No | `{"builder":{"size":20480,"storageClass":null},"registry":{"size":20480,"storageClass":null},"sync":{"size":10240,"storageClass":null}}` |
| Type | Required | Default |
| -------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `object` | No | `{"builder":{"size":20480,"storageClass":null},"nfs":{"storageClass":null},"registry":{"size":20480,"storageClass":null},"sync":{"size":10240,"storageClass":null}}` |

### `providers[].storage.builder`

Expand Down Expand Up @@ -572,6 +572,29 @@ Storage class to use for the volume.
| -------- | -------- | ------- |
| `string` | No | `null` |

### `providers[].storage.nfs`

[providers](#providers) > [storage](#providersstorage) > nfs

Storage parameters for the NFS provisioner, which we automatically create for the sync volume, _unless_
you specify a `storageClass` for the sync volume. See the below `sync` parameter for more.

Only applies when `buildMode` is set to `cluster-docker` or `kaniko`, ignored otherwise.

| Type | Required | Default |
| -------- | -------- | ----------------------- |
| `object` | No | `{"storageClass":null}` |

### `providers[].storage.nfs.storageClass`

[providers](#providers) > [storage](#providersstorage) > [nfs](#providersstoragenfs) > storageClass

Storage class to use as backing storage for NFS .

| Type | Required | Default |
| -------- | -------- | ------- |
| `string` | No | `null` |

### `providers[].storage.registry`

[providers](#providers) > [storage](#providersstorage) > registry
Expand Down Expand Up @@ -613,7 +636,7 @@ Storage parameters for the code sync volume, which build contexts are synced to
in-cluster builds.

Important: The storage class configured here has to support _ReadWriteMany_ access.
If you don't specify a storage class, Garden creates an NFS provisioner and provisions an ephemeral
If you don't specify a storage class, Garden creates an NFS provisioner and provisions an
NFS volume for the sync data volume.

Only applies when `buildMode` is set to `cluster-docker` or `kaniko`, ignored otherwise.
Expand Down Expand Up @@ -837,6 +860,8 @@ providers:
builder:
size: 20480
storageClass: null
nfs:
storageClass: null
registry:
size: 20480
storageClass: null
Expand Down
15 changes: 15 additions & 0 deletions garden-service/src/plugins/kubernetes/commands/cluster-init.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
import { PluginCommand } from "../../../types/plugin/command"
import { prepareSystem, getEnvironmentStatus } from "../init"
import chalk from "chalk"
import { helm } from "../helm/helm-cli"
import { KubernetesPluginContext } from "../config"

export const clusterInit: PluginCommand = {
name: "cluster-init",
Expand All @@ -34,6 +36,19 @@ export const clusterInit: PluginCommand = {
})
}

const k8sCtx = ctx as KubernetesPluginContext

log.info("Cleaning up old resources...")

try {
await helm({
ctx: k8sCtx,
log,
namespace: "garden-system",
args: ["delete", "--purge", "garden-nfs-provisioner"],
})
} catch (_) { }

log.info(chalk.green("\nDone!"))

return { result }
Expand Down
24 changes: 21 additions & 3 deletions garden-service/src/plugins/kubernetes/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,13 @@ interface KubernetesResources {
}

interface KubernetesStorageSpec {
size: number
size?: number
storageClass: string | null
}

interface KubernetesStorage {
builder: KubernetesStorageSpec
nfs: KubernetesStorageSpec
registry: KubernetesStorageSpec
sync: KubernetesStorageSpec
}
Expand Down Expand Up @@ -125,6 +126,9 @@ export const defaultStorage: KubernetesStorage = {
size: 20 * 1024,
storageClass: null,
},
nfs: {
storageClass: null,
},
registry: {
size: 20 * 1024,
storageClass: null,
Expand Down Expand Up @@ -176,7 +180,7 @@ const storageSchema = (defaults: KubernetesStorageSpec) => joi.object()
.description("Volume size in megabytes."),
storageClass: joi.string()
.allow(null)
.default(null)
.default(defaults.storageClass)
.description("Storage class to use for the volume."),
})
.default(defaults)
Expand Down Expand Up @@ -312,6 +316,20 @@ export const kubernetesConfigBase = providerConfigBaseSchema
Only applies when \`buildMode\` is set to \`cluster-docker\`, ignored otherwise.
`),
nfs: joi.object()
.keys({
storageClass: joi.string()
.allow(null)
.default(null)
.description("Storage class to use as backing storage for NFS ."),
})
.default({ storageClass: null })
.description(dedent`
Storage parameters for the NFS provisioner, which we automatically create for the sync volume, _unless_
you specify a \`storageClass\` for the sync volume. See the below \`sync\` parameter for more.
Only applies when \`buildMode\` is set to \`cluster-docker\` or \`kaniko\`, ignored otherwise.
`),
registry: storageSchema(defaultStorage.registry)
.description(dedent`
Storage parameters for the in-cluster Docker registry volume. Built images are stored here, so that they
Expand All @@ -325,7 +343,7 @@ export const kubernetesConfigBase = providerConfigBaseSchema
in-cluster builds.
Important: The storage class configured here has to support _ReadWriteMany_ access.
If you don't specify a storage class, Garden creates an NFS provisioner and provisions an ephemeral
If you don't specify a storage class, Garden creates an NFS provisioner and provisions an
NFS volume for the sync data volume.
Only applies when \`buildMode\` is set to \`cluster-docker\` or \`kaniko\`, ignored otherwise.
Expand Down
32 changes: 20 additions & 12 deletions garden-service/src/plugins/kubernetes/init.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ import chalk from "chalk"
import { deline } from "../../util/string"
import { combineStates, ServiceStatusMap } from "../../types/service"

const nfsStorageClass = "garden-system-nfs"
// Note: We need to increment a version number here if we ever make breaking changes to the NFS provisioner StatefulSet
const nfsStorageClass = "garden-system-nfs-v2"

/**
* Performs the following actions to check environment status:
Expand Down Expand Up @@ -162,7 +163,11 @@ export async function prepareSystem(
// in the prepareEnvironment handler, instead of flagging as not ready here. This avoids blocking users where
// there's variance in configuration between users of the same cluster, that often doesn't affect usage.
if (!clusterInit && remoteCluster) {
if (combinedState === "outdated" && !serviceStates.includes("missing")) {
if (
combinedState === "outdated" &&
!serviceStates.includes("missing") &&
!(ctx.command && ctx.command.name === "plugins" && ctx.command.args.command === "cluster-init")
) {
log.warn({
symbol: "warning",
msg: chalk.yellow(deline`
Expand All @@ -178,12 +183,6 @@ export async function prepareSystem(
// We require manual init if we're installing any system services to remote clusters, to avoid conflicts
// between users or unnecessary work.
if (!clusterInit && remoteCluster && !systemReady) {
// Special-case so that this doesn't error when attempting to run the cluster init
const initCommandName = `plugins ${ctx.provider.name} cluster-init`
if (ctx.command && ctx.command.name === initCommandName) {
return {}
}

throw new KubernetesError(deline`
One or more cluster-wide system services are missing or not ready. You need to run
\`garden --env=${ctx.environmentName} plugins kubernetes cluster-init\`
Expand All @@ -198,6 +197,8 @@ export async function prepareSystem(
const sysProvider = await sysGarden.resolveProvider(k8sCtx.provider.name)
const sysCtx = <KubernetesPluginContext>await sysGarden.getPluginContext(sysProvider)

await sysGarden.clearBuilds()

await installTiller({ ctx: sysCtx, provider: sysCtx.provider, log, force })

// Install system services
Expand Down Expand Up @@ -231,6 +232,8 @@ export async function cleanupEnvironment({ ctx, log }: CleanupEnvironmentParams)
}

export function getKubernetesSystemVariables(config: KubernetesConfig) {
const syncStorageClass = config.storage.sync.storageClass || nfsStorageClass

return {
"namespace": systemNamespace,

Expand All @@ -241,22 +244,27 @@ export function getKubernetesSystemVariables(config: KubernetesConfig) {
"builder-limits-memory": megabytesToString(config.resources.builder.limits.memory),
"builder-requests-cpu": millicpuToString(config.resources.builder.requests.cpu),
"builder-requests-memory": megabytesToString(config.resources.builder.requests.memory),
"builder-storage-size": megabytesToString(config.storage.builder.size),
"builder-storage-size": megabytesToString(config.storage.builder.size!),
"builder-storage-class": config.storage.builder.storageClass,

// We only use NFS for the build-sync volume, so we allocate the space we need for that plus 1GB for margin.
"nfs-storage-size": megabytesToString(config.storage.sync.size! + 1024),
"nfs-storage-class": config.storage.nfs.storageClass,

"registry-limits-cpu": millicpuToString(config.resources.registry.limits.cpu),
"registry-limits-memory": megabytesToString(config.resources.registry.limits.memory),
"registry-requests-cpu": millicpuToString(config.resources.registry.requests.cpu),
"registry-requests-memory": megabytesToString(config.resources.registry.requests.memory),
"registry-storage-size": megabytesToString(config.storage.registry.size),
"registry-storage-size": megabytesToString(config.storage.registry.size!),
"registry-storage-class": config.storage.registry.storageClass,

"sync-limits-cpu": millicpuToString(config.resources.sync.limits.cpu),
"sync-limits-memory": megabytesToString(config.resources.sync.limits.memory),
"sync-requests-cpu": millicpuToString(config.resources.sync.requests.cpu),
"sync-requests-memory": megabytesToString(config.resources.sync.requests.memory),
"sync-storage-size": megabytesToString(config.storage.sync.size),
"sync-storage-class": config.storage.sync.storageClass || nfsStorageClass,
"sync-storage-size": megabytesToString(config.storage.sync.size!),
"sync-storage-class": syncStorageClass,
"sync-volume-name": `garden-sync-${syncStorageClass}`,
}
}

Expand Down
2 changes: 2 additions & 0 deletions garden-service/static/kubernetes/system/build-sync/garden.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ values:
storage:
request: ${var.sync-storage-size}
storageClass: ${var.sync-storage-class}
pvc:
name: ${var.sync-volume-name}
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ spec:
volumes:
- name: garden-build-sync
persistentVolumeClaim:
claimName: garden-build-sync
claimName: {{ .Values.pvc.name }}
containers:
- name: sync
image: "gardendev/rsync:0.1"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: garden-build-sync
name: {{ .Values.pvc.name }}
spec:
accessModes:
- ReadWriteMany
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ resources:
storage:
request: 2Gi

pvc:
name: garden-build-sync

nodeSelector: {}

tolerations: []
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,6 @@ values:
storage:
size: ${var.builder-storage-size}
storageClass: ${var.builder-storage-class}
buildSync:
volume:
name: ${var.sync-volume-name}
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ spec:
claimName: garden-docker-data
- name: garden-build-sync
persistentVolumeClaim:
claimName: garden-build-sync
claimName: {{ .Values.buildSync.volume.name }}
# - name: garden-registry-tls
# secret:
# secretName: foo
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ storage:
registry:
hostname: garden-docker-registry

buildSync:
pvc:
name: garden-build-sync

nodeSelector: {}

tolerations: []
Expand Down
17 changes: 17 additions & 0 deletions garden-service/static/kubernetes/system/nfs-provisioner/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: v1
appVersion: 2.2.3-0
description: nfs-server-provisioner is an out-of-tree dynamic provisioner for Kubernetes.
You can use it to quickly & easily deploy shared storage that works almost anywhere.
home: https://github.com/kubernetes/charts/tree/master/stable/nfs-server-provisioner
keywords:
- nfs
- storage
maintainers:
- email: kiall@macinnes.ie
name: kiall
- email: joaocc-dev@live.com
name: joaocc
name: nfs-server-provisioner
sources:
- https://github.com/kubernetes-incubator/external-storage/tree/master/nfs
version: 0.4.0-0
Loading

0 comments on commit 2f2eef8

Please sign in to comment.