diff --git a/docs/website/content/docs/v0.2/Configuration/environments.md b/docs/website/content/docs/v0.2/Configuration/environments.md new file mode 100644 index 000000000..be9309df7 --- /dev/null +++ b/docs/website/content/docs/v0.2/Configuration/environments.md @@ -0,0 +1,78 @@ +--- +description: "" +weight: 1 +--- + +# Environments + +Environments are a custom resource provided by the Metal Controller Manager. +An environment is a codified description of what should be returned by the PXE server when a physical server attempts to PXE boot. + +Especially important in the environment types are the kernel args. +From here, one can tweak the IP to the metadata server as well as various other kernel options that [Talos](https://www.talos.dev/docs/v0.8/introduction/getting-started/#kernel-parameters) and/or the Linux kernel supports. + +Environments can be supplied to a given server either at the Server or the ServerClass level. +The hierarchy from most to least respected is: + +- `.spec.environmentRef` provided at `Server` level +- `.spec.environmentRef` provided at `ServerClass` level +- `"default"` `Environment` created by administrator + +A sample environment definition looks like this: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: Environment +metadata: + name: default +spec: + kernel: + url: "https://github.com/talos-systems/talos/releases/download/v0.8.1/vmlinuz-amd64" + sha512: "" + args: + - init_on_alloc=1 + - init_on_free=1 + - slab_nomerge + - pti=on + - consoleblank=0 + - random.trust_cpu=on + - ima_template=ima-ng + - ima_appraise=fix + - ima_hash=sha512 + - console=tty0 + - console=ttyS1,115200n8 + - earlyprintk=ttyS1,115200n8 + - panic=0 + - printk.devkmsg=on + - talos.platform=metal + - talos.config=http://$PUBLIC_IP:9091/configdata?uuid= + initrd: + url: "https://github.com/talos-systems/talos/releases/download/v0.8.1/initramfs-amd64.xz" + sha512: "" +``` + +Example of overriding `"default"` `Environment` at the `Server` level: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: Server +... +spec: + environmentRef: + namespace: default + name: boot + ... +``` + +Example of overriding `"default"` `Environment` at the `ServerClass` level: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: ServerClass +... +spec: + environmentRef: + namespace: default + name: boot + ... +``` diff --git a/docs/website/content/docs/v0.2/Configuration/metadata.md b/docs/website/content/docs/v0.2/Configuration/metadata.md new file mode 100644 index 000000000..5ea473a16 --- /dev/null +++ b/docs/website/content/docs/v0.2/Configuration/metadata.md @@ -0,0 +1,30 @@ +--- +description: "" +weight: 4 +--- + +# Metadata + +The Metadata server manages the Machine metadata. +In terms of Talos (the OS on which the Kubernetes cluster is formed), this is the +"[machine config](https://www.talos.dev/docs/v0.8/reference/configuration/)", +which is used during the automated installation. + +## Talos Machine Configuration + +The configuration of each machine is constructed from a number of sources: + +- The Talos bootstrap provider. +- The `Cluster` of which the `Machine` is a member. +- The `ServerClass` which was used to select the `Server` into the `Cluster`. +- Any `Server`-specific patches. + +The base template is constructed from the Talos bootstrap provider, using data from the associated `Cluster` manifest. +Then, any configuration patches are applied from the `ServerClass` and `Server`. + +Only configuration patches are allowed in the `ServerClass` and `Server` resources. +These patches take the form of an [RFC 6902](https://tools.ietf.org/html/rfc6902) JSON (or YAML) patch. +An example of the use of this patch method can be found in [Patching Guide](../../guides/patching/). + +Also note that while a `Server` can be a member of any number of `ServerClass`es, only the `ServerClass` which is used to select the `Server` into the `Cluster` will be used for the generation of the configuration of the `Machine`. +In this way, `Servers` may have a number of different configuration patch sets based on which `Cluster` they are in at any given time. diff --git a/docs/website/content/docs/v0.2/Configuration/serverclasses.md b/docs/website/content/docs/v0.2/Configuration/serverclasses.md new file mode 100644 index 000000000..f605aaa02 --- /dev/null +++ b/docs/website/content/docs/v0.2/Configuration/serverclasses.md @@ -0,0 +1,33 @@ +--- +description: "" +weight: 3 +--- + +# Server Classes + +Server classes are a way to group distinct server resources. +The "qualifiers" key allows the administrator to specify criteria upon which to group these servers. +There are currently three keys: `cpu`, `systemInformation`, and `labelSelectors`. +Each of these keys accepts a list of entries. +The top level keys are a "logical AND", while the lists under each key are a "logical OR". +Qualifiers that are not specified are not evaluated. + +An example: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: ServerClass +metadata: + name: default +spec: + qualifiers: + cpu: + - manufacturer: Intel(R) Corporation + version: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz + - manufacturer: Advanced Micro Devices, Inc. + version: AMD Ryzen 7 2700X Eight-Core Processor + labelSelectors: + - "my-server-label": "true" +``` + +Servers would only be added to the above class if they had _EITHER_ CPU info, _AND_ the label associated with the server resource. diff --git a/docs/website/content/docs/v0.2/Configuration/servers.md b/docs/website/content/docs/v0.2/Configuration/servers.md new file mode 100644 index 000000000..addbbede4 --- /dev/null +++ b/docs/website/content/docs/v0.2/Configuration/servers.md @@ -0,0 +1,111 @@ +--- +description: "" +weight: 2 +--- + +# Servers + +Servers are the basic resource of bare metal in the Metal Controller Manager. +These are created by PXE booting the servers and allowing them to send a registration request to the management plane. + +An example server may look like the following: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: Server +metadata: + name: 00000000-0000-0000-0000-d05099d333e0 +spec: + accepted: false + configPatches: + - op: replace + path: /cluster/network/cni + value: + name: custom + urls: + - http://192.168.1.199/assets/cilium.yaml + cpu: + manufacturer: Intel(R) Corporation + version: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz + system: + family: Unknown + manufacturer: Unknown + productName: Unknown + serialNumber: Unknown + skuNumber: Unknown + version: Unknown +``` + +## Installation Disk + +A an installation disk is required by Talos on bare metal. +This can be specified in a `configPatch`: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: Server +... +spec: + accepted: false + configPatches: + - op: replace + path: /machine/install/disk + value: /dev/sda1 +``` + +The install disk patch can also be set on the `ServerClass`: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: ServerClass +... +spec: + configPatches: + - op: replace + path: /machine/install/disk + value: /dev/sda1 +``` + +## Server Acceptance + +In order for a server to be eligible for consideration, it _must_ be `accepted`. +This is an important separation point which all `Server`s must pass. +Before a `Server` is accepted, no write action will be performed against it. +Thus, it is safe for a computer to be added to a network on which Sidero is operating. +Sidero will never write to or wipe any disk on a computer which is not marked as `accepted`. + +This can be tedious for systems in which all attached computers should be considered to be under the control of Sidero. +Thus, you may also choose to automatically accept any machine into Sidero on its discovery. +Please keep in mind that this means that any newly-connected computer **WILL BE WIPED** automatically. +You can enable auto-acceptance by pasing the `--auto-accept-servers=true` flag to `sidero-controller-manager`. + +Once accepted, a server will be reset (all disks wiped) and then made available to Sidero. + +You should never change an accepted `Server` to be _not_ accepted while it is in use. +Because servers which are not accepted will not be modified, if a server which +_was_ accepted is changed to _not_ accepted, the disk will _not_ be wiped upon +its exit. + +## IPMI + +Sidero can use IPMI information to control `Server` power state, reboot servers and set boot order. + +IMPI connection information can be set in the `Server` spec after initial registration: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: Server +... +spec: + bmc: + endpoint: 10.0.0.25 + user: admin + pass: password +``` + +If IPMI information is set, server boot order might be set to boot from disk, then network, Sidero will switch servers +to PXE boot once that is required. + +Without IPMI info, Sidero can still register servers, wipe them and provision clusters, but Sidero won't be able to +reboot servers once they are removed from the cluster. If IPMI info is not set, servers should be configured to boo first from network, +then from disk. diff --git a/docs/website/content/docs/v0.2/Getting Started/architecture.md b/docs/website/content/docs/v0.2/Getting Started/architecture.md new file mode 100644 index 000000000..fa8a0560a --- /dev/null +++ b/docs/website/content/docs/v0.2/Getting Started/architecture.md @@ -0,0 +1,12 @@ +--- +description: "" +weight: 3 +--- + +# Architecture + +The overarching architecture of Sidero centers around a "management plane". +This plane is expected to serve as a single interface upon which administrators can create, scale, upgrade, and delete Kubernetes clusters. +At a high level view, the management plane + created clusters should look something like: + +![Alternative text](./images/dc-view.png) diff --git a/docs/website/content/docs/v0.2/Getting Started/images/dc-view.png b/docs/website/content/docs/v0.2/Getting Started/images/dc-view.png new file mode 100644 index 000000000..6b27997ac Binary files /dev/null and b/docs/website/content/docs/v0.2/Getting Started/images/dc-view.png differ diff --git a/docs/website/content/docs/v0.2/Getting Started/installation.md b/docs/website/content/docs/v0.2/Getting Started/installation.md new file mode 100644 index 000000000..1f5c17c26 --- /dev/null +++ b/docs/website/content/docs/v0.2/Getting Started/installation.md @@ -0,0 +1,14 @@ +--- +description: "" +weight: 2 +--- + +# Installation + +As of Cluster API version 0.3.9, Sidero is included as a default infrastructure provider in clusterctl. + +To install Sidero and the other Talos providers, simply issue: + +```bash +clusterctl init -b talos -c talos -i sidero +``` diff --git a/docs/website/content/docs/v0.2/Getting Started/introduction.md b/docs/website/content/docs/v0.2/Getting Started/introduction.md new file mode 100755 index 000000000..dc90dc46d --- /dev/null +++ b/docs/website/content/docs/v0.2/Getting Started/introduction.md @@ -0,0 +1,32 @@ +--- +description: "" +weight: 1 +--- + +# Introduction + +Sidero ("Iron" in Greek) is a project created by the [Talos Systems](https://www.talos-systems.com/) team. +The goal of this project is to provide lightweight, composable tools that can be used to create bare-metal Talos + Kubernetes clusters. +These tools are built around the Cluster API project. +Sidero is also a subproject of Talos Systems' [Arges](https://github.com/talos-systems/arges) project, which will publish known-good versions of these components (along with others) with each release. + +## Overview + +Sidero is made currently made up of three components: + +- Metal Metadata Server: Provides a Cluster API (CAPI)-aware metadata server +- Metal Controller Manager: Provides custom resources and controllers for managing the lifecycle of metal machines +- Cluster API Provider Sidero (CAPS): A Cluster API infrastructure provider that makes use of the pieces above to spin up Kubernetes clusters + +Sidero also needs these co-requisites in order to be useful: + +- [Cluster API](https://github.com/kubernetes-sigs/cluster-api) +- [Cluster API Control Plane Provider Talos](https://github.com/talos-systems/cluster-api-control-plane-provider-talos) +- [Cluster API Bootstrap Provider Talos](https://github.com/talos-systems/cluster-api-bootstrap-provider-talos) + +All componenets mentioned above can be installed using Cluster API's `clusterctl` tool. + +Because of the design of Cluster API, there is inherently a "chicken and egg" problem with needing an existing Kubernetes cluster in order to provision the management plane. +Talos Systems and the Cluster API community have created tools to help make this transition easier. +That being said, the management plane cluster does not have to be based on Talos. +If you would, however, like to use Talos as the OS of choice for the Sidero management plane, you can find a number of ways to deploy Talos in the [documentation](https://www.talos.dev). diff --git a/docs/website/content/docs/v0.2/Getting Started/resources.md b/docs/website/content/docs/v0.2/Getting Started/resources.md new file mode 100644 index 000000000..f48a98caf --- /dev/null +++ b/docs/website/content/docs/v0.2/Getting Started/resources.md @@ -0,0 +1,119 @@ +--- +description: "" +weight: 4 +--- + +# Resources + +Sidero, the Talos bootstrap/controlplane providers, and Cluster API each provide several custom resources (CRDs) to Kubernetes. +These CRDs are crucial to understanding the connections between each provider and in troubleshooting problems. +It may also help to look at the [cluster template](https://github.com/talos-systems/sidero/blob/master/templates/cluster-template.yaml) to get an idea of the relationships between these. + +--- + +## Cluster API (CAPI) + +It's worth defining the most basic resources that CAPI provides first, as they are related to several subsequent resources below. + +### `Cluster` + +`Cluster` is the highest level CAPI resource. +It allows users to specify things like network layout of the cluster, as well as contains references to the infrastructure and control plane resources that will be used to create the cluster. + +### `Machines` + +`Machine` represents an infrastructure component hosting a Kubernetes node. +Allows for specification of things like Kubernetes version, as well as contains reference to the infrastructure resource that relates to this machine. + +### `MachineDeployments` + +`MachineDeployments` are similar to a `Deployment` and their relationship to `Pods` in Kubernetes primitives. +A `MachineDeployment` allows for specification of a number of Machine replicas with a given specification. + +--- + +## Cluster API Bootstrap Provider Talos (CABPT) + +### `TalosConfigs` + +The `TalosConfig` resource allows a user to specify the type (init, controlplane, join) for a given machine. +The bootstrap provider will then generate a Talos machine configuration for that machine. +This resource also provides the ability to pass a full, pre-generated machine configuration. +Finally, users have the ability to pass `configPatches`, which are applied to edit a generate machine configuration with user-defined settings. +The `TalosConfig` corresponds to the `bootstrap` sections of Machines, `MachineDeployments`, and the `controlPlaneConfig` section of `TalosControlPlanes`. + +### `TalosConfigTemplates` + +`TalosConfigTemplates` are similar to the `TalosConfig` above, but used when specifying a bootstrap reference in a `MachineDeployment`. + +--- + +## Cluster API Control Plane Provider Talos (CACPPT) + +### `TalosControlPlanes` + +The control plane provider presents a single CRD, the `TalosControlPlane`. +This resource is similar to `MachineDeployments`, but is targeted exclusively for the Kubernetes control plane nodes. +The `TalosControlPlane` allows for specification of the number of replicas, version of Kubernetes for the control plane nodes, references to the infrastructure resource to use (`infrastructureTemplate` section), as well as the configuration of the bootstrap data via the `controlPlaneConfig` section. +This resource is referred to by the CAPI Cluster resource via the `controlPlaneRef` section. + +--- + +## Sidero + +### Cluster API Provider Sidero (CAPS) + +#### `MetalClusters` + +A `MetalCluster` is Sidero's view of the cluster resource. +This resource allows users to define the control plane endpoint that corresponds to the Kubernetes API server. +This resource corresponds to the `infrastructureRef` section of Cluster API's `Cluster` resource. + +#### `MetalMachines` + +A `MetalMachine` is Sidero's view of a machine. +Allows for reference of a single server or a server class from which a physical server will be picked to bootstrap. + +#### `MetalMachineTemplates` + +A `MetalMachineTemplate` is similar to a `MetalMachine` above, but serves as a template that is reused for resources like `MachineDeployments` or `TalosControlPlanes` that allocate multiple `Machines` at once. + +#### `ServerBindings` + +`ServerBindings` represent a one-to-one mapping between a Server resource and a `MetalMachine` resource. +A `ServerBinding` is used internally to keep track of servers that are allocated to a Kubernetes cluster and used to make decisions on cleaning and returning servers to a `ServerClass` upon deallocation. + +### Metal Controller Manager + +#### `Environments` + +These define a desired deployment environment for Talos, including things like which kernel to use, kernel args to pass, and the initrd to use. +Sidero allows you to define a default environment, as well as other environments that may be specific to a subset of nodes. +Users can override the environment at the `ServerClass` or `Server` level, if you have requirements for different kernels or kernel parameters. + +See the [Environments](/docs/v0.1/configuration/environments/) section of our Configuration docs for examples and more detail. + +#### `Servers` + +These represent physical machines as resources in the management plane. +These `Servers` are created when the physical machine PXE boots and completes a "discovery" process in which it registers with the management plane and provides SMBIOS information such as the CPU manufacturer and version, and memory information. + +See the [Servers](/docs/v0.1/configuration/servers/) section of our Configuration docs for examples and more detail. + +#### `ServerClasses` + +`ServerClasses` are a grouping of the `Servers` mentioned above, grouped to create classes of servers based on Memory, CPU or other attributes. +These can be used to compose a bank of `Servers` that are eligible for provisioning. + +See the [ServerClasses](/docs/v0.1/configuration/serverclasses/) section of our Configuration docs for examples and more detail. + +### Metal Metadata Server + +While the metadata server does not present unique CRDs within Kubernetes, it's important to understand the metadata resources that are returned to physical servers during the boot process. + +#### Metadata + +The metadata server may be familiar to you if you have used cloud environments previously. +Using Talos machine configurations created by the Talos Cluster API bootstrap provider, along with patches specified by editing `Server`/`ServerClass` resources or `TalosConfig`/`TalosControlPlane` resources, metadata is returned to servers who query the metadata server at boot time. + +See the [Metadata](/docs/v0.1/configuration/metadata/) section of our Configuration docs for examples and more detail. diff --git a/docs/website/content/docs/v0.2/Guides/bootstrapping.md b/docs/website/content/docs/v0.2/Guides/bootstrapping.md new file mode 100644 index 000000000..afef85528 --- /dev/null +++ b/docs/website/content/docs/v0.2/Guides/bootstrapping.md @@ -0,0 +1,339 @@ +--- +description: "A guide for bootstrapping Sidero management plane" +weight: 1 +--- + +# Bootstrapping + +## Introduction + +Imagine a scenario in which you have shown up to a datacenter with only a laptop and your task is to transition a rack of bare metal machines into an HA management plane and multiple Kubernetes clusters created by that management plane. +In this guide, we will go through how to create a bootstrap cluster using a Docker-based Talos cluster, provision the management plane, and pivot over to it. +Guides around post-pivoting setup and subsequent cluster creation should also be found in the "Guides" section of the sidebar. + +Because of the design of Cluster API, there is inherently a "chicken and egg" problem with needing a Kubernetes cluster in order to provision the management plane. +Talos Systems and the Cluster API community have created tools to help make this transition easier. + +## Prerequisites + +First, you need to install the latest `talosctl` by running the following script: + +```bash +curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64 +chmod +x /usr/local/bin/talosctl +``` + +You can read more about Talos and `talosctl` at [talos.dev](https://www.talos.dev/docs/latest). + +Next, there are two big prerequisites involved with bootstrapping Sidero: routing and DHCP setup. + +From the routing side, the laptop from which you are bootstrapping _must_ be accessible by the bare metal machines that we will be booting. +In the datacenter scenario described above, the easiest way to achieve this is probably to hook the laptop onto the server rack's subnet by plugging it into the top-of-rack switch. +This is needed for TFTP, PXE booting, and for the ability to register machines with the bootstrap plane. + +DHCP configuration is needed to tell the metal servers what their "next server" is when PXE booting. +The configuration of this is different for each environment and each DHCP server, thus it's impossible to give an easy guide. +However, here is an example of the configuration for an Ubiquti EdgeRouter that uses vyatta-dhcpd as the DHCP service: + +This block shows the subnet setup, as well as the extra "subnet-parameters" that tell the DHCP server to include the ipxe-metal.conf file. + +```bash +$ show service dhcp-server shared-network-name MetalDHCP + + authoritative enable + subnet 192.168.254.0/24 { + default-router 192.168.254.1 + dns-server 192.168.1.200 + lease 86400 + start 192.168.254.2 { + stop 192.168.254.252 + } + subnet-parameters "include "/etc/dhcp/ipxe-metal.conf";" + } +``` + +Here is the ipxe-metal.conf file. + +```bash +$ cat /etc/dhcp/ipxe-metal.conf + +allow bootp; +allow booting; + +next-server 192.168.1.150; +if exists user-class and option user-class = "iPXE" { + filename "http://192.168.1.150:8081/boot.ipxe"; +} elsif substring (option vendor-class-identifier, 0, 10) = "HTTPClient" { + option vendor-class-identifier "HTTPClient"; + filename "http://192.168.1.150:8081/tftp/ipxe.efi"; +} else { + filename "ipxe.efi"; +} + +host talos-mgmt-0 { + fixed-address 192.168.254.2; + hardware ethernet d0:50:99:d3:33:60; +} +``` + +Notice that it sets a static address for the management node that I'll be booting, in addition to providing the "next server" info. +This "next server" IP address will match references to `PUBLIC_IP` found below in this guide. + +## Create a Local Cluster + +The `talosctl` CLI tool has built-in support for spinning up Talos in docker containers. +Let's use this to our advantage as an easy Kubernetes cluster to start from. + +Set an environment variable called `PUBLIC_IP` which is the "public" IP of your machine. +Note that "public" is a bit of a misnomer. +We're really looking for the IP of your machine, not the IP of the node on the docker bridge (ex: `192.168.1.150`). + +```bash +export PUBLIC_IP="192.168.1.150" +``` + +We can now create our Docker cluster. +Issue the following to create a single-node cluster: + +```bash +talosctl cluster create \ + -p 69:69/udp,8081:8081/tcp,9091:9091/tcp,50100:50100/tcp \ + --workers 0 \ + --endpoint $PUBLIC_IP +``` + +Note that there are several ports mentioned in the command above. +These allow us to access the services that will get deployed on this node. + +Once the cluster create command is complete, issue `talosctl kubeconfig /desired/path` to fetch the kubeconfig for this cluster. +You should then set your `KUBECONFIG` environment variable to the path of this file. + +## Untaint Control Plane + +Because this is a single node cluster, we need to remove the "NoSchedule" taint on the node to make sure non-controlplane components can be scheduled. + +```bash +kubectl taint node talos-default-master-1 node-role.kubernetes.io/master:NoSchedule- +``` + +## Install Sidero + +As of Cluster API version 0.3.9, Sidero is included as a default infrastructure provider in clusterctl. + +To install Sidero and the other Talos providers, simply issue: + +```bash +clusterctl init -b talos -c talos -i sidero +``` + +## Patch Components + +We will now want to ensure that the Sidero services that got created are publicly accessible across our subnet. +This will allow the metal machines to speak to these services later. + +### Patch the Metadata Server + +Update the metadata server component with the following patches: + +```bash +## Update args to use 9091 for port +kubectl patch deploy -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args", "value": ["--port=9091"]}]' + +## Tweak container port to match +kubectl patch deploy -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/ports", "value": [{"containerPort": 9091,"name": "http"}]}]' + +## Use host networking +kubectl patch deploy -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]' +``` + +### Patch the Metal Controller Manager + +```bash +## Update args to specify the api endpoint to use for registration +kubectl patch deploy -n sidero-system sidero-controller-manager --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/1/args", "value": ["--api-endpoint='$PUBLIC_IP'","--metrics-addr=127.0.0.1:8080","--enable-leader-election"]}]' + +## Use host networking +kubectl patch deploy -n sidero-system sidero-controller-manager --type='json' -p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]' +``` + +## Register the Servers + +At this point, any servers on the same network as Sidero should PXE boot using the Sidero PXE service. +To register a server with Sidero, simply turn it on and Sidero will do the rest. +Once the registration is complete, you should see the servers registered with `kubectl get servers`: + +```bash +$ kubectl get servers -o wide +NAME HOSTNAME ACCEPTED ALLOCATED CLEAN +00000000-0000-0000-0000-d05099d33360 192.168.254.2 false false false +``` + +## Accept the Servers + +Note in the output above that the newly registered servers are not `accepted`. +In order for a server to be eligible for consideration, it _must_ be marked as `accepted`. +Before a `Server` is accepted, no write action will be performed against it. +Servers can be accepted by issuing a patch command like: + +```bash +kubectl patch server 00000000-0000-0000-0000-d05099d33360 --type='json' -p='[{"op": "replace", "path": "/spec/accepted", "value": true}]' +``` + +For more information on server acceptance, see the [server docs](/docs/v0.1/configuration/servers). + +## Create the Default Environment + +We must now create an `Environment` in our bootstrap cluster. +An environment is a CRD that tells the PXE component of Sidero what information to return to nodes that request a PXE boot after completing the registration process above. +Things that can be controlled here are kernel flags and the kernel and init images to use. + +To create a default environment that will use the latest published Talos release, issue the following: + +```bash +cat < management-plane.yaml +``` + +Note that there are several variables that should be set in order for the templating to work properly: + +- `CONTROL_PLANE_ENDPOINT`: The endpoint used for the Kubernetes API server (e.g. `https://1.2.3.4:6443`). + This is the equivalent of the `endpoint` you would specify in `talosctl gen config`. + There are a variety of ways to configure a control plane endpoint. + Some common ways for an HA setup are to use DNS, a load balancer, or BGP. + A simpler method is to use the IP of a single node. + This has the disadvantage of being a single point of failure, but it can be a simple way to get running. +- `CONTROL_PLANE_SERVERCLASS`: The server class to use for control plane nodes. +- `WORKER_SERVERCLASS`: The server class to use for worker nodes. +- `KUBERNETES_VERSION`: The version of Kubernetes to deploy (e.g. `v1.19.4`). +- `CONTROL_PLANE_PORT`: The port used for the Kubernetes API server (port 6443) +- `TALOS_VERSION`: This should correspond to the minor version of Talos that you will be deploying (e.g. `v0.9`). + This value is used in determining the fields present in the machine configuration that gets generated for Talos nodes. + +For instance: +```bash +export CONTROL_PLANE_SERVERCLASS=master +export WORKER_SERVERCLASS=worker +export KUBERNETES_VERSION=v1.20.1 +export CONTROL_PLANE_PORT=6443 +export CONTROL_PLANE_ENDPOINT=1.2.3.4 +clusterctl config cluster management-plane -i sidero > management-plane.yaml +``` + +In addition, you can specify the replicas for control-plane & worker nodes in management-plane.yaml manifest for TalosControlPlane and MachineDeployment objects. Also, they can be scaled if needed: + +```bash +kubectl get taloscontrolplane +kubectl get machinedeployment +kubectl scale taloscontrolplane management-plane-cp --replicas=3 +``` + +Now that we have the manifest, we can simply apply it: + +```bash +kubectl apply -f management-plane.yaml +``` + +**NOTE: The templated manifest above is meant to act as a starting point. If customizations are needed to ensure proper setup of your Talos cluster, they should be added before applying.** + +Once the management plane is setup, you can fetch the talosconfig by using the cluster label. +Be sure to update the cluster name and issue the following command: + +```bash +kubectl get talosconfig \ + -l cluster.x-k8s.io/cluster-name= \ + -o yaml -o jsonpath='{.items[0].status.talosConfig}' > management-plane-talosconfig.yaml +``` + +With the talosconfig in hand, the management plane's kubeconfig can be fetched with `talosctl --talosconfig management-plane-talosconfig.yaml kubeconfig` + +## Pivoting + +Once we have the kubeconfig for the management cluster, we now have the ability to pivot the cluster from our bootstrap. +Using clusterctl, issue: + +```bash +clusterctl init --kubeconfig=/path/to/management-plane/kubeconfig -i sidero -b talos -c talos +``` + +Followed by: + +```bash +clusterctl move --to-kubeconfig=/path/to/management-plane/kubeconfig +``` + +Upon completion of this command, we can now tear down our bootstrap cluster with `talosctl cluster destroy` and begin using our management plane as our point of creation for all future clusters! diff --git a/docs/website/content/docs/v0.2/Guides/first-cluster.md b/docs/website/content/docs/v0.2/Guides/first-cluster.md new file mode 100644 index 000000000..cb0169c96 --- /dev/null +++ b/docs/website/content/docs/v0.2/Guides/first-cluster.md @@ -0,0 +1,148 @@ +--- +description: "A guide for creating your first cluster with the Sidero management plane" +weight: 2 +--- + +# Creating Your First Cluster + +## Introduction + +This guide will detail the steps needed to provision your first bare metal Talos cluster after completing the bootstrap and pivot steps detailed in the previous guide. +There will be two main steps in this guide: reconfiguring the Sidero components now that they have been pivoted and the actual cluster creation. + +## Reconfigure Sidero + +### Patch Services + +In this guide, we will convert the metadata service to a NodePort service and the other services to use host networking. +This is also necessary because some protocols like TFTP don't allow for port configuration. +Along with some nodeSelectors and a scale up of the metal controller manager deployment, creating the services this way allows for the creation of DNS names that point to all management plane nodes and provide an HA experience if desired. +It should also be noted, however, that there are many options for acheiving this functionality. +Users can look into projects like MetalLB or KubeRouter with BGP and ECMP if they desire something else. + +Metal Controller Manager: + +```bash +## Use host networking +kubectl patch deploy -n sidero-system sidero-controller-manager --type='json' -p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]' +``` + +Metadata Server: + +```bash +# Convert metadata server service to nodeport +kubectl patch service -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "replace", "path": "/spec/type", "value": "NodePort"}]' + +## Set a known nodeport for metadata server +kubectl patch service -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "replace", "path": "/spec/ports", "value": [{"port": 80, "protocol": "TCP", "targetPort": "http", "nodePort": 30005}]}]' +``` + +#### Update Environment + +The metadata server's information needs to be updated in the default environment. +Edit the environment with `kubectl edit environment default` and update the `talos.config` kernel arg with the IP of one of the management plane nodes (or the DNS entry you created) and the nodeport we specified above (30005). + +### Update DHCP + +The DHCP options configured in the previous guide should now be updated to point to your new management plane IP or to the DNS name if it was created. + +A revised ipxe-metal.conf file looks like: + +```bash +allow bootp; +allow booting; + +next-server 192.168.254.2; +if exists user-class and option user-class = "iPXE" { + filename "http://192.168.254.2:8081/boot.ipxe"; +} else { + if substring (option vendor-class-identifier, 15, 5) = "00000" { + # BIOS + if substring (option vendor-class-identifier, 0, 10) = "HTTPClient" { + option vendor-class-identifier "HTTPClient"; + filename "http://192.168.254.2:8081/tftp/undionly.kpxe"; + } else { + filename "undionly.kpxe"; + } + } else { + # UEFI + if substring (option vendor-class-identifier, 0, 10) = "HTTPClient" { + option vendor-class-identifier "HTTPClient"; + filename "http://192.168.254.2:8081/tftp/ipxe.efi"; + } else { + filename "ipxe.efi"; + } + } +} + +host talos-mgmt-0 { + fixed-address 192.168.254.2; + hardware ethernet d0:50:99:d3:33:60; +} +``` + +## Register the Servers + +At this point, any servers on the same network as Sidero should PXE boot using the Sidero PXE service. +To register a server with Sidero, simply turn it on and Sidero will do the rest. +Once the registration is complete, you should see the servers registered with `kubectl get servers`: + +```bash +$ kubectl get servers -o wide +NAME HOSTNAME ACCEPTED ALLOCATED CLEAN +00000000-0000-0000-0000-d05099d33360 192.168.254.2 false false false +``` + +## Accept the Servers + +Note in the output above that the newly registered servers are not `accepted`. +In order for a server to be eligible for consideration, it _must_ be marked as `accepted`. +Before a `Server` is accepted, no write action will be performed against it. +Servers can be accepted by issuing a patch command like: + +```bash +kubectl patch server 00000000-0000-0000-0000-d05099d33360 --type='json' -p='[{"op": "replace", "path": "/spec/accepted", "value": true}]' +``` + +For more information on server acceptance, see the [server docs](/docs/v0.1/configuration/servers). + +## Create the Cluster + +The cluster creation process should be identical to what was detailed in the previous guide. +Note that, for this example, the same "default" serverclass that we used in the previous guide is used again. +Using clusterctl, we can create a cluster manifest with: + +```bash +clusterctl config cluster workload-cluster -i sidero > workload-cluster.yaml +``` + +Note that there are several variables that should be set in order for the templating to work properly: + +- `CONTROL_PLANE_ENDPOINT`: The endpoint used for the Kubernetes API server (e.g. `https://1.2.3.4:6443`). + This is the equivalent of the `endpoint` you would specify in `talosctl gen config`. + There are a variety of ways to configure a control plane endpoint. + Some common ways for an HA setup are to use DNS, a load balancer, or BGP. + A simpler method is to use the IP of a single node. + This has the disadvantage of being a single point of failure, but it can be a simple way to get running. +- `CONTROL_PLANE_SERVERCLASS`: The server class to use for control plane nodes. +- `WORKER_SERVERCLASS`: The server class to use for worker nodes. +- `KUBERNETES_VERSION`: The version of Kubernetes to deploy (e.g. `v1.19.4`). +- `TALOS_VERSION`: This should correspond to the minor version of Talos that you will be deploying (e.g. `v0.9`). + This value is used in determining the fields present in the machine configuration that gets generated for Talos nodes. + Note that the default is currently `v0.8`. + +Now that we have the manifest, we can simply apply it: + +```bash +kubectl apply -f workload-cluster.yaml +``` + +**NOTE: The templated manifest above is meant to act as a starting point. If customizations are needed to ensure proper setup of your Talos cluster, they should be added before applying.** + +Once the workload cluster is setup, you can fetch the talosconfig with a command like: + +```bash +kubectl get talosconfig -o yaml workload-cluster-cp-xxx -o jsonpath='{.status.talosConfig}' > workload-cluster-talosconfig.yaml +``` + +Then the workload cluster's kubeconfig can be fetched with `talosctl --talosconfig workload-cluster-talosconfig.yaml kubeconfig /desired/path`. diff --git a/docs/website/content/docs/v0.2/Guides/flow.md b/docs/website/content/docs/v0.2/Guides/flow.md new file mode 100644 index 000000000..ba5024f90 --- /dev/null +++ b/docs/website/content/docs/v0.2/Guides/flow.md @@ -0,0 +1,82 @@ +--- +description: "Diagrams for various flows in Sidero." +weight: 4 +--- + +## Provisioning Flow + +```mermaid +graph TD; + Start(Start); + End(End); + + %% Decisions + + IsOn{Is server is powered on?}; + IsRegistered{Is server is registered?}; + IsAccepted{Is server is accepted?}; + IsClean{Is server is clean?}; + IsAllocated{Is server is allocated?}; + + %% Actions + + DoPowerOn[Power server on]; + DoPowerOff[Power server off]; + DoBootAgentEnvironment[Boot agent]; + DoBootEnvironment[Boot environment]; + DoRegister[Register server]; + DoWipe[Wipe server]; + + %% Chart + + Start-->IsOn; + IsOn--Yes-->End; + IsOn--No-->DoPowerOn; + + DoPowerOn--->IsRegistered; + + IsRegistered--Yes--->IsAccepted; + IsRegistered--No--->DoBootAgentEnvironment-->DoRegister; + + DoRegister-->IsRegistered; + + IsAccepted--Yes--->IsAllocated; + IsAccepted--No--->End; + + IsAllocated--Yes--->DoBootEnvironment; + IsAllocated--No--->IsClean; + IsClean--No--->DoWipe-->DoPowerOff; + + IsClean--Yes--->DoPowerOff; + + DoBootEnvironment-->End; + + DoPowerOff-->End; +``` + +## Installation Flow + +```mermaid +graph TD; + Start(Start); + End(End); + + %% Decisions + + IsInstalled{Is installed}; + + %% Actions + + DoInstall[Install]; + DoReboot[Reboot]; + + %% Chart + + Start-->IsInstalled; + IsInstalled--Yes-->End; + IsInstalled--No-->DoInstall; + + DoInstall-->DoReboot; + + DoReboot-->IsInstalled; +``` diff --git a/docs/website/content/docs/v0.2/Guides/patching.md b/docs/website/content/docs/v0.2/Guides/patching.md new file mode 100644 index 000000000..d2efce468 --- /dev/null +++ b/docs/website/content/docs/v0.2/Guides/patching.md @@ -0,0 +1,58 @@ +--- +description: "A guide describing patching" +weight: 3 +--- + +# Patching + +Server resources can be updated by using the `configPatches` section of the custom resource. +Any field of the [Talos machine config](https://www.talos.dev/docs/v0.8/reference/configuration/) +can be overridden on a per-machine basis using this method. +The format of these patches is based on [JSON 6902](http://jsonpatch.com/) that you may be used to in tools like kustomize. + +Any patches specified in the server resource are processed by the Metal Metadata Server before it returns a Talos machine config for a given server at boot time. + +A set of patches may look like this: + +```yaml +apiVersion: metal.sidero.dev/v1alpha1 +kind: Server +metadata: + name: 00000000-0000-0000-0000-d05099d33360 +spec: + configPatches: + - op: replace + path: /machine/install + value: + disk: /dev/sda + - op: replace + path: /cluster/network/cni + value: + name: "custom" + urls: + - "http://192.168.1.199/assets/cilium.yaml" +``` + +## Testing Configuration Patches + +While developing config patches it is usually convenient to test generated config with patches +before actual server is provisioned with the config. + +This can be achieved by querying the metadata server endpoint directly: + +```sh +$ curl http://$PUBLIC_IP:9091/configdata?uuid=$SERVER_UUID +version: v1alpha1 +... +``` + +Replace `$PUBLIC_IP` with the Sidero IP address and `$SERVER_UUID` with the name of the `Server` to test +against. + +If metadata endpoint returns an error on applying JSON patches, make sure config subtree being patched +exists in the config. If it doesn't exist, create it with the `op: add` above the `op: replace` patch. + +## Combining Patches from Multiple Sources + +Config patches might be combined from multiple sources (`Server`, `ServerClass`), which is explained in details +in [Metadata](../../configuration/metadata/) section. diff --git a/docs/website/content/docs/v0.2/index.md b/docs/website/content/docs/v0.2/index.md new file mode 100644 index 000000000..bbe463500 --- /dev/null +++ b/docs/website/content/docs/v0.2/index.md @@ -0,0 +1,20 @@ +# Documentation + +Welcome to the Sidero documentation. + +## Community + +- Slack: Join our [slack channel](https://slack.dev.talos-systems.io) +- Forum: [community](https://groups.google.com/a/talos-systems.com/forum/#!forum/community) +- Twitter: [@talossystems](https://twitter.com/talossystems) +- Email: [info@talos-systems.com](mailto:info@talos-systems.com) + +If you're interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you! +We hold a weekly meeting that all audiences are welcome to attend. + +### Office Hours + +- When: Mondays at 17:00 UTC. +- Where: [Google Meet](https://meet.google.com/day-pxhv-zky). + +You can subscribe to this meeting by joining the community forum above. diff --git a/docs/website/gridsome.config.js b/docs/website/gridsome.config.js index f1e19dadf..4d75fa641 100644 --- a/docs/website/gridsome.config.js +++ b/docs/website/gridsome.config.js @@ -20,7 +20,7 @@ module.exports = { github: "https://github.com/talos-systems/sidero", nav: { links: [ - { path: "/docs/v0.1/", title: "Docs" }, + { path: "/docs/v0.2/", title: "Docs" }, { path: "/releases/", title: "Releases" }, ], }, @@ -28,6 +28,12 @@ module.exports = { { version: "v0.1", url: "/docs/v0.1/", + latest: false, + prerelease: false, + }, + { + version: "v0.2", + url: "/docs/v0.2/", latest: true, prerelease: false, }, @@ -52,6 +58,7 @@ module.exports = { pathPrefix: "/docs", sidebarOrder: { "v0.1": ["Getting Started", "Configuration", "Guides"], + "v0.2": ["Getting Started", "Configuration", "Guides"], }, remark: { externalLinksTarget: "_blank",