# Deploying a Simple Kubernetes Cluster on FABRIC

This notebook demonstrates how to deploy a basic Kubernetes (K8s) cluster on the [FABRIC](https://fabric-testbed.net) testbed using the FABlib API.  
The cluster is built from scratch by provisioning virtual machines (VMs), configuring dataplane networking, installing Kubernetes components, and setting up the Flannel CNI (Container Network Interface) plugin.

The main steps include:

- Verifying connectivity to all sites
- Creating a slice and adding controller and worker nodes
- Connecting nodes over a Layer 3 FabNetv4 network (easily extendable to Layer 2)
- Uploading configuration and helper scripts to all nodes
- Installing K8s prerequisites and components (`kubeadm`, `kubelet`, `kubectl`)
- Bootstrapping the controller node and configuring it to use the dataplane network
- Deploying the Flannel CNI and patching it for compatibility with IPv6 management networks
- Joining worker nodes to the cluster
- Deploying a test application (nginx) and validating the cluster

This setup enables users to launch and manage a minimal, functional K8s cluster suitable for experimentation and learning on FABRIC.

## FABRIC Library Initialization

We begin by importing and configuring the FABRIC library to manage slice and node provisioning within the testbed.

In [None]:
from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager
fablib = fablib_manager()
fablib.show_config();

## Create Slice and Nodes

This section provisions a new FABRIC slice that includes:

- A **controller node**
- One or more **worker nodes**

These nodes are distributed across the desired FABRIC sites. During provisioning, we specify:
- The **disk size**
- The **OS image**
- The **network configuration**

By default, all nodes are connected to the **Layer 3 FABNetv4 network**, enabling routed connectivity across sites.

> **Note:** This setup can be easily extended to use a **Layer 2 network** by modifying the network attachment configuration during node creation.


In [None]:
slice_name = 'MySlice-k8s'
[site1, site2] = fablib.get_random_sites(count=2)
print(f"Sites: {site1} {site2}")
ctrlr_name  = "ctrlr"
node1_name = 'node1'
node2_name = 'node2'
image = 'default_ubuntu_22'
ctrlr_nic_name = 'NIC1'
node1_nic_name = 'NIC1'
node2_nic_name = 'NIC1'
net1_name = "net1"
net2_name = "net2"
net3_name = "net3"

In [None]:
try:
    #Create Slice
    slice = fablib.new_slice(slice_name)

    # Networks
    net1 = slice.add_l3network(name=net1_name, type='IPv4')
    net2 = slice.add_l3network(name=net2_name, type='IPv4')

    # Add node
    ctrlr = slice.add_node(name=ctrlr_name, site=site1, image=image)
    iface1 = ctrlr.add_component(model='NIC_Basic', name=ctrlr_nic_name).get_interfaces()[0]
    iface1.set_mode('auto')
    net1.add_interface(iface1)
    ctrlr.add_route(subnet=fablib.FABNETV4_SUBNET, next_hop=net1.get_gateway())

    # Add node
    node1 = slice.add_node(name=node1_name, site=site2, image=image)
    iface2 = node1.add_component(model='NIC_Basic', name=node1_nic_name).get_interfaces()[0]
    iface2.set_mode('auto')
    net2.add_interface(iface2)
    node1.add_route(subnet=fablib.FABNETV4_SUBNET, next_hop=net2.get_gateway())
    
    # Add node
    node2 = slice.add_node(name=node2_name, site=site2, image=image)
    iface3 = node2.add_component(model='NIC_Basic', name=node2_nic_name).get_interfaces()[0]
    iface3.set_mode('auto')
    net2.add_interface(iface3)
    node2.add_route(subnet=fablib.FABNETV4_SUBNET, next_hop=net2.get_gateway())
    
    #Submit Slice Request
    slice_id = slice.submit()
    
except Exception as e:
    print(f"{e}")

## Verify Connectivity

Before proceeding, verify that all the nodes in the slice are reachable via SSH. This ensures that the provisioning and network configuration were successful.

Typical checks include:
- Verifying that each node responds to basic `hostname` or `uptime` commands
- Ensuring network connectivity between nodes (e.g., via `ping`)


In [None]:
slice = fablib.get_slice(slice_name)
ctrlr = slice.get_node(name=ctrlr_name)        
node1 = slice.get_node(name=node1_name)        
node2 = slice.get_node(name=node2_name)           

ctrlr_iface = ctrlr.get_interface(network_name=net1_name)
ctrlr_addr = ctrlr.get_interface(network_name=net1_name).get_ip_addr()
node1_addr = node1.get_interface(network_name=net2_name).get_ip_addr()
node2_addr = node2.get_interface(network_name=net2_name).get_ip_addr()

stdout, stderr = ctrlr.execute(f'ping -c 5 {node1_addr}')
stdout, stderr = ctrlr.execute(f'ping -c 5 {node2_addr}')
stdout, stderr = node1.execute(f'ping -c 5 {node2_addr}')


node_ips = {
    node1.get_name() : node1_addr,
    node2.get_name() : node2_addr
}

## Configure SSH Key-Based Access

Generate SSH key pairs for both the `ubuntu` and `root` users.  
Distribute the corresponding public keys to all nodes by appending them to the appropriate `authorized_keys` files, enabling passwordless SSH access between nodes.


In [None]:
for n in slice.get_nodes():
    n.execute('ssh-keygen -t rsa -N "" -f /home/ubuntu/.ssh/id_rsa', quiet=True)
    n.execute('sudo ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa', quiet=True)

In [None]:
keys = {}
# Step 1: Collect public keys from each node
for n in slice.get_nodes():
    ubuntu_key, _ = n.execute("cat /home/ubuntu/.ssh/id_rsa.pub", quiet=True)
    root_key, _ = n.execute("sudo cat /root/.ssh/id_rsa.pub", quiet=True)
    keys[n.get_name()] = {
        "ubuntu": ubuntu_key.strip(),
        "root": root_key.strip()
    }

# Step 2: Distribute public keys to all other nodes
for n in slice.get_nodes():
    for other_node_name, node_keys in keys.items():
        if other_node_name == n.get_name():
            continue
        n.execute(f'echo "{node_keys["ubuntu"]}" >> /home/ubuntu/.ssh/authorized_keys')
        n.execute(f'sudo sh -c \'echo "{node_keys["root"]}" >> /root/.ssh/authorized_keys\'')


## Setup `/etc/hosts`

To enable seamless hostname-based communication over the **data plane network** (FABNetv4 in this example), this section configures the `/etc/hosts` file on each node.

Each node’s `/etc/hosts` entry includes:
- The **data plane IP address** of every other node
- The **corresponding hostname** (e.g., `ctrlr`, `node1`, etc.)

This setup allows nodes to resolve and connect to each other using hostnames over the FABNetv4 network.

> This is particularly useful when deploying Kubernetes or other distributed services where consistent host resolution is needed.


In [None]:
for n in slice.get_nodes():
    for node_name, ip in node_ips.items():
        if n.get_name() == node_name:
            n.execute(f'sudo sh -c \'echo "{ctrlr_addr} {ctrlr.get_name()}" >> /etc/hosts\'')
            continue
        n.execute(f'sudo sh -c \'echo "{ip} {node_name}" >> /etc/hosts\'')

## Upload Scripts

This step uploads the necessary helper scripts to all nodes in the slice (both controller and workers).

These scripts typically include:
- Kubernetes installation and initialization scripts
- Flannel CNI configuration and patch scripts
- Utility or setup helpers (e.g., interface detection, health checks)

> Automating script distribution ensures consistency across nodes and simplifies the deployment process.


In [None]:
for n in slice.get_nodes():
    n.upload_directory(local_directory_path="./node_tools", remote_directory_path=".")

## Install K8s Pre-requisites

This step installs the required software components necessary for setting up Kubernetes on each node, such as:

- `containerd`
- Additional utilities (e.g., `curl`, `apt-transport-https`)

After installation, the nodes are rebooted to ensure changes take effect cleanly. 

In [None]:
for n in slice.get_nodes():
    n.execute("sudo ./node_tools/k8s_pre_install.sh", quiet=True, output_file=f"logs/{n.get_name()}_pre_install.log")

## Wait for the Servers to be Back Up and Re-apply Network Configuration

After the reboot, the block waits for each node to become reachable via SSH.

Once the nodes are accessible, the network configuration (e.g., default routes, DNS settings) is re-applied to ensure proper connectivity on the data plane (FabNetv4 in this example).

In [None]:
slice.wait_ssh()
for n in slice.get_nodes():
    n.config()

## Install K8s Components

Install the core Kubernetes components on each node:

- `kubelet`: The primary "node agent" that runs on each node.
- `kubeadm`: A tool to bootstrap the cluster.
- `kubectl`: The command-line tool for interacting with the Kubernetes API.

These components are installed in preparation for initializing the control plane and joining worker nodes.

In [None]:
for n in slice.get_nodes():
    n.execute("sudo ./node_tools/k8s_install.sh", quiet=True, output_file=f"logs/{n.get_name()}_install.log")

## Setup the Controller

This step initializes the Kubernetes control plane on the designated controller node using `kubeadm init`.

Key configuration options include:

- `--control-plane-endpoint`: The advertised IP address for the control plane.
- `--apiserver-advertise-address`: The IP address the API server binds to.
- `--pod-network-cidr`: The CIDR for the pod network.

After initialization, the Kubernetes admin config (`admin.conf`) is copied to the user's home directory to enable use of `kubectl`.

### Kubelet Config

Configure the kubelet to prefer the dataplane network (FabNetv4 in this example) for node IP assignment instead of the default management network.

This is done by setting the `--node-ip` flag in the kubelet systemd configuration to the dataplane IP address. This ensures that all Kubernetes components use the correct interface for intra-cluster communication.

In [None]:
stdout, stderr = ctrlr.execute(f"sudo node_tools/k8s_kubelet.sh {ctrlr_addr}", output_file=f"logs/{ctrlr.get_name()}_kubelet.log")

In [None]:
stdout, stderr = ctrlr.execute(f"node_tools/k8s_init_control_plane.sh {ctrlr_addr}", output_file=f"logs/{ctrlr.get_name()}_init.log")

### Check Cluster State

At this stage, the control plane should be up and running. Use `kubectl get nodes -o wide` and `kubectl get pods -A` to verify the state of the cluster.

**Note:** It's expected that the DNS pods (e.g., CoreDNS) may show a `CrashLoopBackOff` or `Pending` status until the network plugin (e.g., Flannel or Calico) is deployed and functional.

In [None]:
stdout, stderr = ctrlr.execute("node_tools/k8s_status.sh")

### Deploy Network Plugin

Deploy the Flannel CNI plugin to enable pod networking:

```
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
```
In environments where the management network uses IPv6, Flannel may fail to reach the Kubernetes API server via the default route.
To address this, patch the Flannel DaemonSet to:

Explicitly specify the data plane interface (e.g., --iface-can-reach=<API_SERVER_IP> or --iface-can-reach=<API_SERVER_IP>)

Set the following environment variables:

KUBERNETES_SERVICE_HOST: Set to the IPv4 API server address

KUBERNETES_SERVICE_PORT: Set to 6443

This ensures that Flannel uses the correct interface and successfully connects to the Kubernetes control plane.

In [None]:
stdout, stderr = ctrlr.execute(f"node_tools/k8s_flannel.sh {ctrlr_addr}")

### Check Cluster State

After the network plugin (Flannel) has been deployed and patched, verify the health of the cluster.

Run the following commands:

```
kubectl get nodes -o wide
kubectl get pods -A -o wide
```

All system pods, including CoreDNS, should now be in a Running state.
The cluster is considered healthy once all components are operational and no pods are stuck in Pending, CrashLoopBackOff, or Error states.

In [None]:
stdout, stderr = ctrlr.execute("node_tools/k8s_status.sh")

## Add Nodes

To join additional worker nodes to the Kubernetes cluster:

1. **Retrieve the Join Command from the Controller Node:**

   On the controller, run:

   ```
   kubeadm token create --print-join-command
   ```
   This will output a command like:
   ```
   kubeadm join <controller-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>
   ```
---
2. **Run the Join Command on Each Worker Node:**
   ```
   sudo kubeadm join 10.128.3.2:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   ```
---
3. **Verify Node Addition:**

In [None]:
join_command, stderr = ctrlr.execute("kubeadm token create --print-join-command")

In [None]:
for n in slice.get_nodes():
    if n.get_name() == ctrlr.get_name():
        continue
    n.execute(f"sudo {join_command}")

## Check Cluster State

With all nodes added, verify that the cluster is fully operational:

1. **Check Node Readiness:**

   ```bash
   kubectl get nodes -o wide
   ```
   All nodes should show STATUS = Ready. It may take a few minutes after joining for nodes to transition to the ready state.
---
2. **Check Pod Health:**
   ```
   kubectl get pods -A -o wide
   ```
   All system and network plugin pods should be in the `Running` state.

A healthy cluster with multiple nodes in `Ready` state confirms successful setup and node integration.

In [None]:
stdout, stderr = ctrlr.execute("node_tools/k8s_status.sh")

## Deploy an Application

Deploy a simple `nginx` web application to validate the Kubernetes cluster setup.

### 1. Create the Deployment

```
kubectl apply -f node_tools/pod.yml
````

### 2. Expose the Deployment via NodePort

```
kubectl apply -f node_tools/service_nodeport.yml
```

This creates a service that maps a random port on each node to port 80 of the `nginx` pod.

### 3. Wait for the Service and Pod to be Ready

Check the service and pod status:

```
node_tools/k8s_status.sh
```

Note:

* The **NodePort** assigned (e.g., `300080`)
* The **node IP** where the `nginx` pod is running


In [None]:
stdout, stderr = ctrlr.execute("kubectl apply -f node_tools/pod.yml")

In [None]:
stdout, stderr = ctrlr.execute("kubectl apply -f node_tools/service_nodeport.yml")

In [None]:
stdout, stderr = ctrlr.execute("node_tools/k8s_status.sh")

## Start the SSH Tunnel

- Create SSH Tunnel Configuration `fabric_ssh_tunnel_tools.zip`
- Download your custom `fabric_ssh_tunnel_tools.zip` tarball from the `fabric_config` folder.  
- Untar the tarball and put the resulting folder (`fabric_ssh_tunnel_tools`) somewhere you can access it from the command line.
- Open a terminal window. (Windows: use `powershell`) 
- Use `cd` to navigate to the `fabric_ssh_tunnel_tools` folder.
- In your terminal, run the command that results from running the following cell (leave the terminal window open).

In [None]:
fablib.create_ssh_tunnel_config(overwrite=True)

In [None]:
import os
# Port on your local machine that you want to map the File Browser to.
local_port='30080'
# Local interface to map the File Browser to (can be `localhost`)
local_host='127.0.0.1'

# Port on the node used by the File Browser Service
target_port='30080'

# Username/node on FABRIC
target_host=f'{ctrlr.get_username()}@{ctrlr.get_management_ip()}'

print(f'ssh  -L {local_host}:{local_port}:127.0.0.1:{target_port} -i {os.path.basename(fablib.get_default_slice_public_key_file())[:-4]} -F ssh_config {target_host}')

## Connect to the Nginx Server

The Nginx service running on K8s cluster is now mapped to 127.0.0.1:30080 on your local machine. You can open a browser and navigate to the following address (or just click the link): 

[http://127.0.0.1:30080](http://127.0.0.1:30080)

## Delete Slice

Please delete your slicd when you are done with your experiment.

In [None]:
#slice = fablib.get_slice(slice_name)
#slice.delete()