# Setup Kubernetes environment for DYNAMOS in FABRIC 

This Jupyter notebook will create the Kubernetes environment in FABRIC after the slice and corresponding nodes have been created.

FABRIC API docs: https://fabric-fablib.readthedocs.io/en/latest/index.html


## Step 1: Configure the Environment (has to be done once in the Jupyter Hub environment) & Create Slice

Before running this notebook, you will need to configure your environment using the [Configure Environment](../configure_and_validate.ipynb) notebook. Please stop here, open and run that notebook, then return to this notebook. Note: this has to be done only once in the Jupyter Hub environment (unless configuration is removed/deleted of course).

If you are using the FABRIC JupyterHub many of the environment variables will be automatically configured for you.  You will still need to set your bastion username, upload your bastion private key, and set the path to where you put your bastion private key. Your bastion username and private key should already be in your possession.  

After following all steps of the Configuring Environment notebook, you should be able to run this notebook without additional steps.

Next, you will need to have setup the slice in FABRIC using the [Create Slice](../create_slice.ipynb) notebook.

More information about accessing your experiments through the FABRIC bastion hosts can be found [here](https://learn.fabric-testbed.net/knowledge-base/logging-into-fabric-vms/).
 

## Step 2: Setup the Environment for this Notebook

### Step 2.1: Import FABRIC API and other libraries

In [2]:
import json
import traceback
import re

from fabrictestbed_extensions.fablib.fablib import FablibManager as fablib_manager

fablib = fablib_manager()

fablib.show_config();


ConfigException: Token file does not exist, please provide the token at location: /home/maurits/.tokens.json!

### Step 2.2: Configure the parameters and variables
Can be used to set the corresponding slice and other variables used for subsequent cells.

In [None]:
slice_name = 'DYNAMOS_EnergyEfficiency'
# Nodes:
node1_name = 'k8s-control-plane'
node2_name = 'dynamos-core'
node3_name = 'vu'
node4_name = 'uva'
node5_name = 'surf'
# Network:
network_name = 'NET1'

### Step 2.3: Set Network IPs for nodes
This step sets the network and IPs of the nodes for later usage.

In [None]:
# Get slice and slice components
# Get slice by name: https://fabric-fablib.readthedocs.io/en/latest/fablib.html#fabrictestbed_extensions.fablib.fablib.FablibManager.get_slice
slice = fablib.get_slice(name=slice_name)
# Get the network
network = slice.get_network(name=network_name)
# Get nodes
nodes = slice.get_nodes()

# Define function to get the ips for each node
def get_ip(node):
    interface = node.get_interface(network_name=network_name)
    return interface.get_ip_addr()
# Define function to get interface of the node
def get_interface(node):
    interface = node.get_interface(network_name=network_name)
    return interface.get_device_name()

# Create a dictionary to store node names and their IPs and interfaces
node_ips = {}
node_interfaces = {}

# Populate the dictionary with node names and their corresponding IPs
for node in nodes:
    node_name = node.get_name()
    # Get IPs for each node from the network and set as variables for later usage
    node_ips[node_name] = get_ip(node)
    # Do the same for interface
    # Get IPs for each node from the network and set as variables for later usage
    node_interfaces[node_name] = get_interface(node)

# Print the IPs and interfaces for each node
for node_name, ip in node_ips.items():
    print(f"{node_name} IP: {ip}")
for node_name, interface in node_interfaces.items():
    print(f"{node_name} interface device for Kubernetes network plugin (should be the same for each node): {interface}")

### Step 2.4: Display required information

#### Get SSH Commands
This step prints the SSH commands that can be used with the guide below to SSH into the VMs.

In [None]:
# Print the necessary information
try:
    # Get slice nodes
    for node in slice.get_nodes():
        print(f"Node: {node.get_name()}")
        # Get the original SSH command
        original_ssh_command = node.get_ssh_command()
        # Print SSH commands to get into the nodes
        print(f"  SSH Command from FABRIC: {original_ssh_command}")
        # Replace the file paths in the SSH command
        updated_ssh_command = original_ssh_command.replace(
            "/home/fabric/work/fabric_config/slice_key", "~/.ssh/slice_key"
        ).replace(
            "/home/fabric/work/fabric_config/ssh_config", "ssh_config"
        )
        # Print the updated SSH command
        print(f"  SSH Command locally (ensuring it is saved according to below steps): {updated_ssh_command}")
    
except Exception as e:
    print(f"Fail: {e}")
    traceback.print_exc()

#### Run the SSH Commands
To run the SSH Commands follow these steps (needs to be done once only, the slice_key and fabric_bastion_key can be reused among slices):
1. From the Jupyter Notebook Hub from FABRIC, download the /fabric_config/fabric_bastion_key, /fabric_config/slice_key and /fabric_config/ssh_config files
2. Add the ssh_config file to this project under /fabric/fabric_config, and change the /fabric_config/ssh_config "IdentityFile" entry to "~/.ssh/fabric_bastion_key", this is the new path to the bastion key of FABRIC from where you will be running the file.

3. Execute these steps to save the SSH files savely on your local machine and avoid problems
```sh
# Open a Linux terminal, such as WSL after opening a CMD in Windows:
wsl
# Navigate to the Downloads directory
cd Downloads
# Create a directory called ssh to store the files
mkdir -p ~/.ssh

# Copy the key files to the SSH directory
cp slice_key ~/.ssh/slice_key
cp fabric_bastion_key ~/.ssh/fabric_bastion_key
# Update permissions
chmod 600 ~/.ssh/slice_key
chmod 600 ~/.ssh/fabric_bastion_key
# Navigate to the SSH directory to verify the files
cd ~/.ssh
# List files including permissions (-l)
ls -l

# Navigate to the fabric_config folder of this project, such as:
cd /mnt/c/Users/cpoet/VSC_Projs/EnergyEfficiency_DYNAMOS/fabric/fabric_config
# Then run the command from the previous step, such as:
ssh -i ~/.ssh/slice_key -F ssh_config ubuntu@2001:610:2d0:fabc:f816:3eff:fe65:a464
# To exit SSH access, type "exit" and press Enter
```
4. Now you can SSH into the nodes using the printed commands.

## Step 4: Configure Kubernetes Cluster with Kubeadm

This step configures the Kubernetes cluster.

### Step 4.1: Configure Cluster with Kubeadm
This step configures the kubernetes cluster with Kubeadm.

Note: it may take some time for every node to be ready in the cluster, so keep running "kubectl get nodes -o wide" until all nodes are ready. But make sure to have all the nodes ready and see the describe commands, such as "kubectl describe node dynamos-core" (or use k9s by pressing d on the pod) and look at the events to avoid a problem later, such as the infinite schedule and complete for pod deployments that should not do that (see fabric/dynamos/Troubleshooting.md for that problem that occurred).

In [None]:
try:
    # ========== Step 1: Configure Nodes with Kubeadm ==========
    print(f"========== Step 1: Configure Nodes with Kubeadm ==========")
    for node in slice.get_nodes():
        print(f"Configuring {node.get_name()}...")
        # Upload script file to the node
        file_attributes = node.upload_file(local_file_path="config_k8s_node.sh", remote_file_path="config_k8s_node.sh")
        # Add necessary permissions and execute the script
        stdout, stderr = node.execute(f"chmod +x config_k8s_node.sh && ./config_k8s_node.sh")
    
    # ========== Step 2: Start nodes with Kubeadm ==========
    print(f"========== Step 2: Start nodes with Kubeadm ==========")
    for node in slice.get_nodes():
        temp_node_name = node.get_name()
        print(f"Starting Kubernetes node {temp_node_name}...")

        # Check if the node is the control plane (node1)
        if temp_node_name == node1_name:
            # Upload the Flannel network addon file
            print(f"Uploading kube-flannel.yml on {temp_node_name}...")
            file_attributes = node.upload_file(local_file_path="kube-flannel.yml", remote_file_path="kube-flannel.yml")
            # Upload and execute the control plane script
            print(f"Uploading and running start_control_plane.sh on {temp_node_name}...")
            file_attributes = node.upload_file(local_file_path="start_control_plane.sh", remote_file_path="start_control_plane.sh")
            # Execute start control plane
            stdout, stderr = node.execute(
                f"chmod +x start_control_plane.sh && ./start_control_plane.sh {node_ips.get(temp_node_name)} {node_interfaces.get(temp_node_name)} {temp_node_name}"
            )

            # Extract join command from marked block
            print("Extracting join command and credentials...")
            try:
                join_cmd_block = stdout.split("=====BEGIN_JOIN_COMMAND=====")[1].split("=====END_JOIN_COMMAND=====")[0].strip()
            except IndexError:
                raise Exception("Join command block not found in control plane output.")

            # Parse token and CA cert hash
            match = re.search(
                r'--token\s+(\S+).*?--discovery-token-ca-cert-hash\s+(\S+)', join_cmd_block, re.DOTALL
            )
            if not match:
                raise Exception("Failed to extract token and CA cert hash from join command.")

            token, ca_cert_hash = match.groups()
            print(f"Token: {token}")
            print(f"CA Cert Hash: {ca_cert_hash}")
        # Otherwise, use the worker node script
        else:
            # Upload and execute the worker script
            print(f"Uploading and running start_worker.sh on {temp_node_name}...")
            file_attributes = node.upload_file(local_file_path="start_worker.sh", remote_file_path="start_worker.sh")
            stdout, stderr = node.execute(
                f"chmod +x start_worker.sh && ./start_worker.sh {node_ips.get(temp_node_name)} {node_interfaces.get(temp_node_name)} {node_ips.get(node1_name)} {token} {ca_cert_hash} {temp_node_name}"
            )

    # ========== Step 3: Post install for control plane node ==========
    print(f"========== Step 3: Post install for control plane node ==========")
    # Get the control plane node (node1)
    node1 = slice.get_node(name=node1_name)
    # Upload script file to the node
    file_attributes = node1.upload_file(local_file_path="post_install_control_plane.sh", remote_file_path="post_install_control_plane.sh")
    # Add necessary permissions and execute the script
    stdout, stderr = node1.execute(f"chmod +x post_install_control_plane.sh && ./post_install_control_plane.sh")

    # # ========== Debug: execute something only for one node for example ==========
    # # print(f"========== Step Debug: Debugging and testing specific things ==========")
    # # Get the node
    # node_test = slice.get_node(name=node1_name)
    # node_name_test = node_test.get_name()
    # # Config node
    # file_attributes = node_test.upload_file(local_file_path="config_k8s_node.sh", remote_file_path="config_k8s_node.sh")
    # stdout, stderr = node_test.execute(f"chmod +x config_k8s_node.sh && ./config_k8s_node.sh")
    # # Start control plane node
    # file_attributes = node_test.upload_file(local_file_path="kube-flannel.yml", remote_file_path="kube-flannel.yml")
    # file_attributes = node_test.upload_file(local_file_path="start_control_plane.sh", remote_file_path="start_control_plane.sh")
    # stdout, stderr = node_test.execute(f"chmod +x start_control_plane.sh && ./start_control_plane.sh {network.get_subnet()} {node_ips.get(node1_name)} {node_interfaces.get(node1_name)} {temp_node_name}")
    # # Post install for dependencies
    # file_attributes = node_test.upload_file(local_file_path="post_install_control_plane.sh", remote_file_path="post_install_control_plane.sh")
    # stdout, stderr = node_test.execute(f"chmod +x post_install_control_plane.sh && ./post_install_control_plane.sh")

except Exception as e:
    print(f"Exception: {e}")
    traceback.print_exc()

### Step 4.2: Post-install checks and configurations
After doing the above step, you can check the configuration by executing these steps. After the script execution: make sure to run "source ~/.bashrc" in any SSH session you have open to reload the PATH variables to be able to use the installations.

Now you can move on to the next step to work with DYNAMOS in Kubernetes. Also, you can use the different Kubernetes tools similar to how you used to do locally now, such as k9s and etcd, etc., but now from the control plane node (in this case node1) by using SSH to log into the VM.

For example, test k9s by running "k9s" in the SSH connection to node1 and press "0" to see all namespaces to open k9s and see the different pods in all namespaces.

If problems occur, you can uninstall brew with this command in an SSH into the node and try again for example:
```sh
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/uninstall.sh)"
```
For example, I got this error once:
```sh
ubuntu@Node1:~/kubespray$ brew install derailed/k9s/k9s
==> Fetching derailed/k9s/k9s
==> Downloading https://github.com/derailed/k9s/releases/download/v0.40.10/k9s_Linux_amd64.tar.gz
Already downloaded: /home/ubuntu/.cache/Homebrew/downloads/79755f2b953f2b69637da790d4716219532b891325b5195297379a592b50e86d--k9s_Linux_amd64.tar.gz
==> Installing k9s from derailed/k9s
Error: The following formula cannot be installed from bottle and must be
built from source.
  k9s
Install Clang or run `brew install gcc`.

# This was fixed by ensuring this was done:
sudo apt-get install build-essential
brew install gcc
# Then afterwards it worked, specifically the build-essential was not installed at the time.
# It was due to the missing -y to prompt yes, causing it to Abort in the script.
``` 