This project provides the Exclusive Cache Allocation Technology (ExCAT) in the context of Ansible. The ExCAT feature is used by integrating the component into an existing Ansible project's playbook (see Section Integration into existing project). Everything that's needed to do so is contained within the excat
directory.
For demonstration purposes, the example
directory contains a minimal example project that integrates ExCAT and that is described in Section Example project.
ExCAT for Ansible runs a given workload on one of the provided hosts and assigns an exclusive cache buffer to the workload. The cache is assigned based on the Cache Allocation Technology (CAT) that is part of the Intel® Resource Director Technology (RDT) feature set. In this project we're using the Linux RDT kernel driver that enables a pseudo-file system representing the RDT features at /sys/fs/resctrl
.
For this to work, the CPU has to support CAT and the feature has to be enabled based on the kernel config CONFIG_X86_CPU_RESCTRL
. This can be checked by means of the flags cat_l2
and cat_l3
in /proc/cpuinfo
that represent CAT support for cache level 2 and 3, respectively.
The workload can either be an executable file, or a containerized workload provided as an image packaged in a tar-ball.
cat /proc/cpuinfo | grep cat
The feature is implemented according to the following flow chart:
Details of the colored boxes are shown here:
The general procedure as shown in above diagrams is:
- Check CAT support, mounted
resctrl
filesystem and existing buffers - Remove unused buffers
- Collect all possible slots for the buffer to be created
- Add grades to possible slots depending on how good a slot is
- Choose best slot and create buffer there
- Start the workload on the respective node and assign the created buffer to it
When checking for possible slots within a hosts' cache, there are two possible locations for the required buffer to be created:
- Within unused space (that results from cache buffers that are not used anymore and that are removed by the playbook)
- Within
cos0
which is the default class of service (cos)
The terms class, cos and buffer are used interchangably within this project. cos0
is the cache space that all processes use per default and that usually occupies the whole cache after startup. The more exclusive buffers we create, the smaller cos0
gets and thus all remaining processes have less and less cache available.
To ensure a minimum size is available for the default cos, one of the variables that can be specified is min_defaultclass_size
(see 3rd step in Section Integration into existing project).
It defines the minimum amount of cache used for cos0
in percentage of the whole size for the specified level.
In case of a containerized workload, the container is started using podman and all processes of the container are assigned to the created buffer based on an OCI hook. For this to work, runc is used as the underlaying container runtime.
The following steps describe how to integrate ExCAT into your project:
-
Import ExCAT within your parent playbook by means of adding the
import_playbook
statement like so--- # Parent playbook - hosts: someHosts tasks: # some other tasks ... ... # Import the ExCAT feature - import_playbook: path/to/excat/main.yaml
-
Add an
ExCAT_hosts
group to your inventory and add all hosts that should be considered for running your workload. Examplehosts.yaml
:--- all: children: Other_hosts: hosts: ... ExCAT_hosts: hosts: 192.168.0.1: 192.168.0.2: 192.168.0.3:
-
Adapt your workload requirements within
excat/group_vars/all/workload.yaml
:--- cache_level_request: 2 buffer_size_request: 128000 path2workload: /path/to/workload/myContainerImage.tar containerized: true podman_run_flags: "" min_defaultclass_size: 50 path2logfile: "" debug: false
with
cache_level_request
: cache level that the cache buffer should be created inbuffer_size_request
: the size of the exclusive cache bufferpath2workload
: path to the executable (or image tar-ball ifcontainerized: true
) that shall be started utilizing ExCATcontainerized
: set totrue
if the workload is provided using a tar-ed container image; set tofalse
for an executable filepodman_run_flags
: additional flags to add to thepodman run
command for containerized workloadsmin_defaultclass_size
: minimum size of the default class (in percentage of the whole cache) that all processes run in per defaultpath2logfile
: path to a file on the host where the workload will run; the stdout and stderr of your workload will be written to this file; if empty,/tmp/<buffer_name>
will be used with<buffer_name>
being the name of the cache buffer that will be created in/sys/fs/resctrl/
; not applicable for containerized workloadsdebug=true
: get more verbose output with debug set to true
If you have the need for dynamically assigning one of these variables, you can do so by adding them to a
vars
section below theimport_playbook
like so:... # Import the ExCAT feature - import_playbook: path/to/excat/main.yaml vars: path2workload: "{{ path2workload_fact }}"
The value within
excat/group_vars/all/workload.yaml
will then be ignored.path2workload_fact
in this example is a variable that holds the path to the workload and that has been set before in your parent playbook with theset_fact
module.
Note:
- Some of the tasks within ExCAT require root privileges. Make sure to add the required access rights to your playbook.
- Be aware that whatever executable (or image if
containerized: true
)path2workload
points to will be started and run on one of the nodes in the inventory. So make sure that this is not used wrongly.
Make sure to have the following dependencies installed on your hosts. Installation instruction examples are given for Debian 12 and Ubuntu 22.04.
- python3
apt install python3
- pyyaml
apt install python3-yaml
- for containerized workloads:
- iptables (for containerized workloads)
apt install iptables
- jq (will be installed by the playbook)
- runc: version 1.1.9 or later
- iptables (for containerized workloads)
The small example project in example
demonstrates how to integrate ExCAT as described in the previous Section. For the example to work, it provides credentials for root access and enables for passwordless SSH. This is explained in the following sections.
To provide credentials for root access, for each host in the example/hosts.yaml
file, we add the user
and password
to the file example/group_vars/all/credentials.yaml
.
---
credentials:
192.168.0.1:
user: myUser1
password: "myPwd1"
192.168.0.2:
user: myUser2
password: "myPwd2"
192.168.0.3:
user: myUser2
password: "myPwd2"
Note: The user should have root priviledges.
If the file is complete, it is encrypted by means of
ansible-vault encrypt --vault-id <vaultID>@prompt example/group_vars/all/credentials.yaml
with <vaultID>
being any ID that you can remember that denotes the password you will be requested to provide and that is used to encrypt the file. This ID and the password have to be provided when the playbook is run.
To edit the file after it has been encrypted use
ansible-vault edit example/group_vars/all/credentials.yaml
To change the password and/or the ID, see the Ansible docs
To be able for the playbook to connect to the nodes without the need of entering passwords, a key-pair of a private key on the ansible-host (the host from which the ansible playbook is run) and the public key on each host is used. For this, generate a key-pair with
ssh-keygen -b 2048 -t rsa -N "" -f ~/.ssh/id_rsa
and then copy the public key content in ~/.ssh/id_rsa.pub
to ~/.ssh/authorized_keys
on all hosts. This can be done with
ssh-copy-id -i ~/.ssh/id_rsa.pub <user>@<ip>
with <user>
being the user on the appropriate host and <ip>
the IP address.
The playbook example/copy_keys.yaml
is automating this last copy step. It works based on the credentials encrypted earlier and can be used like so
ansible-playbook --vault-id <vaultID>@prompt copy_keys.yaml
with <vaultID>
being the ID that you've used when encrypting the file example/group_vars/all/credentials.yaml
.
To integrate ExCAT as described in Section Integration into existing project, the workload requirements are adapted in excat/group_vars/all/workload.yaml
. Furthermore, an ExCAT_hosts
group is added to the example/hosts.yaml
file and the ExCAT playbook is imported into the example playbook example/main.yaml
.
Have a look at the files within the example
directory for further details and adapt them to your needs.
To run the playbook, from within the example
directory do
ansible-playbook --vault-id <vaultID>@prompt main.yaml
with <vaultID>
being the ID that you've used when encrypting the file example/group_vars/all/credentials.yaml
.