This guide is the result of an independent study performed by Nathan H. Silverman, a Kalamazoo College's student in Computer Science, under the supervision of Sandino Vargas-Pérez.
The guide presents the list of components needed to create a low-cost Bewolf-type cluster, helpful pictures, instructions for assembling hardware, software installation, commands to execute proper configuration, and more.
For questions, please contact Dr. Vargas-Pérez via e-mail.
The guide was designed to build a cluster using 4 Raspberry Pi (RPi) computers, but it can be used to build a cluster of any size. The step-by-step instructions will detail how to assemble hardware, configure software, and the setup needed to have a fully functioning RPi cluster. The Raspberry Pi 3 B+ model was used for this project, although with a few tweaks, any modern RPi should work.
Below are instructions to assemble a 4 RPi cluster. One RPi will be used as a head node, and the rest will be designated as worker nodes. The following diagram shows the layout for the cluster and its components:
The materials needed might change depending on their availability.
- 4 Raspberry Pi (RPi) computers
- 4 MicroSD cards (32 or 64 GB w/adapter)
- 5 Ethernet cable (cat 5e or 6, 1 foot in length or more)
- 4 Micro USB Cords
- 1 Ethernet switch (with 5 ports or more)
- 1 USB power adapter (with 5 ports or more, 2.4 amps per port)
- 1 Stackable acrylic case for RPis
- Make sure this item includes 4 layers, heatsinks, and it is compatible with the RPi model chosen ⧉.
- Recommended tools:
- Tweezers (Optional)
- Screw driver
- Monitor (or Ethernet to USB adapter to connect via VNC)
-
Installing Heat Sink: The stackable acrylic case included heat sinks. Install three heat sinks per RPi; two larger ones and a smaller one. The two larger heat sinks go on the Broadcom CPU and the RAM chip, the smaller one goes on the Ethernet and USB controller. Install heat sinks by removing the filament on the bottom and pushing firmly until they are attached securely. For all RPis:
-
Place the RPi on acrylic plate. For all RPis:
- Review any included instructions for the case.
- Remove protective films from each side of the plate. (This can be difficult; use a screwdriver or tweezers).
- Place screws in the plate such that the screw heads are against the plate and the threads are pointing up, making sure that it is the correct orientation. The square cutout in the plate should line up with the heatsink on the bottom of the RPi.
- Put spacers on the screws on top of the plate: screw -> plate -> spacer.
- Place RPi through the screws such that the heat sink on the bottom fits through the square cutout: screw → plate → spacer → Pi.
- Place and tighten nuts on the screws on top of the RPi, making sure to tighten them: screw → plate → spacer → RPi → nut.
-
Stack RPis to form a tower:
- Put the screw end of the nub through the plate into the open end of the leg to keep it in place. Repeat for all four of the first layers: nub → plate → leg
- Place the next plate on top of the screw end of the leg: leg → plate → leg
- Place the final plate on legs then and secure it with the silver nut on top: leg → plate w/o Pi → silver nut
The next sections will walk through setting up the microSD cards for the cluster. These examples were using macOS, but all the software needed can be downloaded for Windows OS.
- Latest version of Raspberry Pi OS with desktop for head node and Raspberry Pi OS Lite for worker node ⧉ link.
- Raspberry Pi Imager to install (flash) Raspberry Pi OS images into the microSD card ⧉ link.
- VNC Viewer to connect to the head node using an external computer (optional, but recommended) ⧉ link.
- Formatting MicroSD Card:
- Plug microSD card into a computer. Use a USB to microSD adapter if the computer doesn't have microSD or SD card slots.
- On macOS: open Disk Utility and navigate to the microSD card. Select Erase in the top center of the window. In the "Format" dropdown menu, select "MS-DOS (FAT)". Click Erase (the name of the device does not matter).
- On Windows: use Windows Explorer to locate the microSD card, right-click on it, and select Format. Select Quick Format and click start.
- Open Raspberry Pi Imager to flash the microSD card:
- Click CHOOSE OS and select the last option Use custom. Navigate to where the Raspberry Pi OS was downloaded. The files should end in
.img
or.img.xz
. Select Raspberry Pi OS with desktop if flashing the head node's microSD card, or Raspberry Pi OS Lite if flashing the worker nodes' card. - Click CHOOSE STORAGE and chose the microSD card.
- Click the gear symbol to open the Advanced options.
- Check Set hostname and use the default hostname
raspberrypi.local
. - Check Enable SSH and then check Use password authentication.
- Check Set Username and pasword and use
pi
as the Username andraspberry
as the Password. - Click Save
- Check Set hostname and use the default hostname
- Click WRITE
- Click CHOOSE OS and select the last option Use custom. Navigate to where the Raspberry Pi OS was downloaded. The files should end in
- Remove the flashed microSD card from the computer. Must of the time Raspberry Pi Imager will automatically eject SD cards when finished
Next are instructions to logging into the RPis and edit some configuration files to make them work as a cluster. Accessing the RPis can be done with or without an external monitor. Follow 1. Access RPi with external monitor if using an external monitor, or 2. Access RPi without external monitor otherwise (most common way). As a side note, you should perform 3. Configuring Hostnames and 4. RPi's OS Configuration in sequence while connected to the RPis, to avoid having to connect and disconnect each time.
-
Access RPi with external monitor:
- Insert the microSD card into the RPi.
- Connect RPi to Internet using an Ethernet cable (if the RPi comes with WiFi, this is optional).
- Connect external monitor through HDMI.
- Connect the RPi to the USB power adapter using one of the micro USB cables, and then turn the RPi on. A boot up screen should show up in the external monitor.
- Once booted up, some versions of Raspberry Pi OS will have a prompt to change password, set time zone, and update the OS. Follow these instructions, or do it manually later.
If the RPi has started but the external monitor is not getting a signal, unplug RPi and remove its microSD card. Insert the card into a computer and navigate to its contents. Locate and open the file
config.txt
. If the following lines are commented out, uncomment them by removing the#
in front of them (if they don't exist, add them):hdmi_safe=1
This setting will be used to try to boot the RPi with maximum HDMI compatibility. Re-insert the microSD in the RPi and turn it on. If the previous modifications didn't fix the problem, check that the monitor is connected properly to the correct input, and the HDMI cable is functional.
-
Access RPi without external monitor (most common way): This step will require an Ethernet to USB adapter to connect the RPi via Ethernet cable to a computer using VNC viewer. If the computer has an Ethernet port, the adapter is not needed.
- Start with the RPi designated as head node, and then repeat these steps with the rest of the RPis . Insert the microSD card into the RPi (this is the card with Raspberry Pi OS with desktop in it).
- Connect the RPi to a computer using an Ethernet cable. Use an Ethernet to USB adapter if the computer doesn't have an Ethernet port. You might also power up the RPi by connecting the micro USB power cable to one of the computer's USB ports, or into the USB power adapter.
- Be sure to turn on "Internet Sharing".
- On macOS: Select System Preferences → Sharing, and make sure "Internet Sharing" is checked and the WiFi connection is being shared with the ethernet LAN connection.
- On Windows: ⧉ link Windows 10.
- Open the Terminal and run the command
ping raspberrypi.local
. Alternatively, you can type the commandarp -a
and copy the IP address associated withraspberrypi.local
- Copy the IP address that is associated to the RPi. Press ctrl + z (^ + z on macOS).
- Then type the command
ssh pi@192.168.2.8
, where192.168.2.8
is the IP address you copied in the previous step. - Type
Yes
and press enter (return in macOS). - Enter the RPi username and password. By default the username is
pi
and the password israspberry
.- If working with an RPi designated as a worker node, skip next steps and proceed to: 3. Configuring Hostnames
- Head node only: Start the VNC virtual desktop in the RPi by opening a terminal and typing
vncserver
- Copy the IP of the RPi:
- Open the VNC Viewer application installed in the computer and paste the IP in the top search bar. It should prompt you for the RPi username and password. Remember by default it should be username:
pi
, password:raspberry
.
-
Configuring Hostnames: Choose a host name for your RPis carefully. Later, a scheduler called Slurm will be used to manage workload and running code in the cluster. Slurm requires nodes to be named a certain way, such as
node001
for the head node, andnode002
,node003
, andnode004
for the worker node.- Open a Terminal in the RPi and type the following commands to change the host name:
sudo hostname headnode001
, whereheadnode001
is the host name you chose for the RPi.sudo nano /etc/hostname
, this will open the file/etc/hostname
using the text editor nano. Delete everything, and enter the same host name, i.e.,headnode001
or the host name chosen for the RPi. To save press ctrl + x (^ + x on macOS), then press Y follow by enter (return in macOS).sudo nano /etc/hosts
, and replaceraspberrypi
with your chosen host name (e.g.,headnode001
). To save press ctrl + x (^ + x on macOS), then press Y follow by enter (return in macOS).
- Repeat for the rest of RPis, using an appropiate hostnames for each of them (
headnode001
,workernode002
,workernode003
, andworkernode004
, or simplynode001
,node002
,node003
, andnode004
).
- Open a Terminal in the RPi and type the following commands to change the host name:
-
RPi's OS Configuration:
- On a Terminal in the RPi type the command
sudo raspi-config
to bring up the software configuration tool. - Select "Change User Password" and follow the instructions. It is recommended (for the purpose of this guide) to make all passwords the same for each RPi.
- Next, open "Localisation Options" → "Change Locale". Scroll down until the
en_US.UTF-8 UTF-8
option is highlighted and press enter, then enter again.- If it cannot be set by default, exit the menu and type
sudo nano /etc/locale.gen
, uncomment the line containingen_US.UTF-8 UTF-8
(if it is commented out). Then, in the Terminal typelocale-gen
and try again. If that still does not work, trysudo dpkg-reconfigure locales
and select the correct locale.
- If it cannot be set by default, exit the menu and type
- Once back to the software configuration tool window, select "Localisation [sic] Options" and change the timezone.
- Next, choose "Advanced Options" → "Expand Filesystem" and press enter.
- Finally, select the option "Update".
- To activate all these changes, type
sudo reboot
. To shut down this node, the commandsudo shutdown -h now
can be used. - Repeat the steps from sections 2. to 4. for all of your RPis, making sure to choose appropriate host names.
- On a Terminal in the RPi type the command
- Update existing packages by running the following commands. It should be perform in all the RPis, so connect to the nodes (either from the head node, or individually using a computer and VNC Viewer:
sudo apt-get update sudo apt-get upgrade -y sudo reboot
- Install MPICH, a free, open-source implementation of the message passing interface (MPI), is one of the main components needed to for a high-performance computing (HPC) cluster because it is used for parallel programing. MPICH needs three directories to be created:
mpi-dl
to open the source code,mpi
to be the installation path (“mpi”), andmpi-build
used to build the code, as well as some other dependencies like Gfortran:sudo apt install gfortran -y sudo apt install ntpdate -y sudo apt install mpich -y
- Install MPI4PY, which will make Python available to communicate with the cluster. First install the needed dependencies:
sudo apt install python3-pip python3-dev libopenmpi-dev -y
- Now, install MPI4PY with:
sudo pip install mpi4py
- Now, install MPI4PY with:
-
Setting Static IPs: We need to set static IP addresses for the nodes to allow the cluster to communicate via an interconnection network. The cluster can do this if we create a local area network (LAN) using a switch (or router), ethernet cables, and set static IP addresses in each node, such that, they all belong to the same network. We would recommend making these IP addresses correlate to the hostanames you chose for each node, like:
Node Hostanme IP Address head node001 10.0.0.10 worker 1 node002 10.0.0.20 worker 2 node003 10.0.0.30 worker 3 node004 10.0.0.40 You should set the gateway (or router) IP address to
10.0.0.1
, so do not use this address for other nodes. If you want to continue using VNC to access the cluster, after changing the IP addresses you will have to manually change the settings of your ethernet adapter in the VNC computer to one that belongs to the network, for example10.0.0.50
as the IP address, with a subnet mask (or mask)255.255.255.0
and router (or gateway) IP address of10.0.0.1
. On Windows, you will probably need to set up a DNS server IP Address, so you can type8.8.8.8
. -
Now it is time to connect all the RPIs to the network using the Ethernet switch. Connect each ethernet cable to the RPI's network interface and an available port in the switch. There should be an available port (if you have a switch with five or more ports) to connect your computer (the one using VNC to access the cluster). You will continue accessing each RPI in the same fashion as before, but once the IP addresses are changed, you will need them to connect via VNC.
- While connected to the RPIs, set the static IP by editing the
dhcpcd.conf
file:sudo nano /etc/dhcpcd.conf
- For the head node add these lines at the bottom of the file:
interface eth0 static ip_address=10.0.0.10 static routers=10.0.0.1 static domain_name_servers= static domain_search= noipv6
- For worker nodes add the following lines, where
WORKER_NODE_IP
is one of the IP addresses you chose:interface eth0 static ip_address=WORKER_NODE_IP static routers=10.0.0.1 static domain_name_servers= static domain_search= noipv6
- After saving the changes and closing the
dhcpcd.conf
file typesudo reboot
to restart the RPIs and apply these changes. Remember that now you will have a new IP address on your RPis.
- While connected to the RPIs, set the static IP by editing the
-
Configuring Auto Login: these next steps will enable passwordless login between the head node and the worker nodes. You will generate an ssh key on the head node and copy (the public ssh key) to each of the worker nodes.
- In the head node, open a Terminal and move to the RPi directory:
cd /home/pi/
. - Run
ssh-keygen -t rsa
. Make sure to press enter three times, accepting the default path and creating no password. - To copy the head node's public ssh key into the worker nodes, type
scp /home/pi/.ssh/id_rsa.pub pi@WORKER_NODE_IP:/home/pi/master.pub
, whereWORKER_NODE_IP
is the IP address of one of your nodes (remember you will do this for each worker node) - Then, ssh into the worker node you just copied the public key:
ssh pi@WORKER_NODE_IP
. You should be in the worker node's home directory, if not, typecd /home/pi/
. - Run
ssh-keygen -t rsa
on the worker node. Make sure you hit enter three times, accepting the default path and creating no password. - Type the command
cat master.pub >> .ssh/authorized_keys
on worker node, to copy the head node's public key to the list of authorized devices that can connect to this node via ssh. - Type
exit
to get back to the head node. - Repeat steps iii to vii (3 to 7) for all of your worker nodes.
- To test if the process worked, ssh into one of your worker nodes. You should not have to enter a password.
- In the head node, open a Terminal and move to the RPi directory:
-
Configuring Shared Storage: the shared storage is a hard drive that can be accessed by all the nodes. We can do this by attaching a flash drive or external hard drive to the head node and configure it to be a network file system (NFS).
- To setup shared storage on the head node only, connect the hard drive into an available USB port. Start-up or ssh into your head node.
- You can ssh into your head node by typing either
ssh pi@10.0.0.10
from a computer connected to the switch. If you have a monitor, keyboard, and mouse, just open a terminal.
- You can ssh into your head node by typing either
- Now, we need the location of the flash drive. We can find it by typing
lsblk
. It should besda1
must of the time. Note that the TYPE MOUNTPOINT should bepart
. - Reformat the flash drive using
sudo mkfs.ext4 /dev/sda1
. Be very careful you have the correct path otherwise you might do permanent (but fixable) damage to your microSD card. - Now let's create a shared directory for this hard drive. The name that will be used throughout will be
sharedfiles
. Typesudo mkdir /sharedfiles
to create a folder namedsharedfiles
inside the root directory of your head node. - Change the ownership of the folder by typing
sudo chown nobody.nogroup -R /sharedfiles
- Make the folder readable, writable, and executable by everyone (all the nodes):
sudo chmod 777 -R /sharedfiles
sudo reboot
to activate all the changes.
- To setup shared storage on the head node only, connect the hard drive into an available USB port. Start-up or ssh into your head node.
-
Automatic mounting on Head Node Nodes should automatically mount the shared storage when they startup. The UUID is the path to the device acting as the shared storage. The other nodes need the UUID to know where to connect.
- From the head node type
blkid
to find the UUID. - Copy the UUID for
/dev/sda1
. - To have the shared storage mounted on boot, we need to edit the
fstab
file:sudo nano /etc/fstab
- Add the line
UUID=THE_UUID /sharedfiles ext4 defaults 0 2
at the bottom of the file. - Save and exit. Then type
sudo mount -a
to mount the drive.
- From the head node type
-
We now need to enable the worker nodes to have the ability to access the shared files. A network file system server will let us do this.
- Install the server in the head node byt typing:
sudo apt install nfs-kernel-server -y
. - Open the
exports
file withsudo nano /etc/exports
- Add the following line at the bottom
/sahredfiles 10.0.0.0/8(rw,sync,no_root_squash,no_subtree_check)
, where10.0.0.0/8
is the network IP address identifier and/8
represents a subnet mask of255.0.0.0
.
- Install the server in the head node byt typing:
-
Setting up the Shared Storage on Worker Nodes: Now we will set up each worker nodes NSF server and configure the shared storage. Ssh into the worker nodes and complete the following steps:
-
First, install NFS server:
sudo apt install nfs-common -y
-
Same as with the head node we want to create a folder that can be edited by anyone:
sudo mkdir /sharedfiles
. Make sure to use the same folder name as before. -
sudo chown nobody.nogroup -R /sharedfiles
and thensudo chmod 777 -R /sharedfiles
to fix permissions, ownership, and access. -
Then, edit the
fstab
file:sudo nano /etc/fstab
by adding the following line (the IP address should be the one of the head node):HEAD_NODE_IP:/sharedfiles /sharedfiles nfs defaults 0 0
-
sudo mount -a
to finalize changes. -
Now you should be able to create a file in your shared storage directory and it should be visible on all the nodes. You can test it by doing:
- On head node:
cd /sharedfiles/
, then typesudo touch test_file
. - Access any of the worker node and type
cd /sharedfiles/
, then typels -l
and a list of files should be return includingtest_file
(if you recently formated the hard drive, then this will be the only file). - If after a reboot a worker node is not seeing the shared folder, try
sudo mount -a
to remount it
- On head node:
-
-
COnfiguring SLURM: Slurm is a job scheduler. It is the software that determines which node should execute the next process, giving a more managed parallel computing experience. The head node sends processes to be executed to all the other nodes (for this particular configuration, the head node does not execute any parallel code itself).
-
To install in the head node only, ssh into the head node:
ssh pi@10.0.0.10
, where10.0.0.10
is the IP address of your head node. -
Edit the
hosts
filesudo nano /etc/hosts
. We are going to add the hostnames and IPs of the other nodes in this file (at the bottom) of the file in this format:WORKER_NODE_IP WORKER_HOSTNAME WORKER_NODE_IP WORKER_HOSTNAME WORKER_NODE_IP WORKER_HOSTNAME
-
Save and exit nano (ctrl + x, ^ + x on macOS, then press Y follow by enter).
-
Install the slurm controller using
sudo apt install slurm-wlm -y
-
Slurm comes with default settings but we will need to edit them. Move into the slurm folder
cd /etc/slurm-llnl
. -
First, we will set the right permission to the file we will use as our base configuration:
sudo chmod -R 777 /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
-
Second, we will copy/paste the file into our slurm folder:
cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
-
Unzip the file:
gzip -d slurm.conf.simple.gz
-
Replace the default configuration file by the one just unzipped:
mv slurm.conf.simple slurm.conf
-
-
More Slurm configuration (Head Node Only):
-
Open the Slurm configuration file:
sudo nano /etc/slurm-llnl/slurm.conf
. -
First, comment out the first
SlurmctldHost=
line and add a new one containing the head node's hostname and IP:SlurmctldHost=node001(10.0.0.10)
-
Next, double-check that the lines
SelectType=select/cons_res
andSelectTypeParameters=CR_Core
are uncommented (or exist) in the file -
Change the name of your cluster by changing the line
ClusterName=
-
Now, we have to add the nodes under the “COMPUTE NODES” section of the file:
NodeName=node001 NodeAddr=10.0.0.10 CPUs=4 State=UNKNOWN NodeName=node002 NodeAddr=10.0.0.20 CPUs=4 State=UNKNOWN NodeName=node003 NodeAddr=10.0.0.30 CPUs=4 State=UNKNOWN NodeName=node004 NodeAddr=10.0.0.40 CPUs=4 State=UNKNOWN
-
Comment out the pre-existing lines starting with
NodeName=
andPartitionName=
by putting a#
in front. -
Now, add your own partition:
PartitionName=YOUR_CLUSTER_NAME Nodes=node[002-004] Default=YES MaxTime=INFINITE State=UP
- For the
Nodes=
portion, make sure it matches the name you chose for your working nodes. For this guide, it was chosen to benode
followed by a number ([002-004]
represents node002, node003, and node004, repectively). - For
PartitionName=
you can choose the name of your cluster.
- For the
-
Make sure to exit and save (ctrl + x, then press Y).
-
We need to create a file that tells Slurm what resources it has access to, so navigate to slurm's directory:
cd /etc/slurm-llnl
. -
Open the configuration file
sudo nano cgroup.conf
and add:CgroupMountpoint="/sys/fs/cgroup" CgroupAutomount=yes CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup" AllowedDevicesFile="/etc/slurm-llnl/cgroup_allowed_devices_file.conf" ConstrainCores=no TaskAffinity=no ConstrainRAMSpace=yes ConstrainSwapSpace=no ConstrainDevices=no AllowedRamSpace=100 AllowedSwapSpace=0 MaxRAMPercent=100 MaxSwapPercent=100 MinRAMSpace=30
-
Save and exit.
-
Now, create a file to tell slurm about allowed devices:
sudo nano /etc/slurm-llnl/cgroup_allowed_devices_file.conf
and add the following lines:/dev/null /dev/urandom /dev/zero /dev/sda* /dev/cpu/*/* /dev/pts/* /sharedfiles*”
-
Save and exit.
-
-
Sharing Configuration (Head Node Only): For Slurm to control other nodes, they need to have the same configuration and munge key. A munge key is a type of authentication service. You can easily share your configuration and munge key using the shared storage.
- First, navigate to the
slurm-llnl
folder:cd /etc/slurm-llnl/
- Then, copy and paste the two files created previously into the shared storage directory:
sudo cp slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /sharedfiles
, where/sharedfiles
is the shared folder. - Copy and paste the munge key in the shared storage system:
sudo cp /etc/munge/munge.key /sharedfiles
- Finally, we need to enable and start Munge. In the terminal type the following commands (one by one):
sudo systemctl enable munge sudo systemctl start munge
- Also, enable and start the Slurm daemon and controller:
sudo systemctl enable slurmd sudo systemctl start slurmd sudo systemctl enable slurmctld sudo systemctl start slurmctld
- First, navigate to the
-
Slurm Set-Up for Worker Nodes (All Worker Nodes): This section will walk you through setting up all of your worker nodes. Make sure your head node is still running to allow access to the shared storage. Once again, you will perform the following steps in each node individually (you can ssh into each node via the head node, or from an external computer).
-
Install the Slurm client in the worker node:
sudo apt install slurmd slurm-client -y
-
Like before, update the hosts file:
sudo nano /etc/hosts
, and add all of the other nodes and their IPs (excluding this node itself):HEAD_NODE_IP HEAD_NODE_HOSTNAME WORKER_NODE_IP WORKER_HOSTNAME WORKER_NODE_IP WORKER_HOSTNAME
-
Save and exit.
-
Now, copy and paste the munge key from the shared storage to the munge key folder:
sudo cp /sharedfiles/munge.key /etc/munge/munge.key
-
Copy and paste the Slurm configuration files:
sudo cp /sharedfiles/slurm.conf /etc/slurm-llnl/slurm.conf
-
And copy/paste the cgroups configuration:
sudo cp /~your shared storage~/cgroup* /etc/slurm-llnl
-
Enable and start munge:
sudo systemctl enable munge sudo systemctl start munge
-
Test that munge is working on this node by trying to connect to the headnode (or one of the worker nodes):
ssh pi@~your head node’s ip~ munge -n | unmunge
. Check and make sure it was successful:- If munge is not working properly try rebooting all the pis and trying again.
-
Start Slurm:
sudo systemctl enable slurmd sudo systemctl start slurmd
-
Reboot all the nodes (once you have configure all of them) to activate the changes:
sudo reboot
.
-
Let's test the cluster to check all our configurations are working.
- Starting with Slurm: the head node will now control all the other nodes. On the head node run
sinfo
to get slurm information. - To test if slurm is working correctly we can issue the following command to tell slurm to run
hostname
on two nodes:srun --nodes=2 hostname
. You should see the results of this command (hostname
) printed for two of your nodes. You should see the host names of the two nodes selected by slurm.
THIS SECTION IS UNDER CONSTRUCTION 👷