<a href="https://colab.research.google.com/github/groda/big_data/blob/master/SparkOnSLURM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://github.com/groda/big_data"><div><img src="https://github.com/groda/big_data/blob/master/logo_bdb.png?raw=true" align=right width="90" alt="Logo Big Data for Beginners"></div></a>
# Single-Node Spark on SLURM: A Hands-On Colab Demo

**SLURM** is an acronym for **S**imple **L**inux **U**tility for **R**esource **M**anagement. The name reflects its original design goal of being a straightforward yet powerful tool for managing Linux cluster resources (see [https://github.com/groda/big_data/blob/master/SLURM.ipynb](https://github.com/groda/big_data/blob/master/SLURM.ipynb) for more details).

After setting up, configuring, and using SLURM on a single Ubuntu virtual machine (VM), this demonstration walks you through running a Spark job.

After setting up, configuring, and using SLURM on a single Ubuntu virtual machine (VM), this Colab demonstration walks you through running a Spark job on top of SLURM.
While running SLURM and Spark on a single node naturally hides the complexities of a true multi-node environment, it still allows you to leverage parallelism across multiple CPU cores.

Despite its simplicity, this setup provides a clear, minimal blueprint that can later be extended to real multi-node SLURM clusters and distributed Spark deployments.





>[Single-Node Spark on SLURM: A Hands-On Colab Demo](#scrollTo=ukIg5iRf3yoD)

>>[Install and launch the SLURM services](#scrollTo=R5Uy2zngnChj)

>>[Install Spark](#scrollTo=anYYDKthjvSn)

>>[Launch Spark on SLURM and execute job](#scrollTo=BGZH6AWeFvlG)



## Install and launch the SLURM services

In [1]:
!apt install slurm-wlm -y

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  fonts-dejavu-core freeipmi-common libb64-0d libdbi1 libfreeipmi17
  libipmimonitoring6 libjansson4 libjwt0 liblua5.1-0 libmunge2 librrd8 munge
  slurm-client slurm-wlm-basic-plugins slurmctld slurmd
Suggested packages:
  freeipmi-tools
The following NEW packages will be installed:
  fonts-dejavu-core freeipmi-common libb64-0d libdbi1 libfreeipmi17
  libipmimonitoring6 libjansson4 libjwt0 liblua5.1-0 libmunge2 librrd8 munge
  slurm-client slurm-wlm slurm-wlm-basic-plugins slurmctld slurmd
0 upgraded, 17 newly installed, 0 to remove and 41 not upgraded.
Need to get 6,419 kB of archives.
After this operation, 22.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libjansson4 amd64 2.13.1-1.1build3 [32.4 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 fonts-dejavu-core all 2.37-2bu

Configuration file.

In [2]:
%%writefile /etc/slurm/slurm.conf
# Minimal slurm.conf for single-node testing
ClusterName=mylocalcluster
SlurmctldHost=localhost
AuthType=auth/munge
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
StateSaveLocation=/var/spool/slurmctld
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
SlurmdLogFile=/var/log/slurmd.log
SlurmctldLogFile=/var/log/slurmctld.log
# Node and partition configuration
NodeName=localhost CPUs=2 RealMemory=7923 State=UNKNOWN
PartitionName=LocalQ Nodes=localhost Default=YES MaxTime=INFINITE State=UP

Writing /etc/slurm/slurm.conf


Munge key

In [3]:
%%bash
sudo dd if=/dev/urandom of=/etc/munge/munge.key bs=1 count=1024 >/dev/null 2>&1
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key


Spool Directories

In [4]:
%%bash
sudo mkdir -p /var/spool/slurmctld /var/spool/slurmd /var/lib/munge
sudo chown slurm:slurm /var/spool/slurm{ctld,d} /var/lib/munge
sudo chown munge:munge /var/lib/munge
sudo chmod 755 /var/spool/slurm* /var/lib/munge

Start the Services

In [5]:
%%bash
sudo service munge stop
sudo service slurmctld stop
sudo service slurmd stop

 * Stopping MUNGE munged
   ...done.
 * Stopping slurm central management daemon slurmctld
   ...done.
No /usr/sbin/slurmctld found running; none killed.
slurmctld is stopped
 * Stopping slurm compute node daemon slurmd
   ...done.
No /usr/sbin/slurmd found running; none killed.
slurmd is stopped


In [6]:
%%bash
sudo service munge start
sudo service slurmctld start
sudo service slurmd start

 * Starting MUNGE munged
   ...done.
 * Starting slurm central management daemon slurmctld
   ...done.
 * Starting slurm compute node daemon slurmd
   ...done.


Verify Cluster Status

In [7]:
!sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
LocalQ*      up   infinite      1   idle localhost


If the node is not idle, set it:

In [8]:
!sudo scontrol update NodeName=6d91971d1d4a State=IDLE

## Install Spark

In [9]:
%%bash
wget --no-clobber https://downloads.apache.org/spark/spark-3.5.7/spark-3.5.7-bin-hadoop3.tgz
tar xzf spark-3.5.7-bin-hadoop3.tgz
mv spark-3.5.7-bin-hadoop3 ~/spark

--2025-11-17 09:32:09--  https://downloads.apache.org/spark/spark-3.5.7/spark-3.5.7-bin-hadoop3.tgz
Resolving downloads.apache.org (downloads.apache.org)... 88.99.208.237, 135.181.214.104, 2a01:4f9:3a:2c57::2, ...
Connecting to downloads.apache.org (downloads.apache.org)|88.99.208.237|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 400914067 (382M) [application/x-gzip]
Saving to: ‘spark-3.5.7-bin-hadoop3.tgz’

     0K .......... .......... .......... .......... ..........  0%  119K 55m0s
    50K .......... .......... .......... .......... ..........  0% 1.20M 30m9s
   100K .......... .......... .......... .......... ..........  0%  290K 27m36s
   150K .......... .......... .......... .......... ..........  0% 1.28M 21m56s
   200K .......... .......... .......... .......... ..........  0%  292K 22m1s
   250K .......... .......... .......... .......... ..........  0% 39.8M 18m22s
   300K .......... .......... .......... .......... ..........  0% 58.9M 15m46s
   

Set `SPARK_HOME`

In [10]:
import os
os.environ["SPARK_HOME"] = os.environ["HOME"] + "/spark"
os.environ["PATH"] = os.environ["SPARK_HOME"] + ":" + os.environ["PATH"]

In [11]:
%%bash
echo $SPARK_HOME

/root/spark


## Launch Spark on SLURM and execute job

The script `spark_local.slurm`
- launches a Spark cluster consisting of one master and two workers on SLURM
- executes a Spark job (`org.apache.spark.examples.SparkPi` from the examples suite) on the cluster
- when the job finishes, the Spark cluster is shut down

In [12]:
%%writefile spark_local.slurm
#!/bin/bash
#SBATCH --job-name=spark-local
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=00:05:00
#SBATCH --output=spark_%j.out

# Point to your Spark install
SPARK_HOME=$HOME/spark

# Basic Spark env (optional tuning)
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1G

HOSTNAME=$(hostname)
MASTER_URL="spark://$HOSTNAME:7077"

echo "Starting Spark master and 2 workers on $HOSTNAME"
echo "Master URL: $MASTER_URL"

# Start master
$SPARK_HOME/sbin/start-master.sh --host $HOSTNAME &
sleep 5

# Start 2 workers (each uses 1 core)
$SPARK_HOME/sbin/start-worker.sh $MASTER_URL &
$SPARK_HOME/sbin/start-worker.sh $MASTER_URL &
sleep 5

# Submit example job
$SPARK_HOME/bin/spark-submit \
  --master $MASTER_URL \
  --class org.apache.spark.examples.SparkPi \
  $SPARK_HOME/examples/jars/spark-examples_*.jar 100

# Stop Spark
$SPARK_HOME/sbin/stop-worker.sh
$SPARK_HOME/sbin/stop-master.sh


Writing spark_local.slurm


In [13]:
%%bash
JOB_ID=$(sbatch --wait spark_local.slurm | awk '{print $NF}')
JOB_EXIT_CODE=$?
if [ "$JOB_EXIT_CODE" -eq 0 ]; then
  echo "✅ Slurm job $JOB_ID finished successfully."
else
  echo "❌ Slurm job $JOB_ID failed with exit code $JOB_EXIT_CODE."
fi

✅ Slurm job 1 finished successfully.


In [14]:
!cat spark_7.out

cat: spark_7.out: No such file or directory


In [15]:
%%writefile spark_local2.sh
#!/bin/bash
#SBATCH --job-name=spark-local
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=00:05:00
#SBATCH --output=spark_%j.out

# Point to your Spark install
SPARK_HOME=$HOME/spark

# Basic Spark env (optional tuning)
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1G

HOSTNAME=$(hostname)
MASTER_URL="spark://$HOSTNAME:7077"

echo "Starting Spark master and 2 workers on $HOSTNAME"
echo "Master URL: $MASTER_URL"


MASTER_NODE=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1)
MASTER_URL="spark://$MASTER_NODE:7077"

# Start master
srun -N1 -w $MASTER_NODE $SPARK_HOME/sbin/start-master.sh --host $MASTER_NODE &
sleep 5

# Start 2 workers (each uses 1 core)
srun -N2 $SPARK_HOME/sbin/start-worker.sh $MASTER_URL &
sleep 5

# Submit example job
$SPARK_HOME/bin/spark-submit \
  --master $MASTER_URL \
  --class org.apache.spark.examples.SparkPi \
  $SPARK_HOME/examples/jars/spark-examples_*.jar 10

# Stop Spark
$SPARK_HOME/sbin/stop-worker.sh
$SPARK_HOME/sbin/stop-master.sh


Writing spark_local2.sh


In [16]:
!sbatch spark_local2.sh

Submitted batch job 2


In [17]:
%%bash
latest_job_id=$(ls spark_*.out | grep -oP '\d+' | sort -nr | head -n 1)
if [ -n "$latest_job_id" ]; then
  cat spark_"$latest_job_id".out
else
  echo "No spark_*.out files found."
fi

Starting Spark master and 2 workers on b6f77691a41f
Master URL: spark://b6f77691a41f:7077
starting org.apache.spark.deploy.master.Master, logging to /root/spark/logs/spark--org.apache.spark.deploy.master.Master-1-b6f77691a41f.out
starting org.apache.spark.deploy.worker.Worker, logging to /root/spark/logs/spark--org.apache.spark.deploy.worker.Worker-1-b6f77691a41f.out
starting org.apache.spark.deploy.worker.Worker, logging to /root/spark/logs/spark--org.apache.spark.deploy.worker.Worker-1-b6f77691a41f.out
25/11/17 09:33:12 INFO SparkContext: Running Spark version 3.5.7
25/11/17 09:33:12 INFO SparkContext: OS info Linux, 6.6.105+, amd64
25/11/17 09:33:12 INFO SparkContext: Java version 17.0.16
25/11/17 09:33:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/11/17 09:33:12 INFO ResourceUtils: No custom resources configured for spark.driver.
25/11/17 09:33:12 INFO SparkContext: Submitted application: Spark Pi
