![title](img/this-is-fine-spark.jpeg)

## ðŸ”¥ Spark fires ðŸ”¥  - Hot node (stragglers) scenario

### Overview

In this scenario we create a situation in which one of our executors is processing data much more slowly than the other executors. This is often seen in multi-tenant clusters where load is rarely, if ever, perfectly balanced across the cluster.

You might also hit this scenario if you are running multiple Spark applications on a single tenant cluster, where the application load is unevenly distributed.

### Bootstrap

In [None]:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = (
    SparkSession
    .builder.master("spark://spark:7077")
    # .config("spark.eventLog.enabled", "true")
    # .config("spark.eventLog.dir", "/data/tmp/spark-events")
    .config("spark.locality.wait", "0")  # -- change 2 - improves speed of task redistribution
    .appName("spark-fires-hot-node")
    .getOrCreate()
)

spark.version

### Create some fake data to process

Note, we can tweak the `num_partitions` to help reduce the impact of our hot-node. 

Try running with the original setting and then adjusted setting, perhaps play around with your own values. You should see ~ 2x speed up. Whoop, whoop!

In [None]:
%%time 

num_partitions = 6
# num_partitions = 18  # -- change 1 - increase no. of partitions to allow Spark to redistribute partition tasks across executors

df = spark.range(0, 7200).repartition(num_partitions).cache()
df.show(10)

### Starting the fire

_Note, we use `sleep` as a cheat to simulate our slower processing on our hot node._

In [None]:
%%time

from time import sleep
import os

def process_partition(iterator):
    for item in iterator:
        if 'HOT_NODE' in os.environ:
            sleep(0.05)
        else:
            sleep(0.01)
            
        yield item

mapped = df.rdd.mapPartitions(process_partition).toDF()
mapped.write.format("parquet").mode('overwrite').save("/data/range_nums")

### Putting the fire out  ðŸ”¥ðŸ”¥ðŸ”¥ ðŸš’ ðŸš’ ðŸš’ ðŸ§¯ðŸ§¯ðŸ§¯

First run through you should see an execution time of ~ 60 seconds for the first. If you look at the Spark UI on http://localhost:4040/ and when we dig through into the stage tasks we can see we have two stragglering tasks on our 'hot node'.

Oof, what can we do? Let's take a look at some levers we can pull on:
1. Firstly, we can increase the number of partitions on our data, see `-- change 1` above. This gives Spark more tasks to play with, which it will redistribute to the executors which have completed their work, our 'cool' executors. This change should give us a **30% speed improvement** straight off of the bat.
2. Next, we can tune the Spark configuration by setting the `spark.locality.wait` to 0 seconds, see `-- change 2`. Make this change and restart the kernel. This will make Spark aggressively move work to different executors rather than waiting a bit longer to process the data locally. Data locality is an important concept and trade-off. But with cloud data processing data locality is less relevant than Hadoop environments. This final change should give us a **~ 3x speed improvement**, which is not too shabby.

_One caveat to keep in mind: the Docker setup is all on a single machine, so running locally like this you are not seeing normal network latency and transfer speeds, which will add a small amount of overhead._

In [None]:
# spark.stop()