### Putting it Together

Let's not put together our understanding of transformations and actions with what we see in the Spark UI.

1. Spark Jobs 

So remember that Spark transformations do not actually act on our data, whereas actions do.  This means, if we look at our Spark UI, we'll see that the number Spark jobs is equal to the number of actions that we execute.  So if we simply call `movies_rdd.take(1)`, this kicks off a spark job.

2. Spark Stages

We saw that a single job may have multiple stages.  We can think of our stages as a logical group of steps that can be completed at once.  For us our stages are divided into steps that can be performed before a shuffle, and then steps that can be performed after a shuffle.

For example, let's create a spark context and perform a groupby.

In [1]:
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("films").setMaster("local[2]")
sc = SparkContext.getOrCreate(conf=conf)

> And then create an RDD.

In [2]:
movies = ['dark knight', 'dunkirk', 'pulp fiction', 'avatar']
movies_rdd = sc.parallelize(movies)

In [7]:
movies_rdd.map(lambda word: word.title()). \
groupBy(lambda title: title[0]). \
map(lambda group: (group[0], len(group[1]))).collect()

[('D', 2), ('P', 1), ('A', 1)]

We can see that the first stage involved reading the file, and grouping together the data.  This is often called the `ShuffleMap` Stage.  Because it's where our shuffling and mapping occurred.

<img src="./preshuffle.png" width="30%">

And then once our data was grouped together properly, Spark could prepare the final results of counting the number of movies by record.

<img src="./resultstage.png" width="35%">

3. Spark Tasks

Each task is a stage performed on a partition of a data.  So a stage can have multiple tasks, because it generally occurs in parallel across multiple data partitions.  So when we look in the event timeline of Spark, we see that the same steps occurred across multiple partitions, with one task per partition.

<img src="./tasks.png" width="100%">

### Summary