## Intermediate Parallel Computing
### Segment 1 of 5

### Haste Does NOT Always Make Waste, Indeed!!

#### In this segment we will answer:
* Why parallel processing?
* What are latency and throughput?
* What are some of available tools for parallel computing?


*Lesson Developer: Mohsen Ahmadkhani, ahmad178@umn.edu*

## Thank you for helping our study


<a href="#/slide-1-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

Throughout this lesson you will see reminders, like the one below, to ensure that all participants understand that they are in a voluntary research study.

### Reminder

<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

import warnings
warnings.filterwarnings('ignore') # Hide warnings

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
# HTML(''' 
#     <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
#     <input id="toggle_code" type="button" value="Toggle raw code">
# ''')

HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')


# Why Parallel Computing?

* **Running out of memory**

If you usually work with large datasets or do heavy analyses with your computer, you likely have seen the following error message popping up abruptly:

<center><img src="supplementary/memoryerror.jpg" width=300></center>
Needless to say how frustrating it is to have your computation cancelled after hours or even weeks of processing!<br/><br/>

* **Seeing our results FASTER!**

Parallel computing can reduce the time of your (big) computation of your (big) data from weeks to a few hours! 


## Recap from the beginners' lesson

In the <a href="../../beginner-lessons/parallel-computing/pc-1.ipynb">beginner parallel computing lesson</a> we saw how employing two gardeners instead of one increased the speed of the planting work significantly. 

Now, in this intermediate lesson we will go through the computation parallelization using multiple cores of the computer's central processing units (CPUs). 



<center><img src="supplementary/parall.png" width=400></center>

## What is a CPU?

Before we talk about the parallelization we need to know what a **CPU** is. <br>
In simple words, a CPU is an electric chip made of billions of tiny pieces called **transistors**. 
Transistors are the building blocks of a CPU and their size and speed (clock frequency) estimate the overall attributes of the CPU. 

<center><img src="supplementary/single_cpu.png" width=100></center>
<center>A schematic CPU</center>




## What to optimize to get the work done FASTER?

There are two major ways hardware designers take to increase the computation speed in computers: 
1. Producing stronger CPUs = **increasing** the **number** of transistors and **decreasing** their **size** in a single CPU. 
2. Producing simpler CPUs and assembling them to work in parallel. 

In later slides, we will see why the tech companies are recently leaning toward getting parallel by selecting the second approach!


## Moore's Law
In the last few decades, the technology has had exponential growth in terms of CPU speed. About half a century ago (1965), Gordon Moore, the co-founder of Intel, stated an interesting observation about technology growth. He claimed that the number of transistors gets doubled every two years while their price gets half. His observation got so much attention and now it's known as **Moore's law**.

The following figure shows the trend of transistor quantities in CPUs proving Moore's law. But how far this trend can continue?!
<center><img src="supplementary/moore.png" width=600></center>
<center>Moore's Law <a href=https://en.wikipedia.org/wiki/Moore%27s_law>source</a>.</center>

## Why is the world getting parallel?!

Although the technology has been successful in fitting more transistors in smaller CPUs over time, this does not seem to be the best approach to increase the CPUs' clock frequency (speed) anymore.

It's because increasing the speed of the CPUs via increasing the number of transistors will induce higher power consumption and consequently the need for much stronger cooling systems. This means more cost, less environment-friendly, and therefore, less favorable. 

You guessed it right! Moore's law is coming to an end!



## Technology Trend: Scaling

For a better understanding, click the link below and carefully watch the figure. <br>
This figure is created by Stanford University and shows the trend of transistor sizes since the 70s. 

Please note that in this figure, "feature size" refers to the size of transistors in Micrometers (μm). μm is the unit of length in the international system of units (SI) equivalent to 10<sup>-6</sup> meters. 
<center>
<table>
    <tr>
        <td style="border:solid;width:1000%;font-size:20px;background:white">
            <center><a href=http://cpudb.stanford.edu/visualize/technology_scaling>Transistor Sizes Over Time</a></center>
        </td>
    </tr>
</table>
</center>



## Technology Trend: Speed

Now, click the following link to see the changes in CPU speeds over time. <br>
In this figure, the Y-axis indicates the clock frequency (speed) of CPUs in Mega Hertz (MHz). MHz is the unit of frequency in the SI system equivalent to one million events/cycles/processes per second. Also, each color indicates the make of the CPU.
<center>
<table>
    <tr>
        <td style="border:solid;width:100%;font-size:20px;background:white">
            <center><a href= http://cpudb.stanford.edu/visualize/clock_frequency>Clock Speed Trend</a></center>
        </td>
    </tr>
</table>
</center>  



## Technology Trend: Scaling Vs. Speed 
Looking at these figures, do you see a constant improvement in shrinking the sizes? What about speed?<br>
How do these two relate to each other?<br>
How do you relate these figures to Moore's law?<br>
Do these figures justify the necessity of parallelism? 

In the textbox below, let us know what you think. 



In [None]:

w = widgets.Textarea(
            value='',
            placeholder='Write your thoughts here',
            description='',
            disabled=False,
            layout=Layout( height='200px', min_height='100px', width='900px')
            )


def out3():
    print('Submitted!')
    
display(w)
hourofci.SubmitBtn2(w, out3)

## Why is the world getting parallel?!

As you might have concluded, the technology has been leaning toward making **smaller** and **power-efficient** processing units, but **more of them** to work in parallel. 

In this lesson, we will focus on employing multiple computational cores at the same time! 


## What are we optimizing to get a faster processing in parallelism?

There are two approaches of increasing the processing performance:

<ol>
    <li>
        <b>Latency Optimization</b>
        <ul>
            <li>
                Minimizing the time it takes for a processor to complete a computational task. <br><br>
            </li>
        </ul>
    </li>
    <li>
        <b>Throughput Optimization</b>
    </li>
    <ul>
        <li>
            Maximizing the number of computational tasks per time unit. 
        </li>
    </ul>
</ol>
    

Although these two seem similar, they are indeed different. Let's see how! 

## What do we mean by latency and throughput?!
Imagine you are staying in a line for checkout at a grocery store.

It becomes so boring to wait in a long queue to checkout, specially if it's a Saturday! Ugh! 
<center><img src = "supplementary/queue.png" width = 200></center>

Unfortunately, this problem might persist because your goal and the goal of the grocery store manager do not align. You try to minimize your waiting time in a checkout line, while the grocery store manager tries to maximize the number of customers each cashier processes, so, all employees have work to do. In other words, you are optimizing **latency** while the grocery store is optimizing the **throughput**. 



## Why do we care about latency and throughput?!

Ok, so far, we learned that there are two different ways to optimize the processor performances, their speed (latency) and the amount of data they process per time unit (throughput). 

But the question is why learning about throughput and latency matters. To answer this question, let's introduce you to Hippo company! 



## Hippo Company Example


Assume that Hippo Inc. is a newly founded food corporation located in Wayzata, MN. Managers of Hippo have decided to ship six boxes to Los Angles Port (~1865 miles) for exportation. They are newly established, so they have only two types of vehicles at their disposal, fast sports cars, and a single heavy-duty truck that goes slow. 

This makes them choose between two scenarios in the next slide!




## Hippo Company Example 

**Senario 1:** Use a fast sports car to do the transportation. It takes 3 time units to get to LA Port, however, it only has capacity to ship 2 boxes per travel.  
**Senario 2:** Use a truck that takes 6 time units to get to the destination but can take all 6 boxes in one go.
<br>
<br>
<center><img src = "supplementary/hippo_inc.gif" width=600></center>




Now, let's do the math and calculate the latency and the throughput for each senario:
<center>
$$
  Latency = Time\ to\ finish\ one\ trip\   (processing\ time)
$$

$$
  Throughput = \frac{Number\ of\ boxes\ (tasks)}{Latency}
$$
</center>

**Sports car:**
- Latency: 3 time-unit
- Throughput: 2/3 = 0.66 box/time-unit

**Truck:**
- Latency: 6 time unit
- Throughput: 6/6 = 1 box/time-unit



We can see that the first scenario has a much smaller latency (good!) and smaller throughput (bad!), and the second scenario gives a much larger latency (bad!) but larger throughput (good!). <br/> 
As trivial, the second scenario wins. But what if they use three sports cars at the same time?

Using these measures, we can design a more efficient processing framework according to our problem. <br>

Ok, now that we have a sense of parallel processing, let's see how we can implement it in cyberspace. 


# Choosing a parallel computing tool

Multiple tools have been developed in recent decades that provide a parallel computing framework. In this lesson we introduce **Apache Spark**, **Dask**, and **Ray**. <br><br>
<center>
    <table>
        <tr style="background-color:white;border:solid">
            <td style="border:solid;width:30%;">
                <img src="supplementary/spark_logo.png" width=200>
            </td>
            <td style="border:solid;width:30%;">
                <img src = "supplementary/dask_logo.png" width = 250>
            </td>
            <td style="border:solid;width:30%;">
                <img src = "supplementary/ray_logo2.png" width = 250>
            </td>
        </tr>
    </table>
</center>
   





## Comparing the Three Parallel Computing Frameworks: Origin

<br/>

<table style="background-color:white;width:50%">
    <tr style="background-color:white;width:50%">
        <td style="text-align:left;">
            <center><img src = "supplementary/spark_logo.png" width = 100 height = 100></center>
            <ul>
                <li>
                    Based on MapReduce<sup>*</sup>
                </li>
                <li>
Developed by: U.C. Berkeley, 2010
                </li>
                <li>
Created for big data and analytics
                </li>
                <li>
Successor of: Hadoop ecosystem
                </li>
                <li>
                    Available in: (initially in) Scala, Java, SQL, Python, R, C#, F#
                </li>
                <li>
                    Official Website: <a href=https://spark.apache.org>https://spark.apache.org</a>
                </li>
            </ul>
        </td>
    </tr>
</table>

<sup>*</sup><i style="font-size:70%;">Don't worry if you have no idea what MapReduce is! You will learn it in the upcomming segments!</i>


## Comparing the Three Parallel Computing Frameworks: Origin

<br/>

<table style="background-color:white;width:50%">
    <tr style="background-color:white;width:50%">
        <td style="text-align:left;">
            <center><img src = "supplementary/dask_logo.png" width = 100 height = 100></center>
            <ul>
                <li>
Based on task scheduling
                </li>
                <li>
Anaconda, 2015
                </li>
                <li>
Developed for scaling Python code/packages
                </li>
                <li>
                    Available in: Python
                </li>
                <li>
                    Official Website: <a href=https://dask.org>https://dask.org</a>
                </li>
            </ul>
        </td>
    </tr>
</table>

<sup>*</sup><i style="font-size:70%;">Don't worry if you have no idea what MapReduce is! You will learn it in the upcomming segments!</i>


## Comparing the Three Parallel Computing Frameworks: Origin
<br/>

<table style="background-color:white;width:50%">
    <tr style="background-color:white;width:50%">
        <td style="text-align:left;">
            <center><img src = "supplementary/ray_logo2.png" width = 100 height = 100></center>
            <ul>
                <li>
Based on tasks/actors
                </li>
                <li>
U.C. Berkeley, 2016
                </li>
                <li>
Initially focused on deep learning
                </li>
                <li>
                    APIs: (initially in) C++/Python, Java, Python
                </li>
                <li>
                    Official Website: <a href=https://www.ray.io>https://www.ray.io</a>
                </li>
            </ul>
        </td>
    </tr>
</table>

<sup>*</sup><i style="font-size:70%;">Don't worry if you have no idea what MapReduce is! You will learn it in the upcomming segments!</i>


## Comparing the Three Parallel Computing Frameworks: Popularity
<br/>
To get a sense of the popularity of these three packages we will look at the number of times these packages have been installed using pip command (<a href=https://pypistats.org>source</a>). Please note that as mentioned in the previous slide, these frameworks have been implemented in multiple different languages but here we only use the Python version of them to check the popularity. The statistics shown below are as of September 13, 2022. 

<center>
    <table style="background-color:white;border:solid;">
        <tr style="background-color:white;border:solid;">
            <td style="border:solid;width:30%;text-align:left;">
                <center><img src = "supplementary/spark_logo.png" width = 100 height = 100></center>
                <ul>
                    <li>
Downloads last day: 893K
                    </li>
                    <li>
Downloads last week: 5.8M
                    </li>
                    <li>
Downloads last month: 24M
                    </li>
                </ul>
            </td>
            <td style="border:solid;width:30%;text-align:left;">
                <center><img src = "supplementary/dask_logo.png" width = 100 height = 100></center>
                <ul>
                    <li>
Downloads last day: 236K
                    </li>
                    <li>
Downloads last week: 1.4M
                    </li>
                    <li>
Downloads last month: 7M
                    </li>
                </ul>
            </td>
            <td style="border:solid;width:30%;text-align:left;">
                <center><img src = "supplementary/ray_logo2.png" width = 100 height = 100></center>
                <ul>
                    <li>
Downloads last day: 66K
                    </li>
                    <li>
Downloads last week: 477K
                    </li>
                    <li>
Downloads last month: 1.8M
                    </li>
                </ul>
            </td>
        </tr>
    </table>
</center>



## Apache Spark

As shown in the previous slide, Apache Spark is the most popular parallel computing framework among the three. Therefore, we will be focusing on this package till the end of this lesson. Please note that the popularity is not indeed it's only advantage. There are many pros come with Apache Spark. You can review some of them <a href=https://towardsdatascience.com/the-what-why-and-when-of-apache-spark-6c27abc19527>here</a>.  



Great! Now, let's delve into the Apache Spark in the next segment! <br><br>


<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="pc-3.ipynb">Click here to go to the next segment.</a></font>