## Introduction to Parallel Computing
### Segment 1 of 5

### Haste Does NOT Always Make Waste, Indeed!!

### In this segment we will answer:
* Why parallel processing?
* What are latency and throughput?
* What are some of available tools for parallel computing?

## Thank you for helping our study


<a href="#/slide-1-0" class="navigate-right" style="background-color:blue;color:white;padding:8px;margin:2px;font-weight:bold;">Continue with the lesson</a>

Throughout this lesson you will see reminders, like the one below, to ensure that all participants understand that they are in a voluntary research study.

### Reminder

<font size="+1">

By continuing with this lesson you are granting your permission to take part in this research study for the Hour of Cyberinfrastructure: Developing Cyber Literacy for GIScience project. In this study, you will be learning about cyberinfrastructure and related concepts using a web-based platform that will take approximately one hour per lesson. Participation in this study is voluntary.

Participants in this research must be 18 years or older. If you are under the age of 18 then please exit this webpage or navigate to another website such as the Hour of Code at https://hourofcode.com, which is designed for K-12 students.

If you are not interested in participating please exit the browser or navigate to this website: http://www.umn.edu. Your participation is voluntary and you are free to stop the lesson at any time.

For the full description please navigate to this website: <a href="../../gateway-lesson/gateway/gateway-1.ipynb">Gateway Lesson Research Study Permission</a>.

</font>

In [None]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

import warnings
warnings.filterwarnings('ignore') # Hide warnings

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
# HTML(''' 
#     <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
#     <input id="toggle_code" type="button" value="Toggle raw code">
# ''')

HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <style>
        .output_prompt{opacity:0;}
    </style>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')


# Why Parallel Computing?

* **Running out of memory**

If you work with large datasets or do usually heavy analyses with your computer, you likely have seen the error pop up below:

<center><img src="supplementary/memoryerror.jpg" width=300></center>
Needless to say how frustrating it is to have your computation cancelled after hours or even weeks of processing!<br/><br/>

* **Seeing our results FASTER!**

Parallel computing can reduce the time of your (big) computation of your (big) data from weeks to a few hours! 


## Recap from the beginners' lesson

In the beginner parallel computing lesson we saw how employing two workers instead of one increased the speed of the planting work significantly. 

Now, in this intermediate lesson we will go through the computation parallelization using computer central processing units (CPUs). 


<!-- <img src = https://i.makeagif.com/media/11-30-2015/qqZDNa.gif> -->


<center><img src=https://thumbs.gfycat.com/DependableOldfashionedGibbon-size_restricted.gif></center>

## What to optimize to get the work done FASTER?

There are two major aproaches the hardware designers take to increase the computation speed in computers: 
1. Producing stronger CPUs with faster clock == shorter processing time for each computation. 
2. Hiring multiple simpler processors to work in parallel. 




## Why the world is getting parallel?! 

Although the technology has been successfull in reducing the size of the CPUs over time, the clock frequency (processing speed) of the CPUs do not seem optimal to get any faster. It's becase increasing the clock speed of the CPUs will increase the power consumption and consequently the need for much stronger cooling systems. This means more cost, less environmental friendly, and therefore, less favorable! 

<table>
    <tr>
        <td>
<img src="supplementary/cpu_size.png" width=700>
            <center>CPU size over time</center>
        </td>
        <td>
<img src="supplementary/clock_freq.png" width=700>
            <center>CPU clock speed over time</center>
        </td>
    </tr>
</table>
    

Therefore, the technology has been leaning toward making **smaller** and **more power-efficient** processing units, but **more of them** to work in parallel. That's the philasophy of Graphical Processing Units (GPUs)! 

So, in this lesson we will focus on employing multiple computational cores at the same time! 


Image source: http://cpudb.stanford.edu/visualize


## What are we optimizing for a faster performance?

But there are two approaches of increasing the processing performance:
   1. **Latency Optimization**
   2. **Throughput Optimization**
   <br/>
    <br/>
    
- **Latency Optimization** is minimizing the time it takes for a processor to complete a computation task. 
- **Throughput Optimization** is maximizing the number of computation tasks per time unit. 

Although these two seem similar, they are indeed different. Let's see why! 

## What do we mean by latency and throughput?!
Imaging you are staying in a line for checkout at a grocery store.

It becomes so boring to wait in a long queue to checkout, specially if it's a Saturday! Ugh! 
<center><img src = "supplementary/queue.png" width = 200></center>

Unfortunately, this problem might persist because the optimization targets of you (the customer) and the grocery store managers do not align! You try to minimize your waiting time on a checkout line and the grocery store managers try to maximize the number of customers each cashier processes, so, all employees have work to do! In other words, you are optimizing **latency** and the grocery store is optimizing the **throughput**. 



## Why do we care about throughput and latency?!

Ok, so far we learned that there are two different approach to optimize the processor performances, their speed (latency), and the amount of data they process per time unit (thrughput). 

But the question is why learning about throughput and latency matters? Well, to answer this question let's make an example! <br/>
Assume that Cargill company in Wayzata, MN has 6 boxes to ship to Los Angles Port (~1865 miles). 

**Senario 1:** Use a fast sport car to do the transportation, it takes 3 time units to get to LA Port, however, it only has capacity to ship 2 boxes per travel.  
**Senario 2:** Use a truck that takes 6 time units to get to the destination but can take all 6 boxes in one go.
<center><img src = "supplementary/cargill.gif" width=600></center>





Now, let's do the math and calculate the latency and the throughput for each senario:

Sport car: 
- Latency: 3 time unit
- Throughput: 2/3 = 0.66 box/time unit

Truck:
- Latency: 6 time unit
- Throughput: 6/6 = 1 box/time unit

We can see that the first senario has much smaller latency (good!) and also smaller throughput (bad!) and the second senario gives a much larger latency (bad!) but larger throughput (good!). <br/> 
As obvious from the Cargill example, the second senario wins. But what if we use 3 sport cars at the same time?

Using these measures we can design a more efficient processing framework according to our problem. 


# The rest yet to be developed

# Choosing a parallel computing tool: 

Dask, Ray, Apache Spark

# Comparing the origin of the three pc tools: 
Origin

Popularity:

The number of pip install package name: (another popularity metric)
https://pypistats.org 

Images of popularity ranking


# Capabilities and usages

Spark, Dask, Ray

# Why Spark?
https://towardsdatascience.com/the-what-why-and-when-of-apache-spark-6c27abc19527






<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="pc-3.ipynb">Click here to go to the next notebook.</a></font>