![CS 4000](images/cs4000-title.png)

## **Course Description:**


CS 4000 - Intro to Distributed, Parallel, and Web-Centric Computing is one of the most interesting courses in the CS core cirriculum here at Ohio University. In this course you will learn about several interesting topics involving parallel and distributed computing like threading, synchronization, and potential speedup of programs. CS 4000 is a very project-based course so throughout the semester you will get plenty of hands-on learning involving the topics you will eventually learn about in lecture. Because you will be developing parallel and distributed programs on your own in this course, another essential topic you will learn about is how to calculate and optimize the efficency of your programs. Developing and optimizing programs to be as efficient as possible is a key skill you will develop in this course. Altogether, this course will definetly make you think about optimization more than you ever have before, and allow you to utilize the hardware behind the scenes to create lightning fast programs.

## **Course Topics:**
1. <u>Basic Terminology and Concepts</u> <br>
    Parallel vs. Distributed vs. Concurrent vs. Web-Centric Computing; Sequential and Parallel Models of Computation (Turing Machine and RAM vs. PRAM, Boolean Circuits); Introduction to Networking <br>
2. <u>Parallel Algorithmic Techniques</u> <br>
    Divide and Conquer; Parallel Prefix Computation; Map/Reduce; Cache Oblivious and Communication Avoiding Algorithms <br>
3. <u>Parallel/Distributed Performance (Amdahl’s Law and Gustafson’s Law)</u> <br>
4. <u>Data Dependencies and Critical Paths</u> <br>
5. <u>Threading, Synchronization, and Multi-Core Programming</u> <br>
    Race conditions; Critical sections; OpenMP; PThreads <br>
6. <u>Distributed and Cluster Computing</u> <br>
    OpenMPI; Hadoop / Spark; Sockets and Client Server Programming; Related Security Issues. <br>
7. <u>Advanced Topics/ Web Centric Computing</u> <br>
    Web-Centric Computing/Web Services, Web-Programming Languages; Web-Security Issues <br>

## **Examples of What You'll Learn:**

For this class, we are going to show you one way of developing a parallel program in C++ using a built-in library called C++ Threads. From there we are going to cover how we can calculate the efficiency of the parallel program, and also calculate the potential speed-up. However, speed-up has certain limitations - as described with Amdahl’s Law and Gustafson’s Law.

### **Parallel Programming with C++ Threads** 

C++ threads are a built-in library in the C++ language that supports multi-platform shared-memory multithreading programming in C++. What this technically means is that we can use functionality from the C++ Threads library to create C++ programs that run in parallel. A program that runs in parallel utilizes multiple computer processesors to simoultaneously carry out the calculations of a program. Instead of a normal program that only uses one of your computer's processors, a parallel program can use several at the same time! We can use C++ Threads to develop our C++ programs, and the program will tell the machine the program is running on to run it in parallel using multiple threads. Checkout the example below!

In [None]:
#include <iostream>
#include <thread>

void call_from_thread(int tid) {
    std::cout << "Launched by thread " << tid << std::endl;
}
static const int num_threads = 10;
std::thread t[num_threads];

//Launch a group of threads
for (int i = 0; i < num_threads; ++i) {
    t[i] = std::thread(call_from_thread, i);
}

//Join the threads with the main thread
for (int i = 0; i < num_threads; ++i) {
    t[i].join();
}

If you run the above code block multiple times, you will see that the order in which each thread executes varies every time. This is because the threads are pretty unpredictable in this way. If you were to go further and break up a set of computations or calcluations, you could assign a specific set of those computations to certain threads and bring them altogether at the end to make the program execute the computations in parallel. The beauty behind this dynamic of parallelism is breaking up work into sections or 'threads', so they can execute at the same time and ultimatley speed up the program by a substantial amount.

### **Speed-Up and Efficiency**

In this class, you'll be asked to write parallel programs in C++ that achieve a certain efficiency. A well written parallel program is considered to be about 75% efficient.

Speedup of a parallel program is defined to be, **S = Time of sequential / Time of parallel**. That is, the ratio of the time taken by the original program and the time taken by the parallel version. This number, in general, should be bounded by the number of processors on the system.

The efficiency of the parallel execution is **E = S / p, where S is the speedup, and where p is the number of processors being used.**


#### **Speed-up Question #1: Compute the following value:**

Parallel time = 25 seconds, sequential time = 2 minutes 15 seconds.  What is the speedup of your parallel code? 

Check your answer by running the code below


In [None]:
double parallel_time = 25;
double sequential_time = 135;
int num_of_processors = 1;
double speedup = sequential_time / parallel_time;
double efficiency = speedup / num_of_processors;

printf("Answer: Speedup = %.2f",speedup);

	
#### **Speed-up Question #2: Compute the following value:**

Parallel time = 20 seconds, sequential time = 4 minutes 20 seconds.  What is the speedup of your parallel code? 

Run the code below to check yours answer

In [None]:
double parallel_time = 20;
double sequential_time = 260;
int num_of_processors = 1;
double speedup = sequential_time / parallel_time;
double efficiency = speedup / num_of_processors;

printf("Answer: Speedup = %.2f",speedup);

#### **Efficieny Question #1: Compute the following value:**

Parallel time = 20 seconds, sequential time = 2 minutes 15 seconds.  Number of processors = 8.  What is the efficiency of your parallel code?

Run the code below to check your answer

In [None]:
double parallel_time = 20;
double sequential_time = 135;
int num_of_processors = 8;
double speedup = sequential_time / parallel_time;
double efficiency = speedup / num_of_processors;

printf("Answer: Speedup = %.2f",speedup);
printf("\n");
printf("Answer: Efficiency = %.2f, or %.2f%%",efficiency,efficiency * 100);

#### **Efficieny Question #2: Compute the following value:**

Parallel time = 20 seconds, sequential time = 4 minutes 20 seconds.  Number of processors = 16.  What is the efficiency of your parallel code?

Run the code below to check your answer

In [None]:
double parallel_time = 20;
double sequential_time = 260;
int num_of_processors = 16;
double speedup = sequential_time / parallel_time;
double efficiency = speedup / num_of_processors;

printf("Answer: Speedup = %.2f",speedup);
printf("\n");
printf("Answer: Efficiency = %.2f, or %.2f%%",efficiency,efficiency * 100);

### **Amdahl’s Law and Gustafson’s Law**

#### Amdahls Law:

Gene Amdahl in the 1960’s observed that, in any software system, part of that system must be executed sequentially due to dependency constraints, while other parts of a system may be executed in parallel. Amdahl suggested a simple model where some percentage (S) of a software system must be executed sequentially. In those cases, the value of S limits the performance enhancements that can be made when applying multiple processors to the same system.

According to Amdahls Law, the maximum speedup using N processors that can be achieved when S percent of a software system must be executed sequentially is: 

**Maximum Speedup = 1 / ( S + (1 - S) / N ), where S is the speedup, and N is the number of processors.**

#### Gustafson's Law: 

John Gustafson, in a technical note in 1981, argued that there were problems with Amdahl’s law. Gustafson law assumes that, on a sequential machine, there is a fixed sequential component (s) and a component that can be executed in parallel (p), where s + p = 1. When the program scale to N processors, s remains the same, whereas p is performed on all N processors. So, the scaled speedup according to Gustafson's Law is: 

**Scaled Speedup = s + (p * N), where s is the sequential component, p is the parallel comppnent, and N is the number of processors**

Overall, both Amdahl's Law and Gustafson's Law are good approximations for the limitations of parallel computing, and have their place. It depends on the problem you are looking at on which law to use.

#### **Amdahl's Law Question #1: Compute the following value:**

You have a program where reading the input and producing the output takes 5 seconds.    The complete sequential version of your code takes 2 minutes to execute.    According to Amdahl's law, what is the maximum speedup that you could achieve on this code using 10 processors, assuming that reading the input and producing the output cannot be run in parallel?

Run the code below to see the answer

In [None]:
double parallel_time = 120;
double sequential_time = 5;
int num_of_processors = 10;
double speedup = sequential_time / parallel_time;
double efficiency = speedup / num_of_processors;
double max_speedup = 1 / ( speedup + (1 - speedup) / num_of_processors );
printf("Answer: Speedup = %.2f",speedup);
printf("\n");
printf("Answer: Max Speedup %.2f",max_speedup);

#### **Gustafson's Law Question #1: Compute the following value:**

Assume that the sequential portion of your code takes 5% of the time, whereas the rest of the code (which could be run in parallel) takes 95% of the time.   According to Gustafson's law, what is the maximum scaled speedup for this problem on a machine with 100 processors?

Run the code below to find the answer

In [None]:
double parallel = .95;
double sequential = .05;
int num_of_processors = 100;
double scaled_speedup = sequential + (parallel * num_of_processors);

printf("Answer: Scaled Speedup %.2f", scaled_speedup);

## **Conclusion:**
CS 4000 will provide you with an introduction to distributed, parallel, and web-centric computing. This class will introduce you to some important concepts within the computer science field - including distributed and parallel models of computation, distributed and parallel computer architectures, multi-core designs, potential speed-up, threading, synchronization, and multi-core programming, parallel and distributed algorithms, sockets and client-server based software, web programming, accessing databases across the web, and web-security.