# DS107 Big Data : Lesson Six Companion Notebook

### Table of Contents <a class="anchor" id="DS107L6_toc"></a>

* [Table of Contents](#DS107L6_toc)
    * [Page 1 - Introduction](#DS107L6_page_1)
    * [Page 2 - Streaming Data](#DS107L6_page_2)
    * [Page 3 - Kafka](#DS107L6_page_3)
    * [Page 4 - Apache Flume](#DS107L6_page_4)
    * [Page 5 - Spark Streaming](#DS107L6_page_5)
    * [Page 6 - Apache Storm](#DS107L6_page_6)
    * [Page 7 - Flink](#DS107L6_page_7)
    * [Page 8 - Cluster Redundancy](#DS107L6_page_8)
    * [Page 9 - Zookeeper](#DS107L6_page_9)
    * [Page 10 - Key Terms](#DS107L6_page_10)
    * [Page 11 - Lesson 6 Hands-On](#DS107L6_page_11)
    

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 1 - Overview of this Module<a class="anchor" id="DS107L6_page_1"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

In [3]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Working with Real Time Data
VimeoVideo('388631697', width=720, height=480)

The transcript for the above overview video **[is located here](https://repo.exeterlms.com/documents/V2/DataScience/Video-Transcripts/DSO107L07overview.zip)**.

# Introduction

One of the up-and-coming needs in big data is how to deal with real-time data. That is, how do you deal with data that is not being collected, then analyzed, but is being continuously collected, and maybe even continuously analyzed? This lesson will give you the theoretical foundations you need in the big data programs that deal with data in real time. By the end of this lesson, you should have a general understanding of the following programs:

* Kafka
* Flume
* Spark Streaming
* Storm
* Flink

You will also learn some about cluster redundancy and management, including:

* Understanding the basic operations necessary for cluster redundancy
* Understanding the logistics and architecture of Zookeeper

This lesson will culminate with a hands-on in which you compare and contrast the different real-time data programs.

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>You may want to watch this <a href="https://vimeo.com/459422096"> recorded live workshop on the concepts in this lesson and the previous one if you haven't already. </a> </p>
    </div>
</div>


<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 2 - Streaming Data<a class="anchor" id="DS107L6_page_2"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Streaming Data

Thus far, the primary way you have gotten data into your cluster by uploading it into `File View` in HDFS.  But what if your data is too large to be stored on your computer? What if you can't just drag and drop it in? There are *streaming* technologies that interface with Hadoop to automatically bring in data from its source. These sources can be anything, but to give you some examples, you could have data streaming into Hadoop from web server logs, stock market transactions, or sensors from the *internet of things (IOT)*. Notice how you have "smart" everything these days? From personal assistants like Amazon's Alexa to fitness trackers to security systems to refrigerators that can notify you when you're out of ice, your world is becoming inundated with machines that are generating data; this is often called *machine data*. The internet of things is the network upon which these systems run. No matter what the data source, all these examples have something in common: generation of thousands of records per minute. 

There are two issues with such an abundance of *live* or *real-time* data.  First, how in the heck do you get that data into your cluster? And second, how do you process it when it arrives at your cluster? You'll learn the theory and technologies that are meant to handle data streaming issues.



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 3 - Kafka<a class="anchor" id="DS107L6_page_3"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Kafka

*Kafka* is a big data program that is meant to help solve the problem of how to get streamed data into your cluster.  Although it interfaces nicely with Hadoop, it is not just meant for Hadoop - it can operate independently. Basically, Kafka acts as a post office. Kafka servers store all incoming messages from *publishers* (think data generating things) and then allows them to be picked up and read by *consumers* (think data using things). It is up to the consumer to subscribe to the data they need, and they can pick it up when it is convenient.  This is often in real time, but it does not have to be.

---

## Kafka Architecture

Kafka is built around its own cluster at the center (which may consist of many servers, depending on the volume of data it is processing). Incoming data comes from the publishers, also known as the *producers*, which is then stored on that Kafka cluster. Consumers can receive the data as it comes out, in a wide variety of ways, including database connectors. You can also add in an optional *stream processor*, which will transform unstructured data as it comes in and then send it along to the connectors/consumers.

Kafka is very scalable, because you can keep adding additional servers to your Kafka cluster, and you can even distribute the system of consumers as well if you need to.

---



<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 4 - Apache Flume<a class="anchor" id="DS107L6_page_4"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Apache Flume

*Apache* Flume is yet another way to receive streamed data, and unlike Kafka, which can function outside Hadoop, Apache Flume was made with Hadoop in mind.  Since it was truly built to integrate with Hadoop, it has built in data sinks for HDFS and for HBase.  This is particularly nice because neither HDFS nor HBase like having a lot of connections to them, so you can use Flume as a buffer between where your data is coming from and where you are storing it.  This also provides a nice little backup, so that you don't end up adding in more data than HDFS or HBase can handle at one time. 

---

## Flume Architecture

In Flume, the *source* is where the data is coming from, and data flows into a *channel*, which is a file or memory transfer area. From the channel, data goes into the *sink*, or where data will be temporarily stored. Flume will feed into HBase or other data storage, and then once it has transferred over the data, it deletes it.  The deletion of stored data is one of the differences between Flume and Kafka - Flume basically funnels data from one place to another, while Kafka keeps it all and allows you to reach in and scoop out the data you need. 

---

### Source

On the source end, Flume allows you to add some logic in, so that you can sort the data and place it in the right places.  It also allows you to add *interceptors*, so that you can transform the data before you send it on. There are a whole slew of source types built into Flume, including, but not limited to:

* Kafka
* **Exec:** Linux command prompt output
* **Thrift:** A data connection interface
* **Netcat:** A program that allows you to listen to streamed-in data
* **HTTP:** Listen to web ports

You can also set up custom links. Hopefully, a developer or database engineer will help you with these things should they be required.

---

### Sink

There are also many built-in sink types.  They include, but are not limited to: 

* HDFS
* Hive
* HBase
* Thrift
* Kafka

As with sources, you can also build custom sinks if necessary.

---

### Avro

*Avro* allows you to connect one agent to another, so it is a program used to set up multi-tiered Flume systems - kind of like a set of waterfalls in a stream.  The data hops down one channel, into a first sink, and then into a second channel and a second sink, etc.  This is one of the ways that Flume is scalable - you can add multiple data sources and it allows you to limit the amount of traffic that you end up sending to your final sink.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 5 - Spark Streaming<a class="anchor" id="DS107L6_page_5"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Spark Streaming

*Spark Streaming* is a component of Spark that allows you to process and analyze data either in real time or in very small *batches*.  A batch is a chunk of data for a certain time period. On the large side, you can have batches of data where you look at all the data for a year, or maybe a quarter.  That level is easy to manage without fancy big data software, though.  When you're talking batches in Spark Streaming, you're talking about batching data as small as by the second! Sometimes that is referred to as a *micro batch* because of the nature of its size.

---

## Spark Streaming Architecture

Basically, data streams on in, straight to a *receiver*.  Then the receiver sends a micro batch of data into an RDD. Next, *DStream*, or *Discretized Stream*, generates all those RDDs and performs whatever actions you want Spark to take.  The actions are those available in Spark 1.0, which you will encounter soon. These results help maintain *stateful* data, or data that is long-lived and persists beyond your batch.  Spark Streaming can create aggregate sessions this way and thus maintain your data overtime. 

---

## Windows and Data Intervals

A *window* is a glimpse in time at your streamed data, and typically comprises of multiple batches.  For instance, you could have batches coming every 2 seconds, but have a window set for an hour.  That window will include 1,800 batches! You can think of a window like a snapshot - it only shows you the last hour as it is right now.  In another minute, that window will look differently, because now that window has slid across time.  A window at 12:00pm will look differently than the window at 12:01pm, because now you have gained one new minute and lost one old one.

There are three different interval components to windowing:

* **Batch Interval:** How often data comes into your stream.
* **Slide Interval:** How often you aggregate or transform data in a window.
* **Window Interval:** How far back in time you gather data.

---

## Structured Streaming

There is also something called *structured streaming* that manifests itself in Spark 2.0, though it is still slightly experimental.  In structured streaming, you use DataSets as the primary structure, and you can build tables that just append new rows as new data comes in. Structured streaming is advantageous, because the code looks pretty darn similar to what you would use if you weren't using real-time data, and you can pass your streamed data directly into Spark MLLib if that is required, without further processing.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 6 - Apache Storm<a class="anchor" id="DS107L6_page_6"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Apache Storm

*Apache Storm* is yet another program that will allow you to process data in real time. While Spark Streaming works with incredibly small micro-batches, small enough that streaming feels like it is real time, Storm really is real-time processing, since it works off of each individual data event you have. You basically have less than a second lag time in processing, which is cool, but a little overkill for most data situations.

---

## Storm Architecture

Like every other big data program out there, Storm uses it's own terminology.  You have the *stream*, which is the data flowing into your cluster.  The stream is formatted continuously as tuples. This stream comes from the *spout* or spouts, which is the source of data.  Spouts can be things such as Kafka or pretty much any other data source you can imagine. The output are *bolts*, which is the the data as it is processed, transformed, and/or aggregated.  Because this really is true real-time data, there are no final results - the bolts just keep going on and on and on. In fact, Storm will keep running until you explicitly tell it to stop!

You can create *topology*, or graphs similar to the DAGs constructed in TEZ, that help you assemble the spouts and bolts to chain things together. Unlike TEZ, however, you have create them yourself - Storm will not automatically optimize. 

Storm has a *Supervisor* that keeps track of the workers, and a *Nimbus Node* that tracks the jobs for each worker.  You can also make use of Zookeeper with Storm to reduce the chances that the Nimbus Node will fail.

---

## How to Work in Storm

Honestly, working in Storm will most likely take you outside the role of data scientist and into the role of developer or database architect.  Although you can run Storm in any language you want, by and large, folks interact with it in Java.  And you typically need to interact with either of Storm's APIs: *Storm Core* and *Trident*.  Although both work, Storm Core provides you with data at a lower, less-processed level.  When using Storm Core, you have a high probability of receiving individual pieces of data more than once.  If you use Trident, that is a higher-level API that puts an extra layer between you and the data, so that it is processed enough to reduce the chances of accidental duplication.

The choice of gathering data using Kafka and then processing in real-time with Storm is a popular one.

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 7 - Flink<a class="anchor" id="DS107L6_page_7"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Flink

*Flink* is another fully real-time data streaming framework, similar to Storm. No microbatches here! But, it is faster than Storm, and in big data, being fast can save you not only time, but also money. It is also fully scalable - you can add thousands of nodes if you want and need them. It also has a very strong fault-tolerance and is built to ensure that you only process each data point once (sometimes called *exactly-once processing*). You can also utilize Scala with Flink, similar to Spark Streaming - but again with event-based real-time data rather than microbatches.

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Fun Fact!</h3>
    </div>
    <div class="panel-body">
        <p>Flink is a German word meaning "quick and nimble!"</p>
    </div>
</div>

---

## Flink Architecture

Flink is just a run-time engine that can be run on top of a variety of different platforms, including Hadoop/YARN, AWS, and Google Cloud. It has two APIs that you can use to process either streaming or batch data.

* **DataStream API:** This includes something for event processing called *CEP* and something for querying, called *Table*. 
* **DataSet API:** This helps deal with batch data, and includes *FlinkML* for machine learning, *Gelly* for graph processing, and *Table* for querying.

Flink connects to just about anything you can think of, including:

* HDFS
* Cassandra
* Kafka

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 8 - Cluster Redundancy<a class="anchor" id="DS107L6_page_8"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


In [4]:
from IPython.display import VimeoVideo
# Tutorial Video Name: Working with Real Time Data
VimeoVideo('388357657', width=720, height=480)

The transcript for the above overview video **[is located here](https://repo.exeterlms.com/documents/V2/DataScience/Video-Transcripts/DSO107L06overview.zip)**.

# Cluster Redundancy

One of the most important things about Hadoop is that it is resilient to failure.  When you have so much data, you would **really** feel the loss if something happened to it. It's never good to lose any data due to hardware failure or software glitches, but with a smaller dataset, you may be able to recreate what you've lost or soldier on without it.  If you have all the data your company has ever collected tied up in one system, and it consists of billions of records...well no amount of overtime is going to help you recollect that data. That data is probably the product of a decade or more data collection and the definite record for the company.

Since you don't want anything to happen to your data, most big data systems have at least some built-in redundancy mechanisms.

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Additional Info!</h3>
    </div>
    <div class="panel-body">
        <p>You may want to watch this <a href="https://vimeo.com/459422096"> recorded live workshop on the concepts in this lesson. </a> </p>
    </div>
</div>

---

## Essential Operations in a Distributed System

Whenever you have data distributed across multiple nodes, like in Hadoop, there are some essential things that must be done in order to make sure things run smoothly. Hadoop takes care of all this in the background, so you don't have to worry about it, but it is important to know. The following are redundancy operations that must take place:

* Election of a Master Node
* Detection of crashes and communication failures
* Group management - determine which nodes are available when
* Creation of *metadata* that tracks outstanding tasks and task assignments

*Metadata* is basically data about data. 

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 9 - Zookeeper<a class="anchor" id="DS107L6_page_9"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


# Zookeeper

Hadoop already is pretty resilient, but it has one place in which it is not *redundant*, meaning it does not have a backup system - the Master Node. If you remember back a few lessons, the Master Node keeps track of the rest of the cluster.  It has a list of where data is located and what is being done with that data. With such an important job, you can imagine how much trouble your cluster would be in if the Master Node failed. 

Luckily, there is a solution to this single point of failure in the Master Node: *Zookeeper*. Although you can only have one Master Node at a time (ever have the confusion of reporting to two bosses?) Zookeeper keeps a backup Master Node in reserve and keeps track of the processes that are taking place on your active Master Node, so that the backup can hopefully seamlessly take its place in the event of failure.  Zookeeper logs the following information: 

* Which node is the Master
* Which tasks are assigned to each worker
* Which workers are currently available

Those are the essentials that it would need to take over operations in case the Master Node fails.

---

## Recovering from Partial Failure

Zookeeper isn't just helpful when the entirety of the Master Node has failed; it can also assist when you have had partial failure throughout your cluster.  Some of the partial failure situations that Zookeeper can help your cluster recover from include:

* Hard drive failure on a node(s)
* Loss of power from storms that temporarily knock out a node(s)
* Nodes getting out of sync with each other (also called *drift*)
* Issue with time changes that leaves node(s) asynchronized in time

Zookeeper does not just help the core Hadoop system either.  It can also provide backup for the following applications:

* HBase
* High-Availability MapReduce
* Apache Drill
* Apache Storm
* Solr

So when partial or total failure in a node/worker takes place, what does Zookeeper do? Well, it will detect the issue, notify the system, and then the system can redistribute the work appropriately.

---

## Zookeeper Architecture

As with many other big data programs, it is useful to understand a little bit about the architecture behind Zookeeper. Zookeeper maintains a connection to the Master and Workers in Hadoop, that is called the *ZK Client*.  You need multiple ZK Clients, or it won't do any good - what if they went down? Therefore, Zookeeper uses a *quorum* system, in which you have an odd number of ZK Clients and they have to agree about the situation.  Otherwise, what would happen if there's a communication error, and Zookeeper gets sucked into it? With a quorum, the majority rules and you reduce the chances of having a split brain scenario where different clusters are working out of sync with the others. 

How many ZK clients do you need? At least three. But if you want to prepare for the possibility of more than one node failing at once, you'll need at least five.

---

### Znodes

Zookeeper sends notifications to something called a *znode* so that you get updates about the status of your cluster. This removes the need for you to continuously ping your cluster for updates to see that all is going well (you don't want to be that annoying helicopter parent, do you?) and thus can save resources. There are two types of znodes:

* **Ephemeral znodes:** Ephemeral means fleeting, so these are only created when an issue comes up.  It provides auto-notification to you immediately.
* **Persistent znodes:** Always around. Helps assign tasks to workers in the event that the Master Node goes down and can help the new Master pick up from where the old one left off. 

<div class="panel panel-success">
    <div class="panel-heading">
        <h3 class="panel-title">Fun Fact!</h3>
    </div>
    <div class="panel-body">
        <p>Zookeeper starts automatically with your Hortonworks Sandbox!</p>
    </div>
</div>

---

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 10 - Key Terms<a class="anchor" id="DS107L6_page_10"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Key Terms

Below is a list and short description of the important keywords learned in this lesson. Please read through and go back and review any concepts you do not fully understand. Great Work!

<table class="table table-striped">
    <tr>
        <th>Keyword</th>
        <th>Description</th>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Streaming</td>
        <td>Working with real-time data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Internet of Things</td>
        <td>The network of "smart" devices.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Machine Data</td>
        <td>Data generated by devices.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Real-time Data</td>
        <td>Data that is collected and/or analyzed live.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Kafka</td>
        <td>Big data program to stream data into your cluster.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Publishers/Producers</td>
        <td>Things generating data in Kafka.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Consumers</td>
        <td>Things using data in Kafka.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Stream Processor</td>
        <td>Transforms unstructured data as it comes into Kafka.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Flume</td>
        <td>Big data program to stream data into your cluster that was built for Hadoop.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Source</td>
        <td>Where the data comes from in Flume.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Sink</td>
        <td>Where the data flows to in Flume.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Channel</td>
        <td>File or memory transfer area in Flume.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Interceptors</td>
        <td>Allow you to transform data before it gets to the sink in Flume.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Avro</td>
        <td>Allows for multi-tiered Flume systems.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Spark Streaming</td>
        <td>Spark for processing real-time data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Batch</td>
        <td>Chunk of data from a certain time period.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Microbatch</td>
        <td>Batch that can span a second or less.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Discretized Stream (DStream)</td>
        <td>Transforms your microbatch into an RDD in Spark.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Stateful Data</td>
        <td>Data that persists beyond the batch.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Window</td>
        <td>Glimpse in time at streamed data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Batch Interval</td>
        <td>How often data comes into your stream.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Slide Interval</td>
        <td>How often you aggregate/transform data in a window.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Window Interval</td>
        <td>How far back in time you gather data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Structured Streaming</td>
        <td>Append new rows to DataSets in Spark as data comes in.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Apache Storm</td>
        <td>Big data program to process data in real-time.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Stream</td>
        <td>Data flowing into your cluster in Storm.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Spout</td>
        <td>Data source in Storm.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Bolt</td>
        <td>Processed data in Storm.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Topology</td>
        <td>Graphs that allow you to assemble spouts and bolts into the most optimized pattern in Storm.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Supervisor</td>
        <td>Keeps track of Storm workers.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Nimbus Node</td>
        <td>Keeps track of Storm jobs on the workers.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Storm Core</td>
        <td>A Storm API that provides data at a less processed level.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Trident</td>
        <td>A Storm API that provides data at a more processed level.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Flink</td>
        <td>Big data program to process data in real time.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>Exactly-Once Processing</td>
        <td>Ensure you don't have accidental duplicates in your data.</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>DataStream API</td>
        <td>A Flink API that does event processing (truly real-time).</td>
    </tr>
    <tr>
        <td style="font-weight: bold;" nowrap>DataSet API</td>
        <td>A Flink API that processes batch data.</td>
    </tr>
</table>

<hr style="height:10px;border-width:0;color:gray;background-color:gray">

# Page 11 - Lesson 6 Hands-On<a class="anchor" id="DS107L6_page_11"></a>

[Back to Top](#DS107L6_toc)

<hr style="height:10px;border-width:0;color:gray;background-color:gray">


In this lesson, you've learned all about how to work with data in real time. Now it's time to make sense of all that knowledge with a Hands-On project! This Hands-­On **will** be graded, so make sure you complete each part.

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Do not submit your project until you have completed all requirements, as you will not be able to resubmit.</p>
    </div>
</div>

---

# Description

Please compare and contrast the various programs available for you to use in real-time.  You should **at least** be able to answer the following questions in your written response:

* What programs behave similarly?
* How do these programs differ from each other?
* If you could only pick one program, which one would you choose and why?

<div class="panel panel-danger">
    <div class="panel-heading">
        <h3 class="panel-title">Caution!</h3>
    </div>
    <div class="panel-body">
        <p>Be sure to zip and submit your entire directory when finished!</p>
    </div>
</div>