# Table of Contents
* [Assignment 3: Visualization](#Assignment-3:-Visualization)
	* [Objective](#Objective)
		* [Visualization](#Visualization)
	* [Part 1: Dynamic Visualization with D3](#Part-1:-Dynamic-Visualization-with-D3)
		* [Prerequisite: D3 Basics](#Prerequisite:-D3-Basics)
		* [Task B. Dynamic Visualization using "transition"](#Task-B.-Dynamic-Visualization-using-"transition")
		* [Task C. Dynamic Visualization using "selection.exit"](#Task-C.-Dynamic-Visualization-using-"selection.exit")
		* [Task D. Dynamic Visualization using "selection.on"](#Task-D.-Dynamic-Visualization-using-"selection.on")
		* [Where To Go From Here (Optional)](#Where-To-Go-From-Here-%28Optional%29)
    * [Part 2: Data Analysis](#Part-2:-Data-analysis-with-Matplotlib)
	* [Submission](#Submission)


# Assignment 3: Visualization

## Objective

The main focus of [our course](https://courses.cs.sfu.ca/2018sp-cmpt-733-g1) is on data analytics. In fact, however, there are many other exciting topics about Big Data, which we cannot cover due to time constraints. Lecture 3 gave you a brief overview of Visualization. Assignment 3 is designed to deepen your understanding. After completing this assignment, you should be able to answer the following questions:

### Visualization

1. Why Visualization? 
2. Why D3?
3. How to create a static visualization using D3?  
4. How to create a dynamic visualization using D3?  
5. How to create an interactive visualization using D3?
6. How to perform visual data analysis using Python?

## Part 1: Dynamic Visualization with D3

Data visualization (a.k.a dataviz) is an important skill for data scientists. It can not only help the data scientists to tell a more vivid [story](https://www.youtube.com/watch?feature=player_embedded&v=jbkSRLYSojo) about their findings, but also reveal [interesting patterns](http://www.qualia.hr/the-power-of-data-visualization-anscombes-story/) that cannot be found through typical summary statistics. 

There are a large number of [dataviz tools](http://www.computerworld.com/article/2506820/business-intelligence/business-intelligence-chart-and-image-gallery-30-free-tools-for-data-visualization-and-analysis.html?nsdr=true) available. In this assignment, we are going to learn Data-driven Documents (D3), one of the most popular ones. D3 is a JavaScript library for manipulating documents based on data. People may choose to use D3 for various reasons. What I like most about D3 is that it makes the creation of __dynamic__ data visualizations on the Web become much easier. In Part 2, you will learn three methods: __"transition", "selection.exit", and "selection.on"__ for turning a static visualization into a dynamic one. 

### Prerequisite: D3 Basics

The key idea behind D3 is to manipulate DOM elements in a webpage based on input data. Thus, before you create visualizations using D3, the first question is to ask what your data is; the second one is to ask what DOM elements you want to bind your data to; the third one is to ask how to update the elements to reflect the changes of data. 

Here is a pretty cool tutorial. Please take a look at the first 1 hour. 


In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo("8jvoTV54nXw")

I created a static visualization ([base.html](files/base.html)) using D3. Please download it and read the source code. There shouldn't be a problem to understand the code if you have watched the video. 


"base.html" will be your starting point. The goal of the following tasks is to make "base.html" become dynamic using three different methods. While these tasks look a little useless from a data analysis point of view, you can easily extend them to more realistic charts (e.g., replacing each character with a bar).

### Task B. Dynamic Visualization using "transition"

In Task B, you need to modify the "base.html" file to make the visualization behave as follows.

<img src="img/b.gif"/>

At the beginning, the text is on the left side and it is in black color. After waiting for 1 sec, it moves from left to right by 100px, where the duration time is 1 sec. Once it arrives at the right side, the color of each character is changed from black to its original color. Please note that the above gif figure will repeat the move, but you only need to move it ONCE.      

__Hints:__
* Please take a look at [Transitions](https://www.youtube.com/watch?v=EpeOzq8eDYk&list=PL6il2r9i3BqH9PmbOf5wA5E1wOG3FT22p&index=8)

__Submission:__
* Name your file as B.html, and submit it to the CourSys activity Assignment 3

### Task C. Dynamic Visualization using "selection.exit"

In Task C, you need to modify the "base.html" file to make the visualization behave as follows.

<img src="img/c-new.gif"/>

At the beginning, there is no text. Every 0.5 sec, there is a new character (including space) showing up from left to right. Once all the characters show up, each character will disappear one by one from right to left (every 0.5 sec). Please note that the above gif figure will repeat this process, but you only need to do it ONCE. That is, once all the characters disappear, there is no more change to the visualization. 

__Hints:__
* Please take a look at [Examples 65-67](https://youtu.be/8jvoTV54nXw?t=55m52s) in the above tutorial.

__Submission:__
* Name your file as C.html, and submit it to the CourSys activity Assignment 8

### Task D. Dynamic Visualization using "selection.on"

In Task D, you need to modify the "base.html" file to make the visualization behave as follows.

<img src="img/d.gif"/>

At the beginning, all characters are in black color. Once you move the mouse over each character, the color of the character changes from black to its original color. Once you move the mouse out the character, the color changes to black after waiting for 1 sec.


__Submission:__
* Name your file as D.html, and submit it to the CourSys activity Assignment 8

### Where To Go From Here (Optional)

Here are some good resources to continue your study on D3:

* Learning by books: [D3 Tutorial](http://alignedleft.com/tutorials/d3/), [Interactive Data Visualization for the Web](http://chimera.labs.oreilly.com/books/1230000000345/index.html)
* Learning by examples: [D3 Gallery](https://github.com/mbostock/d3/wiki/Gallery)
* D3 is a low-level visualization tool (think of it as MapReduce). There are many high-level tools built upon D3 (think of them as Spark, HIVE, etc.). Here are some good ones: [Vega](http://vega.github.io), [d3.chart](http://misoproject.com/d3-chart/), and [mpld3](http://mpld3.github.io/).

## Part 2: Data analysis with Matplotlib

Revisit [Assignment 5B from CMPT 732 - Weather prediction](https://coursys.sfu.ca/2017fa-cmpt-732-g2/pages/Assignment5B#h-predicting-weather-how-hard-can-it-be) and show a deeper analysis of the same temperature data utilizing the code you already have.

**Data**

The weather data on HDFS `/courses/732/tmax-{1,2,3}` spans a period from 2000 - 2016 and covers many stations around the globe. There are many possible questions to study. Use a python plotting library of your choice, such as matplotlib.

**Task**

**a)** Produce at least **two figures** that illustrate the **max. temperature distribution over the entire globe** and enable a comparison of **different non-overlapping time periods**. Only show temperatures where you have data available. Here is an example from the web:
<img src="http://c3headlines.typepad.com/.a/6a010536b58035970c013486e5c5e6970c-pi"/>

**b)** Produce two or more figures that show the result of your trained regression model from CMPT 732-A5b:

**(b1)** Evaluate your model at a grid of latitude, longitude positions around the globe leading to a dense plot of temperatures that includes oceans. This could, for instance, look something like the following and may have some gaps, if you decide not to interpolate between grid points:
<img src="http://www.physicalgeography.net/fundamentals/images/jan_temp.gif"/>
  
**(b2)** In a separate plot show the regression error of your model predictions against test data. In this case only use locations where data is given, i.e. you may reuse your plotting method from Part 2 (a).

**Comments and Hints**

Note that any imperfections of your trained model that show up are fine. You are not marked for your previous model again, but rather for the methods you create to investigate it. 

Do not worry about overlaying continent or country borders on your map, but make sure you use enough points such that the shape of some continents roughly emerges from the data distribution. You may train and visualize on any data subset with at least 100k rows.

For (b1) you will need elevation information for the points you produce. Have a look at [`elevation_grid.py`](elevation_grid.py) for a possible way to add this info to your choice of coordinates. If you place the accompanying [elevation data](elevations_latlon.npy.gz) in the same folder as the script you can import the module and see `help(evaluation_grid)` for example usage.
<img src="img/elevations.png"/>
`elevation_grid.py` internally stores elevation data as an array at 5 times the resolution of the figure shown here, use the `get_elevations` function to access it.

**Submission** 

Combine the plots into a PDF document `weather_report.pdf` along with **brief captions explaining the figures**.
Please provide your code as `weather_plot.py`. Your code may import [`weather_hint.py`](weather_hint.py) and [`weather_tools.py`](weather_tools.py) that were part of the old assigment CMPT 732-5B. However, ensure that all new code relevant for marking is in `weather_plot.py`. A Jupyter notebook `weather_report.ipynb` that contains the markdown to render and discuss the figures saved by `weather_plot.py` can also be submitted.

## Submission

In summary, you need to complete three tasks using D3 and one task using a Python data plotting library. Please submit <font color="blue">B.html</font>, <font color="blue">C.html</font>, <font color="blue">D.html</font>, <font color="blue">weather_plot.py</font>, and <font color="blue">weather_report.pdf</font> to the CourSys activity [Assignment 3](https://courses.cs.sfu.ca/2018sp-cmpt-733-g1/+a3/).