# Data Viz Intro

Welcome to our presentation! We will be learning about some theory and principles of data visualization that will enable you to better leverage visualization libraries available in Python and R

## Data Visualization in Python

Visualization tools like Voyager and Raw Graphs are great for "Thinking Visually" and prototyping. When visualizing data it often is important to work with a programming language because data often needs steps to be taken before visualizing that are easy to apply in a programmatic fashion. The visualization capabilities of a language are also much greater than an individual web tool can provide. Let's get started using python for plotting some of our data.

In this example, we will be plotting temperature data for the first day of summer in Tucson, Arizona.

We start by importing additional packages necessary for this visualization.

In [None]:
import pandas
import seaborn

print("Packages loaded!")

Next, we load data and look at the first few rows.

In [None]:
tucson = pandas.read_csv("data/tucson-summer.csv")
tucson.head()

Our first plot will be plotting the daily minimum temperature of the summer solstice over time. Using the seaborn method `regplot`, it will automatically draw the line from linear regression.

In [None]:
seaborn.regplot(x = "year", y = "tmin", data = tucson)

We should update our axis labels, using the `set` command. Note here we also assigned the plot to a variable (`tucson_plot`), so we could manipulate it.

In [None]:
tucson_plot = seaborn.regplot(x = "year", y = "tmin", data = tucson)
tucson_plot.set(ylabel = "Miniumum temperature (F)")

Next, we will modify our code to change what we plot on the y-axis. In this case we want to plot the maximum temperature (`tmax`) on the y-axis. Update the code below to change the values we are plotting on the y-axis.

In [None]:
tucson_plot = seaborn.regplot(x = "year", y = "tmin", data = tucson)
tucson_plot.set(ylabel = "Miniumum temperature (F)")

By default, it will add a linear regression line and confidence intervals. This may not be a linear relationship - try a polynomial relationship by adding `order = 2` to the `regplot` method (immediately following the data specification).

In [None]:
# Paste your code from above here, and update

To finish off this plot, we want to write the plot to a png file.

In [None]:
tucson_plot = # Copy and paste the plot code from the code block above

# Leave this line as is
tucson_plot.get_figure().savefig("output/tucson-plot.png")

After updating and running this last block of code, you can click the Jupyter logo in the top-left part of this notebook to see files that you can download (including the one we just saved). You can open a new tab with this view by right-clicking or control-clicking the icon.

## What How Why
### Important general principles in visualization

The following section of the presentation is based on slides from [Dr. Joshua Levine](https://jalevine.bitbucket.io/) (University of Arizona), and ideas from [Dr. Tamara Munzner](https://www.cs.ubc.ca/~tmm/) (University of British Columbia).

### Practice Identifying Data Attribute types

In the cells below please load a dataframe of our arizona cities temperatures. Then form guesses about which columns are 

* Categorical
* Ordinal
* Quantitative

Identifying the attribute type is often the starting point for the visualization process.

If you don't see a text editor space below make sure to execute the cell


In [3]:
%%html
<iframe src="https://etherpad.wikimedia.org/p/dsf-dvr" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

In [None]:
import pandas as pd
csv_path = "data/arizona-heat.csv"
df = pd.read_csv(csv_path)
df

Write your guesses as comma separated strings in each list below. For example `"id"` is being identified as a categorical attribute type column. 

In [None]:
categorical_columns = ["id"]
ordinal_columns = []
quantitative_columns = []

print("categorical data")
print(df[categorical_columns])

print("ordinal data")
print(df[ordinal_columns])

print("quantitative data")
print(df[quantitative_columns])


### Conceptual Models & Data Models

In the space below this exercise here fill out your guesses about the data model and the conceptual model in play for slide 23
 
If you don't see a text editor space below make sure to execute the cell, then scroll down to the Conceptual Models & Data Models exercise.


In [4]:
%%html
<iframe src="https://etherpad.wikimedia.org/p/dsf-dvr" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

## How: Visual Encoding

Here we will discuss in general the types of options available to pair your data and it's attribute types with visual primitives common to all plottling libraries.

For each plot (1-4) on slide 29, list out what marks and channels you believe are in use

If you don't see a text editor space below make sure to execute the cell, then scroll down to the Visual Encoding exercise.

* 1
    * Marks
    * Channels
* 2
    * Marks
    * Channels
* 3
    * Marks
    * Channels
* 4
    * Marks
    * Channels
  

In [5]:
%%html
<iframe src="https://etherpad.wikimedia.org/p/dsf-dvr" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

## Why: Task Abstraction and Action Target Pairs

Now that we have a understanding of general categories of data, and how to connect them with visual primitives lets talk about how to frame the objectives that a visualization is meant to support. These objectives are often referred to as **"Tasks"** that a viewer is meant to accomplish via the visualization.

Dr. Tamara Munzner put forward an abstraction of Tasks, that they are made up of two parts: an Action and Target. Together these form a Pair, and giving these some thought is often helpful in organizing our thinking when developing a visualization. 

See if you can identify some of the tasks that this visualization above (slide 36) was made to accomplish. Write them out using the categories of Action Target pairs below. One example is provided below to get you started. Also if you're feeling constrained, just write the task normally, and then see if there are actions and targets that come to you after. 

* Task 1:
    * A viewer might be interested in identifying the high and low temperatures for a day in Tucson
    * Action: Explore
    * Target: Outliers

In [6]:
%%html
<iframe src="https://etherpad.wikimedia.org/p/dsf-dvr" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

## Getting more hands on with the data 

At this point in the presentation we would like you to now go through the visualization design process using either of the csv's provided. 

Try to form a statement about both the **what** and the **why** (either one first).

Then either brainstorm with hand drawings, voyager or raw graphs. Once you have completed this attempt to break the visualization down into marks and channels that are at play.