# CDS-301/501 - Scientific Information and Data Visualization

<img src="./images/intro-0.png" alt="drawing" align="left" style="width: 800px;"/>

**From the course catalog:** The techniques and software used to visualize scientific simulations, complex information, and data visualization for knowledge discovery. Includes examples and exercises to help students develop their
understanding of the role visualization plays in computational science
and provides a foundation for applications in their careers. 


In this course our goal will be to allow our viewers to "Look at Data".  To do this effectively we have to (paraphasing Edward Tufte and others...):

- Show the data
- Keep the focus on the data rather than the context, the technology, the production, etc...
- Avoid distorting the data in our visualizations
- Present the (perhaps large set of) data in as compact and concise manner
- Invite comparison between different groups of data
- Understand our own purpose - are we describing the data, exploring the data, trying to convey insight to others, etc...

**"Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency"**
    -- Edward R. Tufte from *The Visual Display of Quantitative Information (1983)*

# Objectives for today

- Review syllabus
- Introduce term project
- Discuss software tools / environment needed for the course
- Understand why visualization is important
- Understand Tableau basics and exercise it on a few examples
- Understand moving from data to visualization
- Understand the basics of visual perception and color vision deficiencies (CVD)
- Understand where visualization falls within the Data Science Lifecycle
- Review a basic visualization checklist
- Use Tableau to build a choropleth 
- Discuss reading assignment and homework

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Section - Why is Visualization Important?

# <img src="./images/why-viz-0.png" alt="Drawing" align="left" style="width: 800px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/why-viz-1.png" alt="drawing" align="left" style="width: 800px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/why-viz-2.png" alt="Drawing" align="left" style="width: 800px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/why-viz-3.png" alt="Drawing" align="left" style="width: 800px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/why-viz-4.png" alt="Drawing" align="left" style="width: 800px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Famous (Constructed) Example - Anscombe's Quartet (1973)

Anscombe's Quartet consists of 4 seperate, two variable (x and y here), datasets that have the same mean and standard deviation

<img src="./images/anscombes-0.png" alt="drawing" align="left" style="width: 400px;"/>

With nearly identical descriptive statistics

<img src="./images/anscombes-2.png" alt="drawing" align="left" style="width: 800px;"/>

But when visualized:

<img src="./images/anscombes-1.png" alt="drawing" align="left" style="width: 600px;"/>


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Visualizing Voter Fraud

<img src="./images/voter-fraud-0.jpg" alt="Drawing" align="left" style="width: 400px;"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Measles Vaccine

<img src="./images/measles-0.png" alt="drawing" align="left" style="width: 1000px;"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Cholera - London 1854 (Dr. John Snow) 

<img src="./images/snow-0.png" alt="drawing" align="left" style="width:800px;"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Napolean's March on Moscow 1812 (Minard)

<img src="./images/minard-0.png" alt="Drawing" align="left" style="width: 1000px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Small Multiples 1878 (Eadweard Muybridge)

Small multiples repeat the chart using the same axes for both x and y.  This allows them to be easily compared.  This is also an example of using a photographic data.

<img src="./images/small-multiples-0.jpg" alt="Drawing" align="left" style="width: 1000px"/>

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Summary - Why Visualize?

- It comes naturally to us and is fast, relying on the visual cortex to quickly integrate/assess information 
- Allows us to "see" patterns, relationships, trends and outliers in data
- Overcomes limitations of our working memory by mapping large amounts of data onto visual channels (aesthetics)
- Allows us to effectively communicate insights to others  

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Section - Introduction to Tableau

## Structured Data

<img src="./images/structured-data-0.png" alt="drawing" align="left" style="width:700px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Quantitative vs. Qualitative Scales

<img src="./images/var-to-scale-map-0.png" alt="drawing" align="left" style="width:800px" />

**Note:** When you connect to a data source in Tableau, Tableau will automatically classify each attribute in the data set as either a ***Dimension*** or as a ***Measure***.  Dimensions are Qualitative data while Measures are Quantitative data. Dates are treated as Dimensions by default, but can also be treated as a Measure.  Text is typically only treated as a Dimension but often is practically used only to label Marks in a graph.

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

## Introduction to Tableau - A simple example...

We will visualize Anscombe's Quartet using Tableau as a simple introduction

### High Level Steps To Follow:
- Install and bring up Tableau (See notes below on installing Tableau)
- Connect to the anscombe.csv dataset
- Go to the first "Sheet" (sheets are where you build visualizations) 
- Drag "X" measure to columns shelf
- Drag "Y" measure to rows shelf
- Note: Tableau, by default, aggreggates measures
- Deselect "Aggregate Measures" under the "Analysis" menu
- Change the mark type to a Circle in the drop down at the top of the Marks card
- Drag the Set dimension to the Color shelf on the Marks card
- Drag the Set dimension to the Filter shelf
- Select to Show the Filter (drop down on the attribute sitting in the Filter shelf)
- Fit a curve to the dataset by selecting the Analytics table and dragging a linear model to the sheet

### Tableau Desktop - Installation Notes

Tableau Desktop is a free version of Tableau.  It is available for Windows and Mac computers. I recommend using the following link to install Tableau if you do not already have a license: https://www.tableau.com/academic/students or if you simply type "Tableau for students" in your browser it should take you to this link.  

Click on the button labeled "Get Tableau for Free" enter the required information.  A license is sent to the e-mail account specified, and you should enter this license into Tableau the first time you start the app.

Note that you do not need to establish an account with Tableau (i.e. an account via the www.tableau.com website to get the desktop version of Tableau).  However, doing so will give you access to free training, videos, etc...so it's recommended that you do create an account.

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Section - Moving from Data to Visualization 

# Visual Perception and Data Visualization 

Our visual system is complex and tries to *actively* construct a representation of what it is looking at.  We will look at the structure of our visual system in later lectures in more detail, for now, what we want to look are 3 important aspects our our visual perception that can have both positive and negative effects on the act of seeing data:

1. Low Level Visual Effects - Edges, contrast and color and optical illusions
2. Preattentive Attributes - what "pops out" at you in your visual search
3. Gestalt Rules - how our brains constantly try to "find" structure in what we are looking at

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Low Level Visual Effects 

Hermann Grid Effect (1870) - note "ghosts" at intersections unless you focus on the intersection

<img src="./images/grid-effect-0.png" alt="Drawing" align="left" style="width: 300px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Mach Bands

<img src="./images/mach-bands-0.png" alt="drawing" align="left" style="width: 400px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Adelson Checkerboard Illusion - Are A & B the same grayscale brightness?

<img src="./images/checkerboard-0.png" alt="drawing" align="left" style="width: 400px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

To see it...block out everything that surounds A & B (I used Visio...I added some transparancy so you can see what I did)

<img src="./images/checkerboard-1.png" alt="Drawing" align="left" style="width: 400px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

We see edge contrast better in moncchrome than in color:

<img src="./images/ware-contrast-0.png" alt="Drawing" align="left" style="width: 200px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Preattentive Attributes

- Preattentive Attributes are visual elements or properties that we notice without conscious effort.  
- Rapidly processed by our brains (on the order of a couple hundred milleseconds)
- Form important building blocks for visualizations

<img src="./images/preattentive-0.png" alt="Drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Gestalt Rules 

- Our brain is constantly trying to find structure in what we are looking at
- We make strong and subconcious inferences regarding the visual components in a scene
- These inferences, when relatively sparse visual information available, are called ***Gestalt Rules***
- These are not purely perceptual effects - we infer relationships between objects that go beyond what is strictly visible
- Gestalt rules are applicable to related fields such as User Interface (UI) design User Experience (UX) and visual design in general.


Consider the figures below - which has more structure?

<img src="./images/gestalt-structure-0.png" alt="Drawing" align="left" style="width: 600px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Gestalt Rules 

1. Proximity - Things that are spatially near to one another seem to be related
2. Similarity - Things that look alike seem to be related
3. Connection - Things that are visually tied to one another seem to be related
4. Continuity - Partially hidden objects are completed into familiar shapes
5. Closure - Incomplete shapes are perceived as complete
6. Figure and Ground - Visual elements are taken to be either in the foreground or in the background
7. Common Fate - Elements sharing a direction of movement are perceived as a unit

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Gestalt Rules - Some Examples... 

Proximity - Things that are spatially near to one another seem to be related:

<img src="./images/gestalt-proximity-0.png" alt="drawing" align="left" style="width: 400px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Similarity - Things that look alike seem to be related:

<img src="./images/gestalt-similarity-0.png" alt="drawing" align="left" style="width: 300px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Connection - Things that are visually tied to one another seem to be related:

<img src="./images/gestalt-connectivity-0.png" alt="drawing" align="left" style="width: 300px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Continuity - Partially hidden objects are completed into familiar shapes:

<img src="./images/gestalt-continuity-0.png" alt="Drawing" align="left" style="width: 300px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Closure - Incomplete shapes are perceived as complete:

<img src="./images/gestalt-closure-2.png" alt="drawing" align="left" style="width: 300px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Figure and Ground - Visual elements are taken to be either in the foreground or in the background:

<img src="./images/gestalt-fg-0.png" alt="drawing" align="left" style="width: 300px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Common Fate - Elements sharing a direction of movement are perceived as a unit:

<img src="./images/gestalt-common-fate-0.png" alt="drawing" align="left" style="width: 300px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Mapping Data onto Aesthetics

# Common Aesthetics

<img src="./images/aesthetics-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Coordinate Systems

# Cartesian Coordinate System

<img src="./images/cartesian-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

Generally it is ok to stretch an axis if the attributes on each axis are using different scales

<img src="./images/stretch-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />


Here is an example where stretching an axis is probably not appropriate:

<img src="./images/stretch-1.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/linear-log-scales-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

## In class Demo: Virginia County Population (Example of Log Scales in Tableau) 
- Open Tableau and connect to va-pop-2018-extract-0.xlsx (Excel)
- Note: Median value for VA counties is 26575
- Follow in class demo...(Wilke shows a similar approach for Texas counties - see textbook)
- Build map of VA counties and color by population
- Build a line chart of VA counties - define a calculated field using median value - display on a log y axis

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/polar-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/polar-ex-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color Scales

## Uses of Color
- Color to Distinquish
- Color to Represent Data Values
    - Sequential Scales
    - Diverging Scales
- Color to Highlight
- Color Deficiencies

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color to Distinquish

<img src="./images/qualitative-scales-0.png" alt="drawing" align="left" style="width: 600px" />  

# Color to Distinquish (Tableau)

<img src="./images/tableau-qualitative-color-picker-0.png" alt="drawing" align="left" style="width: 400px" />


<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color to Represent Data - Sequential Scales

<img src="./images/sequential-scales-0.png" alt="drawing" align="left" style="width: 600px" />

# Color to Represent Data - Sequential Scales (Tableau)

<img src="./images/tableau-sequential-color-picker-0.png" alt="drawing" align="left" style="width: 300px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color to Represent Data - Diverging Scales


<img src="./images/diverging-scales-0.png" alt="drawing" align="left" style="width: 600px" />

# Color to Represent Data - Diverging Scales


<img src="./images/tableau-divergen-color-picker-0.png" alt="drawing" align="left" style="width: 300px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color to Highlight

<img src="./images/accent-scales-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# How many variables are mapped in this visualization?

<img src="./images/scales-ex-0.png" alt="drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

## Student Exercise 1

Using the mtcars.csv dataset, try to replicate the visualization above using Tableau

Note: For this exercise, you can leave the weight as a float variable - no need to convert to an integer as is done above

- Hints:
    - Start by selecting the Marks type as "Shape" from the Marks card
    - As this is basically a scatterplot, so turn off aggregate measures
    - Then drag the appropriate variable to the columns and rows shelves
    - Then drag the other variables to the correct shelves on the Marks card (color, size, symbol)
    - Double click on each axis to modify the origin and tick marks to match the diagram
    - Then, do further refinements (Add title, change color scheme, etc...)

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color Vision Deficiencies (CVD)

- Approximately 8% of males and 0.5% of females suffer from some form of color-vision deficiency.  Percentages can go up in certain regions (e.g. Scandanavia 10-11% of men).

- Note: Suffix "-anomaly" refers to some impairment, while "-anopia" means complete absence

- Definitions:
    - deuteranomaly (most common form of CVD) - means green weak
    - deuteranopia - means green blind
    - protanomaly - means red weak
    - protanopia - means red blind
    - tritanomaly (very rare) - means blue weak
    - tritanopia (very rare) - means blue blind
    - Achromatopsia/Monochromacy (very rare) - complete color blindness 
    - Trichromacy - normal color vision   


- There are also acquired forms of CVD due to aging

- The Eye contains two types of photoreceptors - Rods and Cones
- Rods are responsible for reception at low light levels and have low spatial resolution
- Cones are responsible for color vision and have high spatial resolution 

- There are 3 types of cones, and each has a frequency response curve:
<img src="./images/cones-response-0.png" alt="Drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Color Scales for CVD

Sequential scales (if properly designed) generally do not cause a problem w.r.t CVD 

<img src="./images/cvd-0.png" alt="drawing" align="left" style="width: 500px" />

Divergent and (especially) qualitative scales are more difficult.  Note here how the divergent scale complete disappears for deuteranomaly CVD:


<img src="./images/cvd-1.png" alt="drawing" align="left" style="width: 500px" />

And in this case, a blue green contrast divergent scale becomes indistinquishable under tritanomaly:


<img src="./images/cvd-2.png" alt="drawing" align="left" style="width: 500px" />

An example of a divergent scale that works for most CVD:

<img src="./images/cvd-3.png" alt="drawing" align="left" style="width: 500px" />

Qualitative scales are the most challenging since there all colors need to be distinquished from one another.  A qualitative scale developed for CVD:

<img src="./images/cvd-5.png" alt="drawing" align="left" style="width:500px" />

But because color perception is also dependent on the size of the visual elements shown, its best to run your visualization through a CVD simulation

<img src="./images/cvd-4.png" alt="drawing" align="left" style="width:500px" />

An online drag and drop simulator: [Coblis Color Blindness Simulator](https://www.color-blindness.com/coblis-color-blindness-simulator/) 

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Section - Data Science and Visualization

# <img src="./images/ds-0.png" alt="drawing" align="left" style="width:500px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/ds-1.png" alt="drawing" align="left" style="width:800px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/ds-2.png" alt="drawing" align="left" style="width:800px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Section -  Visualization Checklist (The Basics)

- Is the title present?

- Can the subject and insights be rapidly understood? (5 Second Test) (Explanatory Visualizations)
    - Note that you should ask others to review your visualizations regarding this point
   
   
- Is the data being visualized relevant to the intended subject?

- Are the axes correct and clearly labeled (if they are not implied)?
    - Examples where axes are often strongly implied are geographic and time series contexts  
   
   
- Are the fonts used consistent and of appropriate size to be readable?

    - Be sure to evaluate this in the context of the document you are producing - not the graphical tool used  
  
     
- Is there a good balance of context and data?

- Has the use of gradient or highly saturated backgrounds been avoided?

- Are the color scales used appropriate and has it been reviewed via CVD simulation?

- Has the principle of proportional ink been met?

- Is there a high data-to-ink ratio?

- Has the use of gratuitous 3D been avoided?

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Critique 1

<img src="./images/critique-0.png" alt="Drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Critique 2

<img src="./images/critique-1.png" alt="Drawing" align="left" style="width: 800px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Critique 3

<img src="./images/critique-2.jpg" alt="Drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Critique 4

<img src="./images/critique-3.png" alt="Drawing" align="left" style="width: 600px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# A Tableau Geospatial Example (Per Country Metric) (Choropleth)

<img src="./images/latency-0.png" alt="Drawing" align="left" style="width:1000px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# References

Title: Fundamentals of Data Visualization  
Author: Claus O. Wilke   
Publisher: O'Reilly Media Inc.  
Edition: First Edition  
ISBN: 9781492031086  
<img src="./images/wilke-0.png" alt="Drawing" align="left" style="width:200px" />    

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />

# Reading Assignment 



- Read Wilke - Chp 2,3 and 4 (E-book available via the Library)
- And: The Good, the Bad, and the Biased (In reading folder)

<img src="./images/szafir-0.png" alt="drawing" align="left" style="width:200px" />

<img src="./images/div-0.png" alt="Drawing" align="left" style="width: 800px" />