# Graphical Excellence
In this notebook we will look at some python code that illustrates some key concepts from chapter 1 of VDQI by Tufte


In [None]:
#We will be making use of the following libraries..
import pandas as pd # similar to "spreadsheets" 
import numpy as np # operations on lists of numbers
import matplotlib.pyplot as plt # render charts in the notebook

#### First we will need to import some data
Below we are loading files into pandas 'data frame' objects which are like virtual spread sheets

In [None]:
# There are several small datasets in the /data folder
# they need to be named so we can refer to them later on
# we will call them dataset_a, dataset_b, etc...

dataset_a = pd.read_csv('data/mysterydata.txt', sep="\t", header=0)
dataset_b = pd.read_csv('data/causation-data.csv', sep=",", header=0)
# ^variable ^member funtion of pd object.     ^argument 1       ^arg 2    ^arg 3

# here we declare variables (dataset_*) and
# use a pandas library function that reads a file in
# then creates a virtual spread sheet we can inspect and even manipulate
# the sep="\t" tells pandas that the columns are seperated by tabs
# header=None lets pandas know that our columns aren't labelled
# the function returns a "dataframe" and stores it in dataset_*

#### Now our data is loaded into memory (RAM) !
Let's print some of it out to make sure...

In [None]:
# To see what's "inside" a variable you can just type it's name
dataset_b

In [None]:
# There are however more informative items available

# the "shape" of the data is it's rows and columns
dataset_a.shape

In [None]:
# we can also look at some more technical information

dataset_a.info()

## Why Graphics?
- What *exactly* happens when making graphics with numbers

In [None]:
dataset_a

In [None]:
plt.plot(dataset_a.index, dataset_a['A'])
plt.xlabel("Time")
plt.ylabel("Mystery")
plt.title("One advantage of Data Viz")
plt.show

## Correlation is not Causation

In [None]:
plt.plot(dataset_b.index, dataset_b['quaternion-2'])
#plt.plot(dataset_b.index, dataset_b['quaternion-1'])
plt.xlabel("Time")
plt.ylabel("Mystery")
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
plt.xticks(range(0,2800,250), months)
plt.title("Correlation is not Causation")
plt.show