<h1>Table of Contents: ggplot &amp; Altair<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#ggplot" data-toc-modified-id="ggplot-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>ggplot</a></span><ul class="toc-item"><li><span><a href="#The-Grammar-of-Graphics" data-toc-modified-id="The-Grammar-of-Graphics-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>The Grammar of Graphics</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Sample-Code" data-toc-modified-id="Sample-Code-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Sample Code</a></span><ul class="toc-item"><li><span><a href="#Datasets-&amp;-Aes-(variables)" data-toc-modified-id="Datasets-&amp;-Aes-(variables)-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Datasets &amp; Aes (variables)</a></span></li><li><span><a href="#Geometries" data-toc-modified-id="Geometries-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Geometries</a></span></li><li><span><a href="#Adding-statistics" data-toc-modified-id="Adding-statistics-1.3.3"><span class="toc-item-num">1.3.3&nbsp;&nbsp;</span>Adding statistics</a></span></li><li><span><a href="#Coordinates" data-toc-modified-id="Coordinates-1.3.4"><span class="toc-item-num">1.3.4&nbsp;&nbsp;</span>Coordinates</a></span></li><li><span><a href="#Facets" data-toc-modified-id="Facets-1.3.5"><span class="toc-item-num">1.3.5&nbsp;&nbsp;</span>Facets</a></span></li><li><span><a href="#Themes" data-toc-modified-id="Themes-1.3.6"><span class="toc-item-num">1.3.6&nbsp;&nbsp;</span>Themes</a></span></li></ul></li><li><span><a href="#Saving-a-plot-to-a-file" data-toc-modified-id="Saving-a-plot-to-a-file-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Saving a plot to a file</a></span></li><li><span><a href="#ggplot-Exercise" data-toc-modified-id="ggplot-Exercise-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>ggplot Exercise</a></span></li></ul></li><li><span><a href="#Altair" data-toc-modified-id="Altair-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Altair</a></span><ul class="toc-item"><li><span><a href="#Altair-Exercise" data-toc-modified-id="Altair-Exercise-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Altair Exercise</a></span></li></ul></li></ul></div>

# ggplot

## The Grammar of Graphics

https://www.science-craft.com/2014/07/08/introducing-the-grammar-of-graphics-plotting-concept/

The original paper: https://vita.had.co.nz/papers/layered-grammar.pdf

<img src="https://drive.google.com/uc?id=19ZTLGWjlInppscQxq7XrtRGX2fQxEVm2" width=600 />

![grammar.png](images/grammar.png)

## Imports

1. **Data** the dataset to use when creating the plot.<p>

2. **Aesthetics** (aes) variables used by the underlying drawing system. Variables are mapped to the x- and y-axis aesthetic variables.<p>

3. **Geometries** objects (geoms) defines the type of geometric object to use in the drawing. You can use points, lines, bars, and many others.<p>
<hr>

4. **Facets** allow data to be divided into groups and each group is plotted on to a separate panel in the same graphic.<p>

5. **Statistics** transformations specify computations and aggregations to be applied to the data before plotting it.<p>

6. **Coordinates** systems map the position of objects to a 2D graphical location in the plot. <p>

7. **Themes** allows you to control visual properties like colors, fonts, and shapes (aka non-data ink).<p>

In [None]:
#import sys
import random

from plotnine import ggplot, geom_point, aes, geom_line, geom_bar 
from plotnine import stat_bin, theme, theme_538, theme_xkcd, geom_histogram # on 2 lines for clarity
import pandas as pd
import numpy as np

#import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

from plotnine.data import mpg, huron, economics, diamonds

## Sample Code

In [None]:
dpi = 72
size_inches = (11, 8)                                       # size in inches (for the plot)
size_px = int(size_inches[0]*dpi), int(size_inches[1]*dpi)  # For the canvas


n = 100
x = np.linspace(0, 2 * np.pi, n)
df = pd.DataFrame({
    'x': x,
    'y1': np.random.rand(n),
    'y2': np.sin(x),
    'y3': np.cos(x) * np.sin(x)
    })

        # change the dependent variable and color each time this method is called
y = random.choice(['y1', 'y2', 'y3'])
color = random.choice(['blue', 'red', 'green'])

        # specify the plot and get the figure object
ff = (ggplot(df, aes('x', y))
    + geom_point(color=color)
    + geom_line()
    + theme(figure_size=size_inches,dpi=dpi))
fig = ff.draw()

### Datasets & Aes (variables)
- diamonds
- economics
- mpg
- huron

In [None]:
diamonds

In [None]:
mpg

In [None]:
huron

### Geometries
- geom_point
- geom_bar
- geom_histogram
- geom_boxplot


In [None]:
(
    ggplot(df)  # The data to use
    + aes(x="x", y="y3")  # The variables to use
    + geom_line()  # The geometric objects to use
    )

In [None]:
# Data

ggplot(diamonds)

In [None]:
# Data & aesthetics

ggplot(diamonds) + aes(x="cut", y="price")

In [None]:
# Data, aesthetics & geometries

ggplot(mpg) + aes(x="class", y="hwy") + geom_point()

In [None]:
ggplot(mpg) + aes(x="class") + geom_bar()

### Adding statistics

In [None]:
ggplot(diamonds) + aes(x="price") + stat_bin(bins=20) + geom_bar()

In [None]:
ggplot(diamonds) + aes(x="price") + geom_histogram(bins=20)

In [None]:
# This import is here to show that additional geoms, stats, etc. can be imported when needed.

from plotnine import ggplot, aes, geom_boxplot

(
  ggplot(huron)
  + aes(x="factor(decade)", y="level")
  + geom_boxplot()
)

In [None]:
from plotnine import ggplot, aes, scale_x_timedelta, labs, geom_line

(
    ggplot(economics)
    + aes(x="date", y="pop")
    + scale_x_timedelta(name="Years since 1970")
    + labs(title="Population Evolution", y="Population")
    + geom_line()
)


### Coordinates

In [None]:
# Default coordinates are used.

ggplot(diamonds) + aes(x="color") + geom_bar()

In [None]:
from plotnine import ggplot, aes, geom_bar, coord_flip

ggplot(diamonds)  + aes(x="color") + geom_bar() + coord_flip()

### Facets

In [None]:

from plotnine import facet_grid, labs

(
    ggplot(mpg)
    + facet_grid(facets="year~class")
    + aes(x="displ", y="hwy")
    + labs(
        x="Engine Size",
        y="Miles per Gallon",
        title="Miles per Gallon for Each Year and Vehicle Class",
    )
    + geom_point()

)

### Themes

In [None]:
(
    ggplot(mpg)
    + facet_grid(facets="year~class")
    + aes(x="displ", y="hwy")
    + labs(
        x="Engine Size",
        y="Miles per Gallon",
        title="Miles per Gallon for Each Year and Vehicle Class",
    )
    + geom_point()
    + theme_538()
)

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class")
    + labs(
        x="Engine Cylinders",
        y="Miles per Gallon",
        color="Vehicle Class",
        title="Miles per Gallon for Engine Cylinders and Vehicle Classes",
    )
    + geom_point()
)

## Saving a plot to a file

In [None]:
myPlot = ggplot(economics) + aes(x="date", y="pop") + geom_line()
myPlot.save("myplot.png", dpi=600)

## ggplot Exercise

1. Use the built-in dataset 'Midwest.
2. Create a simple bar chart showing the number of rows by state.
3. Create a visualization showing a plot for each state (along the y-axis).  On that plot show a jitter (a version of    a scatter plot) plot of the number of adults by percent professional for each country

In [None]:
# Put bar chart here

In [None]:
# Put state plots here

# Altair

Altair is a Python library designed for statistical visualizations.  It is considered a declarative API rather than the more common imperative API.  The claim from Altair is that it allows developers to 'declare' what they want to do vs the imperative API in which is focused on how to do it.<br>
<br>
Altair is constructed around the use of pandas dataframes.  

https://altair-viz.github.io/

In [1]:
import altair as alt
from plotnine.data import midwest

In [2]:
midwest.head()

Unnamed: 0,PID,county,state,area,poptotal,popdensity,popwhite,popblack,popamerindian,popasian,...,percollege,percprof,poppovertyknown,percpovertyknown,percbelowpoverty,percchildbelowpovert,percadultpoverty,percelderlypoverty,inmetro,category
0,561,ADAMS,IL,0.052,66090,1270.96154,63917,1702,98,249,...,19.631392,4.355859,63628,96.274777,13.151443,18.011717,11.009776,12.443812,0,AAR
1,562,ALEXANDER,IL,0.014,10626,759.0,7054,3496,19,48,...,11.243308,2.870315,10529,99.087145,32.244278,45.826514,27.385647,25.228976,0,LHR
2,563,BOND,IL,0.022,14991,681.409091,14477,429,35,16,...,17.033819,4.488572,14235,94.956974,12.068844,14.036061,10.85209,12.69741,0,AAR
3,564,BOONE,IL,0.017,30806,1812.11765,29344,127,46,150,...,17.278954,4.1978,30337,98.477569,7.209019,11.179536,5.536013,6.217047,1,ALU
4,565,BROWN,IL,0.018,5836,324.222222,5264,547,14,5,...,14.475999,3.36768,4815,82.50514,13.520249,13.022889,11.143211,19.2,0,AAR


In [3]:
alt.Chart(midwest).mark_bar().encode(
    alt.X('state'),
    y='count()'
)

In [4]:
IL = midwest[midwest['state'] == 'IL']

In [None]:
alt.Chart(IL).mark_point().encode(
    alt.X('percollege'), # percent college
    alt.Y('percprof')  # percent professional
)

In [None]:
alt.Chart(IL).mark_point(filled=False).encode(
    alt.X('percollege'), # percent college
    alt.Y('percprof'),  # percent professional
    alt.Size('poptotal')
)

In [None]:
alt.Chart(IL).mark_point(filled=True).encode(
    alt.X('percollege'), # percent college
    alt.Y('percprof'),  # percent professional
    alt.Size('poptotal'),
    alt.Color('popdensity'),
    alt.OpacityValue(0.7)
)

In [5]:
alt.Chart(IL).mark_point(filled=True).encode(
    alt.X('percollege'), # percent college
    alt.Y('percprof'),  # percent professional
    alt.Size('poptotal'),
    alt.Color('popdensity'),
    alt.OpacityValue(0.7),
        tooltip = [alt.Tooltip('county'),
               alt.Tooltip('percwhite'),
               alt.Tooltip('percblack'),
               alt.Tooltip('percother')
              ]
)

In [6]:
alt.Chart(IL).mark_point(filled=True).encode(
    alt.X('percollege'), # percent college
    alt.Y('percprof'),  # percent professional
    alt.Size('poptotal'),
    alt.Color('popdensity'),
    alt.OpacityValue(0.7),
        tooltip = [alt.Tooltip('county'),
               alt.Tooltip('percwhite'),
               alt.Tooltip('percblack'),
               alt.Tooltip('percother')
              ]
).interactive()



## Altair Exercise

- Modify the plot shown above.
- Use the full midwest dataset
- Instead of just circles, change the mark so that each state is represented with a different shape
- Add the state abbreviation to the tooltip
- Try to change the size of the plot

In [None]:
# Altair exercise here.