# Welcome to Data Visualizations with Jupyter Notebooks!

This is a Jupyter Notebook - an interactive environment designed for data exploration and visualization. Each gray textbox you see is editable - double-click this textbox to change what this text says. To "finalize" the text, press Ctrl-Enter.

In addition to textboxes, a Notebook also supports "codeboxes". Here is one example - you can double click to edit what gets printed, then press Ctrl-Enter to run the code in the codebox. The first line (the one that begins with `#`) is a comment: it serves only to tell the reader what the code does, but is ignored by the Notebook.

In [None]:
# print hello
print("Hello, World!")

The last feature we will use is the "Run All" menu item at the top of the Notebook, under the Cell menu. (Jupyter Notebooks calls the textboxes and codeboxes "cells".) It's an easy way to run all the code one after the other, without having to Ctrl-Enter through them manually. You can also change a textbox to a codebox and vice versa through the "Cell Type" menu item, but that won't be necessary for this walkthrough.

Jupyter Notebooks have a lot more features, but that's enough to get us started with data visualization. We will need to use some code written by other people, and the following code does that. You don't need to worry about understanding it. 

In [None]:
from io import StringIO

import pandas as pd
from bokeh.plotting import output_notebook, Figure, ColumnDataSource, show


def dataframe_from_text(text):
    return pd.read_csv(StringIO(text), sep='\t')


def text_data_source(text):
    return ColumnDataSource(dataframe_from_text(text))


output_notebook()

If you see something like "BokehJS successfully loaded", that means the code ran properly, and we can continue with our data visualization. The next thing we will need is the data we are going to visualize. To make this easy, we are just going to paste the data from Excel directly into a codebox. Here we are creating a `WAGE_DATA` variable, which contains the wages from 1565 to 1813. The name of the variable is important, as we will use it to tell our plots where to get the data from.

In [None]:
WAGE_DATA = '''
year	wages
1565	4.2
1566	4.2
1567	4.2
1568	4.2
1569	4.2
1570	4.3
1571	4.3
1572	4.3
1573	4.3
1574	4.3
1575	4.3
1576	4.36
1577	4.42
1578	4.48
1579	4.54
1580	4.6
1581	4.6
1582	4.6
1583	4.6
1584	4.6
1585	4.6
1586	4.68
1587	4.76
1588	4.84
1589	4.92
1590	5
1591	5
1592	5
1593	5
1594	5
1595	5
1596	5
1597	5
1598	5
1599	5
1600	5
1601	5.04
1602	5.08
1603	5.12
1604	5.16
1605	5.2
1606	5.18
1607	5.16
1608	5.14
1609	5.12
1610	5.1
1611	5.16
1612	5.22
1613	5.28
1614	5.34
1615	5.4
1616	5.42
1617	5.44
1618	5.46
1619	5.48
1620	5.5
1621	5.48
1622	5.46
1623	5.44
1624	5.42
1625	5.4
1626	5.44
1627	5.48
1628	5.52
1629	5.56
1630	5.6
1631	5.6
1632	5.6
1633	5.6
1634	5.6
1635	5.6
1636	5.6
1637	5.6
1638	5.6
1639	5.6
1640	5.6
1641	5.6
1642	5.6
1643	5.6
1644	5.6
1645	5.6
1646	5.72
1647	5.84
1648	5.96
1649	6.08
1650	6.2
1651	6.26
1652	6.32
1653	6.38
1654	6.44
1655	6.5
1656	6.46
1657	6.42
1658	6.38
1659	6.34
1660	6.3
1661	6.38
1662	6.46
1663	6.54
1664	6.62
1665	6.7
1666	6.7
1667	6.7
1668	6.7
1669	6.7
1670	6.7
1671	6.76
1672	6.82
1673	6.88
1674	6.94
1675	7
1676	7.02
1677	7.04
1678	7.06
1679	7.08
1680	7.1
1681	7.22
1682	7.34
1683	7.46
1684	7.58
1685	7.7
1686	7.7
1687	7.7
1688	7.7
1689	7.7
1690	7.7
1691	7.82
1692	7.94
1693	8.06
1694	8.18
1695	8.3
1696	8.42
1697	8.54
1698	8.66
1699	8.78
1700	8.9
1701	9.06
1702	9.22
1703	9.38
1704	9.54
1705	9.7
1706	9.86
1707	10.02
1708	10.18
1709	10.34
1710	10.5
1711	10.7
1712	10.9
1713	11.1
1714	11.3
1715	11.5
1716	11.58
1717	11.66
1718	11.74
1719	11.82
1720	11.9
1721	12.02
1722	12.14
1723	12.26
1724	12.38
1725	12.5
1726	12.54
1727	12.58
1728	12.62
1729	12.66
1730	12.7
1731	12.72
1732	12.74
1733	12.76
1734	12.78
1735	12.8
1736	12.88
1737	12.96
1738	13.04
1739	13.12
1740	13.2
1741	13.34
1742	13.48
1743	13.62
1744	13.76
1745	13.9
1746	14
1747	14.1
1748	14.2
1749	14.3
1750	14.4
1751	14.54
1752	14.68
1753	14.82
1754	14.96
1755	15.1
1756	15.2
1757	15.3
1758	15.4
1759	15.5
1760	15.6
1761	15.7
1762	15.8
1763	15.9
1764	16
1765	16.1
1766	16.36
1767	16.62
1768	16.88
1769	17.14
1770	17.4
1771	17.7
1772	18
1773	18.3
1774	18.6
1775	18.9
1776	19.24
1777	19.58
1778	19.92
1779	20.26
1780	20.6
1781	20.9
1782	21.2
1783	21.5
1784	21.8
1785	22.1
1786	22.56
1787	23.02
1788	23.48
1789	23.94
1790	24.4
1791	24.9
1792	25.4
1793	25.9
1794	26.4
1795	26.9
1796	27.18
1797	27.46
1798	27.74
1799	28.02
1800	28.3
1801	28.46
1802	28.62
1803	28.78
1804	28.94
1805	29.1
1806	29.3
1807	29.5
1808	29.7
1809	29.9
1810	30.1
1811	30.1
1812	30.2
1813	30.3
'''

We will do something similar with the price of wheat, by storing it into a `WHEAT_DATA` variable. 

In [None]:
WHEAT_DATA = '''
start_year	end_year	wheat
1565	1570	40.7
1570	1575	44.85
1575	1580	41.94
1580	1585	48.55
1585	1590	41.23
1590	1595	46.34
1595	1600	64.14
1600	1605	27.32
1605	1610	33.45
1610	1615	32.57
1615	1620	33.36
1620	1625	35.29
1625	1630	33.1
1630	1635	45.18
1635	1640	33.1
1640	1645	39.32
1645	1650	52.98
1650	1655	41.59
1655	1660	40.72
1660	1665	46.23
1665	1670	31.61
1670	1675	38.44
1675	1680	42.21
1680	1685	35.2
1685	1690	27.85
1690	1695	40.28
1695	1700	50.35
1700	1705	30.21
1705	1710	32.57
1710	1715	44.31
1715	1720	32.84
1720	1725	28.9
1725	1730	39.23
1730	1735	25.74
1735	1740	31.52
1740	1745	27.58
1745	1750	28.28
1750	1755	31
1755	1760	35.64
1760	1765	31
1765	1770	42.56
1770	1775	47.37
1775	1780	43.43
1780	1785	46.67
1785	1790	41.59
1790	1795	47.81
1795	1800	75.74
1800	1805	78.55
1805	1810	81.17
1810	1815	99.56
1815	1820	78.81
'''

Finally, the fun part! We will be visualizing data using the Bokeh library, which maps well onto the Grammar of Graphics approach. All Bokeh plots follow the same general pattern:

1. First we will create a `Figure`, which is the plot itself.
2. Then we will add one or more graphical objects, such as lines, bars, x's, and so on.
3. Finally, we will `show` the figure that we've created.

Using our data from before, here is the simplest possible plot we can create.

In [None]:
# create the figure
figure = Figure()
# add a line to the figure, using WAGE_DATA, with the x-axis being the year and the y-axis being the wages
figure.line(
    x='year',
    y='wages',
    source=text_data_source(WAGE_DATA),
)
# show the figure
show(figure)

That's the basic idea, but the Bokeh library offers a lot of options for customizing your visualization. A lot of the options change visual variables associated with each geometric object, and you can use both specific values (eg. `0`, `'red`') or named columns from the data (which this dataset doesn't contain). You can find all the possible options for each geometric object at <https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html#bokeh.plotting.Figure>. You can additionally change other attributes of the graphic as well, such as adding titles, changing the x- and y-axes, and so on. Here is an example that starts with what we had before and adds a lot of color, texture, and visual attributes. (I didn't say these are *good* colors or textures...)

In [None]:
# create the figure
figure = Figure(
    title='Wheat Price versus Wages in England from 1565 to 1830',
    width=800,
    height=600,
    x_range=(1565,1830),
    y_range=(0, 100),
    x_axis_label='Year',
    y_axis_label='Wheat Price / Wages (schillings)',
)
# add a line to the figure, using WAGE_DATA, with the x-axis being the year and the y-axis being the wages
figure.line(
    x='year',
    y='wages',
    source=text_data_source(WAGE_DATA),
    color='red',
    line_dash='dotdash',
    line_width=3,
)
# add a bar chart to the figure
figure.quad(
    left='start_year',
    right='end_year',
    top='wheat',
    bottom=0,
    source=text_data_source(WHEAT_DATA),
    alpha=0.3,
    line_color='black',
    fill_color='chartreuse',
    hatch_pattern='spiral',
    hatch_color='yellow',
)
# show the figure
show(figure)