# Run-through of the PH [Visualising Data with Bokeh and Pandas](https://programminghistorian.org/en/lessons/visualizing-with-bokeh) tutorial

These instructions are adapted from the [Visualizing Data with Bokeh and Pandas](https://programminghistorian.org/en/lessons/visualizing-with-bokeh) tutorial by Charlie Harper, licensed under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0). Please follow along with the tutorial!

### Important note!
Follow the setup instructions [here](https://programminghistorian.org/en/lessons/visualizing-with-bokeh) and remember to download ```thor_wwii.csv``` if it doesn't appear in your directory.

Import some basic functions from bokeh

In [1]:
from bokeh.plotting import figure, output_file, output_notebook, show
import pandas as pd
pd.set_option('display.max_rows', 15) # I just set this to only show 15 rows per dataframe

Call output_notebook to show plots in-line in the notebook

In [6]:
output_notebook()

Set up some test data, one list for each axis

In [8]:
x = [1, 3, 5, 7]
y = [2, 4, 6, 8]

Instantiate a figure and add the data to it

In [16]:
p = figure()

p.circle(x, y, size=10, color='red', legend='circle')
p.line(x, y, color='blue', legend='line')
p.triangle(y, x, color='gold', size=10, legend='triangle')

p.legend.click_policy='hide'


In [17]:
show(p)

### Loading data from pandas - refresher

In [26]:
df = pd.read_csv('thor_wwii.csv')
df

Unnamed: 0,MSNDATE,THEATER,COUNTRY_FLYING_MISSION,NAF,UNIT_ID,AIRCRAFT_NAME,AC_ATTACKING,TAKEOFF_BASE,TAKEOFF_COUNTRY,TAKEOFF_LATITUDE,TAKEOFF_LONGITUDE,TGT_COUNTRY,TGT_LOCATION,TGT_LATITUDE,TGT_LONGITUDE,TONS_HE,TONS_IC,TONS_FRAG,TOTAL_TONS
0,03/30/1941,ETO,GREAT BRITAIN,RAF,84 SQDN,BLENHEIM,10.0,,,,,ALBANIA,ELBASAN,41.100000,20.070000,0.0,0.0,0.0,0.0
1,11/24/1940,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,DURAZZO,41.320000,19.450000,0.0,0.0,0.0,0.0
2,12/04/1940,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,TEPELENE,40.300000,20.020000,0.0,0.0,0.0,0.0
3,12/31/1940,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,VALONA,40.470000,19.490000,0.0,0.0,0.0,0.0
4,01/06/1941,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,VALONA,40.470000,19.490000,0.0,0.0,0.0,0.0
5,02/12/1941,ETO,GREAT BRITAIN,RAF,84 SQDN,BLENHEIM,9.0,,,,,ALBANIA,ELBASAN,41.100000,20.070000,0.0,0.0,0.0,0.0
6,02/12/1941,ETO,GREAT BRITAIN,RAF,11 SQDN,BLENHEIM,9.0,,,,,ALBANIA,ELBASAN - DUKAJ AREA,41.100000,20.070000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178274,06/14/1945,PTO,USA,20 AF,73 BW,B29,99.0,,,,,JAPAN,AMAGASAKI,34.700000,135.433333,0.0,999.0,0.0,999.0
178275,08/01/1945,PTO,USA,20 AF,58 BW,B29,99.0,,,,,JAPAN,HACHIOJI,35.666667,139.333333,0.0,999.0,0.0,999.0


Show column names

In [28]:
df.columns

Index(['MSNDATE', 'THEATER', 'COUNTRY_FLYING_MISSION', 'NAF', 'UNIT_ID',
       'AIRCRAFT_NAME', 'AC_ATTACKING', 'TAKEOFF_BASE', 'TAKEOFF_COUNTRY',
       'TAKEOFF_LATITUDE', 'TAKEOFF_LONGITUDE', 'TGT_COUNTRY', 'TGT_LOCATION',
       'TGT_LATITUDE', 'TGT_LONGITUDE', 'TONS_HE', 'TONS_IC', 'TONS_FRAG',
       'TOTAL_TONS'],
      dtype='object')

To access a single column we pass a string to our dataframe’s indexer: e.g. ```df['MSNDATE']```. To access multiple columns, we pass a list of names to our dataframe’s indexer: e.g. ```df[['MSNDATE', 'THEATER']]```.

In [34]:
df[['MSNDATE','THEATER']]

Unnamed: 0,MSNDATE,THEATER
0,03/30/1941,ETO
1,11/24/1940,ETO
2,12/04/1940,ETO
3,12/31/1940,ETO
4,01/06/1941,ETO
5,02/12/1941,ETO
6,02/12/1941,ETO
...,...,...
178274,06/14/1945,PTO
178275,08/01/1945,PTO


### The Bokeh ColumnDataSource object

- accepts a Pandas DataFrame as an argument 
- can be passed to glyph methods via the source parameter and other parameters, such as our x and y data 
- can then reference column names within our source

Create a scatter plot of the number of attacking aircraft versus the tons of munitions dropped:

In [38]:
# get all the bokeh imports we need
from bokeh.plotting import figure, output_file, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool

output_notebook()
df = pd.read_csv('thor_wwii.csv')

 - don’t want to plot all 170,000+ rows in our scatterplot (messy, time consuming), so...
 - randomly sample 50 rows using ```DataFrame.sample```
 - pass this sample to the ```ColumnDataSource``` constructor and store this in a variable called ```source```
 - (optional extra) print out ```source``` to see what it looks like

In [45]:
sample = df.sample(50) # Return a random sample of items from an axis of object
source = ColumnDataSource(sample) # create ColumnDataSource object
print(source)

ColumnDataSource(id='2304', ...)


- create our ```figure``` object and call the ```circle``` glyph method to plot our data
- pass our source argument to the ```source``` variable
- pass column names holding the number of attacking aircraft (```AC_ATTACKING```) and tons of munitions dropped (```TOTAL_TONS```) to ```x``` and ```y``` arguments

- With ```ColumnDataSource``` we’re not limited to just using column names for ```x``` and ```y``` parameters
- Can also pass a column name for other parameters such as ```size```, ```line_color```, or ```fill_color```
- Allows styling options to be determined by columns in the datasource itself! 
- e.g. change ```size=10``` to ```size='TONS_HE'``` - size of each dot will then reflect the tons of high explosives used.

In [51]:
p = figure()
p.circle(x='TOTAL_TONS', y='AC_ATTACKING',
         source=source,
         size=10, color='green')

In [52]:
show(p)

Label things up

In [53]:
p.title.text = 'Attacking Aircraft and Munitions Dropped'
p.xaxis.axis_label = 'Tons of Munitions Dropped'
p.yaxis.axis_label = 'Number of Attacking Aircraft'

Add some interactive hover-over functionality ([other tools](https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html) are available)

In [54]:
hover = HoverTool()
hover.tooltips=[
    ('Attack Date', '@MSNDATE'),
    ('Attacking Aircraft', '@AC_ATTACKING'),
    ('Tons of Munitions', '@TOTAL_TONS'),
    ('Type of Aircraft', '@AIRCRAFT_NAME')
]

p.add_tools(hover)

show(p)

- `HoverTool` allows you to set a tooltips propertyand takes a list of tuples
- First part of the tuple is a display name, second is a column name from your `ColumnDataSource` prefaced with `@`
- add to plot with the `add_tool` method

## Categorical Data and Bar Charts: Munitions Dropped by Country

In [1]:
#munitions_by_country.py
import pandas as pd
from bokeh.plotting import figure, output_file, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool

from bokeh.palettes import Spectral5
from bokeh.transform import factor_cmap

output_notebook()

df = pd.read_csv('thor_wwii.csv')

- Spectral5 is a pre-made five color pallette
- factor_cmap is a colour map