# More with Bokeh

## First, set up your environment!
Make sure you've activated the 'data' environment
If you haven't (you can tell based on whether your command line has (data) to the left of it), in the regular terminal run ```source activate data```

Now, open python in interactive mode

We're going to need sample data
To get sample data, ```import bokeh``` and run the ```bokeh.sampledata.download()``` function

In [4]:
import bokeh
bokeh.sampledata.download()

Using data directory: C:\Users\gerst\.bokeh\data
Downloading: CGM.csv (1589982 bytes)
     16384 [  1.03%]     32768 [  2.06%]     49152 [  3.09%]     65536 [  4.12%]     81920 [  5.15%]     98304 [  6.18%]    114688 [  7.21%]    131072 [  8.24%]    147456 [  9.27%]    163840 [ 10.30%]    180224 [ 11.33%]    196608 [ 12.37%]    212992 [ 13.40%]    229376 [ 14.43%]    245760 [ 15.46%]    262144 [ 16.49%]    278528 [ 17.52%]    294912 [ 18.55%]    311296 [ 19.58%]    327680 [ 20.61%]    344064 [ 21.64%]    360448 [ 22.67%]    376832 [ 23.70%]    393216 [ 24.73%]    409600 [ 25.76%]    425984 [ 26.79%]    442368 [ 27.82%]    458752 [ 28.85%]    475136 [ 29.88%]    491520 [ 30.91%]    507904 [ 31.94%]    524288 [ 32.97%]    540672 [ 34.00%]    557056 [ 35.04%]    573440 [ 36.07%]    589824 [ 37.10%]    606208 [ 38.13%]    622592 [ 39.16%]    638976 [ 40.19%]    655360 [ 41.22%]    671744 [ 42.25%]    688128 [ 43.28%]    704512 [ 44.31%]    720896

## Getting some data
Let's import the autompg data from the sampledata
Load it as a dataframe, and try looking at it in a variety of ways. Can you find ways to view only the cars that have mpg > 20? How about sorting by weight?

In [2]:
from bokeh.sampledata.autompg import autompg as df
df

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino
5,15.0,8,429.0,198,4341,10.0,70,1,ford galaxie 500
6,14.0,8,454.0,220,4354,9.0,70,1,chevrolet impala
7,14.0,8,440.0,215,4312,8.5,70,1,plymouth fury iii
8,14.0,8,455.0,225,4425,10.0,70,1,pontiac catalina
9,15.0,8,390.0,190,3850,8.5,70,1,amc ambassador dpl


In [63]:
df.sort_index(by='hp')

  if __name__ == '__main__':


Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
19,26.0,4,97.0,46,1835,20.5,70,2,volkswagen 1131 deluxe sedan
101,26.0,4,97.0,46,1950,21.0,73,2,volkswagen super beetle
324,43.4,4,90.0,48,2335,23.7,80,2,vw dasher (diesel)
323,44.3,4,90.0,48,2085,21.7,80,2,vw rabbit c (diesel)
242,43.1,4,90.0,48,1985,21.5,78,2,volkswagen rabbit custom diesel
116,29.0,4,68.0,49,1867,19.5,73,2,fiat 128
193,29.0,4,85.0,52,2035,22.2,76,1,chevrolet chevette
244,32.8,4,78.0,52,1985,19.4,78,3,mazda glc deluxe
388,44.0,4,97.0,52,2130,24.6,82,2,vw pickup
142,31.0,4,76.0,52,1649,16.5,74,3,toyota corona


# Bar Charts

You will need to specify a DataFrame for data, and a column to group by (which will label the x-axis)

In [68]:
from bokeh.charts import Bar, output_notebook, show
# Remember that your imports will be different
# from bokeh.charts import Bar, output_file, save

# Create a chart named p
# x axis is cyl
# y axis is hp
portia = Bar(df, 'cyl', values='hp', title='Horsepower by number of cylinders')

# output_file('yourfile.html')
output_notebook()
# Remember that you'll want output_file('filename.html') instead

show(portia)
# Remember that you'll want save(p) instead

In [69]:
# Find the total horsepower by cylinder

# Create a chart named p
p = Bar(df, 'cyl', values='mpg', title="Total MPG by Cylinders")

show(p)

We can also specify how to combine data. Totals are probably not very useful here. How about averages?

In [74]:
# We don't need to reimport - imports are already taken care of

# use Bar() to create a new chart
p = Bar(df, 'cyl', values='mpg', agg='mean', title='Average mpg by Cylinders', bar_width=1.1, color='teal')

# No need to use output_file() again unless you want to create a different file

show(p)
# Remember that you're probably using save(p)

In [81]:
# What if we wanted to group by year?
# Create a chart that shows the average mpg by year
# you may need to make the bars narrower if they don't fit

# Create a chart named p
p = Bar(df, 'yr', values='mpg', agg='mean', title="Average MPG by Year", color='orange')
show(p)

In [90]:
from bokeh.palettes import Spectral3 as pal
# Create a chart named p
p = Bar(df, 'yr', values='mpg', group='origin', agg='mean', title='Average MPG by year, grouped by origin', palette=pal, background_fill_color='black')
show(p)

In [22]:
from bokeh.charts import BoxPlot
p = BoxPlot(df, values='mpg', label='cyl', title='MPG Summary, grouped by cylinders')

show(p)

In [32]:
from bokeh.charts import Histogram

p = Histogram(df['hp'], title='Distribution of Horsepower')
show(p)

In [34]:
p = Histogram(df, values='mpg', color='origin', title='MPG Distribution grouped by cylinders')
show(p)

In [36]:
from bokeh.charts import Scatter
p = Scatter(df, x='mpg', y='hp', title='HP vs MPG', xlabel='MPG', ylabel='HP')
show(p)

In [46]:
from bokeh.palettes import BrBG5 as pal
p = Scatter(df, x='mpg', y='hp', title='HP vs MPG', xlabel='MPG', ylabel='HP', color='cyl', palette=pal)
show(p)

In [50]:
from bokeh.charts import Area
import numpy as np
import pandas as pd

df2 = pd.DataFrame({'sin': [np.sin(np.radians(n)) for n in range(360)], 'cos': [np.cos(np.radians(n)) for n in range(360)]})
df2

p = Area(df2)
show(p)