## Many Options

There are many options when it comes to producing data visualizations:

* ggplot - R, python
* matplotlib - python
* bokeh - python, R, scala

We are going to look at bokeh since it has an API for Python, R and Scala.  It is modern, based on grammar of graphics, designed for the web.  I think it has higher ROL(return on learning) and a lovely logo:

![Bokeh](http://bokeh.pydata.org/en/latest/_static/images/logo.png)

Lots of resources for bokeh online - start [here](http://bokeh.pydata.org/en/latest/).  

Majority of content in this section is a mash up of the content on the Bokeh site.


## Bokeh Charts

Bokeh has 3 levels of interface for different users:

* a low-level [bokeh.models](http://bokeh.pydata.org/en/latest/docs/reference/models.html#bokeh-models) interface that provides the most flexibility to application developers.
* an intermediate-level [bokeh.plotting](http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh-plotting) interface centered around composing visual glyphs.
* a high-level [bokeh.charts](http://bokeh.pydata.org/en/latest/docs/reference/charts.html#bokeh-charts) interface to build complex statistical plots quickly and simply.

We are going to take a very quick look at bokeh.charts - creating a bar chart, histogram, line chart, scatter chart and a boxplot.  Note we are only scratching the surface - I recommend learning bokeh.plotting when you have time. 

For many more details on bokeh.charts read the great documentation [here](http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html).

### Input Data

Charts uses [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame) internally, so any inputs provided will be coerced into this format. 

The input types accepted are:

Array-like: 1..* list, tuple, numpy.ndarray, pandas.Series

Table-like:

        records: a list(dict)
        columns: a dict(list), pandas.DataFrame, or blaze resource

Additional information - [10 Minute Intro to Panda's Data Frame](http://pandas.pydata.org/pandas-docs/stable/10min.html).


# Bar charts

We are using bokeh.charts as well as a sample data set containing auto mpg data.

We also set the chart size defaults and redirect plots to show up in the notebook:

In [1]:
from bokeh.charts import defaults, Bar, output_notebook, show
from bokeh.sampledata.autompg import autompg as df

defaults.plot_width = 500
defaults.plot_height = 300

output_notebook(hide_banner=True)

df.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name
0,18,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15,8,350,165,3693,11.5,70,1,buick skylark 320
2,18,8,318,150,3436,11.0,70,1,plymouth satellite
3,16,8,304,150,3433,12.0,70,1,amc rebel sst
4,17,8,302,140,3449,10.5,70,1,ford torino


### Simple Bar Chart

In [7]:
p = Bar(df, 'cyl', values='mpg', title="Total MPG by CYL")
show(p)

<bokeh.io._CommsHandle at 0x7f54ae6d2d90>

<img src='files/resources/ic_assignment_black_24dp_2x.png' align='left'>Try changing the title of the plot.  Bring up the help and find out how to compute the Mean mpg rather than Sum.  Can you find a way to change the color of the bars to blue?

### Grouped Bars

In [31]:
p = Bar(df, label='yr', values='mpg', agg='mean', group='origin',
        title="Mean MPG by YR, grouped by ORIGIN", legend='top_right')

show(p)

<bokeh.io._CommsHandle at 0x7f2f9440ff90>

### Stacked Bars

In [42]:
p = Bar(df, label='cyl', values='mpg', agg='count', stack='origin',
        title="Count of observations by CYL, stacked by ORIGIN", legend='top_right')
show(p)

<bokeh.io._CommsHandle at 0x7f2f8df8c2d0>

### Line Chart

Simple line chart - first we group by the yr and compute mean of each value. 

In [13]:
from bokeh.charts import Line

df_mean = df.groupby("yr", as_index=False).mean()

line = Line(data=df_mean, x='yr', y='mpg')
show(line)

<bokeh.io._CommsHandle at 0x7f54acd7e050>

<img src='files/resources/ic_assignment_black_24dp_2x.png' align='left'>Can you find out how to add a second line showing the mean number of cylinders to the plot? The data frame already includes the value - you just need a change to the bokeh Line() call.

### Scatter Chart

Simple, color mapped to category, shape mapped to category

In [18]:
from bokeh.charts import Scatter

scatter = Scatter(data=df, x='mpg',y='weight', color ='cyl', legend='top_left')
show(scatter)

<bokeh.io._CommsHandle at 0x7f54a7848e50>

<img src='files/resources/ic_assignment_black_24dp_2x.png' align='left'>Can you fix the legend placement problem?

### Boxplot

Show the spread of the data.

In [22]:
from bokeh.charts import BoxPlot

boxplot = BoxPlot(df,label='cyl',values='mpg')

show(boxplot)

<bokeh.io._CommsHandle at 0x7f54a62c4710>

<img src='files/resources/ic_assignment_black_24dp_2x.png' align='left'>Find a way to remove the outliers from the plot.  
Can you change the color of the box?

### Where now?  

Going from here - browse over to the [bokeh docs](http://bokeh.pydata.org/en/latest/docs/user_guide.html) and start exploring some of the lower level bokeh.plotting libraries and the [gallery](http://bokeh.pydata.org/en/latest/docs/gallery.html) for inspiration.

Remember - [Pandas](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame) is your friend.  Get that data into Pandas and Bokeh will play very nicely.  Plus Pandas is a great API for working with data frames.  It is worth exploring and learning.

Most the time the default plots will suffice.  It can take many days to get publication ready plots - make sure you really need that tweak to the plot before losing days of your life! 