This coursebook is part of Tokopedia Python for Data Analytics course prepared by team at [Algoritma](https://algorit.ma/). Algoritma is a data science education center based in Jakarta. We organize workshops and training programs to help working professionals and students gain mastery in various data science sub-fields: data visualization, machine learning, data modeling, statistical inference etc. This coursebook is intended for a restricted audience only, i.e. the individuals and organization having received this coursebook directly from Algoritma. It may not be reproduced, distributed, translated or adapted in any form outside these individuals and organizations without permission.

## Training Objectives

The primary objectives in this coursebook is to provide a full hands-on experience in using visual exploratory techniques to help participants gain full proficiency in data visualization tools in Python. The library we'll be using is called Bokeh, an interactive visualization library.

The objectives of this training is divided into 2 main focus:

- **Introduction to Bokeh**
- Grammar of graphics
- Plotting parameters
- Adding annotations
- **Plotting Essentials**
- Basic plotting
- Statistical Plots: Boxplot, and Histogram
- Choosing appropriate plot
- **Enhancing Plot**
- 

At the end of this course, we'll be working with **Learn by Building** module as their graded assignment. You'll be given a dataset to apply what you've learned. Create a visualization with the appropriate annotations and aestethics to generate a powerful insight and communicate a story.

## Introduction to Bokeh

### Why Learn Bokeh?

Visualizing dataset using any programming language can have a higher learning curve than most tools. Some of you might wonder: why use Python to visualize when you can use a drag-and-drop tools such as Tableau or Power BI. There are 2 main points of why we should learn this:
- Reproducibility
- Customization

Since we'll be creating our own visualization from scratch. It gives us limitless capability to create any kind of visualization that we can imagine. There are also limitless variance of the plot that we can try, iterating through the plot types can give us an unexplored insight we never think of, this process is called: **visual exploratory data analysis**. Once we're satisfied with the result, the plot can always be reproduced with the same script telling the same stories from different data variance.

In this hands-on exercise, we'll be going through a visual exploratory data analysis process drawing out all available insights within our dataset.

### Grammar of Graphics

Bokeh adopt a versatile system where user can alter graphic components layer by layer with a high level of modularity. On top of that, it's also able to be rendered within HTML, giving a flexibility of the plots to be presented in a standard browser.

To understand the grammar of graphics, we'll use one of the sample data provided by Bokeh library:

In [33]:
from bokeh.sampledata.autompg import autompg

autompg.dtypes

mpg       float64
cyl         int64
displ     float64
hp          int64
weight      int64
accel     float64
yr          int64
origin      int64
name       object
dtype: object

In order to produce a bokeh plot in a notebook, we need to use `output_notebook()` function in order to make the java script compatible with Jupyter's notebook.

In [20]:
from bokeh.io import output_notebook

output_notebook()

Now working with the grammar:

In [43]:
p = figure(plot_height = 200)
type(p)

bokeh.plotting.figure.Figure

In [44]:
show(p)

Now we can use figure methods to add graphic layers on top of the plot:

In [48]:
p.circle(x = autompg['hp'], y= autompg['mpg'])
show(p)

Notice how we created a scatter plot to visualize the relationship between Miles per Gallon, and Horsepower passed in as the parameters of `circle` function. The object holds various method you can use to add graphic layers on top of the plot.

Now let's try to create a bar plot:

In [55]:
source = ColumnDataSource(autompg)

p = figure(plot_height = 200)



p.vbar(x = autompg['yr'], width = 1, top = autompg['yr'].max(), bottom = 0)
show(p)

In [22]:
retail.columns

Index(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'],
      dtype='object')

In [23]:
retail['UnitPrice'].

0    2.55
1    3.39
2    2.75
3    3.39
4    3.39
Name: UnitPrice, dtype: float64