## Reference to convert functions from R ggplot to Python bokeh

I hope this guide helps you convert a useful graph developed used in R ggplot to Python [bokeh](https://bokeh.pydata.org/). Bokeh is a library that can create interactive plots. However, bokeh has limited integrated statistics functionality.   
This guide converts some sections included in [the ggplot reference](https://ggplot2.tidyverse.org/reference/index.html)

### Loading the library
We can load multiple bokeh.plotting modules, similarly to loading library(ggplot2). To use statistical functions such as count, sum and mean, we use the pandas library. Both, bokeh and pandas are included in Anaconda.
Also to create similar plots to the ggplot2 reference guide, we use the ggplot2 datasets from [scilab](https://forge.scilab.org/index.php/p/rdataset/source/tree/master/csv/ggplot2)

In [1]:
from math import atan, degrees
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.palettes import Spectral6, Blues9
from bokeh.transform import linear_cmap
from bokeh.models import ColumnDataSource, Range1d, LinearColorMapper, ColorBar, BasicTicker

mtcars = pd.read_csv('mtcars.csv', index_col=0)
mpg =  pd.read_csv('mpg.csv', index_col=0)
diamonds =  pd.read_csv('diamonds.csv', index_col=0)
for var in ['x','y','carat','price']:
    diamonds[var] = pd.to_numeric(diamonds[var])
output_notebook()

### Layer:geom
A geom layer in ggplot2 combines data, aesthetic mapping, geometric objects and a position adjustment. In bokeh, a rendering level is similar to a ggplot2 layer, and geometric objects are called glyphs. The options for glyphs are similar to those in ggplot2: 

| ggplot2    | bokeh       |
|-----------|-------------|
| x         | x           |
| y         | y           |
| color     | line_color       |
| alpha     | line_alpha       |
| fill      | fill_color  |

A major difference between ggplot2 and bokeh is that in bokeh, layers are not added with the `+` operator, but by calling a plotting function from plotting object created with the figure function. This will become clear in the following examples

In [2]:
#p <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
def create_p():
    p = figure(title="My plot", plot_width=800, plot_height=400)
    p.circle(x='wt',y='mpg',source=mtcars)
    return p

#### Layer: geoms


#### geom_abline() geom_hline() geom_vline() 
Reference lines: horizontal, vertical, and diagonal. Unfortunately this functionality is not directly accessible in bokeh. The vline and hline functionality can be replicated with the ray glyph and special options. The ray will maintain its angle even after changing axis limits, so we should use the segment function instead.
Notes: The default angle_units are "rad" but can also be changed to "deg". To have an “infinite” ray, that always extends to the edge of the plot, specify 0 for the length.

In [3]:
p = create_p()
#p + geom_vline(xintercept = 5)
p.ray(x=5, y=0, length=0, angle=90, angle_units="deg")
show(p)

In [4]:
p = create_p()
#p + geom_vline(xintercept = 1:5)
p = figure(title="My plot", plot_width=800, plot_height=400)
p.circle(x='wt',y='mpg',source=mtcars)
p.ray(x=list(range(1,6)), y=[0]*5, length=0, angle=90, angle_units="deg")
show(p)

In [5]:
p = create_p()
#p + geom_hline(yintercept = 20)
p.ray(x=0, y=20, length=0, angle=0, angle_units="deg")
show(p)

In [6]:
p = create_p()
#p + geom_abline(intercept = 37, slope = -5)
def abline(xx, intercept, slope): return intercept + slope*xx
p.segment(x0 = -1, x1 = 10, y0 = abline(1,37,-5), y1 = abline(6,37,-5))
show(p)

#### geom_bar() geom_col() stat_count()
ggplot2 excels in plotting stats. For example, geom_bar() uses stat_count() by default while geom_col() uses stat_identity().
In bokeh, stats functions are not included yet, so the pandas Series functionality is used prior to plotting.  
Note: The bar `y` variable in ggplot2 is called `top` in bokeh. Also the bottom and width of the bars should be indicated. 

In [7]:
#g <- ggplot(mpg, aes(class))
mpg['counts'] = 0
mpg_plot = mpg[['class','counts']].groupby(by='class',as_index=False).count()
def create_gv():
    g = figure(title="My plot", plot_width=800, plot_height=400, x_range = mpg_plot['class'])
    return g
def create_gh():
    g = figure(title="My plot", plot_width=800, plot_height=400, y_range = mpg_plot['class'])
    return g


In [8]:
g = create_gv()
#g + geom_bar()
g.vbar(x='class', top='counts', width=0.9, bottom=0, source = mpg_plot)
show(g)

In [9]:
g = create_gv()
#g + geom_bar(aes(weight = displ))
mpg_plot = mpg[['class','displ']].groupby(by='class',as_index=False).sum()
g.vbar(x='class', top='displ', width=0.9, bottom=0, source = mpg_plot)
show(g)

In [10]:
from bokeh.core.properties import value
mycolors = [Spectral6[0],Spectral6[1],Spectral6[5]]
mpg_plot = mpg[['class','drv','counts']].groupby(by=['class','drv']).count().unstack(fill_value=0)
mylegend = [colname[1] for colname in mpg_plot.columns]
mpg_plot.columns = mylegend
mpg_plot['class'] = mpg_plot.index
g = create_gv()
#g + geom_bar(aes(fill = drv))
g.vbar_stack(mylegend, x='class', width=0.9, source = mpg_plot,
             color = mycolors,
             legend = [value(colname) for colname in mylegend])
g.legend.location = "top_left"
show(g)

In [11]:
g = create_gh()
#g +
# geom_bar(aes(fill = drv), position = position_stack(reverse = TRUE)) +
# coord_flip() +
# theme(legend.position = "top")
g.hbar_stack(mylegend, y='class', height=0.9, source = mpg_plot,
             color = mycolors,
             legend = [value(colname) for colname in mylegend])
g.legend.location = "bottom_right"
show(g)

#### geom_bin2d() stat_bin_2d() 
Again, ggplot2 excels in plotting stats. geom_bin2d divides the plane into rectangles, counts the number of cases in each rectangle. In bokeh, stats functions are not included yet, so the pandas Series functionality is used prior to plotting.  
Note: Fortunately hex_bin is available! 

In [12]:
#d <- ggplot(diamonds, aes(x, y)) + xlim(4, 10) + ylim(4, 10)
x_range = (4, 10); y_range = (4, 10)
def create_d():
    d = figure(title="My plot", plot_width=800, plot_height=400,
              x_range = x_range, y_range = y_range)
    return d

In [13]:
d = create_d()
#Prepare 2d count data
H, xedges, yedges = np.histogram2d(diamonds.x, diamonds.y, bins=36, range = [x_range, y_range])
bw = xedges[1]-xedges[0]; bh = yedges[1]-yedges[0]
diamonds_plot = pd.DataFrame(data=H, index=xedges[:-1], columns=yedges[:-1]).stack().reset_index()
diamonds_plot.columns = ['x','y','counts']
diamonds_plot.drop(diamonds_plot[diamonds_plot['counts']==0].index, inplace = True)
mapper = LinearColorMapper(palette=Blues9, low=diamonds_plot['counts'].min(), high=diamonds_plot['counts'].max())

#d + geom_bin2d()
d.rect(x="x", y="y", width=bw, height=bh,
       source=diamonds_plot,
       fill_color={'field': 'counts', 'transform': mapper},
       line_color=None)

color_bar = ColorBar(color_mapper=mapper, major_label_text_font_size="10pt",
                     ticker=BasicTicker(desired_num_ticks=5),
                     label_standoff=12, border_line_color=None, location=(0, 0))
d.add_layout(color_bar, 'right')

show(d)

In [14]:
#d + geom_bin2d(bins = 10)
d = create_d()
#Prepare 2d count data
H, xedges, yedges = np.histogram2d(diamonds.x, diamonds.y, bins=10, range = [x_range, y_range])
bw = xedges[1]-xedges[0]; bh = yedges[1]-yedges[0]
diamonds_plot = pd.DataFrame(data=H, index=xedges[:-1], columns=yedges[:-1]).stack().reset_index()
diamonds_plot.columns = ['x','y','counts']
diamonds_plot.drop(diamonds_plot[diamonds_plot['counts']==0].index, inplace = True)
mapper = LinearColorMapper(palette=Blues9, low=diamonds_plot['counts'].min(), high=diamonds_plot['counts'].max())

#d + geom_bin2d()
d.rect(x="x", y="y", width=bw, height=bh,
       source=diamonds_plot,
       fill_color={'field': 'counts', 'transform': mapper},
       line_color=None)

color_bar = ColorBar(color_mapper=mapper, major_label_text_font_size="10pt",
                     ticker=BasicTicker(desired_num_ticks=5),
                     label_standoff=12, border_line_color=None, location=(0, 0))
d.add_layout(color_bar, 'right')

show(d)

In [15]:
d = figure(title="My plot", plot_width=400, plot_height=400)
#d + geom_bin2d()
d.hexbin(x=diamonds['carat'], y=diamonds['price'], size=0.5)
show(d)

In [16]:
import numpy as np
from bokeh.transform import linear_cmap
from bokeh.util.hex import hexbin

n = 50000
x = np.random.standard_normal(n)
y = np.random.standard_normal(n)

bins = hexbin(x, y, 0.1)

p = figure(title="Manual hex bin for 50000 points", tools="wheel_zoom,pan,reset",
           match_aspect=True, background_fill_color='#ffffff')
p.grid.visible = False

p.hex_tile(q="q", r="r", size=0.1, line_color=None, source=bins,
           fill_color=linear_cmap('counts', 'Viridis256', 0, max(bins.counts)))


show(p)

#geom_bar() geom_col() stat_count()
geom_abline() geom_hline() geom_vline() 


geom_bar() geom_col() stat_count() 
Bar charts

geom_bin2d() stat_bin_2d() 
Heatmap of 2d bin counts

geom_blank() 
Draw nothing

geom_boxplot() stat_boxplot() 
A box and whiskers plot (in the style of Tukey)

geom_contour() stat_contour() 
2d contours of a 3d surface

geom_count() stat_sum() 
Count overlapping points

geom_density() stat_density() 
Smoothed density estimates

geom_density_2d() stat_density_2d() 
Contours of a 2d density estimate

geom_dotplot() 
Dot plot

geom_errorbarh() 
Horizontal error bars

geom_hex() stat_bin_hex() 
Hexagonal heatmap of 2d bin counts

geom_freqpoly() geom_histogram() stat_bin() 
Histograms and frequency polygons

geom_jitter() 
Jittered points

geom_crossbar() geom_errorbar() geom_linerange() geom_pointrange() 
Vertical intervals: lines, crossbars & errorbars

geom_map() 
Polygons from a reference map

geom_path() geom_line() geom_step() 
Connect observations

geom_point() 
Points

geom_polygon() 
Polygons

geom_qq_line() stat_qq_line() geom_qq() stat_qq() 
A quantile-quantile plot

geom_quantile() stat_quantile() 
Quantile regression

geom_ribbon() geom_area() 
Ribbons and area plots

geom_rug() 
Rug plots in the margins

geom_segment() geom_curve() 
Line segments and curves

geom_smooth() stat_smooth() 
Smoothed conditional means

geom_spoke() 
Line segments parameterised by location, direction and distance

geom_label() geom_text() 
Text

geom_raster() geom_rect() geom_tile() 
Rectangles

geom_violin() stat_ydensity() 
Violin plot

stat_sf() geom_sf() geom_sf_label() geom_sf_text() coord_sf() 
Visualise sf objects