## Explanation of Bokeh Packages
For bokeh library, we will use some packages:
* output_file: that save our figure with .html extension
* show: show the figure
* figure: creates empty figure
* ColumnarDataSource: Data source of bokeh
* HoverTool: like cursor
* Row and column: puts plots in row order or column order in figure
* gridplot
* Tabs and Panel: Panel is figure for each plot and tab is like button
    


In [None]:
!pip install bokeh

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:

import numpy as np 
import pandas as pd

def get_path(dataset_name,env_name='colab'):
    """
    This function is used to return the path of the dataset you want to use. 
    
    @params:
    dataset_name: the name of the dataset. 
    env_name: it has two values either local, or colab the default is colab
    """
    prefix = 'https://raw.githubusercontent.com/mohamed-ashry7/Data-Engineering-Lab/main/Datasets/'
    if env_name == 'colab':
        return prefix+dataset_name
    else:
        return f'../Datasets/{dataset_name}'

In [None]:
# bokeh packages
from bokeh.io import output_file,show,output_notebook,push_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,HoverTool,BoxSelectTool
from bokeh.layouts import row,column,gridplot
from bokeh.models.widgets import Tabs,Panel
output_notebook()

## Plotting with Glyphs
* Glyphs: visual shapes like circle, square, rectangle or diamond
* figure: creates figure
    * x_axis_label: label of x axis
    * y_axis_label: label of y axis
    * tools: tools to move or zoom plot
        * pan: slides the plot
        * box_zoom: zoom in 
* output_file:  that save our figure with .html extension
* show: show the figure


# Line Graph

* line: line plot
    * line_width: width of line
    * fill_color: filling inside of circle with color

In [None]:
plot = figure(x_axis_label = "x",y_axis_label = "y")
plot.line(x=[4,5,2,3,1],y=[1,2,3,4,5],line_color = "black",legend_label='Line',width=5)
plot.circle(x=[4,5,2,3,1],y=[1,2,3,4,5],fill_color = "yellow",size = 10,alpha = 0.7,legend_label='Line')
output_file("my_first_bokeh_plot.html")
show(plot)


In [None]:
google = pd.read_csv(get_path('GOOGL_data.csv'))
fb     = pd.read_csv(get_path('FB_data.csv'))
apple = pd.read_csv(get_path('AAPL_data.csv'))
amazon = pd.read_csv(get_path('AMZN_data.csv'))
microsoft = pd.read_csv(get_path('MSFT_data.csv'))
google.head()

Unnamed: 0,date,open,high,low,close,volume,Name
0,2013-02-08,390.4551,393.7283,390.1698,393.0777,6031199,GOOGL
1,2013-02-11,389.5892,391.8915,387.2619,391.6012,4330781,GOOGL
2,2013-02-12,391.2659,394.344,390.0747,390.7403,3714176,GOOGL
3,2013-02-13,390.4551,393.0677,390.375,391.8214,2393946,GOOGL
4,2013-02-14,390.2549,394.7644,389.2739,394.3039,3466971,GOOGL


In [None]:
f = figure()
x =np.arange(len(google.close))
f.line(x,google.close,legend_label='Google')
f.line(x,fb.close,legend_label='Facebook')
f.line(x,amazon.close,legend_label='Amazon')
f.line(x,microsoft.close,legend_label='Microsoft')
f.line(x,apple.close,legend_label='Apple')
x = google.date
f.xaxis.major_label_overrides = {x:y for (x,y) in enumerate(x)}
show(f)

# Scatter Graphs

* circle: like scatter in matplotlib
    * size: size of circles
    * color: color
    * alpha: opacity
* Other markers: 
    * asterisk() 
    * circle() 
    * circle_cross() 
    * circle_x() 
    * cross() 
    * diamond() 
    * diamond_cross() 
    * inverted_triangle() 
    * square() 
    * square_cross() 
    * square_x() 
    * triangle() 
    * x()

In [None]:
plot = figure(x_axis_label = "x",y_axis_label = "y",tools = "pan,box_zoom")
plot.add_tools(BoxSelectTool())
plot.asterisk(x=[5,4,3,2,1],y=[1,2,3,4,5],color = "black",size = 10,alpha = 0.7)
show(plot)


In [None]:
# There are other types of glyphs
plot = figure()
plot.diamond(x=[5,4,3,2,1],y=[1,2,3,4,5],size = 10,color = "black",alpha = 0.7)
plot.cross(x=[1,2,3,4,5],y=[1,2,3,4,5],size = 10,color = "red",alpha = 0.7)
show(plot)

In [None]:
# line
plot = figure()
plot.line(x=[1,2,3,4,5,6,7],y = [1,2,3,4,5,5,5],line_width = 2)
plot.circle(x=[1,2,3,4,5,6,7],y = [1,2,3,4,5,5,5],fill_color = "white",size = 10)
show(plot)

# Bar Charts

In [None]:
movie_scores = pd.read_csv(get_path('movie_scores.csv'))
movie_scores

Unnamed: 0.1,Unnamed: 0,MovieTitle,Tomatometer,AudienceScore
0,0,The Shape of Water,91,73
1,1,Black Panther,97,79
2,2,Dunkirk,92,81
3,3,The Martian,91,91
4,4,The Hobbit: An Unexpected Journey,64,83


In [None]:
f = figure(tools=['box_select'])
x = np.arange(1,len(movie_scores.Tomatometer)+1)
width = 0.2
f.vbar(x=x-width/2, top= movie_scores.Tomatometer,bottom=0 , width=width,color='Green')
f.vbar(x=x+width/2, top= movie_scores.AudienceScore,bottom=0 , width=width)
f.xaxis.major_label_overrides = {x+1:y for (x,y) in enumerate(movie_scores.MovieTitle.values)}
show(f)

## Data Formats
Bokeh can use list, numpy arrays and pandas as a data source. We have pandas data frame in this tutorial.

## Customizing Glyphs
* Selection appearance: when you select some point on data, that points shine and others burn out
    * tools:
        * box_select and lasso_select: selection tools
    * selection_color: When you select point, it becomes selected color
    * nonselection_fill_alpha: Other non selected points become non selected alpha
    * nonselection_fill_color: Other non selected points become non selected color
* HoverTool: cursor
    * Crosshair: line cursor
    * hover_color: Color of hover
* Color mapping: color map of chose field. (like hue in seaborn)
    * factors: names of variable to color map
    * palette: color of chose factors



In [None]:
x = np.linspace(0, 10, 1000)
y = np.sin(x) + np.random.random(1000) * 0.2
plot = figure()
plot.line(x, y)
show(plot)

In [None]:
hover_tool = HoverTool()
f = figure(tools=[hover_tool,'crosshair'])
f.line(x,y)
show(f)

In [None]:
#Hover appearance
from bokeh.models import HoverTool
hover = HoverTool(tooltips=None)
plot = figure(tools=[hover, 'crosshair','box_select','lasso_select'])
# x and y are lists of random points
plot.circle(x, y, size=15, hover_color='green',selection_color = "orange")
show(plot)

In [None]:
# Selection appearance
source= ColumnDataSource(google)
plot = figure(tools="box_select,lasso_select")
plot.circle(x="close",y = "open",source=source,color = "black",
            selection_color = "orange",
            nonselection_fill_alpha = 0.2,
           nonselection_fill_color = "blue")
show(plot)

In [None]:
# Hover appearance
hover = HoverTool(tooltips = [("The closing price","@close"),("The opening price","@open")], mode="hline")
plot = figure(tools=[hover,"crosshair"])
plot.circle(x= "close",y = "open",source=source,color ="black",hover_color ="red")
show(plot)

## Layouts
Arranging multiple plots like subplot in matplot library.
* Row and columns: puts plots in row order or column order in figure
* Grid arrangement: list of order for layout
    * toolbar_location: location of tool bar that can be below above left right or none
* Tabbed layout
    * Panel: like a figure
    * Tabs: like a button 


In [None]:
# Row and column
p1 = figure()
p1.circle(x = "close",y= "open",source = source,color="red")
p2 = figure()
p2.circle(x = "open",y= "high",source = source,color="black")
p3 = figure()
p3.circle(x = "close",y= "low",source = source,color="blue")
p4 = figure()
p4.circle(x = "open",y= "volume",source = source,color="orange")
layout1 = row(p1,p2)
layout2 = row(p3,p4)
layout3= column(layout1,layout2)
show(layout3)

In [None]:
#nested
# I use p1, p2 and p3 that are created at above
layout = row(column(p1,p2),p3)
show(layout)

In [None]:
# Grid plot 
layout = gridplot([[p1,p2],[p3,None]],toolbar_location="above")
show(layout)

In [None]:
#Tabbed layout
#I use p1 and p2 that are created at above
tab1 = Panel(child = p1,title = "close vs. open")
tab2 = Panel(child = p2,title = "open vs. high")
tabs = Tabs(tabs=[tab1,tab2])
show(tabs)

 ## ***Exercises***

In [None]:
import numpy as np 
import pandas as pd

def get_path(dataset_name,env_name='colab'):
    """
    This function is used to return the path of the dataset you want to use. 
    
    @params:
    dataset_name: the name of the dataset. 
    env_name: it has two values either local, or colab the default is colab
    """
    prefix = 'https://raw.githubusercontent.com/mohamed-ashry7/Data-Engineering-Lab/main/Datasets/'
    if env_name == 'colab':
        return prefix+dataset_name
    else:
        return f'../Datasets/{dataset_name}'

In [None]:
# bokeh packages
from bokeh.io import output_file,show,output_notebook,push_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,HoverTool,BoxSelectTool
from bokeh.layouts import row,column,gridplot
from bokeh.models.widgets import Tabs,Panel
output_notebook()

Ex1:

Solve the exercise 3.1 from the last notebook using bokeh and add to the hover and crosshair tools to the existing defaults tools.


In [None]:
df_titanic = pd.read_csv(get_path("titanic.csv"),index_col=0)
df_titanic.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
from bokeh.models import HoverTool, CrosshairTool

ages_group = df_titanic.groupby('Age')['Survived'].sum()
fare_group = df_titanic.groupby('Age')['Fare'].mean()

ages = ages_group.index
survivors = ages_group.values
fares = fare_group.values

hover_tool = HoverTool(tooltips=[("Age","@x"),("Survivors","@y"),("Mean Fare","@size")])
cross_tool = CrosshairTool()

f = figure()
f.add_tools(hover_tool)
f.add_tools(cross_tool)
f.scatter(x=ages,y=survivors, size =fares/6 )

show(f)

In [None]:
# Here we try to make it look nicer with the colors

from bokeh.models import LinearColorMapper

hover_tool = HoverTool(tooltips=[("Age","@x"),("Survivors","@y"),("Mean Fare","@size")])
cross_tool = CrosshairTool()
# Search for color palette in bokeh
color_mapper = LinearColorMapper(palette='Magma256', low=min(survivors), high=max(survivors))

f = figure()
f.add_tools(hover_tool)
f.add_tools(cross_tool)
f.scatter(x=ages,y=survivors, size =fares/6 ,color={'field':'y','transform': color_mapper}, alpha=0.7)
show(f)

Ex2 :

Solve the exercises 3.4 but graph the top ten and the bottom nationalities using the layout using bokeh and use the tools hover and box select to the existing default tools.

In [None]:
fifa_df= pd.read_csv(get_path('fifa.csv'))
fifa_df.head()

Unnamed: 0.1,Unnamed: 0,ID,Name,Age,Photo,Nationality,Flag,Overall,Potential,Club,...,Composure,Marking,StandingTackle,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release Clause
0,0,158023,L. Messi,31,https://cdn.sofifa.org/players/4/19/158023.png,Argentina,https://cdn.sofifa.org/flags/52.png,94,94,FC Barcelona,...,96.0,33.0,28.0,26.0,6.0,11.0,15.0,14.0,8.0,€226.5M
1,1,20801,Cristiano Ronaldo,33,https://cdn.sofifa.org/players/4/19/20801.png,Portugal,https://cdn.sofifa.org/flags/38.png,94,94,Juventus,...,95.0,28.0,31.0,23.0,7.0,11.0,15.0,14.0,11.0,€127.1M
2,2,190871,Neymar Jr,26,https://cdn.sofifa.org/players/4/19/190871.png,Brazil,https://cdn.sofifa.org/flags/54.png,92,93,Paris Saint-Germain,...,94.0,27.0,24.0,33.0,9.0,9.0,15.0,15.0,11.0,€228.1M
3,3,193080,De Gea,27,https://cdn.sofifa.org/players/4/19/193080.png,Spain,https://cdn.sofifa.org/flags/45.png,91,93,Manchester United,...,68.0,15.0,21.0,13.0,90.0,85.0,87.0,88.0,94.0,€138.6M
4,4,192985,K. De Bruyne,27,https://cdn.sofifa.org/players/4/19/192985.png,Belgium,https://cdn.sofifa.org/flags/7.png,91,92,Manchester City,...,88.0,68.0,58.0,51.0,15.0,13.0,5.0,10.0,13.0,€196.4M


In [None]:
top_groups = fifa_df.groupby('Nationality')['Overall'].sum().sort_values()[-10:]
bottom_groups = fifa_df.groupby('Nationality')['Overall'].sum().sort_values()[:10]

In [None]:
top_groups

Nationality
Japan           29933
Netherlands     30647
Colombia        40526
Italy           47847
Brazil          58925
France          61968
Argentina       64252
Spain           74717
Germany         79172
England        105420
Name: Overall, dtype: int64

In [None]:
bottom_groups

Nationality
Botswana       56
Indonesia      56
South Sudan    60
Belize         60
Malta          61
Andorra        62
Puerto Rico    63
Rwanda         63
Grenada        63
Jordan         63
Name: Overall, dtype: int64

In [None]:
from bokeh.models import BoxSelectTool
# The top teams figure
top_source = ColumnDataSource(pd.DataFrame(top_groups))

top_figure = figure(x_range = list(top_groups.index))
# Here we did not say @x or @y because we are using column data source
hover_tool = HoverTool(tooltips=[("Nationality","@Nationality") , ("Overall Sum","@Overall")])
box_tool = BoxSelectTool()
top_figure.add_tools(hover_tool)
top_figure.add_tools(box_tool)

top_figure.vbar(
    x='Nationality',
    top='Overall',
    source= top_source,
    width = 0.5,
    bottom=0,
    color='Green',
    selection_color = "orange", # These three parameters for lasso select tool
    nonselection_fill_alpha = 0.4,
    nonselection_fill_color = "blue"
)
show(top_figure)

In [None]:
# we can customize it more to have colors 

top_source = ColumnDataSource(pd.DataFrame(top_groups))

top_figure = figure(x_range = list(top_groups.index))
# Here we did not say @x or @y because we are using column data source
hover_tool = HoverTool(tooltips=[("Nationality","@Nationality") , ("Overall Sum","@Overall")])
box_tool = BoxSelectTool()
top_figure.add_tools(hover_tool)
top_figure.add_tools(box_tool)
color_mapper = LinearColorMapper(palette='Inferno256', low=max(top_groups.values), high=min(top_groups.values))

top_figure.vbar(
    x='Nationality',
    top='Overall',
    source= top_source,
    width = 0.5,
    bottom=0,
    selection_color = "orange", # These three parameters for lasso select tool
    nonselection_fill_alpha = 0.4,
    nonselection_fill_color = "blue",# This for the color map
    color={'field':'Overall','transform': color_mapper}
    
)
show(top_figure)

In [None]:
# This image for bottom groups
bottom_source = ColumnDataSource(pd.DataFrame(bottom_groups))

bottom_figure = figure(x_range = list(bottom_groups.index))
# Here we did not say @x or @y because we are using column data source
hover_tool = HoverTool(tooltips=[("Nationality","@Nationality") , ("Overall Sum","@Overall")])
box_tool = BoxSelectTool()
bottom_figure.add_tools(hover_tool)
bottom_figure.add_tools(box_tool)
color_mapper = LinearColorMapper(palette='Inferno256', low=max(bottom_groups.values), high=min(bottom_groups.values))

bottom_figure.vbar(
    x='Nationality',
    top='Overall',
    source= bottom_source,
    width = 0.5,
    bottom=0,
    selection_color = "orange", # These three parameters for lasso select tool
    nonselection_fill_alpha = 0.4,
    nonselection_fill_color = "blue",# This for the color map
    color={'field':'Overall','transform': color_mapper}
    
)
show(bottom_figure)

In [None]:
# Then adding the tab layout
tab1 = Panel(child = top_figure,title = "Top Teams")
tab2 = Panel(child = bottom_figure,title = "Bottom Teams")
tabs = Tabs(tabs=[tab1,tab2])
show(tabs)