## Data Visualization using Bokeh

In [1]:
# Standard imports
from bokeh.io import output_notebook, show
output_notebook()
from bokeh.plotting import figure, output_file, show, ColumnDataSource
from bokeh.models import HoverTool
import numpy as np

In [2]:
import bokeh.sampledata
bokeh.sampledata.download()

Using data directory: /Users/harichandanachintha/.bokeh/data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3182088 bytes)
   3182088 [100.00%]
Unpacking: US_Counties.csv
Downloading: us_cities.json (713565 bytes)
    713565 [100.00%]
Downloading: unemployment09.csv (253301 bytes)
    253301 [100.00%]
Downloading: AAPL.csv (166698 bytes)
    166698 [100.00%]
Downloading: FB.csv (9706 bytes)
      9706 [100.00%]
Downloading: GOOG.csv (113894 bytes)
    113894 [100.00%]
Downloading: IBM.csv (165625 bytes)
    165625 [100.00%]
Downloading: MSFT.csv (161614 bytes)
    161614 [100.00%]
Downloading: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip (5148539 bytes)
   5148539 [100.00%]
Unpacking: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.csv
Downloading: gapminder_fertility.csv (64346 bytes)
     64346 [100.00%]
Downloading: gapminder_population.csv (94509 bytes)
     94509 [100.00%]
Downloading: gapminder_life_expectancy.csv (73243 bytes)
     73243 [100.00%]

In [3]:
# create data for line plot
def f(t):
    return np.exp(-t) * np.cos(2*np.pi*t)

t1 = np.arange(0.0, 5.0, 0.02)

In [4]:
p = figure(plot_width=400, plot_height=400)
p.line(t1, f(t1), line_width=2)

show(p)

Exercise: Generate 'cosine' data and plot it using Bokeh line plot (see matplotlib notebook for generating cosine data)

In [5]:
p = figure(plot_width=400, plot_height=400)
p.line(t1, np.cos(2*np.pi*t1), line_width=2)

show(p)

In [8]:
p = figure(plot_width=400, plot_height=400)
p.line(t1, f(t1), line_width=2, color='red', alpha=0.2)
p.line(t1, np.cos(2*np.pi*t1), line_width=2)

show(p)

In [16]:
from bokeh.sampledata.stocks import AAPL, IBM, MSFT, GOOG
from bokeh.palettes import Spectral4
import pandas as pd

In [17]:
type(AAPL)

dict

In [18]:
p = figure(plot_width=800, plot_height=250, x_axis_type="datetime")
p.title.text = 'Click on legend entries to hide the corresponding lines'

for data, name, color in zip([AAPL, IBM, MSFT, GOOG], ["AAPL", "IBM", "MSFT", "GOOG"], Spectral4):
    df = pd.DataFrame(data)
    df['date'] = pd.to_datetime(df['date'])
    p.line(df['date'], df['close'], line_width=2, color=color, alpha=0.8, legend=name)

p.legend.location = "top_left"
p.legend.click_policy="hide"

output_file("interactive_legend.html", title="interactive_legend.py example")

show(p)

Exercise: Use pandas_datareader to download three stocks: NUE, X, and STLD. You can download them individually. Create a similar plot as above for the three stocks.

In [5]:
from bokeh.sampledata.stocks import NUE, X, STLD
from bokeh.palettes import Spectral4
import pandas as pd

ImportError: cannot import name 'NUE'

In [42]:
from datetime import datetime
import pandas_datareader.data as web
from bokeh.palettes import Spectral4
import matplotlib.pyplot as plt
import pandas as pd

start = datetime(2017, 10, 16)
end = datetime(2017, 10, 27)
NUE = web.DataReader('NUE', 'yahoo', start, end)
X = web.DataReader('X', 'yahoo', start, end)
STLD = web.DataReader('STLD', 'yahoo', start, end)

p = figure(plot_width=800, plot_height=250, x_axis_type="datetime")
p.title.text = 'Click on legend entries to hide the corresponding lines'

for data, name, color in zip([NUE, X, STLD], ["NUE", "X", "STLD"], Spectral4):
    df = pd.DataFrame(data)
    df['date']=df.index.values
    p.line(df['date'], df['Close'], line_width=2, color=color, alpha=0.8, legend=name)

p.legend.location = "top_left"
p.legend.click_policy="hide"

output_file("interactive_legend.html", title="interactive_legend.py example")
show(p)

In [None]:
#### Hover Tool

In [76]:
# create data using python dictionary
source = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5],
    y=[2, 5, 8, 2, 7],
    desc=['A', 'b', 'C', 'd', 'E'],
))

In [77]:
hover = HoverTool(tooltips=[
    ("index", "$index"),
    ("(x,y)", "($x, $y)"),
    ("desc", "@desc"),
])

Field names that begin with $ are “special fields”. These often correspond to values that are intrinsic to the plot, such as the coordinates of the mouse in data or screen space. These special fields are listed here:

\$index:	index of selected point in the data source

\$x:	x-coordinate under the cursor in data space

\$y:	y-coordinate under the cursor in data space

Field names that begin with @ are associated with columns in a ColumnDataSource. Note that if a column name contains spaces, the it must be supplied by surrounding it in curly braces, e.g. @{adjusted close} will display values from a column named "adjusted close".

In [78]:
p = figure(plot_width=400, plot_height=400, tools=[hover],
           title="Mouse over the dots")

p.circle(x, y, size=20, source=source)

show(p)

NameError: name 'x' is not defined

Exercise: Load the tips data from seaborn package and draw a scatter plot for total_bill and tip with hover tool that list sex, smoker, day, and time fields for a data point. 

In [68]:
import seaborn as sns
sns.set(color_codes=True)

source = ColumnDataSource(data=sns.load_dataset("tips"))

In [72]:
hover = HoverTool(tooltips=[
    ("index", "$index"),
    ("sex","@sex" ),
    ("smoker","@smoker" ),
    ("day", "@day"),
    ("time", "@time")
])

In [73]:
#titanic = sns.load_dataset("titanic")

p = figure(plot_width=400, plot_height=400,tools=[hover],
           title="Mouse over the dots")

p.circle(x='total_bill',y='tip', size=20, source=source)

show(p)

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='e7975e4e-6e0d-4413-91cd-63a329232775', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='7bc20eb6-df42-4eb6-9c6d-f2f9411c69ba', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='7a80163c-7db9-487a-8146-df1625a3fc88', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: tip, total_bill [renderer: GlyphRenderer(id='1e253894-cd8e-42ea-b72d-9877320fab9a', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: tip, total_bill [renderer: GlyphRenderer(id='de5ed626-07da-4011-b0a5-eca359cb197a', ...)]


Exercise: Use Bokeh to visualize housing data. Please select fields which you would like to explore and see how Bokeh features like hovering can help in making better visualization. 

In [84]:
ds=pd.read_csv("resources/train_house.csv")
ds.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [None]:
from bokeh.sampledata.stocks import AAPL, IBM, MSFT, GOOG
from bokeh.palettes import Spectral4
import pandas as pd

In [87]:
plot =  figure()
plot.scatter(x=ds.Street, y=ds.LotArea)

output_file('test.html')
show(plot)

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='e7975e4e-6e0d-4413-91cd-63a329232775', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='7bc20eb6-df42-4eb6-9c6d-f2f9411c69ba', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='7a80163c-7db9-487a-8146-df1625a3fc88', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: tip, total_bill [renderer: GlyphRenderer(id='1e253894-cd8e-42ea-b72d-9877320fab9a', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: tip, total_bill [renderer: GlyphRenderer(id='de5ed626-07da-4011-b0a5-eca359cb197a', ...)]


In [88]:
output_file("house.html")

p = figure(plot_width=400, plot_height=400)


p.circle(x=ds.Street, y=ds.LotArea, size=20, color="navy", alpha=0.5)

# show the results
show(p)

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='e7975e4e-6e0d-4413-91cd-63a329232775', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='7bc20eb6-df42-4eb6-9c6d-f2f9411c69ba', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: x, y [renderer: GlyphRenderer(id='7a80163c-7db9-487a-8146-df1625a3fc88', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: tip, total_bill [renderer: GlyphRenderer(id='1e253894-cd8e-42ea-b72d-9877320fab9a', ...)]
E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: tip, total_bill [renderer: GlyphRenderer(id='de5ed626-07da-4011-b0a5-eca359cb197a', ...)]


In [85]:
# read housing data
ds=pd.read_csv("resources/train_house.csv")


In [83]:
# Create and deploy interactive data applications

from IPython.display import IFrame
IFrame('https://demo.bokehplots.com/apps/sliders', width=900, height=500)