# Visualisation

You are required to produce a **visualisation** of food hygiene ratings for different establishments across London.
 
Your visualisation should contain a map-based display of the ratings, placed according to their geolocation data. A user should be able to intuitively see which businesses are ‘safe’ to eat at, and those which have not scored so well.
 
You will supplement the map with additional charts and graphics that you deem appropriate to tell a coherent story from the data available.

## Question 0

In this assignment, you will use Bokeh, in particular `WMTSTileSource` class to add points to a map.  You will use a convenience package [`bokeh.tile_providers`](http://bokeh.pydata.org/en/latest/docs/reference/tile_providers.html) which creates a `WMTSTileSource` (like used in the guided exercise) instance with the `url` and `attribution` already set.  So, instead of manually creating a tile, you can just use one of the variables already created.  See [the Bokeh source code](https://github.com/bokeh/bokeh/blob/master/bokeh/tile_providers.py) to look at how they have done it.

The available tiles supported by Bokeh use [_Web Mercator_](https://en.wikipedia.org/wiki/Web_Mercator) format to represent location, so a function `wgs84_to_web_mercator` to convert the two is provided.  

Run the code in the cell below to set up the Notebook.

In [1]:
# You don't need to write anything here
# Set up MongoDB
from pymongo import MongoClient

client = MongoClient('mongodb://cpduser:M13pV5woDW@mongodb/health_data', 27017)
db = client.health_data

from nose.tools import *

# # Numpy, Pandas and Bokeh imports
import numpy as np
import pandas as pd
from bokeh.palettes import Spectral6
from bokeh.io import output_notebook, show
from bokeh.models.sources import ColumnDataSource
from bokeh.models import *
from bokeh.io import curdoc
from bokeh.tile_providers import *
from bokeh.models.tiles import WMTSTileSource
import ipywidgets
from ipywidgets import interact, interactive
# Dropdown for 3(b)
from ipywidgets import HBox, Label, IntSlider, Dropdown

from bokeh.plotting import figure
from bokeh.models import TapTool, CustomJS




def wgs84_to_web_mercator(df, lon="lon", lat="lat"):
    """
    Converts decimal longitude/latitude to Web Mercator format
    Source https://github.com/bokeh/bokeh-notebooks/blob/master/tutorial/11%20-%20geo.ipynb
    """
    k = 6378137
    df["x"] = df[lon] * (k * np.pi/180.0)
    df["y"] = np.log(np.tan((90 + df[lat]) * np.pi/360.0)) * k
    return df

# from ipywidgets import *
# from bokeh.layouts import *
from IPython.display import display
from bokeh.io import output_file, output_notebook, show, push_notebook

output_notebook()



In [2]:
# You don't need to write anything here
# Check it's set up correctly
try:
    imports = [MongoClient, db, np, pd, output_notebook, show, ColumnDataSource,
               output_notebook, show, ColumnDataSource, STAMEN_TERRAIN, figure
              ]
    assert True
    print('Successfully imported required libraries')
except NameError as e:
    print(e)
    assert False

Successfully imported required libraries



## Question 1: Create Map

In this question, you will create functions which will **return** the different objects required for the visualisation of a map on: A [`DataFrame`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html), a `ColumnDataSource` and a `Figure`.

### Question 1(a) [4 marks]

Create a function **`get_data()`** to extract data from the MongoDB database for all institutions which are in the **London** region with the following constraints:
- The results should include: `Lat`, `Lng`, `BusinessType`, `AddressLine1`, `BusinessName`, `RatingValue` but **NOT** the `_id` field
- The results should **only include businesses which have a RatingValue** (N.B. A value of 0 is a RatingValue)
- The results returned should **only include businesses which have a Geocode**
- The returned values should be **limited to 200** institutions
- **Add fields `x` and `y` in _Web Mercator_ format** to specify co-ordinates on the map
- **Return** the result as as [`DataFrame`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).  

*Hint: Week 3, Guided Exercise 4, Cursors*  
*Hint: Week 4, Guided Exercise 2, Importing Data*

In [4]:
def get_data():
    # YOUR CODE HERE
#     insts = db.collection_names()
    insts = db.list_collection_names()
    conditions = {'Region': 'london', 'RatingValue': { '$exists': True, '$ne': None}, 'Geocode': { '$exists': True, '$ne': None} };
    filterColumn = {'Lat': 1, 'Lng':1, 'BusinessType':1,'AddressLine1':1,'BusinessName':1,'RatingValue':1, '_id':0}
#     columns=['Lat', 'Lng', 'BusinessType','AddressLine1','BusinessName','RatingValue']
    # collecting all result list from cursors.
    resultList = []
    
    for inst in insts:
#         if(inst == 'uk'):
        # extract data from the MongoDB database w/ conditions
        cursor = db[inst].find( conditions, filterColumn ) #.limit(200) # GeoCode는 테스트후 제거.
        # cursor -> list
        partList = list(cursor)
        if(partList is None or len(partList) == 0 ): continue
        # institution list  -> append to all result list.
        resultList.extend(partList)
        
    resultDF = pd.DataFrame(resultList)
    resultDF = wgs84_to_web_mercator(resultDF, lon="Lng", lat="Lat") 
    
    #resturn  limit 200 docs in list.
    return resultDF[0:200]
    
get_data().head()


ServerSelectionTimeoutError: mongodb:27017: [Errno 11001] getaddrinfo failed

In [None]:
# You don't need to write anything here
test_data =  get_data()
assert_equal(type(test_data),pd.DataFrame)

for index, row in test_data.iterrows():
    try:
        i = int(row['RatingValue'])
    except:
        raise AssertionError('There is a row which is not an integer.  '
                             'Make sure you exclude all those without a RatingValue')
        assert False

#print('All tests passed successfully')
assert_equal(len(test_data.index), 200)
print('All tests passed successfully')

### Question 1(b)  [2 marks]

Create a function **`get_source`** which takes a **`DataFrame`** as a parameter and manipulates it to prepare for addition to the plot.  The function should:

- **Contain a column `Colour`**, which contains a hex string of the colour with which to display the establishment on the map, e.g., #d53e4f.  This should be used to distinguish different RatingValue values of the businesses.
- **RatingValues will be displayed by different colours** using an appropriate palette such as **`Spectral6`** from the standard [Bokeh palettes](http://bokeh.pydata.org/en/latest/docs/reference/palettes.html)
- The function should **accept an integer** as a number to filter the businesses by `RatingValue`.
- If the rating value is equal to **`-1`**, then **all businesses should be included**.  Otherwise, the data should be filtered to only include businesses with a `RatingValue` of the value passed.
- The function should **return** the result as a **`DataFrame`**  

*Hint: Week 5, Guided Exercise 2, Data Sources*

In [74]:
def get_source(df, data_filter=-1):
    # YOUR CODE HERE

    # set color hex codes w/ df column value condition. RatingValue range: 0 ~ 5
    for colorIdx in range(0,6):
        # set Colour w/ Spectral6. index: RatingValue 
        df.loc[df['RatingValue'] == colorIdx, 'Colour'] = Spectral6[colorIdx]
    
    if( data_filter > -1 and data_filter < 6 ):
        return df[df['RatingValue'] == data_filter]
    else:
        return df
    
    raise NotImplementedError()

# get_source(get_data(), -1)

In [75]:
# You don't need to write anything here
test_source = get_source(get_data())
assert_equal(type(test_source),pd.DataFrame)
# Check the colours are different
rating_value_1 = test_source.loc[test_source['RatingValue'] == 1]
rating_value_2 = test_source.loc[test_source['RatingValue'] == 2]
rating_value_3 = test_source.loc[test_source['RatingValue'] == 3]
rating_value_4 = test_source.loc[test_source['RatingValue'] == 4]
rating_value_5 = test_source.loc[test_source['RatingValue'] == 5]

colour_list = [rating_value_1['Colour'].values[0],
               rating_value_2['Colour'].values[0],
               rating_value_3['Colour'].values[0],
               rating_value_4['Colour'].values[0],
               rating_value_5['Colour'].values[0]]
# If they are all different, then the length of the set should be 5
assert_equal(len(set(colour_list)), 5)

fields = test_source.columns.values
assert 'x' in fields
assert 'y' in fields

# Test that the x and y are the correct web mercator format
test_lng = rating_value_1['Lng'].values[0]
test_lat = rating_value_1['Lat'].values[0]

test_x = rating_value_1['x'].values[0]
test_y = rating_value_1['y'].values[0]

k = 6378137
# print(test_lat * (k * np.pi/180.0))

assert_equal(test_x, test_lng * (k * np.pi/180.0))
assert_equal(test_y, np.log(np.tan((90 + test_lat) * np.pi/360.0)) * k)
print('All tests passed successfully')

All tests passed successfully


### Question 1(c) [2 marks]

Create a function **`get_map`**, which **returns** a map of London using the [`STAMEN_TERRAIN`](http://bokeh.pydata.org/en/latest/docs/reference/tile_providers.html#bokeh.tile_providers.STAMEN_TERRAIN) tile
 
- The function should **return** a **type** `Figure`
- The figure should include **all** the available **[Pan/Drag tools](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#pan-drag-tools)**, the **reset** tool, and the **mouse wheel zoom** tool 
- The map should display London, at an appropriate zoom level for the data
 
*Hint: To get the `x,y` values surrounding London, look at the smallest and largest `x` and `y` values in the data*  
*Hint: Week 4, Guided Exercise 3, Residual Analysis (attributes)*  
*Hint: Week 5, Guided Exercise 2, Data Sources (Bokeh Map Tiling)*

In [117]:
from bokeh.io import output_file, show
from bokeh.layouts import column
from bokeh.plotting import figure
# map of London using the STAMEN_TERRAIN tile
from bokeh.tile_providers import STAMEN_TERRAIN

# for usage of 1d answer. 
def getLondonRange(data):
    # min, max from df column values.
    minX = data.loc[data['x'].idxmin()]['x']
    maxX = data.loc[data['x'].idxmax()]['x']
    minY = data.loc[data['y'].idxmin()]['y']
    maxY = data.loc[data['y'].idxmax()]['y']
#     print('minX:{} // maxX:{} // minY:{} // maxY:{}'.format(minX,maxX,minY,maxY))
    return ((minX, maxX),(minY, maxY))


def get_map(data):
    """
    In this function you return a figure with a map background.  The background should be centred
    on London at an appropriate zoom level
    """
    # YOUR CODE HERE
    xRange, yRange = getLondonRange(data)
    # https://docs.bokeh.org/en/latest/docs/user_guide/tools.html#lassoselecttool
    # pantool 겹칠시 - xpan,ypan # 에러발생
    fig = figure(tools='box_select,box_zoom,lasso_select,pan,xpan,ypan, reset, wheel_zoom', x_range= xRange, y_range= yRange) 
    fig.axis.visible = False
    
#     url = 'http://a.basemaps.cartocdn.com/dark_all/{Z}/{X}/{Y}.png'
#     attr = "Map tiles by Carto, under CC BY 3.0. Data by OpenStreetMap, under ODbL"
    fig.add_tile(STAMEN_TERRAIN)
    
    return fig
    
    raise NotImplementedError()
show(get_map(get_source(get_data())))



### Question 1(d) [3 marks]

Write code which creates and shows a figure `london_map` using the `get_map()` function, obtains a data source for the figure using `get_data()`, then uses the `circle` method to add the data to the map.  

- You should call the output of the `circle` function `data_points`.
- The dots you add to the map should have a size of 10, no border, and a `fill_alpha` of 0.8
- You should call your map `london_map`
- Your code should contain a variable name for the dots added to the map

N.B. You are not being asked to create a function for this question  

*Hint: Week 4, Guided Exercise 3, Fitting a Model - Residual Analysis*  
*Hint: Week 5, Guided Exercise 2, Widgets*

In [135]:
# YOUR CODE HERE
# create figure.

data = get_data()
source = get_source(data, -1) 
# print(source)

# get range of source data.
xRange, yRange = getLondonRange(data)

## call london_map
london_map = get_map(source)

# https://docs.bokeh.org/en/0.12.16/docs/user_guide/plotting.html
data_points = london_map.circle(source=source, x='x', y='y', size=10, color = 'Colour', fill_alpha = 0.8, line_color=None ) 

# data_source = london_map.patch(source['x'], source['y'], fill_color = source['Colour'])
show(london_map) 

#     raise NotImplementedError()

fig_source = london_map.select(GlyphRenderer)[0].data_source




In [137]:
fig_source = london_map.select(GlyphRenderer)[0].data_source

assert_equal(fig_source.data['x'][0], source['x'][0])
assert_equal(fig_source.data['y'][0], source['y'][0])
assert_equal(fig_source.data['Colour'][0], source['Colour'][0])
#assert_equal(fig_source.data['fill_color'][0], ds.data['Colour'][0])    


glyph = london_map.select(GlyphRenderer)[0].glyph
assert_equal(glyph.line_color, None)
assert_equal(glyph.size, 10)
assert_equal(glyph.fill_alpha, 0.8)
#glyph.fill_color == (ds.data['Colour'][0])
print('All tests passed successfully')

All tests passed successfully


## Question 2: Make it Interactive

### Question 2(a) [2 marks]

Create a function **`callback`** for later use, which updates the visible businesses on the map according to their **`RatingValue`**.  The function should have parameter **`rating`** which specifies the value to filter by, calling **`get_source`**.  Use the **`source`** variable from Question 1(d) to update the map.  

*Hint: Week 4, Guided Exercise 2, Bokeh Charts*  
*Hint: Week 5, Guided Exercise 2, Data Sources - Widgets (update figure)*

In [138]:
def callback(rating):
    # YOUR CODE HERE
    print("looking for rating::", rating)
    source = get_source(data, rating)
#     print(source.head())
    return source
    raise NotImplementedError()

# callback(3) 

### Question 2(b) [2 marks]

Using **`ipywidgets`**, create an [interactive](http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html) `IntSlider` widget, which calls the **`callback`** function when it updates.

Return the interactive widget in the function **`set_interactive()`**  

*Hint: Week 5, Guided Exercise 2, Widgets*

In [139]:
def set_interactive():    
    # YOUR CODE HERE 
    # 0 <= RatingValue <= 5, RatingValue == -1 for all data.
    slider = IntSlider(min=-1, max=5, step=1, value=-1)
    # init interactive box w/ function, target value.
    interactBox = interactive(callback, rating = slider) 
    return interactBox

    raise NotImplementedError()
set_interactive() 


looking for rating:: 4


Unnamed: 0,AddressLine1,BusinessName,BusinessType,Lat,Lng,RatingValue,x,y,Colour
0,352 King Street,101 Thai Kitchen,Restaurant/Cafe/Canteen,51.493618,-0.244311,4,-27196.576115,6709078.0,#fc8d59
9,156A King Street,Acorn Metro,Retailers - other,51.492834,-0.232831,4,-25918.628361,6708938.0,#fc8d59
13,77 Askew Road,Adam's Cafe,Restaurant/Cafe/Canteen,51.504197,-0.243386,4,-27093.605586,6710970.0,#fc8d59
18,105 Greyhound Road,Age UK Hammersmith & Fulham,Other catering premises,51.486045,-0.215608,4,-24001.372771,6707724.0,#fc8d59
21,351 North End Road,Al Ghazal,Retailers - other,51.484918,-0.201912,4,-22476.741025,6707523.0,#fc8d59
39,Amuse Bouche,Amuse Bouche/Claudes Kitchen,Restaurant/Cafe/Canteen,51.474888,-0.200477,4,-22316.997556,6705730.0,#fc8d59
44,35 Wingate Road,Anglesea Arms,Pub/bar/nightclub,51.499203,-0.235942,4,-26264.943297,6710077.0,#fc8d59
50,298 Uxbridge Road,Arbil Halal Meat,Retailers - other,51.506573,-0.231579,4,-25779.256358,6711395.0,#fc8d59
53,Unit 8,Archies Kitchen,Other catering premises,51.46874,-0.190166,4,-21169.182286,6704631.0,#fc8d59
54,87 Hammersmith Road,Argentine Steak House El Toro,Restaurant/Cafe/Canteen,51.494642,-0.21206,4,-23606.411218,6709261.0,#fc8d59


In [131]:
# You don't need to write anything here
old_callback = callback
del callback

try:
    set_interactive()
except NameError as e:
    pass
else:
    raise AssertionError('You have not called the callback function in your code')
finally:
    callback = old_callback
    del old_callback
assert_equal(type(set_interactive()), ipywidgets.widgets.widget_box.Box)

## Question 3: Extend the Visualisation

*Applying question 2 solutions*  
  
Now you have created an initial visualisation, you are going to add the following components to it: 
- **Hover** text, so that each dot will give information about the business when you hover
- A **drop down menu** to limit the type of business

You will also be asked to explain a possible use-case for this chart, and offer a suggestion as to how it could be improved.

NOTE: There are discretionary marks available for good visualisation practice

### Question 3(a) [5 marks]

Create a function **`get_hover`**, which returns a [HoverTool](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#hover-tool) to be added to the map.  When the cursor hovers over any circle, the following information should be displayed:
- The **name** of the establishment
- The **type** of the establishment
- The **`RatingValue`** of the establishment

Your function should **return** the [HoverTool](http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#hover-tool).

N.B You may need to read the documentation available to ensure that you match a suitable title for the **field** with the **field value** using **`@`**.  

*Hint: Week 5, Guided Exercise 2, Tools*

In [132]:
def get_hover():
    # YOUR CODE HERE
    hover = HoverTool()
    
    hover.tooltips = [
        ("name", "@BusinessName"),
        ("type", "@BusinessType"),
        ("RatingValue", "@RatingValue")
    ]
    
    return hover
    raise NotImplementedError()

### Question 3(b) [5 marks]

Using [`interactive`](http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html), create a **dropdown menu** which allows the user to choose between different **types of business**.  The map should update automatically as you select.

- You should create two functions **`filter_business_types`**, which the dropdown menu will call when it changes.  This function should update the data source
- You will need to obtain a list of the different types of business in the database
- You should **return** an **`interactive`** object, which should call the function **`filter_business_types`**, which updates the **`source`** variable accordingly

N.B. Should you use the original data set?  

*Hint: Week 5, Guided Exercise 2, Data Sources - Tools*

In [140]:
def filter_business_types(business_type):
    # YOUR CODE HERE 
    # source data for filtering dropdown value.
#     filterData = source
# #     print('filter_business_types - data.head() :: ', filterData.head())
#     # return filtered data.
#     filterData =  filterData[filterData['BusinessType'] == business_type]
#     return filterData

    # return filtered data.
    print("looking for business_type::", business_type)
    return source[source['BusinessType'] == business_type]
    
    raise NotImplementedError()

def get_dropdown_list(): 
    # YOUR CODE HERE
    # (rating이 반영된) BusinessType 리스트.
    dropDownList = list(source['BusinessType'].unique())
    
    # BusinessType dropdown
    dropdown = Dropdown(
        options=dropDownList,
        value=dropDownList[0],
        description='BusinessType:',
        disabled=False
    )
    # init interactive box w/ function, target value.
    interactBox = interactive(filter_business_types, business_type = dropdown) 
    
    return interactBox
    
    raise NotImplementedError()

# DISPLAY THE MAP
data = get_data()
source = get_source(data)
mappy = get_map(data)
dots = mappy.circle(source=source, x='x', y='y',fill_color='Colour', size=10, fill_alpha=0.8, line_color=None)
mappy.add_tools(get_hover())
dropdown = get_dropdown_list()
slider = set_interactive()
show(mappy, notebook_handle=True)





In [141]:
# You don't have to write anything here
# Display the widgets
HBox([slider, dropdown])

# Why do you think we might get a Bokeh warning diplayed when we use our data?

looking for business_type:: Other catering premises


Unnamed: 0,AddressLine1,BusinessName,BusinessType,Lat,Lng,RatingValue,x,y,Colour
1,Charing Cross Hospital,15th Floor Private Patients,Other catering premises,51.48581,-0.219089,5,-24388.875918,6707682.0,#d53e4f
16,The Queens Club,Aegon Championship,Other catering premises,51.486866,-0.212263,5,-23629.009074,6707871.0,#d53e4f
18,105 Greyhound Road,Age UK Hammersmith & Fulham,Other catering premises,51.486045,-0.215608,4,-24001.372771,6707724.0,#fc8d59
28,51 Townmead Road,All About Canteen,Other catering premises,51.467772,-0.185122,5,-20607.686775,6704458.0,#d53e4f
53,Unit 8,Archies Kitchen,Other catering premises,51.46874,-0.190166,4,-21169.182286,6704631.0,#fc8d59
69,Rangers Stadium,Azure Support Services,Other catering premises,51.512103,-0.226725,5,-25238.91155,6712384.0,#d53e4f
178,31 Richmond Way,Butlers Catering,Other catering premises,51.501564,-0.217264,4,-24185.717848,6710499.0,#fc8d59


### Question 3(c) [5 marks]

Describe a use case for which an application like this would be useful, and suggest one way which it could be improved.

### [지역별 나이별 부동산 분포분석]

4주간 "데이터 사이언스 기본 과정"에서 Bokeh는 강력한 시각화도구로써, 지역 맵과, 
각 데이터에 대한 interactive한 필터링 기능을 학습했습니다. 
<br/>이 도구를 이용해 분석을 수행한다면, 현재 가장 주목받는 시장인 
부동산시장에 대해서 다뤄보려 합니다. 

부동산 평균가격에 대한 소유주 나이와 서울시의 지역에 따라 분포를 분석해보고 싶습니다.
<br/>제 가설로 지역과 부동산 평균가, 소유주 나이와 부동산 평균가는 비례할것이라고 
생각합니다.

단지 우리가 배운 Bokeh map tiling을 사용할뿐 아니라, 아래 샘플과 비슷하게 
우측과 하단에 각 구분(나이, 지역명)에 대한 히스토그램을 통해서 좀더 직관적으로 
확인할수 있습니다.
가능하다면 지역, 나이, 부동산 가격을 학습한 interactive한 필터링 기능을 추가함으로써,
세부 데이터를 확인해 분석에 도움이 될것입니다.

이와같이 제 가설에 대한 데이터를 시각화 하며 제 가설인 소유주의 나이, 
지역명과 부동산 평균가는 비례할것이라는 분석에 대한 통찰을 직관적으로 보여주어 
분석의 스토리텔링에 큰 역할을 할것입니다.

이미지 출처: http://demo.bokeh.org/selection_histogram
<br/>해당 git src: https://github.com/bokeh/bokeh/blob/master/examples/app/selection_histogram.py


<a href="https://imgur.com/9aY8QJm"><img src="https://i.imgur.com/9aY8QJm.png" title="source: imgur.com" /></a>


<a href="https://imgur.com/VilEF1S"><img src="https://i.imgur.com/VilEF1S.png" title="source: imgur.com" /></a>

