# Programming Project - Unit 1,3
*by Igor A. Brandão and Leandro Max*

**Goals**
- Analyse and generate visualization related to **Life Expectancy vs Fertility**
- Storytelling using Bokeh & Python
- Explore competences about data visualization

In [17]:
# Cell 01
# Import pandas as pd
import pandas as pd

# Import the gapminder dataset to: data
data = pd.read_csv("gapminder_tidy.csv", encoding = 'latin2', index_col = 0)

It is always a good idea to begin with some Exploratory Data Analysis. Pandas has a number of built-in methods that help with this. For example, **data.head()** displays the first five rows/entries of data, while **data.tail()** displays the last five rows/entries. **data.shape** gives you information about how many rows and columns there are in the data set. Another particularly useful method is **data.info()**, which provides a concise summary of data, including information about the number of entries, columns, data type of each column, and number of non-null entries in each column.



In [18]:
# Cell 02
data.head()

Unnamed: 0_level_0,Country,fertility,life,population,child_mortality,gdp,region
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1964,Afghanistan,7.671,33.639,10474903.0,339.7,1182.0,South Asia
1965,Afghanistan,7.671,34.152,10697983.0,334.1,1182.0,South Asia
1966,Afghanistan,7.671,34.662,10927724.0,328.7,1168.0,South Asia
1967,Afghanistan,7.671,35.17,11163656.0,323.3,1173.0,South Asia
1968,Afghanistan,7.671,35.674,11411022.0,318.1,1187.0,South Asia


In [20]:
# Cell 03
data.tail()

Unnamed: 0_level_0,Country,fertility,life,population,child_mortality,gdp,region
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2002,Aland,,81.8,26257.0,,,Europe & Central Asia
2003,Aland,,80.63,26347.0,,,Europe & Central Asia
2004,Aland,,79.88,26530.0,,,Europe & Central Asia
2005,Aland,,80.0,26766.0,,,Europe & Central Asia
2006,Aland,,80.1,26923.0,,,Europe & Central Asia


In [21]:
# Cell 04
data.shape

(10111, 7)

In [22]:
# Cell 05
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10111 entries, 1964 to 2006
Data columns (total 7 columns):
Country            10111 non-null object
fertility          10100 non-null float64
life               10111 non-null float64
population         10108 non-null float64
child_mortality    9210 non-null float64
gdp                9000 non-null float64
region             10111 non-null object
dtypes: float64(5), object(2)
memory usage: 631.9+ KB


# Some exploratory plots of the data

Now that we have the data ready, let's make a simple plot of Life Expectancy vs Fertility for the year 1970.

Your job is to import the relevant Bokeh modules and then prepare a **ColumnDataSource** object with the **fertility**, **life** and **Country** columns, where you only select the rows with the index value **1970**.


### Instructions

- Import **output_notebook** and **show** from **bokeh.io**, **figure** from **bokeh.plotting**, and **HoverTool** and **ColumnDataSource** from **bokeh.models**.
- Make a **ColumnDataSource** called **source** with **x** set to the fertility column, **y** set to the life column, **country** set to the Country column, **pop** to population column and **region** to region column. For all columns, select the rows with index value **1970**. This can be done using **data.loc[1970].column_name**

In [23]:
# Cell 05

# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show

# Import the ColumnDataSource and HoverTool class from bokeh.models
from bokeh.models import ColumnDataSource, HoverTool

# Make the ColumnDataSource: source
source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country' : data.loc[1970].Country,
    'pop'     : data.loc[1970].population,
    'region'  : data.loc[1970].region,
})

# Define the chart TOOLS
TOOLS = 'box_zoom,box_select,crosshair,resize,reset'

# Create the figure: p
p = figure(title='1970', x_axis_label='Fertility (children per woman)', y_axis_label='Life Expectancy (years)',
           plot_height=400, plot_width=700,
           tools=[TOOLS, HoverTool(tooltips='@country')])

# Add a circle glyph to the figure p
p.circle(x='x', y='y', fill_alpha=0.8, source=source)

# Output the file and show the figure
output_notebook
show(p)

# Enhancing the plot with some shading

Now that we have the base plot ready, let's color each circle glyph by continent.

Your job is to make a list of the unique regions from the data frame, prepare a **ColorMapper**, and add it to the **circle** glyph.

### Instructions

- Make a list of the unique values from the **region** column. You can use the **unique()** and **tolist()** methods on **data.region** to do this.
- Import **CategoricalColorMapper** from **bokeh.models** and the **Spectral6** palette from **bokeh.palettes**.
- Use the **CategoricalColorMapper()** function to make a color mapper called **color_mapper** with **factors=regions_list** and **palette=Spectral6**.
- Add the **color** mapper to the circle glyph as a dictionary with **dict(field='region', transform=color_mapper)** as the argument passed to the color parameter of **p.circle()**. Also set the **legend** parameter to be the **'region'**.
- Set the **legend.location** attribute of plot to **'bottom_left'**


In [24]:
# Cell 06

# Make a list of the unique values from the region column: regions_list
regions_list = data['region'].unique().tolist()

# Import CategoricalColorMapper from bokeh.models and the Spectral6 palette from bokeh.palettes
from bokeh.palettes import Spectral6
from bokeh.models import CategoricalColorMapper

# Make a color mapper: color_mapper
color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)

# Add the color mapper to the circle glyph
p.circle(x='x', y='y', fill_alpha=0.8, source=source,
         color=dict(field='region', transform=color_mapper), legend='region')

# Set the legend.location attribute of the plot to 'bottom_left'
p.legend.location = 'bottom_left'

# Output the file and show the figure
output_notebook()
show(p)

# All in one

Until now, we've been plotting data only for 1970. In this exercise, you'll add a slider to your plot to change the year being plotted. To do this, you'll create an **update_plot()** function and associate it with a slider to select values between 1970 and 2013.

After you are done, you may have to scroll to the right to view the entire plot.

In [32]:
# Cell 07 - This cell is the same of cell 05, just repeat here the same empty steps

# Import figure from bokeh.plotting
from bokeh.plotting import figure

# Import output_notebook and show from bokeh.io
from bokeh.io import output_notebook, show

# Import the ColumnDataSource and HoverTool class from bokeh.models
from bokeh.models import ColumnDataSource, HoverTool

# Make the ColumnDataSource: source
source = ColumnDataSource(data={
    'x'       : data.loc[1970].fertility,
    'y'       : data.loc[1970].life,
    'country' : data.loc[1970].Country,
    'pop'     : data.loc[1970].population,
    'region'  : data.loc[1970].region,
})

# Define the chart TOOLS
TOOLS = 'box_zoom,box_select,crosshair,resize,reset'

# Create the figure: p
p = figure(title='1970', x_axis_label='Fertility (children per woman)', y_axis_label='Life Expectancy (years)',
           plot_height=400, plot_width=700,
           tools=[TOOLS, HoverTool(tooltips='@country')])

# Add a circle glyph to the figure p
p.circle(x='x', y='y', fill_alpha=0.8, source=source)

# Output the file and show the figure
output_notebook
show(p)

In [33]:
# Cell 08 - This cell is the same of cell 06, just repeat here the same empty steps

# Make a list of the unique values from the region column: regions_list
regions_list = data["region"].unique().tolist()

# Import CategoricalColorMapper from bokeh.models and the Spectral6 palette from bokeh.palettes
from bokeh.models import CategoricalColorMapper
from bokeh.palettes import Spectral6

# Make a color mapper: color_mapper
color_mapper = CategoricalColorMapper(factors=regions_list, palette=Spectral6)

# Add the color mapper to the circle glyph
p.circle(x='x', y='y', fill_alpha=0.8, source=source,
            color=dict(field="region", transform=color_mapper), legend="region")

# Set the legend.location attribute of the plot to 'top_right'
p.legend.location = 'bottom_left'

## Instructions

- Import the **widgetbox** and **row** functions from **bokeh.layouts**, and the **Slider** function from **bokeh.models**.
- Import **HoverTool** from **bokeh.models**.
- Define the **update_plot** callback function with parameters **attr**, **old** and **new**.
- Set the **yr** name to **slider.value** and set **source.data = new_data**.
- Create a **HoverTool** object called **hover** with **tooltips=[('Country', '@country')]**.
- Add the **HoverTool** object you created to the plot using **add_tools()**.
- Make a slider object called **slider** using the **Slider()** function with a **start** year of 1970, **end** year of 2013, **step** of 1, **value** of 1970, and **title** of **'Year'**.
- Attach the callback to the **'value'** property of slider. This can be done using **on_change()** and passing in **'value'** and **update_plot**.
- Make a **row** layout of **widgetbox(slider)** and **p** and add it to the current document.

In [34]:
# Cell 09

# Import the necessary modules
from bokeh.layouts import widgetbox, row
from bokeh.models import Slider

# Import HoverTool from bokeh.models
from bokeh.models import HoverTool

def modify_doc(doc):   
    # Define the callback function: update_plot
    def update_plot(attr, old, new):
        # set the `yr` name to `slider.value` and `source.data = new_data`
        yr = slider.value
        new_data = {
            'x'       : data.loc[yr].fertility,
            'y'       : data.loc[yr].life,
            'country' : data.loc[yr].Country,
            'pop'     : data.loc[yr].population,
            'region'  : data.loc[yr].region,
        }
        source.data = new_data
        
        # Add title to figure: plot.title.text
        p.title.text = 'Gapminder data for %d' % yr
        
    # Create a HoverTool: hover
    hover = HoverTool(tooltips=[("Country", "@country")])

    # Add the HoverTool to the plot
    p.add_tools(hover)

    # Make a slider object: slider
    slider = Slider(title="Year", start=1970, end=2013, step=1, value=1970)


    # Attach the callback to the 'value' property of slider
    slider.on_change('value', update_plot)
    
    # Add the color mapper to the circle glyph
    p.circle(x='x', y='y', fill_alpha=0.8, source=source,
            color=dict(field='region', transform=color_mapper), legend='region')

    # Set the legend.location attribute of the plot to 'top_right'
    p.legend.location = 'bottom_left'

    # Make a row layout of widgetbox(slider) and p and add it to the current document
    layout = row(widgetbox(slider),p)
    
    doc.add_root(layout)

In [35]:
# Cell 10

from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application

handler = FunctionHandler(modify_doc)
app = Application(handler)

from tornado.ioloop import IOLoop
loop = IOLoop.current()

In [36]:
# Cell 11

def show_app(app, notebook_url="127.0.0.1:8888"):
    from IPython.display import HTML, display
    from bokeh.embed import autoload_server
    from bokeh.server.server import Server
    
    server = Server({'/': app}, io_loop=loop, port=0, host='*', allow_websocket_origin=[notebook_url])
    server.start()
    
    script = autoload_server(model=None, url='http://127.0.0.1:%d' % server.port)
    
    display(HTML(script))

In [37]:
# Cell 12

show_app(app)

INFO:bokeh.server.server:Starting Bokeh server version 0.12.4


ERROR:tornado.application:Uncaught exception GET /autoload.js?bokeh-autoload-element=c777c77d-7970-4d2e-96b4-644e8c5f4f00&_=1491749623516 (127.0.0.1)
HTTPServerRequest(protocol='http', host='127.0.0.1:49455', method='GET', uri='/autoload.js?bokeh-autoload-element=c777c77d-7970-4d2e-96b4-644e8c5f4f00&_=1491749623516', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Host': '127.0.0.1:49455', 'Connection': 'keep-alive', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36', 'Accept': '*/*', 'Referer': 'http://localhost:8888/notebooks/Dev/life-expectancy-vs-fertility/LifeExpectancy_x_Fertility.ipynb', 'Accept-Encoding': 'gzip, deflate, sdch, br', 'Accept-Language': 'pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4'})
Traceback (most recent call last):
  File "/home/leandromax/anaconda3/lib/python3.6/site-packages/tornado/web.py", line 1469, in _execute
    result = yield result
  File "/home/leandromax/anaconda3/lib/python3.6/site-

As a final step in enhancing your application, in this exercise you will add dropdowns for interactively selecting different data features. In combination with the hover tool you added in the previous exercise, as well as the slider to change the year, you will have a powerful app that allows you to interactively and quickly extract some great insights from the dataset!

```Python
   # Create a dropdown Select widget for the x data: x_select
   x_select = Select(
     options=['fertility', 'life', 'child_mortality', 'gdp'],
     value='fertility',
     title='x-axis data'
   )
```
```Python
    # Create a dropdown Select widget for the y data: y_select
    y_select = Select(
        options=['fertility', 'life', 'child_mortality', 'gdp'],
        value='life',
        title='y-axis data'
    )
```
![TODO](https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0VnFsU0NxVEZKVk0)