# Python for Data Visualization - Part 2

**Goal:** The goal of this project is to construct more advanced bar/column plots in Python using Bokeh.

**Description:** You are the manager of three schools: Elementary, Middle, and High. Each school has grade levels in the range 1-12, and each grade level has recorded Male, Female, and Total enrollment. You want to build two charts:
 - Plot a stacked bar chart comparing Male and Female enrollment by Grade
 - Plot a grouped bar chart comparing Male and Female enrollment by Grade
 - (Challenge) Plot a grouped bar chart comparing Male and Female enrollment by Grade

## Preparation

Similar to the last tutorial, we want to import our CSV data into a Pandas DataFrame.

In [1]:
import pandas as pd                 # Tell Python we will be using the Pandas set of tools, and nickname it to pd so we can type it quicker
df = pd.read_csv("ClassData.csv")   # Create a DataFrame, call it df, and set its value to the content of our CSV data
df.index += 1                       # Tells the DataFrame to start index labels at 1
print(df)                           # Display our dataframe

        School  Grade  Male  Female  Total
1   Elementary      1    16      15     31
2   Elementary      2    12      15     27
3   Elementary      3    10      18     28
4   Elementary      4    17      13     30
5   Elementary      5    15      15     30
6       Middle      6    11      12     23
7       Middle      7    14      12     26
8       Middle      8    15      11     26
9         High      9    13      14     27
10        High     10    12      16     28
11        High     11    16      14     30
12        High     12    14      14     28


## 2A: Stacked Bar Chart

**Our task is to build a stacked bar chart, comparing male vs. female enrollment for each grade**.

### Step 1: Bokeh Setup

Similar to the last tutorial, we need to tell Python we are using Bokeh with `from` and `import`. We also need to specify where the plots should be displayed (`output_notebook`).

In [2]:
from bokeh.plotting import figure, show    # Tells Python we will use figure and show from Bokeh
from bokeh.io import output_notebook       # Tells Python we will need the output_notebook function
from bokeh.models import ColumnDataSource  # We will need this when preparing our data for a bar/column plot

output_notebook()                          # Tells Python to present Bokeh plots in the notebook

### Step 2: Select Data

We care about three columns of data: Grade (x-axis), Male (y-axis), and Female (y-axis).

In [3]:
grades = ((df['Grade']).apply(str)) # X-axis is the Grade column; we convert it to a string so that it can be read easily by Bokeh
males = df['Male']                  # First stack on the y-axis is the Male column
females = df['Female']              # Second stack on the y-axis is the Female column

Next, put all the data together into one object that Bokeh can easily read.

In [4]:
data = dict(grades=grades, males=males, females=females) # Creates a dictionary to store these three sets of data
source = ColumnDataSource(data=data)                     # Similar to last time, we create a ColumnDataSource object for Bokeh to easily read our data

Importantly, the key step is to tell Bokeh what parts of our data and colors we want for the categories.

In [5]:
categories = ('males', 'females')    # Specifies that we want two categories stacked: males and females
colors = ["#718dbf", "#e84d60"]      # Specifies the two colors we will use

### Step 3: Plot Data

Similar to last time, we first create an empty `figure` for our data.

In [6]:
stacked_visual = figure(title="Total Male vs. Female Enrollment by Grade", x_range=grades, y_range=(0,40), 
                        x_axis_label='Grade', y_axis_label="Enrollment", plot_height=300, plot_width=800)

Next, we plot our data onto the empty figure. Unlike last time, we not using simple bars, but instead stacked bars, thereforefore we use `vbar_stack`. 
 - The first argument `categories` refers to the array we created earlier. This tells Bokeh to use `'males'` and `'females'` for the stacked data
 - The `color` argument tells Bokeh which two colors to use for the categories
 - The `legend_label` argument determines the labels presented in the legend

In [7]:
stacked_visual.vbar_stack(categories, x='grades', width=0.7, color=colors, source=source, legend_label=categories)

[GlyphRenderer(id='1043', ...), GlyphRenderer(id='1056', ...)]

Finally, we add a few touches to clean up our visual, and call `show` to display our graph (like the previous project).

In [8]:
stacked_visual.xgrid.grid_line_color = None
stacked_visual.legend.orientation = "horizontal"
stacked_visual.legend.location = "top_left"

show(stacked_visual)    # Remember to call this, or the graph will not display

## 2B: Grouped Bar Chart

**We will once again compare male and female enrollment, but this time we will group the bars by grade (beside each other) instead of stacking them.**

## Step 1: Setup Bokeh

Because we have already setup Bokeh in this notebook, we should be able to skip this step. However, we will use `FactorRange` from Bokeh in this chart, so we need to `import` it here.

In [22]:
from bokeh.models import FactorRange
from bokeh.transform import factor_cmap

## Step 2: Select Data

We care about the same three columns of data, so our hool (group on x-axis), Grade (x-axis) and Total (y-axis).

In [23]:
categories = ['m', 'f']

x = [(grade, category) for grade in grades for category in categories]
y = sum(zip(data['males'], data['females']), ())

source = ColumnDataSource(data=dict(x=x, y=y))

grouped_visual = figure(title="Total Male vs. Female Enrollment by Grade", x_range=FactorRange(*x), y_range=(0,40),
                        x_axis_label='Grade', y_axis_label="Enrollment", plot_height=300, plot_width=800)

grouped_visual.vbar(x='x', top='y', width=0.7, source=source, fill_color=factor_cmap('x', palette=colors, factors=categories, start=1, end=2))

grouped_visual.xgrid.grid_line_color = None
# grouped_visual.legend.orientation = "horizontal"
# grouped_visual.legend.location = "top_left"

show(grouped_visual)    # Remember to call this, or the graph will not display