# Python and Data Visualization 2 - Bar and Column Charts

**Goal:** The goal of this project is to construct more advanced bar/column plots in Python using Bokeh.

**Description:** You are the manager of three schools: Elementary, Middle, and High. Each school has grade levels in the range 1-12, and each grade level has recorded Male, Female, and Total enrollment. You want to build two charts:
 - Plot a stacked bar chart comparing Male and Female enrollment by Grade
 - Plot a grouped bar chart comparing Male and Female enrollment by Grade
 - (Challenge) Plot a stacked bar chart comparing Male and Female enrollment by School

## Preparation

Similar to the last tutorial, we want to import our CSV data into a Pandas DataFrame.

In [1]:
import pandas as pd                 # Tell Python we will be using the Pandas set of tools, and nickname it to pd so we can type it quicker
df = pd.read_csv("ClassData.csv")   # Create a DataFrame, call it df, and set its value to the content of our CSV data
df.index += 1                       # Tells the DataFrame to start index labels at 1
print(df)                           # Display our dataframe

        School  Grade  Male  Female  Total
1   Elementary      1    16      15     31
2   Elementary      2    12      15     27
3   Elementary      3    10      18     28
4   Elementary      4    17      13     30
5   Elementary      5    15      15     30
6       Middle      6    11      12     23
7       Middle      7    14      12     26
8       Middle      8    15      11     26
9         High      9    13      14     27
10        High     10    12      16     28
11        High     11    16      14     30
12        High     12    14      14     28


## 2A: Stacked Bar Chart

**Our task is to build a stacked bar chart, comparing male vs. female enrollment for each grade**.

### Step 1: Bokeh Setup

Similar to the last tutorial, we need to tell Python we are using Bokeh with `from` and `import`. We also need to specify where the plots should be displayed (`output_notebook`).

In [2]:
from bokeh.plotting import figure, show    # Tells Python we will use figure and show from Bokeh
from bokeh.io import output_notebook       # Tells Python we will need the output_notebook function
from bokeh.models import ColumnDataSource  # We will need this when preparing our data for a bar/column plot

output_notebook()                          # Tells Python to present Bokeh plots in the notebook

### Step 2: Select Data

We care about three columns of data: Grade (x-axis), Male (y-axis), and Female (y-axis). Calling `tolist` isn't necessary, but converts the column into a Python list, making it easier to view in the notebook.

In [3]:
grades = (df['Grade'].apply(str)).tolist() # X-axis is the Grade column; we convert it to a string so that it can be read easily by Bokeh
males = df['Male'].tolist()                # First stack on the y-axis is the Male column
females = df['Female'].tolist()            # Second stack on the y-axis is the Female column
print("Grade Column:")
print(grades)
print("")
print("Male Column:")
print(males)
print("")
print("Female Column:")
print(females)

Grade Column:
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']

Male Column:
[16, 12, 10, 17, 15, 11, 14, 15, 13, 12, 16, 14]

Female Column:
[15, 15, 18, 13, 15, 12, 12, 11, 14, 16, 14, 14]


Next, put all the data together into one object that Bokeh can easily read.

In [4]:
data = dict(grades=grades, males=males, females=females) # Creates a dictionary to store these three sets of data
print(data)
source = ColumnDataSource(data=data)                     # Similar to last time, we create a ColumnDataSource object for Bokeh to easily read our data

{'grades': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12'], 'males': [16, 12, 10, 17, 15, 11, 14, 15, 13, 12, 16, 14], 'females': [15, 15, 18, 13, 15, 12, 12, 11, 14, 16, 14, 14]}


Importantly, the key step is to tell Bokeh what parts of our data and colors we want for the categories.

In [5]:
categories = ('males', 'females')    # Specifies that we want two categories stacked: males and females
colors = ["#718dbf", "#e84d60"]      # Specifies the two colors we will use

### Step 3: Plot Data

Similar to last time, we first create an empty `figure` for our data.

In [6]:
stacked_visual = figure(title="Total Male vs. Female Enrollment by Grade", x_range=grades, y_range=(0,40), 
                        x_axis_label='Grade', y_axis_label="Enrollment", plot_height=300, plot_width=800)

Next, we plot our data onto the empty figure. Unlike last time, we not using simple bars, but instead stacked bars, thereforefore we use `vbar_stack`. 
 - The first argument `categories` refers to the array we created earlier. This tells Bokeh to use `'males'` and `'females'` for the stacked data
 - The `color` argument tells Bokeh which two colors to use for the categories
 - The `legend_label` argument determines the labels presented in the legend

In [7]:
stacked_visual.vbar_stack(categories, x='grades', width=0.7, color=colors, source=source, legend_label=categories)

[GlyphRenderer(id='1041', ...), GlyphRenderer(id='1056', ...)]

Finally, we add a few touches to clean up our visual, and call `show` to display our graph (like the previous project).

In [8]:
stacked_visual.xgrid.grid_line_color = None
stacked_visual.legend.orientation = "horizontal"
stacked_visual.legend.location = "top_left"

show(stacked_visual)    # Remember to call this, or the graph will not display

**Challenge**: Move the legend to the top right of the graph.

**Challenge**: Figure out how to change the bars so that they are all touching.

## 2B: Grouped Bar Chart

**We will once again compare male and female enrollment, but this time we will group the bars by grade (beside each other) instead of stacking them.**

## Step 1: Bokeh Setup

Because we have already set up Bokeh in this notebook, we should be able to skip this step. However, we need `FactorRange` and `factor_cmap` from Bokeh in this chart, and so we need to `import` them here.

In [9]:
from bokeh.models import FactorRange
from bokeh.transform import factor_cmap

## Step 2: Select Data

We care about the same three columns of data: Grade, Male, and Female. We can once again view the data by printing each column.

In [10]:
print("Grade Column:")
print(grades)
print("")
print("Male Column:")
print(males)
print("")
print("Female Column:")
print(females)

Grade Column:
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']

Male Column:
[16, 12, 10, 17, 15, 11, 14, 15, 13, 12, 16, 14]

Female Column:
[15, 15, 18, 13, 15, 12, 12, 11, 14, 16, 14, 14]


In the next few lines of code, we prepare our data to be grouped by category.

In [11]:
categories = ['m', 'f']                                                # We rename the categories so the labels display better on our graph
x = [(grade, category) for grade in grades for category in categories] # X-axis is every possible combination of grade and category
y = sum(zip(data['males'], data['females']), ())                       # Y-axis is the total count of males and females by grade and category
print("X Data:")
print(x)
print("")
print("Y Data:")
print(y)

X Data:
[('1', 'm'), ('1', 'f'), ('2', 'm'), ('2', 'f'), ('3', 'm'), ('3', 'f'), ('4', 'm'), ('4', 'f'), ('5', 'm'), ('5', 'f'), ('6', 'm'), ('6', 'f'), ('7', 'm'), ('7', 'f'), ('8', 'm'), ('8', 'f'), ('9', 'm'), ('9', 'f'), ('10', 'm'), ('10', 'f'), ('11', 'm'), ('11', 'f'), ('12', 'm'), ('12', 'f')]

Y Data:
(16, 15, 12, 15, 10, 18, 17, 13, 15, 15, 11, 12, 14, 12, 15, 11, 13, 14, 12, 16, 16, 14, 14, 14)


Note: `x` is an array of (grade, category) pairs, while `y` is a matching array containing the count of individuals in that grade and category. Once again, we wrap the data into a type that Bokeh can easily understand; `ColumnDataSource`.

In [12]:
data = dict(x=x, y=y)
print(data)
source = ColumnDataSource(data=data)

{'x': [('1', 'm'), ('1', 'f'), ('2', 'm'), ('2', 'f'), ('3', 'm'), ('3', 'f'), ('4', 'm'), ('4', 'f'), ('5', 'm'), ('5', 'f'), ('6', 'm'), ('6', 'f'), ('7', 'm'), ('7', 'f'), ('8', 'm'), ('8', 'f'), ('9', 'm'), ('9', 'f'), ('10', 'm'), ('10', 'f'), ('11', 'm'), ('11', 'f'), ('12', 'm'), ('12', 'f')], 'y': (16, 15, 12, 15, 10, 18, 17, 13, 15, 15, 11, 12, 14, 12, 15, 11, 13, 14, 12, 16, 16, 14, 14, 14)}


Now, we create the `figure` and plot our data with `vbar`. We use `vbar` and not `vbar_stack`. We also use `FactorRange(*x)` to parse the (grade, category) pairs we created earlier. Finally, we use `factor_cmap` to map the two colors to the two categories.

In [13]:
grouped_visual = figure(title="Total Male vs. Female Enrollment by Grade", x_range=FactorRange(*x), y_range=(0,40),
                        x_axis_label='Grade', y_axis_label="Enrollment", plot_height=300, plot_width=800)

grouped_visual.vbar(x='x', top='y', width=0.7, source=source, fill_color=factor_cmap('x', palette=colors, factors=categories, start=1, end=2))

grouped_visual.xgrid.grid_line_color = None
show(grouped_visual)                           # Remember to call this, or the graph will not display

## 2C (Challenge): Stacked Bar Chart - Male and Female Enrollment by School 

**A more challenging task is to plot a stacked chart of Male/Female enrollment by School instead of Grade**

In reality, this is quite similar to the first chart we made in this notebook. We simply need to change the input data, and we can use the same graphing function. First, we need to aggregate the male and female enrollment by school. We start by splitting our DataFrame into smaller ones, by School.

In [14]:
grouped_df = df.groupby(['School'])                # Tells Python to group the data by the School column
schools = {}                                       # Creates an empty dictionary to store each school's DataFrame
for group in grouped_df.groups:                    # For every group in our grouped_df (for every school), get data only on
    schools[group] = grouped_df.get_group(group)   # that group (school), and put it into our dictionary 
    
print("Elementary")
print(schools['Elementary'])
print("")
print("Middle")
print(schools['Middle'])
print("")
print("High")
print(schools['High'])

Elementary
       School  Grade  Male  Female  Total
1  Elementary      1    16      15     31
2  Elementary      2    12      15     27
3  Elementary      3    10      18     28
4  Elementary      4    17      13     30
5  Elementary      5    15      15     30

Middle
   School  Grade  Male  Female  Total
6  Middle      6    11      12     23
7  Middle      7    14      12     26
8  Middle      8    15      11     26

High
   School  Grade  Male  Female  Total
9    High      9    13      14     27
10   High     10    12      16     28
11   High     11    16      14     30
12   High     12    14      14     28


We can now calculate totals for Male and Female enrollment for each school.

In [15]:
school_names = ["Elementary", "Middle", "High"]
males = []
females = []
for school in school_names:
    # Calulate the sum of Male enrollment for the given school and add to males array
    males.append(schools[school]['Male'].sum())
    # Calculate the sum of Female enrollment for the given school and add to females array
    females.append(schools[school]['Female'].sum()) 

print("Schools")
print(school_names)
print("")
print("Males")
print(males)
print("")
print("Females")
print(females)

Schools
['Elementary', 'Middle', 'High']

Males
[70, 40, 55]

Females
[76, 35, 58]


Finally, we can use the exact same steps we followed in 2A to create a stacked chart. First, let's wrap our data into `ColumnDataSource`

In [16]:
data = dict(school_names=school_names, males=males, females=females) # Creates a dictionary to store these three sets of data
print(data)
source = ColumnDataSource(data=data) # Similar to last time, we create a ColumnDataSource object for Bokeh to easily read our data

{'school_names': ['Elementary', 'Middle', 'High'], 'males': [70, 40, 55], 'females': [76, 35, 58]}


Then, we create an empty figure before we can plot our data. Note that we change `grades` to `school_names` because that is our new x-axis. We also changed the `y_range` to display the visual more clearly.

In [17]:
categories = ('males', 'females')    # Specifies that we want two categories stacked: males and females
colors = ["#718dbf", "#e84d60"]      # Specifies the two colors we will use

challenge_visual = figure(title="Total Male vs. Female Enrollment by Grade", x_range=school_names, y_range=(0,200), 
                        x_axis_label='School', y_axis_label="Enrollment", plot_height=300, plot_width=800)

Finally, we plot our data and add some visual refinements.

In [18]:
challenge_visual.vbar_stack(categories, x='school_names', width=0.7, color=colors, source=source, legend_label=categories)

challenge_visual.xgrid.grid_line_color = None
challenge_visual.legend.orientation = "horizontal"
challenge_visual.legend.location = "top_left"

show(challenge_visual)    # Remember to call this, or the graph will not display

## Exercise

To test your understanding, try creating a similar graph to the one above, except grouping the categories instead of stacking them.

For more information, check out the documentation at: https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html.