# Case Study: Waterfall Chart

In this case study, we will reverse engineer a [waterfall chart](https://vega.github.io/vega-lite/examples/waterfall_chart.html) using Altair. Waterfall chart is a common chart that can reveal the cumulative effect of data. While Altair is good support for waterfall chart via its bar mark, the difficult part is getting all the annotation right. It is a good exercise to test your Altair skills. Let's get started!

## Data

In [1]:
import altair as alt
import pandas as pd
import numpy as np

source = pd.DataFrame({"label": ["Begin", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "End"],
                      "amount": [4000, 1707, -1425, -1030, 1812, -1067, -1481, 1228, 1176, 1146, 1205, -1388, 1492, 0]})
source.head()

Unnamed: 0,label,amount
0,Begin,4000
1,Jan,1707
2,Feb,-1425
3,Mar,-1030
4,Apr,1812


The dataset contains 2 columns: "label" and "amount".  Labels are mostly months.  It contains two special labels: "Begin" and "End" for the first and last datum.  The "amount" columns stores the amount of changes corresponding to each month. The amount associated with "Begin" and "End" are absolute amount at the begining and at the end. What makes generating this chart hard is that the first datum and the last datum are semantically different from the other datum.

## Visualization

The bar mark in Altair allows us to specify both the bottom and the top location of each bar. Let's first extract these information into their own columns.

In [2]:
source['y2'] = np.cumsum(source['amount'])
source['y1'] = np.roll(source['y2'], 1)
source.at[0, 'y1'] = 0
source.at[13, 'y1'] = 0

source.tail()

Unnamed: 0,label,amount,y2,y1
9,Sep,1146,6066,4920
10,Oct,1205,7271,6066
11,Nov,-1388,5883,7271
12,Dec,1492,7375,5883
13,End,0,7375,0


We added two columns, `y1` and `y2`, representing the top and bottom location of each bar.  Note that `y1` may be smaller than `y2` for some observations, and that is OK. Altair is able generate the correct bar shape if the top and bottom locations are flipped.

### First try

Next, let's use Altair's `mark_bar` and see the resulting plot.

In [3]:
base = alt.Chart(source)
bar_layer = base.mark_bar().encode(x="label:N", 
                                   y="y1:Q", 
                                   y2="y2:Q")
bar_layer.properties(width=800, height=450)

Hmm, this does not look right...  What went wong is that Altair reordered the bars using __alphabatical__ ordering of the months! Let's fix that by setting the `sort` property of `x` encoding to `None`.

In [4]:
base = alt.Chart(source)
bar_layer = base.mark_bar().encode(x=alt.X("label:N", sort=None), 
                                   y="y1:Q", 
                                   y2="y2:Q")
bar_layer.properties(width=800, height=450)

### Adding Color

Now the bars looks correct!  Let's add some color for bars.

In [5]:
source['profit'] = np.where(source['y2'] > source['y1'], "profit", "loss")
source.at[0, 'profit'] = "NA"
source.at[13, 'profit'] = "NA"

base = alt.Chart(source)
bar_layer = base.mark_bar().encode(x=alt.X("label:N", sort=None), 
                                   y="y1:Q", 
                                   y2="y2:Q",
                                   color="profit")
bar_layer.properties(width=800, height=450)

Here we add a new column named "profit" to the data frame. The possible value of the "profit" are "profit", "loss" and "NA". When encode "profit" as color, we get a very reasonable looking bar chart. Altair picks the default color palette for nominal data.

For the sake of this exercise, let's try to be exact.  For the bars corresponding to "Begin" and "End", we want to use the color `#f7e0b6`. For bars corresponding to a month, we will color it `#93c4aa` if the "amount" is positive.  Otherwise, we will use the color `#f78a64`.

In [6]:
color_domain = ["NA", "profit", "loss"]
color_range = ["#f7e0b6", "#93c4aa", "#f78a64"]

bar_layer = base.mark_bar().encode(x=alt.X("label:N", sort=None, title="Month", axis=alt.Axis(labelAngle=0)), 
                                   y=alt.Y("y1:Q", title="Amount"), 
                                   y2=alt.Y2("y2:Q"),
                                   color=alt.Color("profit", 
                                                   scale=alt.Scale(domain=color_domain, range=color_range), 
                                                   legend=None))
bar_layer.properties(width=800, height=450)

As shown above, Altair allows us to set color by setting the domain and range properties of the scale corresponding to color encoding. This is super handy for categorical data where we want to have full control of the color palette. Also note that we turned off the color legend by setting it to `None`.

### Adding Line Annotations

This is as much as we can do with simple bar chart.  Comparing to our [target plot](https://vega.github.io/vega-lite/examples/waterfall_chart.html), we are still missing some line segments and some texts. Let's start with the lines.

In [7]:
source['label2'] = np.roll(source['label'], -1)
source.at[13, 'label2'] = "End"

rule_layer = alt.Chart(source).mark_rule(xOffset=-25, x2Offset=25)\
    .encode(y=alt.Y("y2"), 
            x=alt.X("label:N", sort=None), 
            x2=alt.X2("label2:N"))
chart = bar_layer + rule_layer
chart.properties(width=800, height=450)

Here, we use `mark_rule` to add the line segments. Because each line segment spans two bars, we need to add a new column to our dataset named `label2`. `label2` is basically the offset version of `label`. By setting the `x` to `label` and `x2` to `label2`, each line segment starts from the middle of bar specified by `label` column and end at the middle of bar specified by `label2` column. Lastly, we extends each line a bit to left and right using the `xOffset` and `x2Offset` properties.  Feel free to adjust these settings and see how they change the chart.

### Adding Text Annotations

With the lines in place, we now add the texts.

In [8]:
source['mid_y'] = (source['y1'] + source['y2'])/2
text_layer = alt.Chart(source)
profit_text_layer = text_layer.mark_text(align="center", baseline="bottom", dy=-2)\
    .encode(x=alt.X("label:N", sort=None), y="y2", text="y2")\
    .transform_filter(alt.datum.profit != "loss")
loss_text_layer = text_layer.mark_text(align="center", baseline="top", dy=2)\
    .encode(x=alt.X("label:N", sort=None), y="y2", text="y2")\
    .transform_filter(alt.datum.profit == "loss")
amount_text_layer = text_layer.mark_text(color="white")\
    .encode(x=alt.X("label:N", sort=None), y="mid_y", text="amount")\
    .transform_filter(alt.datum.profit != "NA")
end_text_layer = text_layer.mark_text()\
    .encode(x=alt.X("label:N", sort=None), y="mid_y", text="y2")\
    .transform_filter(alt.datum.profit == "NA")
chart = bar_layer + rule_layer + profit_text_layer + loss_text_layer + amount_text_layer + end_text_layer
chart.properties(width=800, height=450)

As shown in the code above, there are actually 4 different set of texts:

1. Texts above the bars.
2. Texts below the bars.
3. Texts inside the bars corresponding each month.
4. Texts inside the bars corresponding to "Begin" and "end".

Group 1 and 2 are both showing the final accumulative amount.  However, because their locations are different, I find it easier to specify them in separate layers. While group 3 and 4 are both located in the middle of the bars, they are showing different types of texts. Group 3 shows the data from the "amount" column to indicates the monthly profit or loss amount. Group 4 shows data from the "y2" column indicating the total amount at the beginning and the end.

We use 4 text layers to add the texts for the 4 groups. In each layer, we use [filter transform](https://altair-viz.github.io/user_guide/transform/filter.html#user-guide-filter-transform) to specify the observation (i.e. bars) for which we want the show the texts. For group 3 and 4, we also added a new column to the dataset corresponding to the middle value of each bar.

## Summary

So here we have it.  We have reverse engineered the [target waterfall chart](https://vega.github.io/vega-lite/examples/waterfall_chart.html). Our result is pretty close the target chart. As an exercise, can you spot the difference between our chart and the [target chart](https://vega.github.io/vega-lite/examples/waterfall_chart.html)? Can you modify our chart to look _exactly_ the same as the target chart?

Here are some the key points we touched in this case study:
* Waterfall chart can be achieved using "bar" mark in Altair.
* Bars are ordered alphabetically by default, but it can be easily turn off.
* With the domain and range property of color encoding, we can control the colors of the bars.
* We use "rule" mark to add horizontal line segments to the plot.
* It is often easier to build charts one layer at a time. For example, we add text annotations in 4 separate layers.