# Case Study: Global Normalcy Index

During the COVID-19 pandemic, the Economist has compiled and published the [Global Normalcy Index](https://github.com/TheEconomist/normalcy-index-data). It is designed to measure the extent by which the economies of countries around the world have returned to their pre-pandemic level. You can read more about this index in [the Economist artical](https://www.economist.com/graphic-detail/tracking-the-return-to-normalcy-after-covid-19).

In this case study, we will create some exploratory visualizations of this dataset and try to reverse engineer one of the charts from the Economist.

## Retrieving the data

We will retrieve the data directly from its url.

In [1]:
import altair as alt
import pandas as pd

url = "https://raw.githubusercontent.com/TheEconomist/normalcy-index-data/main/normalcy-index.csv"
base = alt.Chart(url)
data = pd.read_csv(url)

Note that we have loaded the data separately into Altair and Pandas. This is because we would like Pandas to explore the structure of the data frame. At the same time, we want Altair to directly grab the data from the url to avoid including large amounts of raw data into the output VegaLite specification. Let's take a look at the data first.

In [2]:
data

Unnamed: 0,iso3c,date,cinema,flights,office_occupancy,public_transport,retail_footfall,time_outside,sports_attendance,traffic,overall
0,ARE,2020-02-28,48.886518,96.943468,102.044649,98.573704,98.714286,99.000000,,67.757832,87.088764
1,ARE,2020-02-29,51.289476,96.677830,102.187506,98.529055,98.857143,99.071429,,67.820069,87.376587
2,ARE,2020-03-01,54.000001,96.507063,102.399564,98.435302,99.000000,99.071429,,68.209439,87.729661
3,ARE,2020-03-02,57.018093,96.279374,102.611622,98.270120,99.071429,99.071429,,67.627131,87.996614
4,ARE,2020-03-03,60.343752,95.994762,102.776803,97.962082,99.214286,99.071429,,67.571121,88.307557
...,...,...,...,...,...,...,...,...,...,...,...
46208,ZAF,2022-08-24,,,115.117950,100.261286,116.285714,82.428571,,99.642392,92.417942
46209,ZAF,2022-08-25,,,115.189378,100.706933,117.428571,82.285714,,100.692592,92.814792
46210,ZAF,2022-08-26,,,115.263239,101.379280,118.785714,82.214286,,101.731911,93.288189
46211,ZAF,2022-08-27,,,115.446317,102.227376,120.285714,82.285714,,102.324604,93.792870


There are few things that stand out in this data:
1. The data contain missing values.
2. The data is in the "wide" form.  The activity columns, i.e. `cinema`, `flights`, etc., can be folded into a key column and a value column.

Because we are not using the Pandas dataframe for plotting, there is no need to change it.  Instead, we will purely rely on Altair to tidy up the data during plot time.

## Line chart

First, let's visualize the data using line chart.

In [3]:
base.mark_line().encode(x="date:T", y="overall:Q", color="iso3c:N")

Not surprisingly, we get a line chart that looks like spaghetti due the sheer amount of data.  This is the classic problem of over plotting. One way of avoiding over plotting is to use facets!

In [4]:
base.mark_line().encode(
    x="date:T",
    y="overall:Q",
    facet=alt.Facet("iso3c:N", columns=5)
).properties(width=150, height=50)\
.configure_axis(grid=False, domain=False)

With every country plotted in a separate chart and sharing the same X and Y scales, it is much easier to see the trend and compare between countries.

## Heat map

Next, we will visualize the same data using a heat map.

In [5]:
base.mark_rect().encode(
    x="yearmonth(date):T",
    y="iso3c:N",
    color=alt.Color("mean(overall):Q", scale=alt.Scale(zero=False, scheme="viridis")),
).properties(width=300, height=500)

Note that we use `yearmonth(date)` to bin the input data by year and month, and color encodes the average index value within each bin. See [here](https://altair-viz.github.io/user_guide/transform/timeunit.html) for more details about date/time-based aggregation in Altair.

As seen in the result, heat maps provide a much more concise visualization! This is very useful when the available space for your chart is limited.

## Reverse engineering Economist's chart

Now, let's try to reverse engineering the [original chart](https://www.economist.com/graphic-detail/tracking-the-return-to-normalcy-after-covid-19) provided by the Economist. Let's ignore the interactive aspect of the chart for now.

In the original Economist visualization, each type of activity is visualized using line charts.  The X axis encodes time, and the Y axis encodes the value.  All but one of the lines are grayed out. For this exercise, let us focus on the USA for now.

In [6]:
us_base = base.transform_filter(alt.datum.iso3c=='USA')\
.transform_fold(["cinema", "flights", "office_occupancy", 
                 "public_transport", "retail_footfall",
                 "time_outside", "sports_attendance", 
                 "traffic", "overall"])

line_chart = us_base.mark_line().encode(x="date:T", y="value:Q", detail="key:N", 
                        color=alt.condition(alt.datum.key == "overall", 
                                            alt.value("orange"),
                                            alt.value("lightgray")))
line_chart

Here, we used `trasnform_filter` to focus on data related to the USA. We also used `transform_fold` to convert the data  from its wide form to its long form. As a result, two new columns are created: `key` column contains the activity types, `value` column contains the corresponding index value.

With the transformed data, we can simply plot our line chart. Note that `detail="key:N"` is important. It tells Altair which data points should be grouped together when drawing lines.  To achieve the gray out effect, we used Altair's `condition` function to pick the color based on key value.

Please note there are more than one way of creating this plot. See if you can find other ways!

## Summary

In this exercise, we have demonstrated visualizing the global normalcy data using line charts and heat maps. As expected, the line chart is prone to over-plotting problems, and we solved it using facets. Heat map involves some data aggregation, and it is more space-efficient.

We have also reverse engineered the plot in the original article of the Economist. In order to do it, we used Altair's filter and fold transforms to tidy up the data, and we use its condition function to assign color.