#### A data exploration in Altair:
## 7. Building the right view. From the bottom up


Contact: jonas.oesch@nzz.ch

Import the necessary libraries and don't include the data in Vega-Lite specifications:

In [68]:
import pandas as pd
import altair as alt
from altair import datum

alt.data_transformers.enable('data_server')

DataTransformerRegistry.enable('data_server')

Read the data, convert into correct types and preview:

In [20]:
data = pd.read_excel("Olympics.xlsx")
data.Year = pd.to_datetime(data.Year)
data.head(3)

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Gender,Event,Medal,Country,Code,...,Durability,Endurance,Flexibility,Hand-Eye Coordination,Nerve,Power,Rank,Speed,Strength,Total
0,1896-01-01,Athens,Aquatics,Swimming,"HAJOS, Alfred",Men,100M Freestyle,Gold,Hungary,HUN,...,4.63,9.25,5.5,2.88,2.63,4.63,36,5.5,5.25,46.875
1,1896-01-01,Athens,Aquatics,Swimming,"HAJOS, Alfred",Men,100M Freestyle,Gold,Hungary,HUN,...,3.25,4.13,5.5,2.75,2.5,6.25,45,7.88,5.25,44.125
2,1896-01-01,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",Men,100M Freestyle,Silver,Austria,AUT,...,4.63,9.25,5.5,2.88,2.63,4.63,36,5.5,5.25,46.875


Show me the countries again:

In [27]:
c1 = alt.Chart(data).mark_bar().encode(
    x="count()",
    y=alt.Y("Country", sort="-x")
)
c1

Seeing how Switzerland has done over time …

In [72]:
alt.Chart(data).mark_bar().transform_filter(datum.Country == 'Switzerland').encode(
    x="Year",
    y="count()"
)

And now compared to other countries

In [70]:
alt.Chart(
    data[data.Country.isin([
    "Switzerland", 
    "Italy",
    "Sweden",
    "Norway",
    "Finland",
    "Netherlands"
    ]
)]).mark_bar(size=8).encode(
    x="Year",
    y="count()",
    row="Country",
    color="Season"
).properties(height=100, width=600)

In [66]:
d2 = data[data.Country.isin([
    "United States",
    "Australia",
    "United Kingdom",
    "Germany",
    "France",
    "Canada",
    "Italy",
    "Sweden",
    "Netherlands",
    "Hungary",
    "China",
    "Russia",
    "Japan",
    "Korea, South",
    "Norway",
    "Finland",
    "Switzerland"
])]
d2.head(2)

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Gender,Event,Medal,Country,Code,...,Durability,Endurance,Flexibility,Hand-Eye Coordination,Nerve,Power,Rank,Speed,Strength,Total
0,1896-01-01,Athens,Aquatics,Swimming,"HAJOS, Alfred",Men,100M Freestyle,Gold,Hungary,HUN,...,4.63,9.25,5.5,2.88,2.63,4.63,36,5.5,5.25,46.875
1,1896-01-01,Athens,Aquatics,Swimming,"HAJOS, Alfred",Men,100M Freestyle,Gold,Hungary,HUN,...,3.25,4.13,5.5,2.75,2.5,6.25,45,7.88,5.25,44.125


In [73]:
sel = alt.selection_multi(fields=["Country"])

(
    alt.Chart(d2)
        .mark_bar()
        .encode(
            y=alt.Y("Country", sort="-x"),
            x="count()",
            color=alt.condition(sel, 
                                alt.ColorValue("orange"), 
                                alt.ColorValue("blue"))
        )
        .add_selection(sel) 
    & 
    alt.Chart(d2)
        .mark_bar(size=8)
        .transform_filter(sel)
        .encode(
            x="Year",
            y="count()",
            row="Country",
            color="Season"
        )
        .properties(height=100, width=600)
)

And we see that the medal counts are not really all that comparable, looking at Russia, China and Germany ;-)

For another introduction into Altair, you can watch the author for 2,5 hours explaining what he thinks are the best parts: https://www.youtube.com/watch?v=ms29ZPUKxbU&t=1986s

And follow along with this tutorial: https://altair-viz.github.io/altair-tutorial/README.html

A good way to see what is possible are also the examples in the documentation: https://altair-viz.github.io/gallery/index.html As you can see, you can have a lot of fun with Altair. Although some of the examples get quite complex.

But because you can work from the bottom up in Altair, you can always start small and add bells and whistles later.

Others like it too: http://fernandoi.cl/blog/posts/altair/