#### A data exploration in Altair:
## 3. Start exploring


Contact: jonas.oesch@nzz.ch

Import the necessary libraries and don't include the data in Vega-Lite specifications:

In [46]:
import pandas as pd
import altair as alt

alt.data_transformers.enable('data_server')

DataTransformerRegistry.enable('data_server')

Read the data, convert into correct types and preview:

In [47]:
data = pd.read_excel("Olympics.xlsx")
data.Year = pd.to_datetime(data.Year)
data.head(3)

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Gender,Event,Medal,Country,Code,...,Durability,Endurance,Flexibility,Hand-Eye Coordination,Nerve,Power,Rank,Speed,Strength,Total
0,1896-01-01,Athens,Aquatics,Swimming,"HAJOS, Alfred",Men,100M Freestyle,Gold,Hungary,HUN,...,4.63,9.25,5.5,2.88,2.63,4.63,36,5.5,5.25,46.875
1,1896-01-01,Athens,Aquatics,Swimming,"HAJOS, Alfred",Men,100M Freestyle,Gold,Hungary,HUN,...,3.25,4.13,5.5,2.75,2.5,6.25,45,7.88,5.25,44.125
2,1896-01-01,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",Men,100M Freestyle,Silver,Austria,AUT,...,4.63,9.25,5.5,2.88,2.63,4.63,36,5.5,5.25,46.875


We are free to encode any data in any visual form we want. Let's see when women could participate for the first time and use the y-axis for time for a change:

In [48]:
alt.Chart(data).mark_circle().encode(
    x="Gender",
    y="Year"
)

A bit strange. Let's get back to normal. And to make it a bit easier on the eyes, let's separate the genders also by color. It gets clear, from the second olympic games on, women have participated.

In [49]:
alt.Chart(data).mark_circle().encode(
    y="Gender",
    x="Year",
    color="Gender"
)

How many women have won medals compared to men?

In [55]:
alt.Chart(data).mark_rect().encode(
    y="Gender",
    x="count()"
)

Now I'm wondering, why there is a higer cadence of games after 1990. I assume that it has something to do with the winter olympics …

In [51]:
alt.Chart(data).mark_circle().encode(
    x="Year",
    color="Season"
)

Indeed, summer and winter olympics alternate after 1990. But something is strange before that …

In [56]:
alt.Chart(data).mark_circle().encode(
    x="Year",
    y="Season",
    color="Season"
)

Winter games started in the 1920s and where held in the same year as the summer games. After 1990 they shifted them by two years.

A little word about data-types. Normally, Altair can infer them from the data frame. But not always. It understands, that a date is `temporal`. But we might simply want to display the events in order:

In [57]:
alt.Chart(data).mark_circle().encode(
    x="Year:O",
    y="Season",
    color="Season"
)

For these cases, and when Altair can't guess the right type, we can add a little suffix behind the column name to indicate, what we want. The suffixes are:
* `N`: For "nominal" or categorical data. Like city names.
* `O`: For discrete but ordered data: "ordinal".
* `Q`: For "quantitative" data aka continous numbers.
* `T`: For "temporal" data like dates.
* `G`: For "geodata"