# Altair graphics

[Altair](https://altair-viz.github.io/) is based on [Vega](https://vega.github.io/vega/) which is named after the [Summer Triangle](https://en.wikipedia.org/wiki/Summer_Triangle).

Reference materials:

* https://altair-viz.github.io/gallery/index.html
* https://altair-viz.github.io/user_guide/saving_charts.html

In [3]:
# !pip install altair

In [4]:
import pandas as pd
import altair as alt

# Scatters/bars (countries)

In [5]:
df = pd.read_csv("countries.csv")
df.head()

Unnamed: 0,country,continent,gdp_per_capita,life_expectancy,population
0,Afghanistan,Asia,663,54.863,22856302
1,Albania,Europe,4195,74.2,3071856
2,Algeria,Africa,5098,68.963,30533827
3,Angola,Africa,2446,45.234,13926373
4,Antigua and Barbuda,N. America,12738,73.544,77656


In [22]:
alt.Chart(df).mark_circle().encode(
    x='gdp_per_capita',
    y='life_expectancy',
    color='continent',
    tooltip=['country', 'gdp_per_capita']
)

# Lines/areas (Electricity)

We'll need to reshape our dataset to make it work! We need **one measurement per row**.

In [170]:
df = pd.read_csv("electricity.csv")
df.head()

Unnamed: 0,year,Fossil Fuels,Nuclear Energy,Renewables
0,2001-01-01,35361,3853,1437
1,2002-01-01,35991,4574,1963
2,2003-01-01,36234,3988,1885
3,2004-01-01,36205,4929,2102
4,2005-01-01,36883,4538,2724


In [171]:
df = df.melt(id_vars='year', var_name='source', value_name='amount')
df.head()

Unnamed: 0,year,source,amount
0,2001-01-01,Fossil Fuels,35361
1,2002-01-01,Fossil Fuels,35991
2,2003-01-01,Fossil Fuels,36234
3,2004-01-01,Fossil Fuels,36205
4,2005-01-01,Fossil Fuels,36883


In [173]:
alt.Chart(df).mark_line().encode(
    x="year:T",
    y="amount",
    color="source"
)


In [174]:
alt.Chart(df).mark_area().encode(
    x="year:T",
    y="amount",
    color="source"
)

In [175]:
alt.Chart(df).mark_area().encode(
    x="year:T",
    y="amount:Q",
    color="source:N"
)


## Time transforms (Lumber prices)

See more at https://altair-viz.github.io/user_guide/transform/timeunit.html

In [176]:
df = pd.read_csv("lumber-prices-clean.csv", parse_dates=['market_date'])
df.head()

Unnamed: 0,price,market_date
0,407.0,1996-12-09
1,426.0,1997-01-02
2,408.5,1997-02-03
3,386.0,1997-03-03
4,378.0,1997-04-01


In [178]:
alt.Chart(df).mark_line().encode(
    x="market_date:T",
    y="price:Q"
)

In [189]:
alt.Chart(df).mark_line().encode(
    x="year(market_date):T",
    y="median(price):Q"
)

In [190]:
alt.Chart(df).mark_line().encode(
    x="yearmonth(market_date):T",
    y="median(price):Q"
)

In [192]:
alt.Chart(df).mark_bar().encode(
    x="month(market_date):T",
    y="median(price):Q"
)

# Aggregates, sorting, stacking and more

In [48]:
df = pd.read_csv("countries.csv")
df.head()

Unnamed: 0,country,continent,gdp_per_capita,life_expectancy,population
0,Afghanistan,Asia,663,54.863,22856302
1,Albania,Europe,4195,74.2,3071856
2,Algeria,Africa,5098,68.963,30533827
3,Angola,Africa,2446,45.234,13926373
4,Antigua and Barbuda,N. America,12738,73.544,77656


In [66]:
alt.Chart(
    df,
    title='China and India are the most populous nations'
).mark_bar().encode(
    x='population',
    y=alt.X('country', sort='-x')
)

In [77]:
alt.Chart(
    df,
    title='China and India are the most populous nations'
).mark_bar().encode(
    # y='continent',
    y=alt.Y('continent', sort='-x'),
    x='population',
#    color='country'
    color=alt.Color('country', scale=alt.Scale(range=['beige', 'pink']), legend=None)
)

In [89]:
alt.Chart(
    df,
    title='China and India are the most populous nations'
).mark_circle().encode(
    y=alt.Y('continent'),
    x='population',
    color=alt.Color('continent', legend=None)
)

In [49]:
df.groupby('continent').population.sum()

continent
Africa         809892820
Asia          3849172861
Europe         596440013
N. America     481999240
Oceania         30272328
S. America     347265096
Name: population, dtype: int64

In [91]:
# https://github.com/d3/d3-format#locale_format
# https://vega.github.io/vega-lite/docs/format.html

alt.Chart(
    df,
    title='Asia is the most populous continent'
).mark_bar().encode(
    y=alt.Y('continent:N', sort='-x'),
    x=alt.X('sum(population)', axis=alt.Axis(format='~s')),

)

# Layered charts

In [194]:
df = pd.read_csv("countries.csv")
df.head()

Unnamed: 0,country,continent,gdp_per_capita,life_expectancy,population
0,Afghanistan,Asia,663,54.863,22856302
1,Albania,Europe,4195,74.2,3071856
2,Algeria,Africa,5098,68.963,30533827
3,Angola,Africa,2446,45.234,13926373
4,Antigua and Barbuda,N. America,12738,73.544,77656


In [195]:
start = alt.Chart(df).mark_tick(color='blue').encode(
    y='continent',
    x='min(gdp_per_capita)'
)

end = alt.Chart(df).mark_tick(color='red').encode(
    y='continent',
    x='max(gdp_per_capita)'
)

fill = alt.Chart(df).mark_rect(opacity=0.1, color='red').encode(
    y='continent',
    x='min(gdp_per_capita)',
#     x1='min(population)',
    x2='max(gdp_per_capita)'
)


start + end + fill