# Altair Notebook

# Contents:
1. Introduction & Basics
3. Types of Transformations
4. Types of Marks i.e. Graph Types
5. Types of Charts
6. Customizing Visualizations

## 1. Introduction & Basics

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.

* Altair Documentation Link: http://github.com/altair-viz/altair  <br>
* Github link: https://altair-viz.github.io/index.html

### Data Types in Altair

Quantitative - Q <br>
Ordinal - O <br>
Nominal - N <br>
Temporal - T <br>
GeoJSON - G <br>
<br>

We need to specify Data Types when defining X & Y axes<br>
 1. Select the Dataset and type of graph - <b>alt.Chart(Dataframe).mark_(graph-type)</b>.encode(x="x:O",y="y:O",color="z:O")<br>
 2. Encode i.e. determine the x-axis, y-axis and other features - alt.Chart(Dataframe).mark_(graph-type)<b>.encode(x="x:O",y="y:O",color="z:O")</b>
    
Example:<br>
alt.Chart(pandas.DataFrame).mark_ChartType.encode(x="X-axis-attribute:O", y="Y-axis-attribute:Q", color="Color-attribute:O")

In [1]:
import altair as alt
import numpy as np
import pandas as pd
from vega_datasets import data
alt.data_transformers.disable_max_rows()

x,y = np.meshgrid(range(-5,5),range(-5,5))
z = x**2 + y**2

source = pd.DataFrame(({"x":x.ravel(), "y":y.ravel(), "z":z.ravel()}))
source.head()

Unnamed: 0,x,y,z
0,-5,-5,50
1,-4,-5,41
2,-3,-5,34
3,-2,-5,29
4,-1,-5,26


In [2]:
alt.Chart(source).mark_rect().encode(x="x:Q", y="y:O", color="z:O")

### Some important functions
1. alt.limit_rows(DataFrame, max_rows={{int}}) - raises error if number of rows in dataset exceed the max rows set
2. alt.sample(DataFrame, n={{int}}) - sample data for a given dataset
3. alt.to_json(DataFrame, prefix="") - saves dataframe to .json file
4. alt.to_csv(DataFrame, prefix="") - saves dataframe to .csv file
5. alt.to_values(DataFrame) - converts dataframe to JSON format
6. alt.data_transformers.disable_max_rows() - disables the max rows limit condition
7. pipe() - creates a sequence of operations

In [3]:
# raise error if number of rows in dataset exceed the max rows set
alt.limit_rows(source, max_rows=10)

MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (10). For information on how to plot larger datasets in Altair, see the documentation

In [4]:
# sample data for a given dataset. In below example, it will randomly sample 5 rows
print(alt.sample(source, n=5))
print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
# Saves dataframe to .json file
alt.to_json(source, prefix="sample_json_save")

# Saves dataframe to .csv file
alt.to_csv(source, prefix="sample_csv_save")

# Converts dataframe to JSON format
print(alt.to_values(source))
print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
# By default, altair limits the number of rows in dataset to 5000. To prevent this from happening, use:
alt.data_transformers.disable_max_rows()

# To create sequence of operations, use pipe
from altair import sample, to_values
from toolz.curried import pipe

pipe(source, sample(n=10), to_values)

    x  y   z
34 -1 -2   5
22 -3 -3  18
90 -5  4  41
97  2  4  20
53 -2  0   4
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
{'values': [{'x': -5, 'y': -5, 'z': 50}, {'x': -4, 'y': -5, 'z': 41}, {'x': -3, 'y': -5, 'z': 34}, {'x': -2, 'y': -5, 'z': 29}, {'x': -1, 'y': -5, 'z': 26}, {'x': 0, 'y': -5, 'z': 25}, {'x': 1, 'y': -5, 'z': 26}, {'x': 2, 'y': -5, 'z': 29}, {'x': 3, 'y': -5, 'z': 34}, {'x': 4, 'y': -5, 'z': 41}, {'x': -5, 'y': -4, 'z': 41}, {'x': -4, 'y': -4, 'z': 32}, {'x': -3, 'y': -4, 'z': 25}, {'x': -2, 'y': -4, 'z': 20}, {'x': -1, 'y': -4, 'z': 17}, {'x': 0, 'y': -4, 'z': 16}, {'x': 1, 'y': -4, 'z': 17}, {'x': 2, 'y': -4, 'z': 20}, {'x': 3, 'y': -4, 'z': 25}, {'x': 4, 'y': -4, 'z': 32}, {'x': -5, 'y': -3, 'z': 34}, {'x': -4, 'y': -3, 'z': 25}, {'x': -3, 'y': -3, 'z': 18}, {'x': -2, 'y': -3, 'z': 13}, {'x': -1, 'y': -3, 'z': 10}, {'x': 0, 'y': -3, 'z': 9}, {'x': 1, 'y': -3, 'z': 10}, {'x': 2, 'y': -

{'values': [{'x': -3, 'y': 4, 'z': 25},
  {'x': -4, 'y': 3, 'z': 25},
  {'x': 2, 'y': -2, 'z': 8},
  {'x': 2, 'y': -4, 'z': 20},
  {'x': 1, 'y': 0, 'z': 1},
  {'x': 0, 'y': -2, 'z': 4},
  {'x': 0, 'y': -4, 'z': 16},
  {'x': -1, 'y': 2, 'z': 5},
  {'x': -5, 'y': -4, 'z': 41},
  {'x': 4, 'y': 4, 'z': 32}]}

<hr>

## 2. Types of Transformations

* Aggregate
* Filter
* Sample
* Timeunit

#### Aggregate Transform

In [5]:
df_aggregate = data.cars()
print(df_aggregate.head())

alt.Chart(df_aggregate).mark_rect().encode(
    x="Cylinders:O", y="Miles_per_Gallon:Q"
).transform_aggregate(Miles_per_Gallon="mean(Miles_per_Gallon)", groupby=["Cylinders"]).properties(
    width=200,
    height=200
)

                        Name  Miles_per_Gallon  Cylinders  Displacement  \
0  chevrolet chevelle malibu              18.0          8         307.0   
1          buick skylark 320              15.0          8         350.0   
2         plymouth satellite              18.0          8         318.0   
3              amc rebel sst              16.0          8         304.0   
4                ford torino              17.0          8         302.0   

   Horsepower  Weight_in_lbs  Acceleration       Year Origin  
0       130.0           3504          12.0 1970-01-01    USA  
1       165.0           3693          11.5 1970-01-01    USA  
2       150.0           3436          11.0 1970-01-01    USA  
3       150.0           3433          12.0 1970-01-01    USA  
4       140.0           3449          10.5 1970-01-01    USA  


#### Filter Transform

In [6]:
from altair import datum

df_filter = data.population()
df_filter.head()

alt.Chart(df_filter).mark_bar().encode(x="year:O", y="people:Q", color="sex:N").transform_filter(datum.age < 20).properties(
    width=500,
    height=200
)

#### Sample Transform

In [7]:
df_sample = data.cars()
print(df_sample.head())

alt.Chart(df_sample).mark_point().encode(x="Year:T", y="Horsepower:Q", color="Cylinders:N").transform_sample(100).properties(width=500, height=200)

                        Name  Miles_per_Gallon  Cylinders  Displacement  \
0  chevrolet chevelle malibu              18.0          8         307.0   
1          buick skylark 320              15.0          8         350.0   
2         plymouth satellite              18.0          8         318.0   
3              amc rebel sst              16.0          8         304.0   
4                ford torino              17.0          8         302.0   

   Horsepower  Weight_in_lbs  Acceleration       Year Origin  
0       130.0           3504          12.0 1970-01-01    USA  
1       165.0           3693          11.5 1970-01-01    USA  
2       150.0           3436          11.0 1970-01-01    USA  
3       150.0           3433          12.0 1970-01-01    USA  
4       140.0           3449          10.5 1970-01-01    USA  


#### Timeunit Transform

In [8]:
df_timeunit = data.seattle_temps()
print(df_timeunit.head())

alt.Chart(df_timeunit).mark_circle().encode(x="date:T", y="temp:Q").transform_timeunit(
    date="month(date)"
).properties(
    width=600,
    height=200
)

                 date  temp
0 2010-01-01 00:00:00  39.4
1 2010-01-01 01:00:00  39.2
2 2010-01-01 02:00:00  39.0
3 2010-01-01 03:00:00  38.9
4 2010-01-01 04:00:00  38.8


<hr>

## 3. Types of Marks i.e. Graph Types:
1. mark_area() - area plots
2. mark_bar() - bar plots
3. mark_circle() - scatter plot with circles
4. mark_image() - scatter plot using image markers
5. mark_point() - scatter plot using point shapes
6. mark_square() - scatter plot with filled squares
7. mark_line() - line plot
8. mark_rect() - filled rectangles, used in heatmaps
9. mark_rule() - verticle or horizontal line spanning the axis
10. mark_tick() - verticle or horizontal tick mark

### 3.1. mark_area()

In [9]:
df_area = data.iowa_electricity()
print(df_area.head())
alt.Chart(df_area).mark_area().encode(x="year:T", y="net_generation:Q", color="source:N")

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 3.2. mark_bar() - bar plots

In [10]:
df_mark = data.iowa_electricity()
print(df_mark.head())
alt.Chart(df_mark).mark_bar().encode(x="year:T", y="net_generation:Q", color="source:N")

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 3.3. mark_circle() - scatter plot with circles

In [11]:
df_circle = data.cars()
print(df_circle.head())
alt.Chart(df_circle).mark_circle().encode(x="Year:T", y="Displacement:Q", color="Cylinders:N")

                        Name  Miles_per_Gallon  Cylinders  Displacement  \
0  chevrolet chevelle malibu              18.0          8         307.0   
1          buick skylark 320              15.0          8         350.0   
2         plymouth satellite              18.0          8         318.0   
3              amc rebel sst              16.0          8         304.0   
4                ford torino              17.0          8         302.0   

   Horsepower  Weight_in_lbs  Acceleration       Year Origin  
0       130.0           3504          12.0 1970-01-01    USA  
1       165.0           3693          11.5 1970-01-01    USA  
2       150.0           3436          11.0 1970-01-01    USA  
3       150.0           3433          12.0 1970-01-01    USA  
4       140.0           3449          10.5 1970-01-01    USA  


### 3.4. mark_image() - scatter plot using image markers

In [12]:
source = pd.DataFrame.from_records([
      {"x": 0.5, "y": 0.5, "img": "https://vega.github.io/vega-datasets/data/ffox.png"},
      {"x": 1.5, "y": 1.5, "img": "https://vega.github.io/vega-datasets/data/gimp.png"},
      {"x": 2.5, "y": 2.5, "img": "https://vega.github.io/vega-datasets/data/7zip.png"}
])

print(source)

alt.Chart(source).mark_image(width=50,height=50).encode(x='x',y='y',url='img')

     x    y                                                img
0  0.5  0.5  https://vega.github.io/vega-datasets/data/ffox...
1  1.5  1.5  https://vega.github.io/vega-datasets/data/gimp...
2  2.5  2.5  https://vega.github.io/vega-datasets/data/7zip...


### 3.5. mark_point() - scatter plot using point shapes

In [13]:
df_point = data.cars()
print(df_point.head())
alt.Chart(df_point).mark_point().encode(x="Year:T", y="Displacement:Q", color="Cylinders:N")

                        Name  Miles_per_Gallon  Cylinders  Displacement  \
0  chevrolet chevelle malibu              18.0          8         307.0   
1          buick skylark 320              15.0          8         350.0   
2         plymouth satellite              18.0          8         318.0   
3              amc rebel sst              16.0          8         304.0   
4                ford torino              17.0          8         302.0   

   Horsepower  Weight_in_lbs  Acceleration       Year Origin  
0       130.0           3504          12.0 1970-01-01    USA  
1       165.0           3693          11.5 1970-01-01    USA  
2       150.0           3436          11.0 1970-01-01    USA  
3       150.0           3433          12.0 1970-01-01    USA  
4       140.0           3449          10.5 1970-01-01    USA  


### 3.6. mark_square() - scatter plot with filled squares

In [14]:
df_square = data.cars()
print(df_square.head())
alt.Chart(df_square).mark_square().encode(x="Year:T", y="Displacement:Q", color="Cylinders:N")

                        Name  Miles_per_Gallon  Cylinders  Displacement  \
0  chevrolet chevelle malibu              18.0          8         307.0   
1          buick skylark 320              15.0          8         350.0   
2         plymouth satellite              18.0          8         318.0   
3              amc rebel sst              16.0          8         304.0   
4                ford torino              17.0          8         302.0   

   Horsepower  Weight_in_lbs  Acceleration       Year Origin  
0       130.0           3504          12.0 1970-01-01    USA  
1       165.0           3693          11.5 1970-01-01    USA  
2       150.0           3436          11.0 1970-01-01    USA  
3       150.0           3433          12.0 1970-01-01    USA  
4       140.0           3449          10.5 1970-01-01    USA  


### 3.7. mark_line() - line plot

In [15]:
df_line = data.iowa_electricity()
print(df_line.head())
alt.Chart(df_line).mark_line().encode(x="year:T", y="net_generation", color="source:N")

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 3.8. mark_rect() - filled rectangles, used in heatmaps

In [16]:
x,y = np.meshgrid(range(-5,5),range(-5,5))
z = x**2 + y**2
source = pd.DataFrame(({"x":x.ravel(), "y":y.ravel(), "z":z.ravel()}))
print(source.head())

alt.Chart(source).mark_rect().encode(x="x:Q", y="y:O", color="z:N")

   x  y   z
0 -5 -5  50
1 -4 -5  41
2 -3 -5  34
3 -2 -5  29
4 -1 -5  26


### 3.9. mark_rule() - verticle or horizontal line spanning the axis

In [17]:
df_rule = data.iowa_electricity()
print(df_rule.head())
alt.Chart(df_rule).mark_rule().encode(x="year:T", y="net_generation", color="source:N")

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 3.10. mark_tick() - verticle or horizontal tick mark

In [18]:
df_tick = data.iowa_electricity()
print(df_tick.head())
alt.Chart(df_tick).mark_tick().encode(x="year:T", y="net_generation", color="source:N")

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


<hr>

## 4. Types of Charts
1. Chart.configure_header()
2. Chart.configure_legend()
3. Chart.configure_title()

### 4.1. Chart.configure_header()

In [19]:
df_config_header = data.iowa_electricity()
print(df_config_header.head())
img = (
    alt.Chart(df_config_header)
    .mark_circle()
    .encode(x="year:T", y="net_generation:Q", color="source:N", column="source:N")
    .properties(height=200, width=200)
)
img.configure_header(
    titleColor="red", titleFontSize=14, labelColor="blue", labelFontSize=14
)

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 4.2. Chart.configure_legend()

In [20]:
df_config_legend = data.iowa_electricity()
print(df_config_legend.head())
img = (
    alt.Chart(df_config_legend)
    .mark_circle()
    .encode(x="year:T", y="net_generation:Q", color="source:N", column="source:N")
    .properties(height=200, width=200)
)
img.configure_legend(
    strokeColor="red", fillColor="gray", padding=2, cornerRadius=10, orient="top"
)

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 4.3. Chart.configure_title()

In [21]:
df_config_title = data.iowa_electricity()
print(df_config_title.head())
img = (
    alt.Chart(df_config_title)
    .mark_circle()
    .encode(x="year:T", y="net_generation:Q", color="source:N", column="source:N")
    .properties(height=200, width=200, title="IOWA Electricity")
)
img.configure_title(
    fontSize=20, font="Times New Roman", anchor="start", color="blue"
)

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


## 5. Customizing Visualizations
Methods:
1. Global - using .configure method
2. Local - passing property directly in the ".mark_GraphType()" function
3. Encode - passing property directly in the ".encode()" function. Needs to be passed as value function parameter i.e. <b>alt.value(parameter)</b>

### 5.1. Global Method

In [22]:
df_area = data.iowa_electricity()
print(df_area.head())
alt.Chart(df_area).mark_area().encode(
    x="year:T", y="net_generation:Q", column="source:N"
).properties(width=200, height=100).configure_mark(color="red")

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 5.2. Local Method

In [23]:
df_area = data.iowa_electricity()
print(df_area.head())
alt.Chart(df_area).mark_area(color="blue").encode(
    x="year:T", y="net_generation:Q", column="source:N"
).properties(width=200, height=100)

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883


### 5.3 Encode Method

In [24]:
df_area = data.iowa_electricity()
print(df_area.head())
alt.Chart(df_area).mark_area().encode(
    x="year:T", y="net_generation:Q", column="source:N", color=alt.value("green")
).properties(width=200, height=100)

        year        source  net_generation
0 2001-01-01  Fossil Fuels           35361
1 2002-01-01  Fossil Fuels           35991
2 2003-01-01  Fossil Fuels           36234
3 2004-01-01  Fossil Fuels           36205
4 2005-01-01  Fossil Fuels           36883
