# Let's understand Altair

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite which are both visualization grammar that allows you to describe the visual appearance and interactive behavior of a visualization in a JSON format.

> Declarative visualization lets you think about the relationships within the data rather than mechanics of the visualization like axis limits, legends, etc.

Altair produces Vega-Lite visualizations, which require a Javascript frontend to display the charts. Because notebook environments combine a Python backend with a Javascript frontend.

The key idea is that you are declaring links between `data columns` and `visual encoding channels`, such as the x-axis, y-axis, color, etc. The rest of the plot details are handled automatically.

In [None]:
# Importing libraries

import pandas as pd
import numpy as np
import altair as alt
#alt.data_transformers.enable('default', max_rows=None) 

## Altair Basic Guide

I'll talk about the basic process which will help you to create an interactive chart.

### 1.) Data

The data used internally by Altair is built around the Pandas DataFrame. The data  can be specified in one of the following ways:

- as a Pandas DataFrame
- as a Data or related object
- as a url string pointing to a json or csv formated file
- as an object that supports the geo_interface (eg. Geopandas GeoDataFrame, Shapely Geometries, GeoJSON Objects)

In [None]:
music = pd.read_csv("../input/data-analytics-to-study-music-streaming-patterns/spotify.csv", index_col=0)
music.head()

Kaggle does not work with altair provided 'notebook' renderer. We need to create custom renderer for kaggle. 

There is a well written [python script](https://www.kaggle.com/omegaji/altair-render-script/notebook). This kernel will help you to get started with Altair on Kaggle. Kernel: https://www.kaggle.com/omegaji/altair-render-script/notebook

In [None]:
# Source: https://www.kaggle.com/omegaji/altair-render-script/notebook

import json
from IPython.display import HTML

KAGGLE_HTML_TEMPLATE = """
<style>
.vega-actions a {{
    margin-right: 12px;
    color: #757575;
    font-weight: normal;
    font-size: 13px;
}}
.error {{
    color: red;
}}
</style>
<div id="{output_div}"></div>
<script>
requirejs.config({{
    "paths": {{
        "vega": "{base_url}/vega@{vega_version}?noext",
        "vega-lib": "{base_url}/vega-lib?noext",
        "vega-lite": "{base_url}/vega-lite@{vegalite_version}?noext",
        "vega-embed": "{base_url}/vega-embed@{vegaembed_version}?noext",
    }}
}});
function showError(el, error){{
    el.innerHTML = ('<div class="error">'
                    + '<p>JavaScript Error: ' + error.message + '</p>'
                    + "<p>This usually means there's a typo in your chart specification. "
                    + "See the javascript console for the full traceback.</p>"
                    + '</div>');
    throw error;
}}
require(["vega-embed"], function(vegaEmbed) {{
    const spec = {spec};
    const embed_opt = {embed_opt};
    const el = document.getElementById('{output_div}');
    vegaEmbed("#{output_div}", spec, embed_opt)
      .catch(error => showError(el, error));
}});
</script>
"""

class KaggleHtml(object):
    def __init__(self, base_url='https://cdn.jsdelivr.net/npm'):
        self.chart_count = 0
        self.base_url = base_url
        
    @property
    def output_div(self):
        return "vega-chart-{}".format(self.chart_count)
        
    def __call__(self, spec, embed_options=None, json_kwds=None):
        # we need to increment the div, because all charts live in the same document
        self.chart_count += 1
        embed_options = embed_options or {}
        json_kwds = json_kwds or {}
        html = KAGGLE_HTML_TEMPLATE.format(
            spec=json.dumps(spec, **json_kwds),
            embed_opt=json.dumps(embed_options),
            output_div=self.output_div,
            base_url=self.base_url,
            vega_version=alt.VEGA_VERSION,
            vegalite_version=alt.VEGALITE_VERSION,
            vegaembed_version=alt.VEGAEMBED_VERSION
        )
        return {"text/html": html}
    
alt.renderers.register('kaggle', KaggleHtml())
print("Define and register the kaggle renderer. Enable with\n\n"
      "    alt.renderers.enable('kaggle')")
alt.renderers.enable('kaggle')  



### 2.) Chart

The fundamental object in Altair is the Chart, which takes a dataframe as a single argument:

In [None]:
chart = alt.Chart(music)


So far, we have defined the Chart object which itself has no meaning, and we have not yet told the chart what to do anything with the data. That will come next.

### 3.) Marks

Now, we would like the data to be visualized. This is done via the `mark` attribute of the chart object, which is most conveniently accessed via the `Chart.mark_*` methods.


After selecting data, you need to choose various charts such as bar charts, line charts, area charts, scatter charts, histograms, and maps. The mark property is what specifies how exactly those attribute should be represented on the plot.

Altair provides a number of `basic mark properties` that can be used like point, circle, square, bar, etc. 

In [None]:
alt.Chart(music).mark_bar().encode(
    x='Danceability',
    y='Energy'
)

In addition to basic marks, it also provides the `compound marks` like box_plot, error_band and error_bar.

In [None]:
alt.Chart(music).mark_boxplot().encode(
    x='Genre',
    y='Speechiness'
)

Note that the default behavior is to display outliers as points, where an outlier is defined as any point more than 1.5 IQRs from the box. 
We can also adjust the threshold using the `extent` property of the mark.

In [None]:
alt.Chart(music).mark_boxplot(extent=2.0).encode(
    x='Genre',
    y='Speechiness'
)

We can also ignore outliers completely using `extent=max-min`.

**Mark Properties**

Altair also provides mark properties which is passed as an arguments to `mark_*()` methods like opacity, color, font, size, etc.

In [None]:
alt.Chart(music).mark_circle(
    color='red',
    opacity=0.3
).encode(
    x='Genre',
    y='Speechiness'
)

You can access the detailed lists from [official documentation](https://altair-viz.github.io/user_guide/marks.html).

### 4.) Encodings

Once we have the data and how it is represented - we want to specify where to represent it. That is, set up the positions, size, color, etc. This is where we use encodings. 

In Altair, encodings is the mapping of data to visual properties such as axis, color of marker, shape of marker etc. The encoding method `Chart.encode()` defines various properties of chart display and it is the most important function to create meaningful visualization. The following are the most basic encoding properties and knowing them should be enough for you to create basic charts.

#### • Encoding Channels

Some `encoding channels` that is mostly used:

- x: the x-axis value
- y: the y-axis value
- row: The row of a faceted plot
- column: the column of a faceted plot

You can access the full list of encoding channels from [here](https://altair-viz.github.io/user_guide/encoding.html).

To visually separate the points from the code, we can map various encoding channels, or channels for short, to columns in the dataset.

In [None]:
alt.Chart(music).mark_point().encode(
    y='Positivity',
    x='Artist', 
    color='Genre'
)

#### • Encoding Data Types

For data specified as a DataFrame, Altair can automatically determine the correct data type for each encoding, and creates appropriate scales and legends to represent the data.

- quantitative: shorthand code Q, a continuous real-valued quantity
- ordinal: shorthand code O, a discrete ordered quantity
- nominal: shorthand code N, a discrete unordered category
- temporal: shorthand code T, a time or date value

In [None]:
alt.Chart(music).mark_point().encode(
    y='Length(seconds):O',
    x='Artist:N', 
    color='Speechiness:N'
)

In [None]:
alt.Chart(music).mark_point().encode(
    y='Length(seconds):O',
    x='Artist:N', 
    color='Speechiness:Q'
) 

From the above two plots, it is clear that how color scale depends on the data type specified which decides whether a discrete or continuous legend is used.

### **Binning and Aggregation**

Altair uses Split-Apply-Combine Strategy for visualizing data, where we split data based on some conditions, aggregation is applied within each group, and then combine the data back together. 

Let's build a histogram where we take 1D data and split the data based on the bin it falls in, using the count of the data, we aggregate the results within each bin then combines the results back together.

In [None]:
alt.Chart(music).mark_bar().encode(
    alt.X('Positivity', bin=True),
    y='count()'
 )   

In above plot `Count` aggregation has been used which doesn't require any other field.

You can also specify the bin value by using `alt.Bin` as bin parameter. By default it accepts only boolean types i.e. True or False. Let's see how it works:

In [None]:
alt.Chart(music).mark_square().encode(
    alt.X('Beats per minute', bin=alt.Bin(maxbins=20)),
    alt.Y('Positivity', bin=True),
    size='count()',
    color='Artist:N'
  )

Here, `count` is used to specify the size of squares to indicate the counts.

In [None]:
alt.Chart(music).mark_square().encode(
    alt.X('Beats per minute', bin=alt.Bin(maxbins=20)),
    alt.Y('Positivity', bin=True),
    size='count()',
    color='mean(Popularity):Q'
  )

Other than `mean` and `count`, you can use various aggregation functions built into Altair. You can access the full list from [here](https://altair-viz.github.io/user_guide/encoding.html#binning-and-aggregation).

## Interactions

One of the unique features of Altair is that users can interact with charts, including controls such as panning, zooming, and selecting a range of data.

There are three main concepts which make the chart interactive:

- **Selection** - the `selection()` object captures the interactions through inputs like mouse click, drag, slider, etc. to effect the chart. 

- **Condition** - Selection can't do anything alone. So, to make the chart respond, the `condition()` function takes the selection input and reference it in some ways within the chart.

- **Binding** - the `bind` property of the selection establishes a two-way binding between the selection and an input element of your chart.

Altair provides the `interactive()` method which is the simplest way of interaction.

In [None]:
alt.Chart(music).mark_point().encode(
    x='Positivity:Q',
    y='Popularity:Q',
    color='Artist:N',
    tooltip='Genre'
).interactive()

When you hover over a point, it will bring up a tooltip with a name of the Genre.

### Selection types

Altair provides a general `selection` API for creating interactive plots. There are three types of selections available in Altair.

- **Interval Selection** - Altair provides `alt.selection_interval()` function which allows you to select chart elements by clicking and dragging.

- **Single Selection** - Altair provides `alt.selection_single()` function which allows you to select a single chart element at a time using mouse actions. By default, points are selected on click.

- **Multiple Selection** - Altair provides `alt.selection_multi()` function which allows you to select multiple chart objects at once. 

Let's create an interval selection using the `selection_interval()` function:

In [None]:
interval = alt.selection_interval()

We can now `bind` this interval to our chart by setting the `selection` property:

In [None]:
alt.Chart(music).mark_point().encode(
    x='Danceability:Q',
    y='Positivity:Q',
    color='Artist:N'
).add_selection(
    interval
)

This selection doesn't do anything yet. Now, let's `condition` the color on this selection which helps to highlight the points in the selection.

In [None]:
alt.Chart(music).mark_point().encode(
    x='Danceability:Q',
    y='Positivity:Q',
    color=alt.condition(
        interval, 'Artist:N', alt.value('lightgray'))
).add_selection(
     interval 
) 

Let's create an interval selection using the `selection_single()` function:

In [None]:
single = alt.selection_single()

In [None]:
alt.Chart(music).mark_point().encode(
    x='Danceability:Q',
    y='Positivity:Q',
    color=alt.condition(
        single, 'Artist:N', alt.value('lightgray'))
).add_selection(
     single 
) 

Let's create a bar chart:

In [None]:
alt.Chart(music).mark_bar().encode(
    y='Genre:N',
    color='Artist:N',
    x='count(Genre):Q'
)

### **Data Transformations**

There are times when we filter or transform data for visualization, and there are two ways we can do this in Altair:

1. Before the chart definition, using standard Pandas data transformations.

2. Within the chart definition, using Vega-Lite’s data transformation tools.

First one is pretty straightforward as most of us are familiar with data manipulations in Python which requires much more flexibility. 

Now, we are going to make compound chart, so I'll be using second approach which helps us to specify data transformation within the chart specification itself. 

Altair provides `transform_*` methods to accomplish this tasks.

You can access the full list from [here](https://altair-viz.github.io/user_guide/transform/index.html)

I'll be using `transform_filter()` method which selects a subset of data based on a condition. We will be using this method to associate bar chart with the scatter chart.

Let's create compound chart where we don't need transformation methods.

In [None]:
base = alt.Chart(music).mark_point().encode(
    y='Popularity',
    color=alt.condition(interval, 'Artist', alt.value('lightgray')),
    tooltip='Genre'
).properties(
    selection=interval
)
base.encode(x='Positivity') | base.encode(x='Speechiness')

In [None]:
points = alt.Chart(music).mark_point().encode(
    x='Danceability:Q',
    y='Positivity:Q',
    color=alt.condition(
        interval, 'Artist:N', alt.value('grey'))
).add_selection(
     interval 
) 

bars = alt.Chart(music).mark_bar().encode(
    y='Genre:N',
    color='Artist:N',
    x='count(Genre):Q'
).transform_filter(
    interval
)

points & bars

For composing multiple selection chart, we need to create variable for each of the chart and use logical composition operands like `&(AND)`, `|(OR)`, and `~(NOT)` to combine the charts. Above plot is the result of this operation.

To understand the Altair and learn more about it, check the [official documentation](https://altair-viz.github.io/index.html).