## <font color='darkblue'>Preface</font>
([article source](https://towardsdatascience.com/making-interactive-visualizations-with-python-altair-7880ab5cf894)) <font size='3ptx'>**A comprehensive practical guide**</font>

Data visualization is a fundamental piece of data science. If used in exploratory data analysis, **data visualizations are highly effective at unveiling the underlying structure within a dataset or discovering relationships among variables.**

Another common use case of data visualizations is to deliver results or findings. They carry much more informative power than plain numbers. Thus, we **often use data visualization in storytelling, a critical part of the data science pipeline.**

We can enhance the capabilities of data visualizations by adding interactivity. **The [Altair library](https://altair-viz.github.io/) for Python is highly efficient at creating interactive visualizations.**

**In this article, we will go over the basic components of interactivity in Altair. We will also do examples to put these components into action.** Let’s start by importing the libraries.

In [13]:
#!pip install altair
#!pip install vega-datasets

In [2]:
import numpy as np
import pandas as pd
import altair as alt

### <font color='darkgreen'>Dataset</font>
We also need a dataset for the examples. We will use a small sample from the [Melbourne housing dataset](https://www.kaggle.com/dansbecker/melbourne-housing-snapshot) available on Kaggle.

In [3]:
df = pd.read_csv(
    "../../datas/kaggle_melbourne_housing_snapshot/melb_data.csv",
    usecols = ['Price','Landsize','Distance','Type', 'Regionname']
)
df.head()

Unnamed: 0,Type,Price,Distance,Landsize,Regionname
0,h,1480000.0,2.5,202.0,Northern Metropolitan
1,h,1035000.0,2.5,156.0,Northern Metropolitan
2,h,1465000.0,2.5,134.0,Northern Metropolitan
3,h,850000.0,2.5,94.0,Northern Metropolitan
4,h,1600000.0,2.5,120.0,Northern Metropolitan


In [4]:
df = df[(df.Price < 3_000_000) & (df.Landsize < 1200)].sample(n=1000).reset_index(drop=True)
df.head()

Unnamed: 0,Type,Price,Distance,Landsize,Regionname
0,h,1120000.0,12.4,280.0,Eastern Metropolitan
1,h,1780000.0,6.3,0.0,Southern Metropolitan
2,t,855000.0,2.6,92.0,Northern Metropolitan
3,h,1350000.0,13.0,617.0,Southern Metropolitan
4,h,750000.0,15.5,636.0,Western Metropolitan


I have only read a small part of the original dataset. The <font color='violet'>usecols</font> parameter of the [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function allows for reading only the given columns of the csv file. I have also filtered out the outliers with regards to the `Price` and `Landsize`. Finally, a random sample of 1000 observations (<font color='brown'>i.e. rows</font>) is selected using the [sample](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html) function.

## <font color='darkblue'>Introduction of Altair package</font>
**[Altair](https://altair-viz.github.io/) is a powerful library in terms of data transformations and creating interactive plots**. There are three components of interactivity.
* **Selection:** Captures interactions from the user. In other words, it selects a part of the visualization.
* **Condition:** Changes or customizes the elements based on the selection. In order to see an action, we need to attach a selection to a condition.
* **Bind:** It is a property of the selection and creates a two-way binding between a selection and input.

These concepts will be more clear as we go through the examples. Let’s first create a static scatter plot and then we will add interactive features to it.

In [5]:
alt.Chart(df).mark_circle(size=50).encode(
   x='Price',
   y='Distance',
   color='Type'
).properties(
   height=350, width=500
)

### <font color='darkgreen'>Altar Syntax</font>
Before starting on the interactive plots, it is better to **briefly mention the basic structure of Altair syntax**. We start by passing the data to a top-level Chart object. The data can be in the form of a Pandas dataframe or a URL string pointing to a json or csv file.

Then we describe the **type of visualization** (<font color='brown'>e.g. [mark_circle](https://altair-viz.github.io/user_guide/generated/toplevel/altair.Chart.html?highlight=mark_circle#altair.Chart.mark_circle), [mark_line](https://altair-viz.github.io/user_guide/generated/toplevel/altair.Chart.html?highlight=mark_line#altair.Chart.mark_line), and so on</font>). The <font color='blue'>encode</font> function specifies what to plot in the given dataframe. Thus, anything we write in the <font color='blue'>encode</font> function must be linked to the dataframe. Finally, we specify certain properties of the plot using the <font color='blue'>properties</font> function.

### <font color='darkgreen'>Selection & Condition</font>
Some part of the plot seems too overlapped in terms of the dots. It would look better if we can also view data points that belong to a specific type. We can achieve this in two steps. The first step is to add a selection with the `type` column and bind it to the legend:
```python
selection = alt.selection_multi(fields=['Type'], bind='legend')
```

It is not enough just to add a selection. We should somehow update the plot based on the selection. For instance, we can adjust the opacity of the data points according to the selected category by using the `condition` property with the <font color='violet'>opacity</font> parameter.

In [6]:
selection = alt.selection_multi(fields=['Type'], bind='legend')

alt.Chart(df).mark_circle(size=50).encode(
   x='Price',
   y='Distance',
   color='Type',
   opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
).properties(
   height=350, width=500
).add_selection(
   selection
)

For the second example, we will create a scatter plot of the `Distance` and `Landsize` columns and a histogram of the `Price` column. **The histogram will be updated based on the selected area on the scatter plot**.

Since we want to select an area on the plot, we need to add a selection interval on the scatter plot:
```python
selection = alt.selection_interval()
```

This selection will will be added as a selection property to the scatter plot. For the histogram, we will use the selection as a transform filter.

In [7]:
selection = alt.selection_interval()

chart1 = alt.Chart(df).mark_circle(size=50).encode(
  x='Landsize',
  y='Distance',
  color='Type'
).properties(
  height=350, width=500
).add_selection(
  selection
)
chart2 = alt.Chart(df).mark_bar().encode(
  alt.X('Price:Q', bin=True), alt.Y('count()')
).transform_filter(
  selection
)

The `chart1` and `chart2` variables contain the scatter plot and the histogram, respectively. We can now combine and display them. **Altair is quite flexible in terms of combining multiple plots or subplots. We can even use the logical operators.**

In [8]:
chart1 | chart2

As we can see, the histogram is updated based on the selected data points on the scatter plot. Thus, **we are able see the `Price` distribution of the selected subset.**

### <font color='darkgreen'>More</font>
**In order to better understand the concepts of the selection and condition, let’s switch the roles on the scatter plot and histogram**. We will add the selection to the histogram and use it as a transform filter on the scatter plot.

In [9]:
selection = alt.selection_interval()

chart1 = alt.Chart(df).mark_circle(size=50).encode(
   x='Landsize',
   y='Distance',
   color='Type'
).properties(
   height=350, width=500
).transform_filter(
   selection
)
chart2 = alt.Chart(df).mark_bar().encode(
   alt.X('Price:Q', bin=False), alt.Y('count()')
).add_selection(
   selection
)
chart1 | chart2

## <font color='darkblue'>Example Gallery</font>


### <font color='darkgreen'>Bar Charts</font>

#### Bar Chart with Negative Values
This example shows a bar chart with both positive and negative values.

In [18]:
import altair as alt
from vega_datasets import data

data.us_employment().head()

Unnamed: 0,month,nonfarm,private,goods_producing,service_providing,private_service_providing,mining_and_logging,construction,manufacturing,durable_goods,...,transportation_and_warehousing,utilities,information,financial_activities,professional_and_business_services,education_and_health_services,leisure_and_hospitality,other_services,government,nonfarm_change
0,2006-01-01,135450,113603,22467,112983,91136,656,7601,14210,8982,...,4420.0,549.8,3052,8307,17299,17946,12945,5425,21847,282
1,2006-02-01,135762,113884,22535,113227,91349,662,7664,14209,8986,...,4429.4,550.1,3052,8332,17365,17998,12980,5426,21878,312
2,2006-03-01,136059,114156,22572,113487,91584,669,7689,14214,9000,...,4429.7,547.5,3055,8348,17438,18045,13034,5425,21903,297
3,2006-04-01,136227,114308,22631,113596,91677,679,7726,14226,9020,...,4445.4,548.9,3046,8369,17462,18070,13074,5426,21919,168
4,2006-05-01,136258,114332,22597,113661,91735,681,7713,14203,9017,...,4459.4,548.3,3039,8376,17512,18100,13052,5433,21926,31


In [20]:
source = data.us_employment()

alt.Chart(source).mark_bar().encode(
    x="month:T",
    y="nonfarm_change:Q",
    color=alt.condition(
        alt.datum.nonfarm_change > 0,
        alt.value("steelblue"),  # The positive color
        alt.value("orange")      # The negative color
    )
).properties(width=600)

#### Trellis Stacked Bar Chart
This is an example of a horizontal stacked bar chart using data which contains crop yields over different regions and different years in the 1930s.

In [21]:
import altair as alt
from vega_datasets import data

source = data.barley()
source.head()

Unnamed: 0,yield,variety,year,site
0,27.0,Manchuria,1931,University Farm
1,48.86667,Manchuria,1931,Waseca
2,27.43334,Manchuria,1931,Morris
3,39.93333,Manchuria,1931,Crookston
4,32.96667,Manchuria,1931,Grand Rapids


In [22]:
alt.Chart(source).mark_bar().encode(
    column='year',
    x='yield',
    y='variety',
    color='site'
).properties(width=220)

### <font color='darkgreen'>Interactive Charts</font>

#### [Multi-Line Highlight](https://altair-viz.github.io/gallery/multiline_highlight.html)
This multi-line chart uses an invisible Voronoi tessellation to handle mouseover to identify the nearest point and then highlight the line on which the point falls. It is adapted from the Vega-Lite example found at https://bl.ocks.org/amitkaps/fe4238e716db53930b2f1a70d3401701

In [24]:
import altair as alt
from vega_datasets import data

source = data.stocks()
source.head()

Unnamed: 0,symbol,date,price
0,MSFT,2000-01-01,39.81
1,MSFT,2000-02-01,36.35
2,MSFT,2000-03-01,43.22
3,MSFT,2000-04-01,28.37
4,MSFT,2000-05-01,25.45


In [34]:
highlight = alt.selection(
    type='single', 
    on='mouseover',
    fields=['symbol'], 
    nearest=True
)

legend_selection = alt.selection_multi(
    fields=['symbol'],
    bind='legend'
)

base = alt.Chart(source).encode(
    x='date:T',
    y='price:Q',
    color='symbol:N'
)

points = base.mark_circle().encode(
    opacity=alt.value(0)
).add_selection(
    highlight
).properties(
    width=600
)

lines = base.mark_line().encode(
    size=alt.condition(~highlight, alt.value(1), alt.value(3)),
    opacity=alt.condition(legend_selection, alt.value(1), alt.value(0.1))
).add_selection(
    legend_selection
)

points + lines

#### [Interactive Chart with Cross-Highlight](https://altair-viz.github.io/gallery/interactive_cross_highlight.html)
This example shows an interactive chart where selections in one portion of the chart affect what is shown in other panels. Click on the bar chart to see a detail of the distribution in the upper panel.

In [35]:
import altair as alt
from vega_datasets import data

source = data.movies.url

In [48]:
pts = alt.selection(type="multi", encodings=['x'], toggle=False)

rect = alt.Chart(data.movies.url).mark_rect().encode(
    alt.X('IMDB_Rating:Q', bin=True),
    alt.Y('Rotten_Tomatoes_Rating:Q', bin=True),
    alt.Color('count()',
        scale=alt.Scale(scheme='greenblue'),
        legend=alt.Legend(title='Total Records')
    )
)

circ = rect.mark_point().encode(
    alt.ColorValue('grey'),
    alt.Size('count()',
        legend=alt.Legend(title='Records in Selection')
    )
).transform_filter(
    pts
)

bar = alt.Chart(source).mark_bar().encode(
    x='Major_Genre:N',
    y='count()',
    color=alt.condition(pts, alt.ColorValue("steelblue"), alt.ColorValue("grey"))
).properties(
    width=550,
    height=200
).add_selection(pts)

alt.vconcat(
    rect + circ,
    bar
).resolve_legend(
    color="independent",
    size="independent"
)

## <font color='darkblue'>Conclusion</font>
The sky is the limit! We can create lots of different interactive plots. Altair is also quite flexible in terms of the ways to add interactive components to the visualization.

Once you have a comprehensive understanding of the elements of interactivity, you can enrich your visualizations. These elements are selection, condition, and bind. As with any other subject, practice makes perfect. The syntax may look a little bit confusing at first. However, once you understand the logic and the connections between the elements we have mentioned, creating interactive plots will become fairly easy.

## <font color='darkblue'>Supplement</font>
* [Altair - Example Gallery](https://altair-viz.github.io/gallery/index.html)
* [Altair - Bindings, Selections, Conditions: Making Charts Interactive](https://altair-viz.github.io/user_guide/interactions.html)