### Introduction to Data Science – Lecture 12 – Practical Data Visualization
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*

In this lecture, we'll cover some of the nitty-gritty details of how to create visualizations in Python. We will take a step back and talk about visualization design and methods in the next lecture. 

To be frank: the Python data visualization environment is a MESS. It reminds me of this: 


![](standards.png)

### Matplotlib & Extensions

 * [Matplotlib](https://matplotlib.org/) - the elephant in the room
 * [Pandas Visualization](https://pandas.pydata.org/pandas-docs/stable/visualization.html) - based on Matplotlib
 * [Seaborn](https://seaborn.pydata.org/) - based on Matplotlib, higher-level
 * [ggplot](http://ggplot.yhathq.com/) - based on the popular R plotting library, some similarites, uses Matplotlib.
 
These tools generally can be used to create figures independent of Jupyter. 
 
### Web-based Vis tools

 * [Plotly](https://plot.ly/python/)
 * [Altair](https://github.com/altair-viz/altair), based on [Vega](https://vega.github.io/vega/) 
 * [PdVega](https://jakevdp.github.io/pdvega/), based on Vega, integrated with pandas dataframes.
 * [Bokeh](https://bokeh.pydata.org/en/latest/)
 
 
These tools mostly rely on Jupyter running in your browser and use a JavaScript based language in the back-end. 

As of February 2023, it seems like Plotly and Altair are serious contenders for more advanced, interactive visualization.

There are also some domain specific libraries, e.g., for maps and for networks, that we will cover at a later point. 
 
 
There are also [many](https://www.dataquest.io/blog/python-data-visualization-libraries/) [blog](https://codeburst.io/overview-of-python-data-visualization-tools-e32e1f716d10) [posts](https://lisacharlotterost.github.io/2016/05/17/one-chart-code/) [comparing](https://blog.modeanalytics.com/python-data-visualization-libraries/) various data visualization libraries.

Generally speaking, there are 
 * **plotting libraries** that have pre-made charts, and 
 * **drawing libraries** that allow you to freely express anything you can imagine. 
 
We will mainly cover the former, but as visualization researchers we typically rely on tools that enable as much expressivity as possible, such as [D3](https://d3js.org/) or [WebGL](https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API).

We will start of with basic Matplotlib, explore the build-in pandas library, and then look at some more advanced tools.

## Matplotlib

Matplotlib is a project started in 2002 and is inspired by MATLAB plotting. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# code after a % are ipython specific instructions
# this command tells Jupyter/ipython that we want to create the visualizations
# inline in this notebook instead of as files to save.
%matplotlib inline

# an example data vector
fib_series = [1,1,2,3,5,8,13,21,34]

# here we run a simple plot command to create a line chart
plt.plot(fib_series)

The `.plot` command uses a [`figure`](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) to plot in. If no figure has been defined, it will automatically create one. If there is already a figure, it will plot to the latest figure. 

Here we create a figure manually: 

In [None]:
# we create a figure with size 10 by 10 inches
fig = plt.figure(figsize=(10, 10))

The figure by itself doesn't plot anyhing. We have to add a [`subplot`](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_subplot) to it. 

In [None]:
# figsize defines the size of the plot in inches - 10 wide by 6 high here. 
fig = plt.figure(figsize=(10, 6))
# add a suplot to a grid of 1x1, return the 1st figure
my_plot = fig.add_subplot(1,1,1)
# plotting a data array
my_plot.plot(fib_series)

Here we add a title and axis labels: 

In [None]:
fig = plt.figure(figsize=(10, 8))
my_plot = fig.add_subplot(1,1,1)
fig.suptitle('My Line Chart', fontsize=12, fontweight='bold')
my_plot.set_xlabel("My sequence")
my_plot.set_ylabel("My values")
my_plot.plot(fib_series)

We can plot multiple line charts by calling plot multiple times on the same axis:

In [None]:
fig = plt.figure(figsize=(10, 8))
my_plot = fig.add_subplot(1,1,1)
fib_rising, = my_plot.plot(fib_series)
fib_falling, = my_plot.plot(fib_series[::-1])
constant, = my_plot.plot([10]*9)

# Here's how we create a legend
my_plot.legend((fib_rising, fib_falling, constant), ("Fibonnaci Rising", "Fibonnaci Falling", "Constant") )

Alternatively, we can also call the plot function with multiple data series. 

Note that here we have to specify both x and y values, the y values were implicit earlier.

In [None]:
fig = plt.figure(figsize=(10, 8))
my_plot = fig.add_subplot(1,1,1)
fig.suptitle('My Line Chart', fontsize=12, fontweight='bold')
my_plot.set_xlabel("My sequence")
my_plot.set_ylabel("My values")
fib_rising, fib_falling, constant = my_plot.plot(range(9), fib_series, range(9), fib_series[::-1], range(9), [10]*9)
my_plot.legend((fib_rising, fib_falling, constant), ("Fibonnaci Rising", "Fibonnaci Falling", "Constant") )

Now let's create a figure with multiple subplots:

In [None]:
fig = plt.figure(figsize=(10, 8))

# create a subplot in a 2 by 2 grid, 
# return the subplot at position specified in third parameter
# these subplots are often called "axes"
sub_fig_1 = fig.add_subplot(2,2,1)
sub_fig_2 = fig.add_subplot(2,2,2)
sub_fig_3 = fig.add_subplot(2,2,3)

# this will plot to the last figure used
# you shouldn't do that but rather use explicit subplot references if you have them
# k-- is a style option for a black dashed line
plt.plot([3, 4, 6, 2], "k--")

# here is how we can plot explicitly to subfigures
sub_fig_1.plot(range(0,10))

sub_fig_2.plot(fib_series)

We can use the [`subplots`](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html?highlight=subplots#matplotlib.figure.Figure.subplots) shorthand to create multiple subplots that we can access form an array. 

In [None]:
# using the variable axs for multiple Axes
fig, axs = plt.subplots(2, 2)

axs[0][0].plot([3, 4, 6, 2], "k--")
axs[0][1].plot(range(0,10))
axs[1][0].plot(fib_series)

Or we can use tuples to unpack the array: 

In [None]:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
ax1.plot([3, 4, 6, 2], "k--")
ax2.plot(range(0,10))
ax3.plot(fib_series)

Next, we will create a couple of different visualization techniques: 

Visualizations for Correlations
 
 * [Scatterplot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter)  
 
Visualizations for raw data, one dimension 
 * [Vertical Bar Chart](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html#matplotlib.pyplot.bar)
 * [Horizontal Bar Chart](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.barh.html#matplotlib.pyplot.barh)
 * [Pie Chart](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pie.html#matplotlib.pyplot.pie)
 
Visualizations for distributions 
 * [Boxplot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.boxplot.html#matplotlib.pyplot.boxplot)
 * [Histogram](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hist.html?highlight=hist#matplotlib.pyplot.hist)
 * [Violin Plot](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.violinplot.html)

In [None]:
large_dist = np.random.randn(400)
small_dist = np.random.randn(400)*0.5


# a function because we'll reuse these later
def sample_figures():
    # define a figure with subfigures in 2 rows and 3 columns
    fig, subfigs = plt.subplots(2, 4, figsize=(14, 6))

    # Scatterplot. Pass two arrays for your x and y values.
    subfigs[0,0].scatter(range(0,10),range(10,0,-1))
    
    # Bar Chart. First array is x position, second is value (height) of data
    subfigs[0,1].bar([1, 2, 3], [4, 2, 3])
    
    # Horizontal bar chart. 
    # First array contains y positions (expressed as range), second contains data values (lengths of bars), 
    # tick_label is an array of labels
    subfigs[0,2].barh(range(0,4), [9, 7, 2, 3], tick_label=["a", "b", "c", "d"])
    
    # You can also (but maybe you shouldn't) do pie charts. First array is shares of total. 
    # labels in the same order of data. autopct defines how to format the numerical labels 
    # (here, one digit after comma)
    subfigs[0,3].pie([1, 5, 10], labels=["blackberry", "ios", "android"], autopct='%1.1f%%')
    
    # Box plots visualizing two distributions with 100 items each. 
    subfigs[1,0].boxplot([large_dist, small_dist])
    
    # A histogram visualizes a distribution. It takes one array, we can specify bins as second parameter 'bins'
    subfigs[1,1].hist(large_dist)
    
    # A violing plot also visualizes a distribution, using kernel density estimation.  
    subfigs[1,2].violinplot([large_dist, small_dist])
    
    # a cleaner, horizontal version of a violin plot
    subfigs[1,3].violinplot([large_dist, small_dist], showmeans=True,
        showextrema=False, vert=False)

sample_figures()

### Heat Maps

Heat maps encoded matrix/tabular data using color. There are two ways to implement heatmaps in Matplotlib:

 * [pcolor](https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.pcolor.html)
 * [imshow](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html)

imshow is used to display images (which are just matrices, where the pixels have a colorvalue). In practice, imshow and pcolor differ mainly in their coordinate system: the origin of imshow is at the top left (as is common for images), the origin of pcolor is at the bottom left.

For heatmaps, we need a [color map](https://matplotlib.org/tutorials/colors/colormaps.html). Matplotlib has many color maps baked in, including those from http://colorbrewer.org.

In [None]:
# just a helper function to create some 2D data based on a gaussian kernel.
def gkern(l=5, sig=1.):
    """
    creates gaussian kernel with side length l and a sigma of sig
    """
    ax = np.arange(-l // 2 + 1., l // 2 + 1.)
    xx, yy = np.meshgrid(ax, ax)
    kernel = np.exp(-(xx**2 + yy**2) / (2. * sig**2))
    return kernel / np.sum(kernel)

kernel = gkern(20, 5)

In [None]:
plt.style.use('default')
# select a blue color map
heatmap = plt.pcolor(kernel, cmap=plt.cm.Blues)
# plot the legend on the side
plt.colorbar(heatmap)

In [None]:
hm = plt.imshow(kernel, cmap='hot')
plt.colorbar(hm)

In [None]:
# a diverging color map from Color Brewer
heatmap = plt.pcolor(kernel, cmap=plt.cm.YlGn)
plt.colorbar(heatmap)

### Styling

Matplotlib has [different styles](https://matplotlib.org/devdocs/gallery/style_sheets/style_sheets_reference.html) that we can apply globally.

Here are a couple of examples:

In [None]:
# ggplot style based on the popular R plotting library 
plt.style.use('ggplot')
sample_figures()

In [None]:
# style based on the seaborn library
plt.style.use('seaborn')
sample_figures()

In [None]:
plt.style.use('grayscale')
sample_figures()

## Plotting with Pandas

Pandas has good [built-in plotting capabilities](http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html). We've seen some already in previous lectures and in the homeworks.

We're going to use the movies dataset to demonstrate plots and start of by reproducing some of the work you did for your homework: 

In [None]:
plt.style.use('ggplot')
pd_movies = pd.read_csv('movies.csv')
pd_movies.head()

### Line Chart

In [None]:
# subset to major movies
major_movies = pd_movies[pd_movies['votes'] >= 500]
# show yearly number of moves
yearly_movies = major_movies["year"].value_counts().sort_index()

In [None]:
# you can also do this, but it will be deprecated: 
# yearly_movies.plot()
yearly_movies.plot.line()

### Histogram

This is the right way to do this: 

In [None]:
major_movies["rating"].plot.hist()

But there are some legacy methods: 

In [None]:
major_movies.hist(column="rating")

Let's load a dataset with multiple dimensions on the same scale, and plot it as histograms. 

In [None]:
penguins = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv')

# Clean dataset and put it in more similar size units for the following plots only
penguins = penguins.dropna()
penguins['body_mass_100g'] = penguins['body_mass_g'] * 0.01
if 'body_mass_g' in penguins.columns:
      penguins = penguins.drop(['body_mass_g'], axis=1)
penguins


In [None]:
penguins.plot.hist(alpha=0.7)

We can also plot KDEs: 

In [None]:
major_movies["rating"].plot.kde()

In [None]:
penguins.plot.kde()

### Bar Chart

We'll show a bar chart for the first 10 movies

In [None]:
subset = major_movies.set_index("title")
subset = subset.iloc[0:10]
subset

In [None]:
subset["rating"].plot(kind="bar", title="Select Movies")

We can create grouped bar charts. The values should be on the same scale.

In [None]:
pulse = pd.DataFrame({
        "BP1":[140, 120, 110],
        "BP2":[150, 130, 110]
    })
pulse.index = ["Robin", "Katie", "Chuck"]

pulse.plot(kind="barh")

Equally, we can create stacked bar charts: 

In [None]:
pulse.plot(kind="barh", stacked="True")

### Scatterplot

We can plot a scatterplot, comparing ratings of movies over time:

In [None]:
plt.figure(figsize=(10, 10))
major_movies.plot.scatter("year", "rating", figsize=(15, 10))

However, here we might overplot some points in more recent years. We can fix that with an alpha value:

In [None]:
major_movies.plot.scatter("year", "rating", figsize=(15, 10), alpha=0.4)

### Box Plot

Let's plot a box plot of the ratings

In [None]:
major_movies["rating"].plot.box()

We can also create boxplots for the data grouped by another column. Here, we create a rating box plot for each year: 

In [None]:
#this is styling information
flierprops = dict(marker='o', markerfacecolor='steelblue', markersize=2,
                  linestyle='none')
major_movies.boxplot(column=["rating"], by=["year"], rot=90, figsize=(15,10), flierprops=flierprops)

## Scatterplot Matrix 

We frequently will look at high-dimensional datasets. We can do that conveniently with a scatterplot matrix:

In [None]:
# import the scatter_matrix functionality
from pandas.plotting import scatter_matrix

In [None]:
scatter_matrix(major_movies[["year", "length", "rating"]], alpha = 0.2, figsize=(10, 10))
# this supresses the output of the scatter matrix
print()

Here the cells with the same variables in columns and rows are shown as histograms. We can also use KDEs instead: 

In [None]:
scatter_matrix(major_movies[["year", "length", "rating"]], diagonal="kde", alpha = 0.2, figsize=(10, 10))
print()

We can also use a categorical label to color code a value. To do that, we have to create a series of the length of the dataset that defines the color for each row: 

In [None]:
color_list=penguins["species"].map({"Chinstrap":"#ca0020", "Gentoo":"#0571b0", "Adelie":"#5e3c99"})
color_list.head()

In [None]:
scatter_matrix(penguins, color=color_list, figsize=(10,10))
print()

# Seaborn

We often have a chart in mind, and we don't have to be picky about how to achieve it in Python. [Seaborn](https://seaborn.pydata.org/introduction.html) is a good library that has a lot of very useful advanced plots built in that are tricky to re-create with Matplotlib. 


We won't cover all the plots here, just give you a flavor of what's possible. 

In [None]:
# Import seaborn
import seaborn as sns

# Apply the default theme
sns.set_theme()

# Load an example dataset into a pandas dataframe. 
tips = sns.load_dataset("tips")
tips

We start off with a scatterplot, which Seaborn calls (oddly?) ["Relationship Plot"](https://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn.relplot).

In [None]:
# Create a visualization
sns.relplot(
    data=tips,
    x="total_bill", y="tip",
    hue="smoker", size="size",
)

We can facet the plots by a categorical column in the dataset:

In [None]:
# Also try faceting by "time"
sns.relplot(
    data=tips,
    x="total_bill", y="tip", col="day",
    style="smoker", hue="smoker")

Very useful are [categorical plots](https://seaborn.pydata.org/generated/seaborn.catplot.html#seaborn.catplot). Here's a strip plot:

In [None]:
sns.catplot(data=tips, kind="strip", x="day", y="total_bill", hue="smoker", dodge=True)

And a "Beeswarm" plot, that avoids overlaps:

In [None]:
sns.catplot(data=tips, kind="swarm", x="day", y="total_bill", hue="smoker", dodge=True)

And a Violin Plot that is faceted: 

In [None]:
sns.catplot(data=tips, kind="violin", x="day", y="total_bill", hue="smoker", split=True)

For datasets with categories, we can use "joint plots" to get an overview of the distributions and their relationships: 

In [None]:
penguins = sns.load_dataset("penguins")
sns.jointplot(data=penguins, x="flipper_length_mm", y="bill_length_mm", hue="species")

And we can create [scatterplot matrices (pairplots in Seaborn)](https://seaborn.pydata.org/generated/seaborn.pairplot.html#seaborn.pairplot) to show relationships:

In [None]:
sns.pairplot(data=penguins, hue="species")

# Altair

[Altair](https://altair-viz.github.io/) is a modern plotting library based on Vega. We're not going to go into details, but we've compiled a couple of examples below. 

In contrast to Seaborn, it's not based on Matplotlib but renders directly to the web browser. 

You will likely have to install Altair: 

`
$ pip install altair vega_datasets
`

In [None]:
import altair as alt

In [None]:
movies = pd.read_csv('http://vcg.github.io/upset/data/movies/movies.csv', sep=';')
movies.head()

movies_genre = movies.copy(deep=True)
movies_genre['Genre'] = movies.loc[:,'Action':'Western'].idxmax(1)
movies_genre = movies_genre[['Name', 'Genre', 'ReleaseDate', 'AvgRating', 'Watches']]

movies_genre.head()

## Basic Charts

In [None]:
alt.Chart(movies_genre).mark_point()

The above chart is not really useful, since it just shows all the movies on top of each other. Each row in the dataset is a `mark`.

To make this into an useful chart,  we have to encode the columns.

We will encode AvgRating along `x-axis`.

In [None]:
alt.Chart(movies_genre).mark_point().encode(
  x='AvgRating'
)

We can use other marks like 'point'.

There are a number of available marks that you can use; some of the more common are the following:

* ``mark_point()`` 
* ``mark_circle()``
* ``mark_square()``
* ``mark_line()``
* ``mark_area()``
* ``mark_bar()``
* ``mark_tick()``

You can get a complete list of ``mark_*`` methods using Jupyter's tab-completion feature: in any cell just type:

    alt.Chart.mark_
    
Maybe a tick instead of a point is more appropriate here:

In [None]:
alt.Chart(movies_genre).mark_tick().encode(
  x='AvgRating'
)

### Encodings

The next step is to add *visual encoding channels* (or *encodings* for short) to the chart. An encoding channel specifies how a given data column should be mapped onto the visual properties of the visualization.
Some of the more frequently used visual encodings are listed here:

* ``x``: x-axis value
* ``y``: y-axis value
* ``color``: color of the mark
* ``opacity``: transparency/opacity of the mark
* ``shape``: shape of the mark
* ``size``: size of the mark
* ``row``: row within a grid of facet plots
* ``column``: column within a grid of facet plots

For a complete list of these encodings, see the [Encodings](https://altair-viz.github.io/user_guide/encoding.html) section of the documentation.

Visual encodings can be created with the `encode()` method of the `Chart` object. 

In [None]:
alt.Chart(movies_genre).mark_point().encode(
  y='AvgRating', size="Watches"
)

One of the central ideas of Altair is that the library will **choose good defaults for your data type**.

The basic data types supported by Altair are as follows:

<table>
  <tr>
    <th>Data Type</th>
    <th>Code</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>quantitative</td>
    <td>Q</td>
    <td>Numerical quantity (real-valued)</td>
  </tr>
  <tr>
    <td>nominal</td>
    <td>N</td>
    <td>Name / Unordered categorical</td>
  </tr>
  <tr>
    <td>ordinal</td>
    <td>O</td>
    <td>Ordered categorial</td>
  </tr>
  <tr>
    <td>temporal</td>
    <td>T</td>
    <td>Date/time</td>
  </tr>
</table>

When you specify data as a pandas dataframe, these types are **automatically determined** by Altair.

We can encode another variable along Y-axis to turn it into a scatter plot.

Let us plot `ReleaseDate` on X-axis and `AvgRating` on Y-axis.

In [None]:
alt.Chart(movies_genre).mark_point().encode(
  x = 'ReleaseDate',
  y = 'AvgRating'
)

Having axis start from 0 does not always makes sense, we can turn off this behaviour.

We will use [`altair.Scale`](https://altair-viz.github.io/user_guide/generated/core/altair.Scale.html), [altair.X](https://altair-viz.github.io/user_guide/generated/channels/altair.X.html), and [altair.Y](https://altair-viz.github.io/user_guide/generated/channels/altair.Y.html)

In [None]:
alt.Chart(movies_genre).mark_point().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False))
)

We can replace this `point` mark with `circle`

In [None]:
alt.Chart(movies_genre).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False))
)

We can encode another variable as color of the marks. We will use `Children`, `Horror`,  and `Documentary` genres only.

In [None]:
select_genres = movies_genre[movies_genre['Genre'].isin(['Children', 'Horror', 'Documentary'])]

alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False)),
  color='Genre'
)

We can also color using a continous variable, let us try `Watches`.

In [None]:
alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False)),
  color='Watches'
)

In both cases, `Altair` automatically selects proper colormaps

## Binning and Aggregation

We can bin our data and create histograms.

Altair does not have special functions to create a histogram like matplotlib.

We use `alt.X()` for the x encoding and `count()`  for y.

We will also change the `mark` type to `bar`.

In [None]:
alt.Chart(movies_genre).mark_bar().encode(
  x = alt.X('AvgRating', bin=True),
  y = 'count()'
)

We can control the bins using `altair.Bin`

In [None]:
alt.Chart(movies_genre).mark_bar().encode(
  x = alt.X('AvgRating', bin=alt.Bin(maxbins = 5)),
  y = 'count()',
)

The data will be automatically grouped within each bin, if we apply another encoding e.g. `color`.

In [None]:
alt.Chart(select_genres).mark_bar().encode(
  x = alt.X('AvgRating', bin=alt.Bin(maxbins = 5)),
  y = 'count()',
  color = 'Genre'
)

We can make a seperate plot for each category if we use `column` encoding.

In [None]:
alt.Chart(select_genres).mark_bar().encode(
  x = alt.X('AvgRating', bin=alt.Bin(maxbins = 5)),
  y = 'count()',
  color = 'Genre',
  column = 'Genre'
)

## Line Chart

We will plot mean average rating for each year.



In [None]:
alt.Chart(movies_genre).mark_line().encode(
  x = 'ReleaseDate',
  y = alt.X('mean(AvgRating)', scale=alt.Scale(zero=False))
)

## Compound Charts

When we want to use [compound charts], Altair has a special syntax for that. First we define two charts, hist and scatter:

In [None]:
hist = alt.Chart(select_genres).mark_bar().encode(
  x = 'count()',
  y = 'Genre',
  color = 'Genre'
).properties(
  width = 400,
  height = 100
)

scatter = alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False)),
  color='Genre'
)

Then we can call the `vconcat` (vertical concatenation) function:

In [None]:
alt.vconcat(hist, scatter)

Alternatively, we can use the `&` operator, which is overloaded to do `vconcat`: 

In [None]:
hist & scatter

If we want horizontal concatenation, we can do this with `hconcat` or the `|` operator.

In [None]:
scatter | hist 

Finally, we can layer two charts on top of each other, using the `layer()` function or the `+` operator. 

In [None]:
year_hist = alt.Chart(select_genres).mark_bar().encode(
  x = 'ReleaseDate',
  y = 'count()'
).properties(
  width = 400,
  height = 300
)

(year_hist + scatter).resolve_scale(
    y = 'independent'
)

## Interactions

We can add simple interactions using the `interactive` function.

This enables simple interactions like zooming and panning.

In [None]:
alt.Chart(movies_genre).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False))
).interactive()

### Selections

To really look at multi-dimensional data, interactive selections based on brushes are useful. This is where Altair really shines. 

We first have to specify how we want to select something: 

In [None]:
interval = alt.selection_interval()

alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False)),
  color='Genre'
).add_params(interval)

Currently this selection does nothing.

We can use conditional highlighting when selection made. We use `altair.condition` and specify that elements outside the "interval" are shown in light gray.

In [None]:
interval = alt.selection_interval()

alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  alt.Y('AvgRating', scale = alt.Scale(zero = False)),
  color = alt.condition(interval, 'Genre', alt.value('lightgray'))
).add_params(interval)

The selection API automatically applies to all the compound charts, as long as they have same selection applied.

Let us see an example with two horizontally concatenated charts.

In [None]:
interval = alt.selection_interval()

base = alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
  color = alt.condition(interval, 'Genre', alt.value('lightgray'))
).add_params(interval)

avg_rating = base.encode(y = alt.Y('AvgRating', scale=alt.Scale(zero=False))) 
watches =  base.encode(y = alt.Y('Watches', scale=alt.Scale(zero=False)))

avg_rating | watches 

We can combine Layering and Selection API to do complicated interactions. Let us combine the above chart with histogram which shows counts for selections using a filter with the [`transform_filter()`](https://altair-viz.github.io/user_guide/generated/toplevel/altair.Chart.html?highlight=transform_filter#altair.Chart.transform_filter) function.

In [None]:
hist = alt.Chart(select_genres).mark_bar().encode(
  x = 'count()',
  y = 'Genre',
  color = 'Genre'
).properties(
  width = 800,
  height = 100
).transform_filter(
  interval
)

scatter = avg_rating | watches

scatter & hist

Or here a similar example with a filtered scatterplot: 

In [None]:
watches_filtered = alt.Chart(select_genres).mark_circle().encode(
  alt.X('ReleaseDate', scale = alt.Scale(zero = False)),
    y = alt.Y('Watches', scale=alt.Scale(zero=False)),
  color = alt.condition(interval, 'Genre', alt.value('lightgray'))
).transform_filter(
  interval
)

avg_rating | watches_filtered

## Maps

Altair can also do maps, based on it's `mark_geoshape` mark. See the [source for this example](https://altair-viz.github.io/gallery/choropleth.html#gallery-choropleth).

In [None]:
from vega_datasets import data

counties = alt.topo_feature(data.us_10m.url, 'counties')
source = data.unemployment.url

alt.Chart(counties).mark_geoshape().encode(
    color='rate:Q'
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(source, 'id', ['rate'])
).project(
    type='albersUsa'
).properties(
    width=500,
    height=300
)