<style>div.container { width: 100% }</style>
<img style="float:left;  vertical-align:text-bottom;" height="65" width="172" src="https://raw.githubusercontent.com/holoviz/holoviz/master/doc/_static/holoviz-logo-unstacked.svg" />
<div style="float:right; vertical-align:text-bottom;"><h2>Tutorial 2. Plotting</h2></div>

When trying to make sense of data, there are many representations to choose from, including data tables, textual summaries and so on. We'll mostly focus on plotting data to get an intuitive visual representation, using a simple but powerful plotting API.

If you have tried to visualize a `pandas.DataFrame` before, then you have likely encountered the [Pandas .plot() API](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html). These plotting commands use [Matplotlib](http://matplotlib.org) to render static PNGs or SVGs in a Jupyter notebook using the `inline` backend, or interactive figures via `%matplotlib widget`, with a command that can be as simple as `df.plot()` for a DataFrame with one or two columns. 

The Pandas .plot() API has emerged as a de-facto standard for high-level plotting APIs in Python, and is now supported by many different libraries that use various underlying plotting engines to provide additional power and flexibility. Learning this API allows you to access capabilities provided by a wide variety of underlying tools, with relatively little additional effort. The libraries currently supporting this API include:

- [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) -- Matplotlib-based API included with Pandas. Static or interactive output in Jupyter notebooks.
- [xarray](https://xarray.pydata.org/en/stable/plotting.html) -- Matplotlib-based API included with xarray, based on pandas .plot API. Static or interactive output in Jupyter notebooks.
- [hvPlot](https://hvplot.pyviz.org) -- HoloViews and Bokeh-based interactive plots for Pandas, GeoPandas, xarray, Dask, Intake, and Streamz data.
- [Pandas Bokeh](https://github.com/PatrikHlobil/Pandas-Bokeh) -- Bokeh-based interactive plots, for Pandas, GeoPandas, and PySpark data.
- [Cufflinks](https://github.com/santosjorge/cufflinks) -- Plotly-based interactive plots for Pandas data.
- [Plotly Express](https://plotly.com/python/pandas-backend) -- Plotly-Express-based interactive plots for Pandas data; only partial support for the .plot API keywords.
- [PdVega](https://altair-viz.github.io/pdvega) -- Vega-lite-based, JSON-encoded interactive plots for Pandas data.

In this notebook we'll explore what is possible with the default `.plot` API and demonstrate the additional capabilities provided by `.hvplot`. 

### Import and configure packages

Please note that in **Colab** you will need to run `!pip install panel hvplot`.

In [None]:
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    !pip install panel hvplot

In [None]:
import holoviews as hv
import hvplot.pandas
import panel as pn

You will need to run the following to be able to see plots in **Colab**.

In [None]:
if 'google.colab' in str(get_ipython()):
    def _render(self, **kw):
        hv.extension('bokeh')
        return hv.Store.render(self)
    hv.core.Dimensioned._repr_mimebundle_ = _render

### Read in the data

Here we will focus on Pandas, but a similar approach will work for any supported DataFrame type, including Dask for distributed computing or RAPIDS cuDF for GPU computing. 

In this example, we load a small Auto MPG data set. 

In [None]:
from bokeh.sampledata.autompg import autompg_clean as df
# you may also read in data from a local file, for example using
# df = pd.read_csv('data.csv')

In [None]:
len(df)

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.describe()

### Using Pandas `.plot()`

The first thing that we'd like to do with this data is visualize two features of the dataset: mpg and hp. So we would like to make a scatter or points plot where _x_ is mpg and _y_ is hp. 

We can do that for the smaller dataframe using the `pandas.plot` API and Matplotlib:

In [None]:
df.plot.scatter(x='mpg', y='hp');

In [None]:
%matplotlib widget
df.plot.scatter(x='mpg', y='hp');

In [None]:
%matplotlib inline
df.plot.scatter(x='mpg', y='hp');

### Using .hvplot
As you can see above, the Pandas API gives you a usable plot very easily. You can make a very similar plot with the same arguments using hvplot, after importing hvplot.pandas to install hvPlot support into Pandas:

In [None]:
import hvplot.pandas 

In [None]:
df.hvplot.scatter(x='mpg', y='hp')

Here unlike in the Pandas .plot() there is a default hover action on the datapoints to show the values, and you can always pan and zoom to focus on any particular region of the data of interest.



You can also add hover columns to show additional info when you hover:

In [None]:
df.hvplot.scatter(x='mpg', y='hp', hover_cols=['yr', 'horsepower'])

#### Exercise

Try changing the x axis on the plot above to see the relationship between weight and hp. 

<details><summary><i><u>(Solution)</u><i></summary><br>

<details><summary><i><u>(Are you sure you want to see the solution?)</u><i></summary><br>

```python
df.hvplot.scatter(x='weight', y='hp')
```
</details>
    
</details>

#### Exercise

Try create a line plot to see the relationship between weight and hp. 
<details><summary><i><u>(Solution)</u><i></summary><br>

<details><summary><i><u>(Notice anything wrong with your plot?)</u><i></summary><br>

```python
df.sort_values(by='weight').hvplot.line(x='weight', y='hp')
```
</details>
    
</details>

### Getting help with hvplot options

You may be wondering how you can learn about all the options that are available with `hvplot`. For this purpose, you can use tab-completion in the Jupyter notebook or the `hvplot.help` function which are documented in the [user guide](https://hvplot.holoviz.org/user_guide/Customization.html).

For tab completion, you can press tab after the opening parenthesis in a `obj.hvplot.<kind>(` call. For instance, you can try pressing tab after the partial expression `df.hvplot.scatter(<TAB>`.



Alternatively, you can call `hvplot.help(<kind>)` to see a documentation pane pop up in the notebook. Try uncommenting the following line and executing it:

In [None]:
hvplot.help('scatter')

You will see there are a lot of options!  You can control which section of the documentation you view with the `generic`, `docstring` and `style` boolean switches also documented in the  [user guide](https://hvplot.holoviz.org/user_guide/Customization.html). If you run the following cell, you will see that `alpha` is listed in the 'Style options'.

In [None]:
hvplot.help('scatter', style=True, generic=False)

These style options refer to options that are part of the Bokeh API. This means that the `alpha` keyword is passed directly to Bokeh just like all the other style options. As these are Bokeh-level options, you can find out more by using the search functionality in the [Bokeh docs](https://docs.bokeh.org/en/latest/).

#### Exercise

Try changing color, width, height, and alpha value of the plot. 
<details><summary><i><u>(Solution)</u><i></summary><br>

<details><summary><i><u>(You can do this!)</u><i></summary><br>

```python
df.hvplot.scatter(
    x='mpg', 
    y='hp',
    color='pink',
    width=600,
    height=300,
    alpha=0.5
)
```
</details>
    
</details>

### Other kinds of plots

Use tab completion to explore the available plot types.

```
df.hvplot.<TAB>
```

Plot types available include:

- .area(): Plots a area chart similar to a line chart except for filling the area under the curve and optionally stacking

- .bar(): Plots a bar chart that can be stacked or grouped

- .bivariate(): Plots 2D density of a set of points

- .box(): Plots a box-whisker chart comparing the distribution of one or more variables

- .heatmap(): Plots a heatmap to visualizing a variable across two independent dimensions

- .hexbins(): Plots hex bins

- .hist(): Plots the distribution of one or histograms as a set of bins

- .kde(): Plots the kernel density estimate of one or more variables.

- .line(): Plots a line chart (such as for a time series)

- .scatter(): Plots a scatter chart comparing two variables

- .step(): Plots a step chart akin to a line plot

- .table(): Generates a SlickGrid DataTable

- .violin(): Plots a violin plot comparing the distribution of one or more variables using the kernel density estimate



In [None]:
# default is line plot
df.sort_values(by='mpg').hvplot(x='mpg', y='hp')

#### Exercise

Create a hexbin plot (hexbin) plot of the relationship between mpg and hp:
<details><summary><i><u>(Solution)</u><i></summary><br>
    
<details><summary><i><u>(Are you sure you want to see the solution?)</u><i></summary><br>

```python
df.hvplot.hexbin(x='mpg', y='hp')
```
</details>
</details>

In [None]:
# histogram
df.hvplot.hist('mpg', bins=10)

#### Exercise

Create a kernel density estimate (kde) plot, a box plot, and a violon plot of mpg for `df`:
<details><summary><i><u>(Hint)</u><i></summary><br>
Use kde, box, and violin. 
    
<details><summary><i><u>(Solution)</u><i></summary><br>

```python
df.hvplot.kde('mpg')
df.hvplot.box('mpg')
df.hvplot.violin('mpg')
```
    
</details>
</details>

#### Exercise

Create a bar plot showing the counts of origin:
<details><summary><i><u>(Solution)</u><i></summary><br>

<details><summary><i><u>(Give yourself one more minute)</u><i></summary><br>


```python
df.origin.value_counts().hvplot.bar()
```
</details>
</details>

#### Exercise

Create a bar plot showing the mean value of mpg by yr and origin:
<details><summary><i><u>(Hint)</u><i></summary><br>

Use .groupby 
    
<details><summary><i><u>(Solution)</u><i></summary><br>


```python
df.groupby(['yr', 'origin']).mean()['mpg'].hvplot.bar(stacked=True, height=500, legend='top_left')
```
</details>
</details>

### The kind argument 
Instead of using the hvplot namespace to call different types of plots directly (e.g., `df.hvplot.scatter(x='mpg', y='hp')`), we can use the kind argument to the plot call (e.g., `df.hvplot(x='mpg', y='hp', kind='scatter')`). 

In [None]:
df.hvplot(x='mpg', y='hp', kind='scatter')

## Grouping

We can overlay all our groups on the same plot using the `by` option:

In [None]:
df.hvplot.scatter(x='mpg', y='hp', by='origin')

Define color palette:

In [None]:
PALETTE = ["#ff6f69", "#ffcc5c", "#88d8b0", ]
pn.Row(
    pn.layout.HSpacer(height=50, background=PALETTE[0]),
    pn.layout.HSpacer(height=50, background=PALETTE[1]),
    pn.layout.HSpacer(height=50, background=PALETTE[2]),
)

#### Exercise

Try changing the color using the above color palette
<details><summary><i><u>(Solution)</u><i></summary><br>
<details><summary><i><u>(Give yourself another minute!)</u><i></summary><br>
    
```python
df.hvplot.scatter(x='mpg', y='hp', by='origin', color=PALETTE)
```
</details>
</details>

#### Exercise

Add `subplots=True` and `width=300` to see the different classes side-by-side instead of overlaid. The axes will be linked, so try zooming.

<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.scatter(
    x='mpg', 
    y='hp', 
    by='origin',
    subplots=True,
    width=300
    )
```

</details>

#### Exercise

Add `shared_axes=False` and try zooming.

<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.scatter(
    x='mpg', 
    y='hp', 
    by='origin',
    subplots=True,
    width=300,
    shared_axes=False
    )
```

</details>

#### Exercise

Add `.cols(1)` and see what happens.

<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.scatter(
    x='mpg', 
    y='hp', 
    by='origin',
    subplots=True,
    width=300,
    shared_axes=False
    ).cols(1)
```

</details>

What if you want a single plot, but want to see each class separately? You can use the `groupby` option to get a widget for toggling between classes, here in a bivariate plot (using a subset of the data as bivariate plots can be expensive to compute):

In [None]:
df.hvplot.scatter(x='mpg', y='hp', groupby='origin')

#### Exercise

Plot the distribution of mpg by origin.

<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.hist(y='mpg', groupby='origin')
```

</details>

## Combine plots

We can use a `+` symbol to compose HoloViews objects side-by-side with axes linked for any shared dimensions. 

In [None]:
df.hvplot.hist('mpg', bins=10, width=300)

In [None]:
df.hvplot.scatter(x='mpg', y='hp', width=300)

In [None]:
df.hvplot.hist('mpg', bins=10, width=300) + df.hvplot.scatter(x='mpg', y='hp', width=300)

Organize plots into one columns:

In [None]:
(df.hvplot.hist('mpg', bins=10, height=300) + df.hvplot.scatter(x='mpg', y='hp', height=300)).cols(1)

Try zooming in and out (including on the axes) to explore the linking between the plots above.

## Overlay plots

HoloViews objects can be composed into an overlay using a * symbol, with a legend generated to distinguish them. 
As we have seen previously, we can use `by='origin'` to distinguish points by origin using different colors:

In [None]:
df.hvplot.scatter(x='mpg', y='hp', by='origin')

To show how * works, we can plot each region separately and combine them into an overlay using the * symbol:

In [None]:
( 
    df[df.origin=='Asia'].hvplot.scatter(x='mpg', y='hp', by='origin') * 
    df[df.origin=='Europe'].hvplot.scatter(x='mpg', y='hp', by='origin') * 
    df[df.origin=='North America'].hvplot.scatter(x='mpg', y='hp', by='origin') 
)

## Scatter Matrix

When working with multi-dimensional data, it is often difficult to understand the relationship between all the different variables. A scatter_matrix makes it possible to visualize all of the pairwise relationships in a compact format. 

In [None]:
hvplot.scatter_matrix(df, c="origin")

# Interlinked plots

hvPlot allows you to generate a number of different types of plot quickly from a standard API, returning Bokeh-based [HoloViews](https://holoviews.org) objects as discussed in the previous notebook. Each initial plot will make some aspects of the data clear, and using the automatic interactive Bokeh pan, zoom, and hover tools you can find additional trends and outliers at different spatial locations and spatial scales within each plot.

Beyond what you can discover from each plot individually, how do you understand how the various plots relate to each other? For instance, imagine you have a data frame with columns _u_, _v_, _w_, _z_, and have separate plots of _u_ vs. _v_, _u_ vs. _w_, and _w_ vs. _z_. If you see a few outliers or a clump of unusual datapoints in your  _u_ vs. _v_ plot, how can you find out the properties of those points in the _w_ vs. _z_ or other plots? Are those unusual _u_ vs. _v_ points typically high _w_, uniformly distributed along _w_, or some other pattern? 

To help understand multicolumnar and multidimensional datasets like this, scientists will often build complex multi-pane dashboards with custom functionality. HoloViz (and specifically Panel) tools are great for such dashboards, but here we can actually use the fact that hvPlot returns HoloViews objects to get quite sophisticated interlinking ([linked brushing](http://holoviews.org/user_guide/Linked_Brushing.html)) "for free", without needing to build any dashboard. HoloViews objects store metadata about what dimensions they cover, and we can use this metadata programmatically to let the user see how any data points in any plot relate across different plots.

### Linked brushing across elements

Previously, we saw how plot axes are automatically linked for panning and zooming when using the `+` operator, provided the dimensions match. When dimensions or an underlying index match across multiple plots, we can use a similar principle to achieve linked brushing, where user selections are also linked across plots.

To illustrate, let us generate two histograms from our `df` DataFrame:

In [None]:
mpg_hist = df.hvplot.hist('mpg', width=300, height=150)
hp_hist = df.hvplot.hist('hp', width=300, height=150)

These two histograms are plotting two different dimensions of our dataset (mpg and hp). The samples between these two histograms share an index, and the relationships between these data points can be discovered and exploited programmatically even though they are in different elements. To do this, we can create an object for linking selections across elements:

In [None]:
ls = hv.link_selections.instance()

Given some HoloViews objects (elements, layouts, etc.), we can create versions of them linked to this shared linking object by calling `ls` on them:

In [None]:
ls(mpg_hist + hp_hist)

Try using the first Bokeh tool to select areas of either histogram: you'll then see both the mpg and hp distributions for the bins you have selected, compared to the overall distribution. By default, selections on both histograms are combined so that the selection is the intersection of the two regions selected (data points matching _both_ the constraints on depth and the constraints on magnitude that you select). You can use the Bokeh reset tool (double arrow) to clear your selection.

### Linked brushing across element types

The previous example linked across two histograms as a first example, but nothing prevents you from linked brushing across different element types. 

In [None]:
points = df.hvplot.scatter(x='mpg', y='hp', c='origin',  height=150)
mpg_hist = df.hvplot.hist('mpg', height=150)
hp_violin = df.hvplot.violin('hp', height=150)

In [None]:
ls2 = hv.link_selections.instance()

(ls2(points + mpg_hist + hp_violin)).cols(1)

## Accessing the data selection

If you pass your `DataFrame` into the `.filter` method of your linked selection object, you can apply the active filter that you specified interactively:

In [None]:
filtered = ls2.filter(df)
filtered

You can analysis your filtered data:

In [None]:
filtered.describe()

## Exploring further

As you can see, hvPlot makes it simple to explore your data interactively, with commands based on the widely used Pandas `.plot()` API but now supporting many more features and different types of data. The visualizations above just touch the surface of what is available from hvPlot, and you can explore the [hvPlot website](https://hvplot.pyviz.org) to see much more, or just explore it yourself using tab completion (`df.hvplot.`_[TAB]_). The following section will focus on how to put these plots together once you have them, linking them to understand and show their structure.

#### Reading time

Read the [hvPlot documentation](https://hvplot.pyviz.org) and let us know if you have any questions.