<style>div.container { width: 100% }</style>
<img style="float:left;  vertical-align:text-bottom;" height="65" width="172" src="https://raw.githubusercontent.com/holoviz/holoviz/master/doc/_static/holoviz-logo-unstacked.svg" />
<div style="float:right; vertical-align:text-bottom;"><h2>Tutorial 2. Plotting</h2></div>

When trying to make sense of data, there are many representations to choose from, including data tables, textual summaries and so on. We'll mostly focus on plotting data to get an intuitive visual representation, using a simple but powerful plotting API.

If you have tried to visualize a `pandas.DataFrame` before, then you have likely encountered the [Pandas .plot() API](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html). These plotting commands use [Matplotlib](http://matplotlib.org) to render static PNGs or SVGs in a Jupyter notebook using the `inline` backend, or interactive figures via `%matplotlib widget`, with a command that can be as simple as `df.plot()` for a DataFrame with one or two columns. 

The Pandas .plot() API has emerged as a de-facto standard for high-level plotting APIs in Python, and is now supported by many different libraries that use various underlying plotting engines to provide additional power and flexibility. Learning this API allows you to access capabilities provided by a wide variety of underlying tools, with relatively little additional effort. The libraries currently supporting this API include:

- [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) -- Matplotlib-based API included with Pandas. Static or interactive output in Jupyter notebooks.
- [xarray](https://xarray.pydata.org/en/stable/plotting.html) -- Matplotlib-based API included with xarray, based on pandas .plot API. Static or interactive output in Jupyter notebooks.
- [hvPlot](https://hvplot.pyviz.org) -- HoloViews and Bokeh-based interactive plots for Pandas, GeoPandas, xarray, Dask, Intake, and Streamz data.
- [Pandas Bokeh](https://github.com/PatrikHlobil/Pandas-Bokeh) -- Bokeh-based interactive plots, for Pandas, GeoPandas, and PySpark data.
- [Cufflinks](https://github.com/santosjorge/cufflinks) -- Plotly-based interactive plots for Pandas data.
- [Plotly Express](https://plotly.com/python/pandas-backend) -- Plotly-Express-based interactive plots for Pandas data; only partial support for the .plot API keywords.
- [PdVega](https://altair-viz.github.io/pdvega) -- Vega-lite-based, JSON-encoded interactive plots for Pandas data.

In this notebook we'll explore what is possible with the default `.plot` API and demonstrate the additional capabilities provided by `.hvplot`. 

### Import and configure packages

Please note that in **Colab** you will need to `!pip install panel hvplot`.

In [None]:
# !pip install panel==0.12.6 hvplot==0.7.3

In [None]:
import hvplot.pandas
import panel as pn

### Read in the data

Here we will focus on Pandas, but a similar approach will work for any supported DataFrame type, including Dask for distributed computing or RAPIDS cuDF for GPU computing. This dataset is relatively large (2.1 million rows), but should still fit into memory on any recent machine, and thus won't need special out-of-core or distributed approaches like Dask provides.

In [None]:
from bokeh.sampledata.autompg import autompg_clean as df
# you may also read in data from a local file, for example using
# df = pd.read_csv('data.csv')

In [None]:
len(df)

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.describe()

### Using Pandas `.plot()`

The first thing that we'd like to do with this data is visualize two features of the dataset: mpg and hp. So we would like to make a scatter or points plot where _x_ is mpg and _y_ is hp. 

We can do that for the smaller dataframe using the `pandas.plot` API and Matplotlib:

In [None]:
df.plot.scatter(x='mpg', y='hp')

### Using .hvplot
As you can see above, the Pandas API gives you a usable plot very easily. You can make a very similar plot with the same arguments using hvplot, after importing hvplot.pandas to install hvPlot support into Pandas:

In [None]:
import hvplot.pandas 

In [None]:
df.hvplot.scatter(x='mpg', y='hp')

Here unlike in the Pandas .plot() there is a default hover action on the datapoints to show the values, and you can always pan and zoom to focus on any particular region of the data of interest.



#### Exercise

Try changing the x axis on the plot above to see the relationship between weight and hp. 
<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.scatter(x='weight', y='hp')
```

</details>

### Getting help with hvplot options

You may be wondering how you can learn about all the options that are available with `hvplot`. For this purpose, you can use tab-completion in the Jupyter notebook or the `hvplot.help` function which are documented in the [user guide](https://hvplot.holoviz.org/user_guide/Customization.html).

For tab completion, you can press tab after the opening parenthesis in a `obj.hvplot.<kind>(` call. For instance, you can try pressing tab after the partial expression `df.hvplot.scatter(<TAB>`.

Alternatively, you can call `hvplot.help(<kind>)` to see a documentation pane pop up in the notebook. Try uncommenting the following line and executing it:

In [None]:
hvplot.help('scatter')

You will see there are a lot of options!  You can control which section of the documentation you view with the `generic`, `docstring` and `style` boolean switches also documented in the  [user guide](https://hvplot.holoviz.org/user_guide/Customization.html). If you run the following cell, you will see that `alpha` is listed in the 'Style options'.

In [None]:
hvplot.help('scatter', style=True, generic=False)

In [None]:
PALETTE = ["#ff6f69", "#ffcc5c", "#88d8b0", ]
pn.Row(
    pn.layout.HSpacer(height=50, background=PALETTE[0]),
    pn.layout.HSpacer(height=50, background=PALETTE[1]),
    pn.layout.HSpacer(height=50, background=PALETTE[2]),
)

#### Exercise

Try changing color, width, height, and alpha value of the plot. 
<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.scatter(
    x='mpg', 
    y='hp',
    color=PALETTE,
    width=600,
    height=300,
    alpha=0.5
)
```

</details>

In [None]:
df.hvplot.hist('mpg', bins=10)

#### Exercise

Create a kernel density estimate (kde) plot of mpg for `df`:
<details><summary><i><u>(Solution)</u><i></summary><br>

```python
df.hvplot.kde('mpg')
```

</details>

In [None]:
df.hvplot.hexbin(x='mpg', y='hp')

### Grouping

We can overlay all our groups on the same plot using the `by` option:

In [None]:
df.hvplot.scatter(x='mpg', y='hp', by='origin')

#### Exercise

Add `subplots=True` and `width=300` to see the different classes side-by-side instead of overlaid. The axes will be linked, so try zooming.

<details><summary><i><u>(Solution)</u><i></summary><br>
    
```python
df.hvplot.scatter(
    x='mpg', 
    y='hp', 
    by='origin',
    subplots=True,
    width=300
    )
```

</details>

What if you want a single plot, but want to see each class separately? You can use the `groupby` option to get a widget for toggling between classes, here in a bivariate plot (using a subset of the data as bivariate plots can be expensive to compute):

In [None]:
df.hvplot.scatter(x='mpg', y='hp', groupby='origin')

# Exploring further

As you can see, hvPlot makes it simple to explore your data interactively, with commands based on the widely used Pandas `.plot()` API but now supporting many more features and different types of data. The visualizations above just touch the surface of what is available from hvPlot, and you can explore the [hvPlot website](https://hvplot.pyviz.org) to see much more, or just explore it yourself using tab completion (`df.hvplot.`_[TAB]_). The following section will focus on how to put these plots together once you have them, linking them to understand and show their structure.