## What is a visualization?

Visualizations help us reveal patterns in data by mapping some property of the data onto a visual property our brain can interpret, e.g.

* A geometry or glyph type (scatter, bar, pie etc.)
* Scales/axes (x and y)
* Colors
* Shapes
* Size
* Opacity

In this section we will explore how to use pandas `.plot` (which uses matplotlib) and `.hvplot` to plot data directly from a DataFrame. The difference between the two is that .plot uses `Matplotlib` to plot the data which gives you a static image, while ``.hvplot`` uses HoloViews and Bokeh in the background to give you interactive plots.

<img src="https://hvplot.pyviz.org/assets/hvplot-wm.png" width=150px></img>

hvPlot is a library developed at Anaconda by us to quickly visually explore datasets using an API users are already familiar with.

We will start by importing pandas and make a simple dataset containing a few timeseries:

In [None]:
import pandas as pd
import numpy as np

idx = pd.date_range('1/1/2000', periods=1000)
df  = pd.DataFrame(np.random.randn(1000, 4), index=idx, columns=list('ABCD')).cumsum()

Next we will initialize both matplotlib and hvplot plotting backends:

In [None]:
import hvplot.pandas

%matplotlib inline

### Plotting

To plot this dataframe we can now simple call `.plot`, which will give us a static Matplotlib plot:

In [None]:
df.plot();

Alternatively we can call `.hvplot`, giving us a fully interactive plot including hover, interactive legends, zooming panning etc.:

In [None]:
df.hvplot()

There are several plot types available using the `.hvplot.<type>()` methods. See [hvPlot reference gallery](https://hvplot.pyviz.org/reference/index.html).

### Customizing the plot

hvPlot tries to chose sensible defaults for plots however you can adjust the `width` and `height` of a plot by setting these as options. You can also make the plot responsive to fit the entire screen:

In [None]:
df.hvplot(responsive=True, min_height=400)

In addition to options you can see when typing `df.hvplot.<type>(Shift+Tab)` you can see the full list of options here: https://hvplot.pyviz.org/user_guide/Customization.html

### Plotting single variables

To plot just a single column we can declare both the x- and the y-column:

In [None]:
df.hvplot.line('index', 'A')

Now plot column 'C' in the same way:

## Plotting histograms

Now let us return to the mpg dataset and explore generating a few different plot types for this data:

In [None]:
mpg = pd.read_csv('auto-mpg.csv')

A histogram is a very useful tool for understanding the distribution of a dataset, by selecting a column we can generate a histogram for that column:

In [None]:
mpg.mpg.hvplot.hist()

Now plot a histogram of the horsepower (hp) column:

## Plotting scatter plots

Scatter plots allow us to see the relationship between two variables:

In [None]:
mpg.hvplot.scatter('hp', 'mpg')

Now plot the relationship between 'weight' and 'mpg' values:

We can also color by the origin to see how this third variable interacts with the other two:

In [None]:
mpg.hvplot.scatter('hp', 'mpg', color='origin')

## Plotting bars

Bar graphs can be very useful for visualizing statistics for a (relatively small) number of categorical values, e.g. by computing the mean 'mpg' for each 'origin':

In [None]:
mpg.groupby('origin').mean().hvplot.bar('origin', 'mpg')

Plot a bar graph of the mean miles per gallon (mpg) values per year:

## Plotting subsets

To explore subsetting we will load a second dataset containing the population by year and by country:

In [None]:
pop = pd.read_csv('world_ind_pop_data.csv')

pop.head()

Next we will select the 5 most populous countries:

In [None]:
most_populous = ['CHN', 'IND', 'USA', 'IDN', 'BRA']
populous_df = pop[pop.CountryCode.isin(most_populous)]

And plot the 'Total Population' ``by`` country:

In [None]:
populous_df.hvplot('Year', 'Total Population', by='CountryName')

Here the `by` variable overlays the line plot for each country. In a bar chart on the other hand it will group each bar:

In [None]:
mpg.groupby(['yr', 'origin']).mean().hvplot.bar('yr', 'mpg', by='origin', rot=90)

If we have multiple `y` variables we want to plot we can also generate a separate plot for each of these by leaving ``y`` unspecified and using the subplots option:

In [None]:
mpg.groupby('origin').mean().hvplot.bar('origin', subplots=True, shared_axes=False)

Another option for subsetting data is to use the ``groupby`` argument, instead of overlaying or subplotting this will generate widgets to select between the different values of the specified column(s):

In [None]:
income_groups = ['Low income', 'Middle income', 'High income']
income_df = pop[pop.CountryName.isin(income_groups)]

income_df.hvplot.bar('CountryName', 'Urban population (% of total)', groupby='Year',
                     ylim=(0, 100)).redim.values(CountryName=income_groups)

Create a bar plot of the 'Urban population (% of total)' by 'Year' with a widget to select between 'CountryName':

## Adding widgets

Another major strength when comparing hvPlot to the standard .plot API is the integration with Panel widgets. By creating a set of widgets and passing those in instead of fixed variables we can quickly create a GUI for exploring some dataset:

In [None]:
import panel as pn

x = pn.widgets.Select(options=list(mpg.columns), value='mpg', name='x')
y = pn.widgets.Select(options=list(mpg.columns), value='hp', name='y')
color = pn.widgets.Select(options=list(mpg.columns), value='origin', name='color')

pn.Row(
    pn.Column(x, y, color),
    mpg.hvplot.scatter(x, y, color=color)
)