<a href='http://www.holoviews.org'><img src="../notebooks/assets/hv+bk.png" alt="HV+BK logos" width="40%;" align="left"/></a>
<div style="float:right;"><h2>Excercise 2: Datasets and Collections of Data</h2></div>

In [None]:
import numpy as np
import pandas as pd
import holoviews as hv

hv.extension('bokeh')

In [None]:
diamonds = pd.read_csv('../data/diamonds.csv').sample(5000)
macro = pd.read_csv('../data/macro.csv')

### Example 1:

Start by inspecting the ``macro`` dataframe using the ``head`` method to discover which columns it declares.

Now plot the ``growth`` by ``year`` using a ``Scatter`` element.

<b><a href="#solution1" data-toggle="collapse">Solution</a></b>

<div id="solution1" class="collapse">
    <br>
    <code>hv.Scatter(macro, 'year', 'growth')</code>
</div>

Now declare the 'country' and 'unem' columns as additional vdims to the ``Scatter`` object and set ``color_index='country'`` and ``size_index='unem'`` as plot options. 

<b><a href="#hint1" data-toggle="collapse">Hint</a></b>

<div id="hint1" class="collapse">
The ``kdims`` and ``vdims`` arguments accept a single dimension or lists of dimensions.
</div>

<b><a href="#solution2" data-toggle="collapse">Solution</a></b>

<div id="solution2" class="collapse">
<br>
<code>%%opts Scatter [color_index='country' size_index='unem']
hv.Scatter(macro, 'year', ['growth', 'country', 'unem'])</code>
</div>

You should now have a plot a fairly complex plot showing the growth by year, where each point is colored by the country and a size scaled by the unemployment, but you will immediately note the various issues with this plot. First identify the issues then try to address them by using tab-completion in the ``%%opts`` magic.

<b><a href="#hint2" data-toggle="collapse">Hint</a></b>

<div id="hint2" class="collapse">
The color mapping can be controlled using the ``cmap`` style option, while the legend position can be changed using the ``legend_position`` plot option.
</div>

<b><a href="#solution3" data-toggle="collapse">Solution</a></b>

<div id="solution3" class="collapse">
<br>
<code>%%opts Scatter [color_index='country' size_index='unem' width=800 height=400 legend_position='left'] (cmap='tab20')
hv.Scatter(macro, 'year', ['growth', 'country', 'unem']).sort(['year', 'country'])</code>
</div>


### Example 2: 

This time we will be working with a dataset about diamonds, as before inspect the dataframe to see what columns it has. Instead of looking at the dataframe itself look at the string representation of the ``diamond_ds`` Dataset:

In [None]:
diamond_ds = hv.Dataset(diamonds, ['cut', 'color', 'clarity'])

Using the ``.to`` method on the ``diamond_ds`` plot the 'carat' against the 'price' column using a  ``Scatter`` element and use the groupby kwarg to split the dataset by 'clarity'.

<b><a href="#hint3" data-toggle="collapse">Hint</a></b>

<div id="hint3" class="collapse">
The ``.to`` method follows a signature of ``dataset.to(Element, kdims, vdims, groupby)``.
</div>

<b><a href="#solution4" data-toggle="collapse">Solution</a></b>

<div id="solution4" class="collapse">
<br>
<code>diamond_ds.to(hv.Scatter, 'carat', 'price', groupby='clarity')</code>
</div>

The plot you should have gotten lets you view each subset separately. Now use the ``.overlay`` method on the grouped dataset to overlay the individual plots. Then adjust the width and height of the plot, enable a log y-axis and define a custom color cycle using ``Cycle('Category20')`` as a style option.

<b><a href="#hint4" data-toggle="collapse">Hint</a></b>

<div id="hint4" class="collapse">
A ``Cycle`` can be used on any style option, to set a color cycle just set it on the ``color`` option.
</div>

<b><a href="#solution5" data-toggle="collapse">Solution</a></b>

<div id="solution5" class="collapse">
<br>
<code>%%opts Scatter [width=600 height=400 logy=True] (color=Cycle('Category20'))
diamond_ds.to(hv.Scatter, 'carat', 'price', groupby='clarity').overlay()</code>
</div>

### Example 3:

Now let's look at the same dataset in a different way. Again using the ``.to`` method plot the 'price' broken down by 'cut' and group it by 'clarity'.

<b><a href="#hint4" data-toggle="collapse">Hint</a></b>

<div id="hint4" class="collapse">
Make sure to specify the 'price' as the value dimension, i.e. as the second argument.
</div>

<b><a href="#solution6" data-toggle="collapse">Solution</a></b>

<div id="solution6" class="collapse">
<br>
<code>diamond_ds.to(hv.BoxWhisker, 'cut', 'price', groupby='clarity')</code>
</div>

This time let's lay out the grouped dimension as a grid. Use the ``grid`` method on the grouped dataset. Then enable the ``shared_xaxis`` and ``shared_yaxis`` plot options on the resulting ``GridSpace``. Finally set a custom ``xrotation`` to rotate the x-axis ticks.

<b><a href="#hint4" data-toggle="collapse">Hint</a></b>

<div id="hint4" class="collapse">
Just like elements, containers like a ``GridSpace`` have plot options which can be specified using ``%%opts GridSpace [...].
</div>

<b><a href="#solution6" data-toggle="collapse">Solution</a></b>

<div id="solution6" class="collapse">
<br>
<code>%%opts GridSpace [shared_xaxis=True shared_yaxis=True] BoxWhisker [xrotation=45]
diamond_ds.to(hv.BoxWhisker, 'cut', 'price', groupby='clarity').grid()</code>
</div>