## 3: Introduction to the `.plot` API and associated libraries.

### Read in the data

In [None]:
import dask
import dask.dataframe as dd

In [None]:
df = dd.read_parquet('../data/earthquakes.parq')
df.head()

### Using default `.plot`

The first thing that we'd like to do with this data is visualize the locations of every earthquake. So we would like to make a `scatter` plot where `x='longitude'` and `y='latitude'`. If you are familiar with the `pandas.plot` API you might expect to execute: `df.plot.scatter(x='longitude', y='latitude')`. Feel free to try this out in a new cell. It throws an error: `
AttributeError: 'DataFrame' object has no attribute 'plot'`. Since we have a dask dataframe rather than a pandas dataframe, we need to first convert it to pandas. In order to make the data more manageable, we'll use just a fraction (1%) of it and call that `small_df`. 

In [None]:
small_df = df.sample(frac=.01).compute()
small_df.shape

Now we have a smaller dataset with just 21k earthquakes. We can use that to test out our visualizations before ramping back up to the full dataset.

In [None]:
small_df.plot.scatter(x='longitude', y='latitude')

That is a good place to start and you can start to see the structure of the edges of the plates (which often correspond with the edges of the continents).

In [None]:
import hvplot.pandas
import hvplot.dask

In [None]:
small_df.hvplot.scatter(x='longitude', y='latitude')

In [None]:
small_df.hvplot.scatter(x='longitude', y='latitude', datashade=True)

We can even use the original full dask dataset (remember that this has 100x the points of the small dataset).

In [None]:
df.hvplot.scatter(x='longitude', y='latitude', datashade=True)

When you zoom into the plot, the image re-renders. This can be slow (tens of seconds) when the data is being read from disk. If your computer has sufficient RAM, then you can persist the dataset in memory and significantly speed up the time that it takes to re-render the plot on zoom events.

In [None]:
persisted = df.persist()

In [None]:
persisted.hvplot.scatter(x='longitude', y='latitude', datashade=True)

### Magnitude of earthquakes over time

In [None]:
small_df.plot(x='time', y='mag')

In [None]:
cleaned_small_df = small_df.copy()
cleaned_small_df['mag'] = small_df.mag.where(small_df.mag > 0)

In [None]:
cleaned_small_df.plot(x='time', y='mag')

In [None]:
cleaned_small_df.hvplot(x='time', y='mag')

In [None]:
cleaned_small_df.hvplot.scatter(x='time', y='mag')

In [None]:
cleaned_small_df.hvplot.scatter(x='time', y='mag', datashade=True)

In [None]:
cleaned_df = df.copy()
cleaned_df['mag'] = df.mag.where(df.mag > 0)

If that seemed really fast it's because the values haven't been computed yet. All that has happened is the task graph has been set up.

In [None]:
cleaned_persisted_df = cleaned_df.persist()

In [None]:
cleaned_persisted_df.hvplot.scatter(x='time', y='mag', datashade = True)

### Grouping data

In [None]:
cleaned_small_reindexed_df = cleaned_small_df.set_index(keys='time')
cleaned_small_reindexed_df.head()

In [None]:
cleaned_small_reindexed_df.id.resample('1M').count().hvplot()

In [None]:
cleaned_small_reindexed_df.mag.resample('1M').mean().hvplot()

In [None]:
cleaned_reindexed_df = cleaned_df.set_index(cleaned_df.time)
persisted_cleaned_reindexed_df = cleaned_reindexed_df.persist()

In [None]:
persisted_cleaned_reindexed_df.id.resample('1M').count().hvplot(title='Montly count')

In [None]:
persisted_cleaned_reindexed_df.mag.resample('1M').mean().hvplot(title='Montly mean magnitude')

Libraries using this interface: pandas `.plot`, xarray `.plot`?, quick intro to cufflinks, hvplot.

Quick easy way to generate simple plots.

#### Exercise 3: Add some visualizations via the `.plot` API to the dashboard elements built in exercise 2.

#### 3b: Link to `HoloViews` from `hvplot` output ('shortcuts not dead ends')

Deconstruct hvplot output to show what individual elements are. Show more involved repr with `NdOverlay`. Discuss that these are are compositional objects. Show `DynamicMap` output from `hvplot`.