## 3: Introduction to the `.plot` API and associated libraries.

### Read in the data

In [None]:
import dask
import dask.dataframe as dd

In [None]:
df = dd.read_parquet('../data/earthquakes.parq')
df.head()

### Using default `.plot`

The first thing that we'd like to do with this data is visualize the locations of every earthquake. So we would like to make a `scatter` plot where `x='longitude'` and `y='latitude'`. If you are familiar with the `pandas.plot` API you might expect to execute: `df.plot.scatter(x='longitude', y='latitude')`. Feel free to try this out in a new cell. It throws an error: `
AttributeError: 'DataFrame' object has no attribute 'plot'`. Since we have a dask dataframe rather than a pandas dataframe, we need to first convert it to pandas. In order to make the data more manageable, we'll use just a fraction (1%) of it and call that `small_df`. 

In [None]:
small_df = df.sample(frac=.01).compute()
small_df.shape

Now we have a smaller dataset with just 21k earthquakes. We can use that to test out our visualizations before ramping back up to the full dataset.

In [None]:
small_df.plot.scatter(x='longitude', y='latitude')

That is a good place to start and you can start to see the structure of the edges of the plates (which often correspond with the edges of the continents).

In [None]:
import hvplot.pandas
import hvplot.dask

In [None]:
small_df.hvplot.scatter(x='longitude', y='latitude')

In [None]:
small_df.hvplot.scatter(x='longitude', y='latitude', datashade=True)

We can even use the original full dask dataset (remember that this has 100x the points of the small dataset).

In [None]:
df.hvplot.scatter(x='longitude', y='latitude', datashade=True)

When you zoom into the plot, the image re-renders. This can be slow (tens of seconds) when the data is being read from disk. If your computer has sufficient RAM, then you can persist the dataset in memory and significantly speed up the time that it takes to re-render the plot on zoom events.

In [None]:
persisted = df.persist()

In [None]:
persisted.hvplot.scatter(x='longitude', y='latitude', datashade=True)

### Magnitude

Next we'll look at the frequency of different magnitude earthquakes. As a first pass, we'll use a histogram first with `plot.hist` on the small data, then with `.hvplot.hist` on the full dataset. Before plotting we can clean the data by setting any magnitudes that are less than 0 to NaN.

In [None]:
cleaned_small_df = small_df.copy()
cleaned_small_df['mag'] = small_df.mag.where(small_df.mag > 0)

In [None]:
cleaned_small_df.mag.plot.hist()

In [None]:
cleaned_df = df.copy()
cleaned_df['mag'] = df.mag.where(df.mag > 0)

If that seemed really fast it's because the values haven't been computed yet. All that has happened is the task graph has been set up.

In [None]:
cleaned_persisted_df = cleaned_df.persist()

In [None]:
cleaned_persisted_df.hvplot.hist(y='mag', bin_range=(0,10), bins=50)

### Magnitude of earthquakes over time

In [None]:
small_df.plot(x='time', y='mag')

In [None]:
cleaned_small_df.hvplot.scatter(x='time', y='mag')

In [None]:
cleaned_small_df.hvplot.scatter(x='time', y='mag', datashade=True)

In [None]:
cleaned_persisted_df.hvplot.scatter(x='time', y='mag', datashade=True)

### Resampling data

In [None]:
cleaned_reindexed_df = cleaned_df.set_index(cleaned_df.time)
persisted_cleaned_reindexed_df = cleaned_reindexed_df.persist()

In [None]:
monthly_count = persisted_cleaned_reindexed_df.id.resample('1M').count()
monthly_count_plot = monthly_count.hvplot(title='Monthly count')
monthly_count_plot

In [None]:
print(monthly_count_plot)

In [None]:
monthly_mean_magnitude = persisted_cleaned_reindexed_df.mag.resample('1M').mean()
monthly_mean_magnitude_plot = monthly_mean_magnitude.hvplot(title='Monthly mean magnitude')
monthly_mean_magnitude_plot

In [None]:
print(monthly_mean_magnitude_plot)

In [None]:
(monthly_mean_magnitude_plot + monthly_count_plot).cols(1)

In [None]:
#TODO: Need a 3 dimensional thing, maybe color?

## Adding a third dimension

So far we have used location, time, and magnitude, now let's zoom in to really high intensity earthquakes and see what we can learn about them.

In [None]:
most_severe = persisted[persisted.mag > 7].compute()

In [None]:
most_severe.place

In [None]:
most_severe.plot.scatter(x='longitude', y='latitude', c='mag')

In [None]:
most_severe.hvplot.scatter(x='longitude', y='latitude', c='mag')

Tweaking the options to create a better plot

In [None]:
most_severe.hvplot.scatter(x='longitude', y='latitude', c='mag', hover_cols=['place', 'time'],
                           cmap='fire_r', title='Earthquakes with magnitude > 7')

Now that would be a lot better with a map underneath it

In [None]:
import pandas as pd
import datashader.geo

from holoviews.element.tiles import EsriImagery, Wikipedia, OSM

In [None]:
x, y = datashader.geo.lnglat_to_meters(most_severe.longitude, most_severe.latitude)
most_severe_projected = most_severe.join([pd.DataFrame({'x': x}), pd.DataFrame({'y': y})])

In [None]:
OSM() * most_severe_projected.hvplot.scatter(x='x', y='y', c='mag', hover_cols=['place', 'time'], 
                                             cmap='fire_r', title='Earthquakes with magnitude > 7')

## Categorical section WIP

Class	Magnitude
Great	8 or more
Major	7 - 7.9
Strong	6 - 6.9
Moderate	5 - 5.9
Light	4 - 4.9
Minor	3 -3.9


Magnitude	Earthquake Effects	Estimated Number
Each Year
2.5 or less	Usually not felt, but can be recorded by seismograph.	900,000
2.5 to 5.4	Often felt, but only causes minor damage.	30,000
5.5 to 6.0	Slight damage to buildings and other structures.	500
6.1 to 6.9	May cause a lot of damage in very populated areas.	100
7.0 to 7.9	Major earthquake. Serious damage.	20
8.0 or greater	Great earthquake. Can totally destroy communities near the epicenter.	One every 5 to 10 years

In [None]:
x, y = datashader.geo.lnglat_to_meters(cleaned_df.longitude, cleaned_df.latitude)

In [None]:
cleaned_projected_df = cleaned_df.join([pd.DataFrame({'x': x}), pd.DataFrame({'y': y})])

In [None]:
OSM() * df.hvplot.scatter(x='x', y='y', hover_cols=['place', 'time'], groupby='mag')

Libraries using this interface: pandas `.plot`, xarray `.plot`?, quick intro to cufflinks, hvplot.

Quick easy way to generate simple plots.

#### Exercise 3: Add some visualizations via the `.plot` API to the dashboard elements built in exercise 2.

#### 3b: Link to `HoloViews` from `hvplot` output ('shortcuts not dead ends')

Deconstruct hvplot output to show what individual elements are. Show more involved repr with `NdOverlay`. Discuss that these are are compositional objects. Show `DynamicMap` output from `hvplot`.