# Joy plot (aka ridge line plot)

In this example we'll create a joy plot, which takes its name from a [famous album cover](https://i.ytimg.com/vi/V3Ioohi9aqE/maxresdefault.jpg).

This kind of plot summarizes the distribution of a numeric variable for several groups. Each group is represented as a density chart, each density chart overlapping each other to use space more efficiently.

# Assignment

Create a ridge line plot for three time periods: 1800-1850, 1850-1900, 1900-1950. On the x axis you should put the year, on the y axis the number of meteorites for that year.

Hints:

* you can follow the example from [python-graph-gallery tutorial](https://www.python-graph-gallery.com/ridgeline-graph-plotly)
* you are expected to use plotly library, if it's not installed you can always do `!pip install plotly`
* remember that we have a function to load data in the `data_manager.py` file

## Data loading

We'll use the NASA meteorite dataset. To do so we'll need first to run the data manager using the %run magic code.

In [None]:
%run data_manager.py
df = load_meteorites()
df.head()

The `year_as_date` column contains the data as DateTime entries. However we'd prefer to use the year as num 

In [None]:
#standard imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd #this is already imported by data_manager.py

#------ your code here ------
#extracting the year as a number
#df['year_numeric'] = ...
#----------------------------

#a description of the year, as a number
print(df['year_numeric'].describe())

#the total number of meteorites
print('Total number of entries: ' + str(df.shape[0]))

It appears that we have a number of meteorites without an assigned year. We go for the easiest route and just remove them.

In [None]:
#------ your code here ------

#----------------------------

print('Entries after dropping NAs : ' + str(df.shape[0]))

At this point we can start to take a look at the distribution of the data we are going to plot. A simple histogram it's always a good starting point.

In [None]:
#------ your code here ------

#----------------------------

You should see that the data are highly skewed. In fact the vast majority of meteorites has been observerd during the second half of the twentieth century. Luckily the excercise asks us to plot earlier periods, from 1800 to 1950, otherwise the plot would result *VERY* unbalanced.

### Preparing the data

We need to extract a per-year count of the meteorites. Moreover, we'll need to sort the data in chronological order, otherwise the plot would look terrible (can you tell why?)

In [None]:
#------ your code here ------
#extracting the tally for all possible years
#counts = ...

#sorting back in chronological order
#counts = ...
#----------------------------

#just checking
print(counts)

## The joyplot

We are following the solution proposed in the [python-graph-gallery tutorial](https://www.python-graph-gallery.com/ridgeline-graph-plotly). The general idea is that for each line we need to add:

* add a white trace, which will serve as a baseline for the ridge area
* a trace of scatter points, with the colored area
* optionally, an annotation that tells which period we are plotting

To separate the three lines we need to:

* adjust the x values, so that they are all in the 0-50 range
* add a growing offset so that the second line will be a bit above the first one, and so on...

You are going to use a few plotly functions:

* [fig.add_trace()](https://plotly.com/python/creating-and-updating-figures/#adding-traces), a method invoked on the figure instance obtained via `go.Figure()`. This method allows to add a trace, i.e. a line or area into the plot
* [go.scatter()](https://plotly.com/python/reference/scatter/#scatter) is the base function to do scatter plots, line plots, and areas

Keep in mind that your jupiter lab may not be able to render plotly results on the fly. Two solutions:

1. use plotly to create an html, which you then embed as an iframe (this is the more general approach, adopted in the solution)
2. you may want to install a [jupyter extension](https://stackoverflow.com/questions/52771328/plotly-chart-not-showing-in-jupyter-notebook), so that you can use `fig.show()`. Some tweaking may be required, but the base command is: `jupyter labextension install jupyterlab-plotly`

### Installing plotly
Let's check if plotly is there, and let's install it if needed.

In [None]:
#do we need to install plotly?
!pip install plotly
import plotly.graph_objects as go

### Building the plot

In [None]:
# we need to declare a figure instance
fig = go.Figure()


#the core solution is to add two traces for each 50-years
#period. The first one is a simple white line. The second
#one, above it, is a filled colored line. 
#refer to: https://www.python-graph-gallery.com/ridgeline-graph-plotly
#Good luck!

#------ your code here ------
#----------------------------

# we save the output to an html file
fig.write_html("../results/ridgeline-graph-plotly.html")

We are now ready to see the results, embedding the saved html

In [None]:
%%html
<iframe src="../results/ridgeline-graph-plotly.html" width="800" height="600" title="ridgeline chart with plotly" style="border:none"></iframe>