# Plotting with Hail

The [`hail.ggplot2`](https://hail.is/docs/0.2/ggplot2/index.html) module provides a set of functions for aggregating and plotting your data. The module attempts to mimic the API of [R's `ggplot2` library](https://ggplot2.tidyverse.org) as closely as possible. The module displays plots visually using the [Vega-Altair](https://altair-viz.github.io/) library.

On this page, you'll find an explanation of the basics of the module, as well as examples of how to create some commonly-used types of plot.

In order to provide example plots, we'll need example data. The following code uses [`hail.utils.range_table`](https://hail.is/docs/0.2/utils/index.html#hail.utils.range_table) to generate a table with a single column containing the index of each row:

In [None]:
import hail as hl
data = hl.utils.range_table(100)
data.show()

## Plot Objects

Like R's `ggplot2`, `hail.ggplot2` bases its approach to plotting on a [layered grammar of graphics](https://ggplot2-book.org/introduction.html#what-is-the-grammar-of-graphics). This means that plots are built up from the data in composable layers, which allows you to add or remove different plot elements easily.

We can create a basic plot object by calling the [`ggplot`](https://hail.is/docs/0.2/ggplot2/index.html#hail.ggplot2.ggplot) function on our data:

In [None]:
from hail.ggplot2 import ggplot

plot = ggplot(data)
print(plot)

## Aesthetic Mappings

To create a plot, we'll need to decide what values will be mapped to its x-axis. Mapping values to an axis in `hail.ggplot2` is done using an aesthetic mapping, which can be created using [`aes`](https://hail.is/docs/0.2/ggplot2/index.html#hail.ggplot2.aes). By default, the first argument to `aes` is `x`, and the second argument is `y`. Let's map the `idx` field of the data to our x-axis.

In [None]:
from hail.ggplot2 import aes

plot1 = plot + aes(data.idx)
print(plot1)

The creation of the plot object and the addition of the aesthetic mapping is often done in a single step, in which case the mapping can be passed as an argument to `ggplot`, instead of using the `+` operator:

In [None]:
plot2 = ggplot(data, aes(data.idx))
print(plot2)

TODO: color, shape

## Displaying Plots

Unlike R's `ggplot2`, `hail.ggplot2` doesn't render plots unless explicitly told to do so. Once we have added a layer with a visual component to our plot object (in this case, [`geom_histogram`](https://hail.is/docs/0.2/ggplot2/index.html#hail.ggplot2.geom_histogram)), we can display the plot using the [`show`](https://hail.is/docs/0.2/ggplot2/index.html#hail.ggplot2.show) method:

In [None]:
from hail.ggplot2 import geom_histogram, show

plot3 = plot2 + geom_histogram()
show(plot3)

## Titles

TODO

## Axis Labels

TODO

## Geoms

TODO: what are geoms (demo below for adding different ones to the same data, recommend this type of approach where you store the result in a variable every time you add something to it so you can change your mind really easily)

In [None]:
from hail.ggplot2 import geom_point

plot4 = plot2 + geom_point()
show(plot4)

## Stats

TODO: you may have noticed that some geoms, like histogram, implicitly compute some statistics about the data before rendering it. but what if you need to transform your data independently of a geom?

TODO: stats are cached, so when you recompute them, the plot object will attempt to reuse the cached values of previously applied aggregations

## Scales

TODO: what are scales

## Facets

TODO

## Examples

### Histogram

TODO: explain a bit about histograms

We've already created a histogram [above](#Displaying-Plots). Let's take another look at it:

In [None]:
show(plot3)

TODO: talk about default settings

We can specify the number of `bins` as an argument to `geom_histogram`:

In [None]:
plot4 = plot2 + geom_histogram(bins=50)
show(plot4)

### Cumulative Histogram

TODO

### 2D Histogram

TODO

### Scatter Plot

TODO

### QQ Plot

TODO: To create a quantile-quantile (QQ) plot, ...

### Manhattan Plot

TODO (use actual genetics data)