# Plotnine - Grammar of graphics for Python

Plotnine is an implementation of a grammar of graphics in Python, based on `ggplot2`. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.

Plotting with a grammar is powerful, it makes custom (and otherwise complex) plots easy to think about and then create, while the simple plots remain simple. A grammar of graphics is a high-level tool that allows you to create data plots in an efficient and consistent way. It abstracts most low-level details, letting you focus on creating meaningful and beautiful visualizations for your data.

It can be installed either by `pip install plotnine` or `conda install -c conda-forge plotnine` if you're using Anaconda.

## Quick example

In [None]:
from plotnine import *
from plotnine.data import mtcars # just a sample dataset

(ggplot(mtcars, aes('wt', 'mpg')) + geom_point())

#### Scatter plot colored according to some variable

In [None]:
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point())

#### Scatter plot colored according to some variable and smoothed with a linear model with confidence intervals

In [None]:
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm'))

#### Scatter plot colored according to some variable, smoothed with a linear model with confidence intervals and plotted on separate panels

In [None]:
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm')
 + facet_wrap('~gear'))

#### Change the theme and make it playful

In [None]:
(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm')
 + facet_wrap('~gear')
 + theme_xkcd())

## Plotnine’s grammar of graphics

The three required components for creating a plot are:

1. **Data** is the information to use when creating the plot.
2. Aesthetics (**aes**) provides a mapping between data variables and aesthetic, or graphical, variables used by the underlying drawing system. In the previous example, you mapped the `wt` and `mpg` data variables to the x and y axis aesthetic variables.
3. Geometric objects (**geoms**) defines the type of geometric object to use in the drawing. You can use points, lines, bars, and many others.

And there are other optional components that you can use:

1. **Statistical transformations** specify computations and aggregations to be applied to the data before plotting it.
2. **Scales** apply some transformation during the mapping from data to aesthetics. For example, sometimes you can use a logarithmic scale to better reflect some aspects of your data.
3. **Facets** allow you to divide data into groups based on some attributes and then plot each group into a separate panel in the same graphic.
4. **Coordinates systems** map the position of objects to a 2D graphical location in the plot. For example, you can choose to flip the vertical and horizontal axes if that makes more sense in the visualization you’re building.
5. **Themes** allows you to control visual properties like colors, fonts, and shapes.

### Geometric objects (geoms)

Plotnine has the following aesthetics available:

#### Point

In [None]:
from plotnine.data import mpg # just a sample dataset
ggplot(mpg) + aes(x="class", y="trans") + geom_point()

#### Bar

In [None]:
ggplot(mpg) + aes(x="class") + geom_bar()

#### Boxplot

In [None]:
ggplot(mpg) + aes(x="class", y="cty") + geom_boxplot()

#### Swarm plot

In [None]:
ggplot(mpg) + aes(x="class", y="cty") + geom_jitter()

#### Boxplot + Swarm plot

In [None]:
ggplot(mpg) + aes(x="class", y="cty") + geom_boxplot() + geom_jitter()

#### Histogram

In [None]:
ggplot(mpg) + aes(x="cty") + geom_histogram(binwidth=5)

#### Line

In [None]:
from plotnine.data import economics # just a sample dataset
ggplot(economics) + aes(x="date", y="pop") + geom_line()

### Statistical transformations

In the quick example above there's one of the possible statistical transformations, the `stat_smooth` transformation, which applied a linear model on the data and showed it on the plot.

Here's another example, using `stat_count`:

In [None]:
ggplot(mpg) + stat_count(mapping=aes(x="manufacturer")) + theme(figure_size=(11, 4.8))

### Scales

Plot legibility can be improved using scales. Here's how we could improve the plot above to a more meaningful representation:

In [None]:
(
ggplot(economics)
    + aes(x="date", y="pop")
    + scale_x_timedelta(name="Years since 1970")# this scale transforms each point’s x-value by computing its difference from the oldest date in the dataset
    + geom_line()
)

### Facets

When your x axis is a categorical dimension, you can use facets to create individual plots for each category.

For example, we could apply facets to the swarm plot above to give us a better understanding of the data (and introducing an extra metric, `hwy`):

In [None]:
(
    ggplot(mpg)
    + facet_wrap("~class", nrow=2)
    + aes(x="cty", y="hwy")
    + geom_point()
)

We can even add extra facet, to further refine the data (in this example we'll introduce the `year` metric):

In [None]:
(
    ggplot(mpg)
    + facet_grid(facets="year~class")
    + aes(x="cty", y="hwy")
    + geom_point()
)

### Coordinates systems

For example, you can easily rotate a plot:

In [None]:
(
ggplot(mpg) 
    + aes(x="class") 
    + geom_bar() 
    + coord_flip()
)

### Themes

Plotnine has buil-in themes that you can use to globally style your plots.

In the quick example above you've seen the `xkcd` theme, but there are others you can use (check all the themes [here](https://plotnine.readthedocs.io/en/stable/api.html#themes)):

In [None]:
(
    ggplot(mpg)
    + facet_grid(facets="year~class")
    + aes(x="cty", y="hwy")
    + geom_point()
    + theme_dark()
)

In [None]:
(
    ggplot(mpg)
    + facet_grid(facets="year~class")
    + aes(x="cty", y="hwy")
    + geom_point()
    + theme_minimal()
)

In [None]:
(
    ggplot(mpg)
    + facet_grid(facets="year~class")
    + aes(x="cty", y="hwy")
    + geom_point()
    + theme_seaborn()
)

### Customizations

#### `aes` - categorize by color

The `color` argument in `aes()` enables you to add a third dimension to your plot, based on a category:

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class")
    + geom_point()
)

#### `aes` - categorize by shape

We can also categorize by shape:

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", shape="class")
    + geom_point()
)

#### `aes` - categorize by alpha

Or even by alpha:

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", alpha="class")
    + geom_point()
)

The `size` argument in enables you to add a size dimension to your plot, based on a quantity:

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class", size="cty")
    + geom_point()
)

#### `geom`

Add arguments to the chosen `geom` to customize it. Beware that some of these properties will override the `class` and `size` arguments in `aes`.

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class", size="cty") # class and size have been overriden by color and size in the geom
    + geom_point(color='cornflowerblue', alpha = 0.5, size = 0.75)
)

#### Labels

Adding `labs()` to a plot allows you to edit the plot's title, axes titles and legend titles:

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class", size="cty")
    + labs(
        x="Engine Cylinders",
        y="Miles per Gallon",
        color="Vehicle Class",
        size="MPG - City",
        title="Miles per Gallon for Engine Cylinders and Vehicle Classes",
    )
    + geom_point(alpha = 0.5)
)

#### Adding labels inside the plot

By using the `geom_text()`, it's possible to add labels inside the plot, like in the example below:

In [None]:
(
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class", size="cty")
    + labs(
        x="Engine Cylinders",
        y="Miles per Gallon - Highway",
        color="Vehicle Class",
        size="MPG - City",
        title="Miles per Gallon for Engine Cylinders and Vehicle Classes",
    )
    + geom_point(alpha = 0.5)
    + geom_text(
        aes(x='cyl', y='hwy', label='manufacturer'),
        data=mpg[(mpg['cty'] < 10) | (mpg['cty'] > 26)],
        color="grey",
        size="8",
        nudge_x = .3)
    + theme_classic()
    + theme(axis_line=element_line(color="grey"), axis_ticks=element_line(color = "grey"))
)

### Exporting plots

Plots can be easily exported to image files:

In [None]:
my_plot = (
    ggplot(mpg)
    + aes(x="cyl", y="hwy", color="class")
    + labs(
        x="Engine Cylinders",
        y="Miles per Gallon",
        color="Vehicle Class",
        title="Miles per Gallon for Engine Cylinders and Vehicle Classes",
    )
    + geom_point()
)
my_plot.save("plotnine.png", dpi=100)