# NB: GGPlot in Python with Plotnine

## GGPlot in Python

There are two ports of GGPlot2 to Python: `pygg` and `plotnine`.

The first seems to have stopped development and is much less used.

Let's look at Plotnine.

In [None]:
## ! conda install -c conda-forge plotnine -y

In [None]:
import pandas as pd
import numpy as np
from pandas.api.types import CategoricalDtype

In [None]:
from plotnine import *
from plotnine.data import mpg

Our old friend, `mpg` in Python:

In [None]:
mpg

## A Simple Bar Chart

In [None]:
(ggplot(mpg)            # defining what data to use
    + aes(x='class')    # defining what variable to use
    + geom_bar(size=20) # defining the type of plot to use
)

Notice that `aes()` is not a helper function (a function in the argument space).

Also, R dots become `_` in the argument names.

Note that we don't have to use the syntax above, which groups the functions in a single expression with `(...)`.

We can do this:

In [None]:
ggplot(mpg) + aes(x='class') + geom_bar(size=20)

Or this:

In [None]:
ggplot(mpg) + \
    aes(x='class') + \
    geom_bar(size=20)

Note that none of these are like R due to differing white space rules.

## Faceting

In [None]:
ggplot(mpg) + \
    aes(x = 'drv', y = 'cty', color = 'class', size='cyl') + \
    geom_point()

In [None]:
(ggplot(mpg)         
 + aes(x='drv', y='cty', color='class', size='cyl')
 + geom_point()
 + facet_wrap('class')
 + theme(legend_position = "none")
)

## The Pandas Way

Note the GGPlot included the computation of counts in the `geom_bar()` function.

In [None]:
mpg['class'].value_counts().sort_index().plot.bar(rot=45)

However, sometomes Pandas *does* do internal calculations, as with `.hist()`:

In [None]:
mpg['cty'].hist()

For **faceting in Pandas**, see this: https://stackoverflow.com/questions/29786227/how-do-i-plot-facet-plots-in-pandas

Notice that it is essentially the result of a `.groupby()` followed by `.unstack()`.

So, Pandas expects you to do the data transformations upfront.

GGPlot2 will handle these in the geometries and facets

GGPlot2 is easier, but Pandas separates comcerns

**As a rule, data operations should never take place in the visualization**.
