# The Python (Data Science) Plotting Ecosystem

If you've heard about any plotting library in Python, it is almost certainly `matplotlib`. matplotlib is not only the library most people see first when learning to plot in Python, but it also actually underlies a number of other popular libraries like `plotnine` and `seaborn`. 

While very flexible, however, matplotlib is not the most user-friendly. Most of its focus is on the lower-level aspects of actually implementing visualizations from a programming perspective, and it lacks easy to use tools for quickly making common the types of figures (scatter plots, linear fits, histograms, etc.) that are so often needed by data scientists.

To illustrate, here's the code needed to plot a scatter plot of points along with a linear regression fit overlay in matplotlib:

```python
import matplotlib.pyplot as plt
import numpy as np

# Initialize layout
fig, ax = plt.subplots(figsize = (9, 9))

# Add scatterplot
ax.scatter(x, y, s=60, alpha=0.7, edgecolors="k")

# Fit linear regression via least squares with numpy.polyfit
# It returns an slope (b) and intercept (a)
# deg=1 means linear fit (i.e. polynomial of degree 1)
b, a = np.polyfit(x, y, deg=1)

# Create sequence of 100 numbers from 0 to 100 
xseq = np.linspace(0, 10, num=100)

# Plot regression line
ax.plot(xseq, a + b * xseq, color="k", lw=2.5);
```

to generate:

![linear_regression_matplotlib](images/linear_regression_matplotlib.png)



### matplotlib Alternatives

With that in mind, several other packages have been created to make plotting in Python easier for data scientists. Unlike in matplotlib, where you have to think in terms of what geometric objects and axes, all three of these alternative libraries allow for higher-level, more "declarative" code to make scatter plots, histograms, kernel densities, etc. 

#### seaborn

The first of these is [seaborn](https://seaborn.pydata.org/). seaborn is actually built on top of matplotlib, but provides simple declarative functions for generating data science figures, such as `regplot` to plot a linear regression fit, or `histplot` to plot a histogram.

To illustrate, here's a linear regression fit overlaying a scatter plot:

```python
import seaborn as sns
sns.regplot(x="total_bill", y="tip", data=our_data)
```

![linear_regression_seaborn](images/linear_regression_seaborn.png)


#### plotnine (e.g. ggplot2 in Python)

plotnine is a wonderful library that re-implements almost the entire API of the much loved ggplot2 plotting library from R. Like seaborn, plotnine is actually built on top of matplotlib, but like seaborn it provides a much more user-friendly experience for data scientists.

Unlike seaborn, where each type of visualization supported gets its own function, plotnine has a composable API that, like the altair library we'll focus on in this course, embodies the logic of how we use visualizations to communicate information about our data. As a result, plotnine syntax seems a little more verbose than that of seaborn, but I would argue that in the long run it's much more powerful. 

To plot our simple linear regression figure in plotnine, we'd run:

```python
import plotnine as p9

(p9.ggplot(mtcars, aes('wt', 'mpg'))
 + geom_point()
 + stat_smooth(method='lm')
)
```

![linear_regression_plotnine](images/linear_regression_plotnine.png)



## Altair

Having provided an overview of some of the other major plotting libraries in Python, we now turn to the library we'll be focused on in this course: altair. 

Like seaborn and plotnine, altair is a libraries designed to make plotting in Python a more intuitive experience than one gets with matplotlib. And like plotnine, altair's design is meant to embody one model of the logic of how we communicate with visualizations, known as the *grammer of graphics*. 

But altair also has some substantive differences from the libraries discussed above. First, it has no connection to matplotlib, unlike seaborn or plotnine. Altair actually generates javascript-based visualizations, which means that it is web native and creates interactive visualizations very easily. 

Moreover, unlike plotnine -- which I have actually taught in this course in the past -- the documentation and support for altair is very good, and to be honest I think the way it embodies the logic of visualizations is a little more intuitive than how it is accomplished by plotnine/ggplot2. 

With that said, there are some tradeoffs with altair. It doesn't come with *quite* as much in-built support for easy statistical modelling. As we'll discuss below, because the figures it generates are inherently implemented in javascript, they can be a little more finicky to work with. 