# New Advances in Python Data Visualization

This presentation takes six data visualization libraries in Python and runs them through some tests to determine which ones are the most performant, easiest to use, and result in the most beautiful visuals.

### The Old Standards
* matplotlib
* seaborn

### The New Generation
* bokeh
* plotnine
* Altair
* Plotly

In [1]:
import matplotlib.pyplot as plt
import seaborn as sns

import bokeh as bk
from bokeh.io import show, output_notebook
from bokeh.plotting import figure

from plotnine import *
import plotnine.options as pno

import altair as alt

import plotly.express as px
import plotly.graph_objects as go

import pandas as pd
import numpy as np

## Dataset

For the tests we're doing, we'll use the Kaggle Spotify Tracks dataset. This includes assorted numeric and categorical columns, and a bit more than 20,000 rows so we can do a reasonable test of high volumes of data that ought to be handled well by a robust tool.

In [2]:
dataset = pd.read_csv("data.csv")
dataset.head()

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,mode,name,popularity,release_date,speechiness,tempo,valence,year
0,0.995,['Carl Woitschach'],0.708,158648,0.195,0,6KbQ3uYMLKb5jDxLF7wYDD,0.563,10,0.151,-12.428,1,Singende Bataillone 1. Teil,0,1928,0.0506,118.469,0.779,1928
1,0.994,"['Robert Schumann', 'Vladimir Horowitz']",0.379,282133,0.0135,0,6KuQTIu1KoTTkLXKrwlLPV,0.901,8,0.0763,-28.454,1,"Fantasiestücke, Op. 111: Più tosto lento",0,1928,0.0462,83.972,0.0767,1928
2,0.604,['Seweryn Goszczyński'],0.749,104300,0.22,0,6L63VW0PibdM1HDSBoqnoM,0.0,5,0.119,-19.924,0,Chapter 1.18 - Zamek kaniowski,0,1928,0.929,107.177,0.88,1928
3,0.995,['Francisco Canaro'],0.781,180760,0.13,0,6M94FkXd15sOAOQYRnWPN8,0.887,1,0.111,-14.734,0,Bebamos Juntos - Instrumental (Remasterizado),0,1928-09-25,0.0926,108.003,0.72,1928
4,0.99,"['Frédéric Chopin', 'Vladimir Horowitz']",0.21,687733,0.204,0,6N6tiFZ9vLTSOIxkj8qKrd,0.908,11,0.098,-16.829,1,"Polonaise-Fantaisie in A-Flat Major, Op. 61",1,1928,0.0424,62.149,0.0693,1928


In [3]:
dataset.describe()


Unnamed: 0,acousticness,danceability,duration_ms,energy,explicit,instrumentalness,key,liveness,loudness,mode,popularity,speechiness,tempo,valence,year
count,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0,169909.0
mean,0.493214,0.53815,231406.2,0.488593,0.084863,0.161937,5.200519,0.20669,-11.370289,0.708556,31.55661,0.094058,116.948017,0.532095,1977.223231
std,0.376627,0.175346,121321.9,0.26739,0.278679,0.309329,3.515257,0.176796,5.666765,0.454429,21.582614,0.149937,30.726937,0.262408,25.593168
min,0.0,0.0,5108.0,0.0,0.0,0.0,0.0,0.0,-60.0,0.0,0.0,0.0,0.0,0.0,1921.0
25%,0.0945,0.417,171040.0,0.263,0.0,0.0,2.0,0.0984,-14.47,0.0,12.0,0.0349,93.516,0.322,1957.0
50%,0.492,0.548,208600.0,0.481,0.0,0.000204,5.0,0.135,-10.474,1.0,33.0,0.045,114.778,0.544,1978.0
75%,0.888,0.667,262960.0,0.71,0.0,0.0868,8.0,0.263,-7.118,1.0,48.0,0.0754,135.712,0.749,1999.0
max,0.996,0.988,5403500.0,1.0,1.0,1.0,11.0,1.0,3.855,1.0,100.0,0.969,244.091,1.0,2020.0


## The Competition

We'll start with really easy visuals, and gradually try harder or more complicated techniques.

We're testing to decide which tool:
* is easier to write
* produces shorter code
* produces readable code, with predictable grammar
* renders beautiful results naturally
* has robust capabilities

As a bonus, some of these tools allow interactivity, but we won't use that as a huge deal breaker.


Other factors:
* can it do minor data manipulation to make visualizing easier?
* grammar approach: adding layers to objects, then calling `show` versus creating a single layered object and calling its name to show

## See the Tests

* [Histogram](histogram.ipynb)
* [Scatterplot](scatter.ipynb)
* [Faceted Scatterplot](facets.ipynb)
* [Grouped Bar](groupbar.ipynb)
* [Time Series Line](timeline.ipynb)
* BONUS: [3D Scatterplot](scatter3d.ipynb)