### Introduction to pdvega
Full article at [pbpython.com](http://pbpython.com/pdvega.html)

In [1]:
pip install pandas-flavor

Collecting pandas-flavor
  Downloading pandas_flavor-0.3.0-py3-none-any.whl (6.3 kB)
Collecting lazy-loader==0.1rc2
  Downloading lazy_loader-0.1rc2-py3-none-any.whl (5.1 kB)
Installing collected packages: lazy-loader, pandas-flavor
Successfully installed lazy-loader-0.1rc2 pandas-flavor-0.3.0
Note: you may need to restart the kernel to use updated packages.


In [None]:
conda install -c conda-forge pdvega

In [2]:
import pandas as pd
import pdvega

ModuleNotFoundError: No module named 'pdvega'

In [None]:
%matplotlib inline

Read in the FiveThirtyEight data on candy

In [None]:
df = pd.read_csv("https://github.com/fivethirtyeight/data/blob/master/candy-power-ranking/candy-data.csv?raw=True")

In [None]:
# Clean up broken apostrophe
df['competitorname'].replace(regex=True,inplace=True,to_replace=r'Õ',value=r"'")

In [None]:
df.head()

Try a pandas plot first

In [None]:
df["winpercent"].plot.hist()

Try the same thing using pdvega

In [None]:
df["winpercent"].vgplot.hist()

KDE plots work as expected

In [None]:
df["sugarpercent"].vgplot.kde()

We can look at the sugar and price percentile distributions

In [None]:
df["sugarpercent"].vgplot.hist()

In [None]:
df["pricepercent"].vgplot.hist()

In [None]:
df[["sugarpercent", "pricepercent"]].vgplot.hist()

Compare it to the pure pandas example

In [None]:
df[["sugarpercent", "pricepercent"]].plot.hist(alpha=0.5)

Let's try some scatter plots

In [None]:
df.vgplot.scatter(x='pricepercent', y='sugarpercent')

In [None]:
df.vgplot.scatter(x='winpercent', y='sugarpercent')

The pandas version does not look as nice

In [None]:
df.plot.scatter(x='winpercent', y='sugarpercent', c='bar')

pdvega suppports encoding the size and color based on values in columns of the dataframe

In [None]:
df.vgplot.scatter(x='winpercent', y='sugarpercent', s='pricepercent', c='bar')

The scatter matrix is really helpful

In [None]:
pdvega.scatter_matrix(df[["sugarpercent", "winpercent", "pricepercent"]], "winpercent")

Here's a simple bar chart.
Unfortunately I could not figure out how to sort by the winpercent

In [None]:
df.sort_values(by=['winpercent'], ascending=False).head(10)

In [None]:
df.sort_values(by=['winpercent'], ascending=False).head(15).plot.barh(x='competitorname', y='winpercent')

In [None]:
df.sort_values(by=['winpercent'], ascending=False).head(15).vgplot.barh(x='competitorname', y='winpercent')