## 3c. More ways to plot

### Table of contents

- Pandas
  - Plotting Shortcuts
  - Table Styling


- Other Packages
  - Venn Diagrams
  - Joyplots
  - Network Graphs
  - More Types

---

In [None]:
import pandas as pd  # we'll use pandas to read and manipulate datasets
import numpy as np

import warnings
warnings.simplefilter('ignore')

import matplotlib.pyplot as plt
# display figures alongside cell output
%matplotlib inline

import matplotlib

**ℹ️ Tip**: it's not by mistake that the following cell is separate from the previous. There is a small bug that causes it not to be executed correctly if they are ran at the same time. This is not limited to Jupyter notebooks.

In [None]:
matplotlib.rcParams['figure.dpi'] = 100  # make figures large
%config InlineBackend.figure_format = 'retina'  # make figures crisp

## Pandas

The average daily temperature from Jan 2018 to Jan 2019, as reported by AccuWeather:

In [None]:
weather = pd.read_csv('example_files/weather.csv')
weather.head()

In [None]:
month_names = 'January February March April May June July August September Octomber November December'.split()
cities = ['New York City', 'Los Angeles']

### Plotting Shortcuts

Pandas DataFrames integrates directly with Matplotlib, providing convenient plotting shortcuts:

In [None]:
weather[weather.month == 1][cities].plot()

# continue customizing the chart
plt.title('January Temperature')
plt.xlabel('Day of Month')
plt.ylabel('Temperature (°F)');

They make labeling, grouping and other tedious tasks easier:

In [None]:
weather.groupby('month')[cities].mean().plot(
    kind='barh',  # horizontal bar chart
    figsize=(4, 7),
    title='Average Temperatures',
)

plt.gca().set_yticklabels(month_names)
plt.gca().legend(bbox_to_anchor=(1.025, 1))
plt.xlabel('Temperature (°F)');

Most non-specialty chart types are supported:

In [None]:
weather['Los Angeles'].plot(kind='hist')

plt.title('Year-long Temperature Distribution')
plt.xlabel('Temperature (°F)')
plt.ylabel('#Days observed');

**ℹ️ Tip**: [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) about supported chart types and options

### Table Styling

Lightweight visualizations can also be incorporated directly inside tables

In [None]:
df = pd.DataFrame(np.random.randn(7, 3), columns=list('ABC'))
df.iloc[1, 1] = np.nan

df

Set a caption for your table:

In [None]:
df.style.set_caption('Example Data')

Modify the precision:

In [None]:
df.round(3)

Set global options:

In [None]:
pd.set_option('precision', 2)

**ℹ️ Tip**: [read more](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html) about Pandas options

---

Change the style of specific elements:

In [None]:
df.style.highlight_null()

Restrict to only a subset of rows/columns:

In [None]:
df.style.highlight_max(subset=['A', 'B'], axis=0)

Arbitrary functions and function chaining:

In [None]:
def highlight_negatives(val):
    """ Make negative values bold red """
    color = 'red' if val < 0 else 'black'
    weight = 'bold' if val < 0 else 'normal'
    return f'color: {color}; font-weight: {weight}'  # css syntax

In [None]:
df.style\
    .set_precision(3)\
    .applymap(highlight_negatives)

---

Even inline charts:

In [None]:
df.style.bar(subset='C')

In [None]:
df.style.background_gradient(cmap='Greens')

**ℹ️ Tip**: read more about [table styling](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html)

**ℹ️ Tip**: watch a [short animation](http://i.imgur.com/ZY8dKpA.gif) on (slightly overdone) table styling

## Other Packages

While Matplotlib is the most widely used library (seconded by Seaborn), there are many other ones, most with overlapping functionality (line, bar charts etc). But there are also those that offer specific kinds of visualizations

### Venn Diagrams

Show logical relations between a finite collection of sets:

In [None]:
from matplotlib_venn import venn2

In [None]:
venn2(subsets = (10, 5, 2), set_labels = ('Group A', 'Group B'));

### Joyplots

Joyplots show distributions over an ordinal variable or discretized time:

In [None]:
from joypy import joyplot

**👾 Trivia**: they got their name from Joy Division's [album](https://itunes.apple.com/us/album/unknown-pleasures-remastered/544363171) that used such a plot on as their cover. Otherwise known as a ridgeplot. More recently popularized by Tensorflow's display of weights distributions over time.

In [None]:
fig, axes = joyplot(
    weather, by='month', column=['New York City', 'Los Angeles'],
    alpha=.75, range_style='own', grid='y', linecolor='white', 
    figsize=(8, 10), title='Monthly Temperature', legend=True,
)

axes[-1].set_xlabel('Temperature (°F)')
for month, ax in zip(month_names, axes):
    ax.set_yticklabels([month])

### Network Graphs

NetworkX is the de-facto library for storing graphs

In [None]:
import networkx as nx

Provides simple plotting:

In [None]:
G = nx.gnm_random_graph(7, 15)
nx.draw(G)

But also complex customizations:

In [None]:
%%time
import requests
# source: http://evelinag.com/blog/2015/12-15-star-wars-social-network/index.html#.XG0a7KeZPRZ
r = requests.get('https://raw.githubusercontent.com/evelinag/StarWars-social-network/master/networks/starwars-episode-1-interactions-allCharacters.json')
d = r.json()

In [None]:
G = nx.Graph()
G.add_nodes_from((n['value'], n) 
                 for n in d['nodes'])
G.add_weighted_edges_from([(n['source'], n['target'], n['value']) 
                           for n in d['links'] 
                           if n['source'] in G.nodes and n['target'] in G.nodes])

In [None]:
# data from Wookieepedia
G.add_nodes_from([
    (33, {'affiliation': 'Republic', 'species': 'mechanical'}),  # r2-d2
    (6 , {'affiliation': 'Republic', 'species': 'mechanical'}),  # bravo 2
    (4 , {'affiliation': 'Republic', 'species': 'mechanical'}),  # bravo 3
    (31, {'affiliation': 'Republic', 'species': 'human'}),  # padme
    (7 , {'affiliation': 'Republic', 'species': 'non-human'}),  # yoda    
    
    (19, {'affiliation': 'Empire', 'species': 'human'}),  # nute
    (3 , {'affiliation': 'Empire', 'species': 'human'}),  # organa
    (14, {'affiliation': 'Empire', 'species': 'human'}),  # emperor
    (5 , {'affiliation': 'Empire', 'species': 'human'}),  # ceel

    (11, {'affiliation': 'Neutral', 'species': 'human'}),  # shmi
    (12, {'affiliation': 'Neutral', 'species': 'human'}),  # fode
    (8 , {'affiliation': 'Neutral', 'species': 'non-human'}),  # watto
])

In [None]:
G.remove_nodes_from([node for node, val in nx.get_node_attributes(G, 'value').items() if val == 0])
G.remove_nodes_from([node for node, deg in G.degree if deg == 0])

In [None]:
print(nx.info(G))

In [None]:
species2shape = {
    'mechanical': 's',  # square
    'human':      'o',  # circle
    'non-human':  '^',  # triangle
}

In [None]:
affiliation2color = {
    'Republic': 'C0',  # blue
    'Empire':   'C3',  # red
    'Neutral':  'C1',  # orange
}

In [None]:
plt.figure(figsize=(10, 10))
pos = nx.kamada_kawai_layout(G)

""" Draw nodes """
for species, shape in species2shape.items():
    species_nodes = [node 
                     for node, s in nx.get_node_attributes(G, 'species').items() 
                     if s == species]
    node_sizes    = [c 
                     for n, c in nx.degree_centrality(G).items() 
                     if n in species_nodes]
    node_colors   = [affiliation2color[aff] 
                     for n, aff in nx.get_node_attributes(G, 'affiliation').items() 
                     if n in species_nodes]
    nx.draw_networkx_nodes(
        G, 
        pos=pos,
        nodelist=species_nodes,  # list nodes to plot
        node_size=np.array(node_sizes) * 1500,  # list size of each node, in order
        node_color=node_colors,  # list color of each node, in order
        node_shape=shape,  # a single shape for all nodes (which is why this loop is needed)
    )


""" Draw labels"""
label_pos  = {node: coords + [0, -.075]  # a bit lower than the coordinates of the node
              for node, coords in pos.items()}
label_text = {node: name.title() 
              for node, name in nx.get_node_attributes(G, 'name').items()}
nx.draw_networkx_labels(
    G, 
    pos=label_pos,  # the position of each node, as a dictionary
    labels=label_text,  # the text of each node, as a dictionary
)


""" Draw edges """
edge_widths = list(nx.get_edge_attributes(G, 'weight').values())
nx.draw_networkx_edges(
    G, 
    pos=pos,
    width=np.array(edge_widths) ** .4 * 3,  # raise to subunitary power to atenuate the large differences
    edge_color='lightgray',
)


""" Figure settings """
# expand limits
plt.ylim(plt.ylim()[0] - .1, plt.ylim()[1] + .1)
plt.xlim(plt.xlim()[0] - .1, plt.xlim()[1] + .1)

# hide axes
plt.axis('off');

**ℹ️ Tip**: learn more about NetworkX from the [official tutorial](https://networkx.github.io/documentation/stable/tutorial.html)

### More Types

The largest areas we haven't touched upon are interactive charts and map charts.  While this workshop focused on static charts, we can take advantage of the Jupyter environment and plot these as well. 

 - more chart types: 
    - [3D scatterplot](https://plot.ly/python/3d-network-graph/) (navigatable)
    - [sankey](https://plot.ly/python/parallel-categories-diagram/)
    - [choropleth](https://plot.ly/python/maps/)
    - [chord diagram](https://plot.ly/python/filled-chord-diagram/)
    - [treemap](https://plot.ly/python/treemaps/)
    - [wind rose](https://plot.ly/python/wind-rose-charts/)