Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Funnel plot

Simple funnel plots for visualising sub-group variance.

This package provides simple funnel plots in Python, using Matplotlib. This lets you quickly see whether sub-groups of a population are outliers compared to the full population.

Two methods are provided:

  • parametric funnelplot which uses a standard distribution to estimate the intervals of the funnel (usually a normal distribution)
  • bootstrap funnelplot which uses bootstrapped percentiles to estimate the intervals of the funnel

A utility function funnel() to make it easy to plot data by grouping Pandas DataFrames in a Seaborn-like API is provided.

Example

Data of test performance for California schools from pydataset/Caschool.

funnel(df=data("Caschool"), x="testscr", group="county")

Install

pip install funnelplot

Examples

Full caschool example

# load some example data
import pandas as pd
import matplotlib.pyplot as plt
from pydataset import data
from funnelplot.core import funnel

# create a suitable axis
fig,ax = plt.subplots(figsize=(4,6))
ax.set_frame_on(False)

# funnel plot, using 0.5% -> 99.5% interval
funnel(df=data("Caschool"), x="testscr", group="county", percentage=99.5, error_mode="data")
C:\Users\John\Dropbox\devel\funnelplot\funnelplot\core.py:14: RuntimeWarning: invalid value encountered in true_divide
  return band / np.sqrt(group_size)
C:\Users\John\Dropbox\devel\funnelplot\funnelplot\core.py:14: RuntimeWarning: divide by zero encountered in true_divide
  return band / np.sqrt(group_size)

png

# use bootstrap instead of normal fit
fig,ax = plt.subplots(figsize=(5,6))
ax.set_frame_on(False)
funnel(df=data("Caschool"), x='testscr', group="county", bootstrap_mode=True, error_mode="bootstrap")

png

Synthetic data example

## Synthetic data
import numpy as np
import random
random.seed(2020)
np.random.seed(2020)
groups = []
p_mean, p_std = 0, 1
# random groups, with different sizes, means and std. devs.
for i in range(25):
    n_group = np.random.randint(1, 80)
    g_std =  np.random.uniform(0.1, 4.5) 
    g_mean = np.random.uniform(-1.9, 0.5)
    groups.append(np.random.normal(p_mean + g_mean,
                                   p_std + g_std, 
                                   n_group))
ax, fig = plt.subplots(figsize=(9, 4))
funnel_plot(
    groups,
    labels=[random.choice("abcdefg") * 4 for i in range(len(groups))],
    percentage=95,
)

png

ax, fig = plt.subplots(figsize=(9, 4))
# bootstrap version, using medians instead of means
funnel_plot_bootstrap(
    groups,
    labels=[random.choice("abcdefg") * 4 for i in range(len(groups))],
    percentage=95,
    stat=np.median
)

png

API

  • funnel(df, x, group, bootstrap_mode=False) show a DataFrame df as a funnel plot, rendering column x and grouping the data by group.

      Parameters:
          df: DataFrame
              The data to be shown.
          x:  string, column name
              The column of the frame to render as datapoints.
          group: string, column name
              The column to group the frame by
          bootstrap_mode: boolean, optional (default False)
              If True, uses the funnel_plot_bootstrap() function; otherwise
              use the parameteric funnel_plot() function
          **kwargs:
              passed to funnel_plot() / funnel_plot_bootstrap()
    
  • funnel_plot(data_groups, ...) plot a list of arrays as a funnel plot.

      Parameters:
          data_groups: list of 1D arrays
              a list of 1D arrays the individual groups to be analysed.
          ax: axis, optional
              an Matplotlib axis to draw onto
          dist: distribution function, like scipy.stats.norm(0,1)
              function to use to get the ppf and cdf of for plotting
          percentage: float, 0.0 -> 100.0 (default 95)
              percentage  of interval enclosed (e.g. percentage=95 will enclose 2.5% to 97.5%)
          labels: list of strings, optional
              one label string per group, will be shown only for those groups that lie outside the funnel
          left_color: matplotlib color, optional (default C1)
              color to render points to the left of the funnel bounds (negative outliers)
          right_color: matplotlib color, optional (default C2)
              color to render points to the right of the funnel bounds (positive outliers)        
          error_mode: string, optional (default "data")
              For each outlier group, can show:
                  "data": original data values for that group as a dot plot
                  "none": no error bars
                  "bootstrap": 95% bootstrap intervals, as lines
                  "ci": 95% CI intervals, as lines
          show_rug: boolean, optional (default False):
              If True, show a rug plot at the bottom of the graph, for
              the whole group population
          show_contours: boolean optional (default True)
              true if additional contours shown
    
  • funnel_plot_bootstrap(data_groups, ...) plot a list of arrays as a funnel plot, using bootstrapped intervals instead of a parametric distribution.

      Parameters:
          data_groups: list of 1D arrays
              a list of 1D arrays the individual groups to be analysed.
          ax: axis, optional
              an Matplotlib axis to draw onto
          percentage: float, 0.0 -> 100.0 (default 95)
              percentage  of interval enclosed (e.g. percentage=95 will enclose 2.5% to 97.5%)
          labels: list of strings, optional
              one label string per group, will be shown only for those groups that lie outside the funnel
          left_color: matplotlib color, optional (default C1)
              color to render points to the left of the funnel bounds (negative outliers)
          right_color: matplotlib color, optional (default C2)
              color to render points to the right of the funnel bounds (positive outliers)
          bootstrap_n: int, optional (default 1000)
              number of runs in the bootstrap
          error_mode: string, optional (default "data")
              For each outlier group, can show:
                  "data": original data values for that group as a dot plot
                  "none": no error bars
                  "bootstrap": 95% bootstrap intervals, as lines
                  "ci": 95% CI intervals, as lines
          show_rug: boolean, optional (default False):
              If True, show a rug plot at the bottom of the graph, for
              the whole group population            
          show_contours: boolean optional (default True)
              true if additional contours shown
          stat: function like np.mean, optional
              statistic to use when plotting the funnel plot  
    

About

Funnel plot

Resources

License

Releases

No releases published

Packages

No packages published