# Examining the geographical distribution of non-profit wealth in different sectors 

In [1]:
%matplotlib qt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.options.display.max_columns = None

Read in just the most recent data (from 2015):

In [2]:
df_2015 = pd.read_csv('nccs.core2015pc.csv')

  interactivity=interactivity, compiler=compiler, result=result)


Clean the data a bit by removing non-profits that are foreign, or in non-states:

In [3]:
df_2015 = df_2015[df_2015['STATE'].isin(['AS', 'FM', 'FO', 'GU', 'MH', 'MP', 'PW', 'PR', 'VI']) == False]

Save a reference of state-by-state populations so we can perform per-capita evaluations. Otherwise the maps will likely just look like population density maps:

In [4]:
state_pop = \
{'AL': 4858979,
'AK': 738432,
'AZ': 6828065,
'AR': 2978204,
'CA': 39144818,
'CO': 5456574,
'CT': 3590886,
'DC': 672228,
'DE': 945934,
'FL': 20271272,
'GA': 10214860,
'HI': 1431603,
'ID': 1654930,
'IL': 12859995,
'IN': 6619680,
'IA': 3123899,
'KS': 2911641,
'KY': 4425092,
'LA': 4670724,
'ME': 1329328,
'MD': 6006401,
'MA': 6794422,
'MI': 9922576,
'MN': 5489594,
'MS': 2992333,
'MO': 6083672,
'MT': 1032949,
'NE': 1896190,
'NV': 2890845,
'NH': 1330608,
'NJ': 8958013,
'NM': 2085109,
'NY': 19795791,
'NC': 10042802,
'ND': 756927,
'OH': 11613423,
'OK': 3911338,
'OR': 4028977,
'PA': 12802503,
'RI': 1056298,
'SC': 4896146,
'SD': 858469,
'TN': 6600299,
'TX': 27469114,
'UT': 2995919,
'VT': 626042,
'VA': 8382993,
'WA': 7170351,
'WV': 1844128,
'WI': 5771337,
'WY': 586107}

Group the data by sector (`NTEE1`), and sum the total revenue for all the matching non-profits. We then sort the sectors by their total revenue, and pull out the 12 largest for comparisons, since these are likely the higher importance sectors within the non-profit economy. 

In [5]:
categories = df_2015.groupby('NTEE1').sum()['TOTREV'].sort_values(ascending=False).index.tolist()[:12]
categories

['E', 'B', 'P', 'T', 'A', 'Q', 'F', 'L', 'U', 'G', 'X', 'S']

For each category, group the corresponding non-profits by their resident state, and sum their expenditures to get a picture of the total amount of money spent by that sector in each state. For each state, divide the summed expenses by the state population to compare a *per capita* value.

In [6]:
from tqdm import tqdm
dicts_to_plot = {}
for c in tqdm(categories):
    dicts_to_plot[c] = df_2015[df_2015['NTEE1'] == c].groupby('STATE').sum()['EXPS'] / pd.Series(state_pop)

100%|██████████| 12/12 [00:08<00:00,  1.40it/s]


This dictionary mapping helps when computing the choropleth in the coming cells...

In [7]:
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY',
}

The following function will plot the choropleth map, using the state-by-state date provided. The code for building these maps was adapted from the example provided in the matplotlib [documentation](https://github.com/matplotlib/basemap/blob/master/examples/fillstates.py).

In [11]:
def plot_choropleth(data, title):
    import numpy as np
    import matplotlib.pyplot as plt
    from mpl_toolkits.basemap import Basemap as Basemap
    from matplotlib.colors import rgb2hex, Normalize
    from matplotlib.patches import Polygon
    from matplotlib.colorbar import ColorbarBase

    fig, ax = plt.subplots(figsize=(10,6))

    # Lambert Conformal map of lower 48 states.
    m = Basemap(llcrnrlon=-119,llcrnrlat=20,urcrnrlon=-64,urcrnrlat=49,
                projection='lcc',lat_1=33,lat_2=45,lon_0=-95)

    # Mercator projection, for Alaska and Hawaii
    m_ = Basemap(llcrnrlon=-190,llcrnrlat=20,urcrnrlon=-143,urcrnrlat=46,
                projection='merc',lat_ts=20)  # do not change these numbers

    shp_info = m.readshapefile('st99_d00','states',drawbounds=True,
                               linewidth=0.45,color='gray')
    shp_info_ = m_.readshapefile('st99_d00','states',drawbounds=False)


    val = data

    #%% -------- choose a color for each state based on population density. -------
    colors={}
    statenames=[]
    cmap = plt.cm.viridis # use 'reversed hot' colormap
    vmin = val.min(); vmax = val.max() # set range.
    norm = Normalize(vmin=vmin, vmax=vmax)
    for shapedict in m.states_info:
        statename = shapedict['NAME']
        if statename not in ['District of Columbia','Puerto Rico']:
            pop = val[us_state_abbrev[statename]]
            colors[statename] = cmap(np.sqrt((pop-vmin)/(vmax-vmin)))[:3]
        statenames.append(statename)

    #%% ---------  cycle through state names, color each one.  --------------------
    for nshape,seg in enumerate(m.states):
        if statenames[nshape] not in ['Puerto Rico', 'District of Columbia']:
            color = rgb2hex(colors[statenames[nshape]])
            poly = Polygon(seg,facecolor=color,edgecolor=color)
            ax.add_patch(poly)

    AREA_1 = 0.005  # exclude small Hawaiian islands that are smaller than AREA_1
    AREA_2 = AREA_1 * 30.0  # exclude Alaskan islands that are smaller than AREA_2
    AK_SCALE = 0.19  # scale down Alaska to show as a map inset
    HI_OFFSET_X = -1900000  # X coordinate offset amount to move Hawaii "beneath" Texas
    HI_OFFSET_Y = 250000    # similar to above: Y offset for Hawaii
    AK_OFFSET_X = -250000   # X offset for Alaska (These four values are obtained
    AK_OFFSET_Y = -750000   # via manual trial and error, thus changing them is not recommended.)

    for nshape, shapedict in enumerate(m_.states_info):  # plot Alaska and Hawaii as map insets
        if shapedict['NAME'] in ['Alaska', 'Hawaii']:
            seg = m_.states[int(shapedict['SHAPENUM'] - 1)]
            if shapedict['NAME'] == 'Hawaii' and float(shapedict['AREA']) > AREA_1:
                seg = [(x + HI_OFFSET_X, y + HI_OFFSET_Y) for x, y in seg]
                color = rgb2hex(colors[statenames[nshape]])
            elif shapedict['NAME'] == 'Alaska' and float(shapedict['AREA']) > AREA_2:
                seg = [(x*AK_SCALE + AK_OFFSET_X, y*AK_SCALE + AK_OFFSET_Y)\
                       for x, y in seg]
                color = rgb2hex(colors[statenames[nshape]])
            poly = Polygon(seg, facecolor=color, edgecolor='gray', linewidth=.45)
            ax.add_patch(poly)

    ax.set_title(title)

    #%% ---------  Plot bounding boxes for Alaska and Hawaii insets  --------------
    light_gray = [0.8]*3  # define light gray color RGB
    x1,y1 = m_([-190,-183,-180,-180,-175,-171,-171],[29,29,26,26,26,22,20])
    x2,y2 = m_([-180,-180,-177],[26,23,20])  # these numbers are fine-tuned manually
    m_.plot(x1,y1,color=light_gray,linewidth=0.8)  # do not change them drastically
    m_.plot(x2,y2,color=light_gray,linewidth=0.8)

    #%% ---------   Show color bar  ---------------------------------------
    ax_c = fig.add_axes([0.85, 0.1, 0.03, 0.8])
    cb = ColorbarBase(ax_c,cmap=cmap,norm=norm,orientation='vertical',
                      label=r'[2015 USD (per capita)]')

    plt.show()

In [15]:
cats = ['E', 'B', 'P', 'T', 'A', 'Q', 'F', 'L', 'U', 'G', 'X', 'S']
titles = [
    'E: Health Care Expenditures (per capita)',
    'B: Educational Expenditures (per capita)',
    'P: Human Services Expenditures (per capita)',
    'T: Philanthropy Expenditures (per capita)',
    'A: Arts, Culture, & Humanities Expenditures (per capita)',
    'Q: International Development and Foreign Expenditures (per capita)',
    'F: Mental Health and Crisis Intervention Expenditures (per capita)',
    'L: Housing & Shelter Expenditures (per capita)',
    'U: Science and Technology Expenditures (per capita)',
    'G: Diseases and Medical Expenditures (per capita)',
    'X: Religious Group Expenditures (per capita)',
    'S: Community Improvement Expenditures (per capita)'
]

for c, t in zip(cats, titles):
    plot_choropleth(dicts_to_plot[c], t)
    plt.savefig('plot_{}.png'.format(c))

** Not all the figures are discussed here, but a few reveal some very interesting results, and raise important questions that are worth further study: **

* Category E: Healthcare expenditures per capita are drastically higher in Oregon than every other state. Upon closer inspection, this is because the Kaiser Permanente family of non-profits are registered with addresses in Portland, OR, and operate entire hospital and insurance systems, explaining the extremely high dollar amounts seen in Oregon
* Category B: Educational expenditures seem to be concentrated to a small degree in the north-east, which makes sense given the high concentration of universities and private secondary schools in this part of the country
* Category P: Human services expenditures are fairly uniform across the country, but are lowest in Mississippi, a state that regularly ranks at the bottom in terms of poverty and health metrics, indicating that perhaps the help is not going to the areas where it is needed the most.
* Category F: Mental health and crisis intervention expenditures are extremely high in Vermont, and are generally a great deal higher in the northern states when compared to the southern ones. The reasoning for this is not immediately apparent, and the anomaly of VT warrants further study.
* Category X: Religious groups (such as churches) have significantly higher expenditures in Arizona and Georgia, as well as more generally throughout the center of the country. This matches expectations to some degree that there is a "bible belt" in the central and southern United States. These results are skewed however, by the inclusion of religious-affiliated health care systems (such as Mercy Maricopa Integrated Care in Phoenix, AZ). These groups are included as religious organizations (X) rather than health care ones (E), which complicates analyses.

![plot_E](plot_E.png)
![plot_B](plot_B.png)
![plot_P](plot_P.png)
![plot_T](plot_T.png)
![plot_A](plot_A.png)
![plot_Q](plot_Q.png)
![plot_F](plot_F.png)
![plot_L](plot_L.png)
![plot_U](plot_U.png)
![plot_G](plot_G.png)
![plot_X](plot_X.png)
![plot_S](plot_S.png)