## Charities in Colorado: Initial EDA and data preprocessing

### Data source: Registration data for charities - CO Secretary of State
https://www.sos.state.co.us/pubs/charities/CCSAreports.html?faces-redirect=true 
Carities (XLS) link under the Data Extacts


In [1]:
# Imports
import os
import pandas as pd
import numpy as np
import datetime
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# load data
charities_all = pd.read_excel('CoCodeCo/datafiles/CharitableOrganizationsDataExtract.xlsx', parsedates=True)

In [3]:
#Check the data load
charities_all.head()

Unnamed: 0,Principal Name of Organization,DBAs,Registration Number,EIN,Principal Address 1,Principal Address 2,Principal City,Principal County,Principal State,Principal Zip,...,Fundraising Expenses,Total Expenses,Fundraising Ratio Fundraising Expenses to Contributions,Program Service Ratio Program Services Expenses Total Expenses,Total Asset Amount,Total Liabilities Amount,Net Assets,NTEE Code 1,NTEE Code 2,NTEE Code 3
0,DENVER HARLEQUINS RUGBY FOOTBALL CLUB,,20183010000.0,27-1686134,3843 XAVIER ST.,,DENVER,DENVER,CO,80212,...,0,26817,0.0,0.9836,4269.55,0.0,4269.55,"S-COMMUNITY IMPROVEMENT,CAPACITY BUILDING",B-EDUCATION,"N-RECREATION,SPORTS"
1,#WALKAWAY FOUNDATION,,20193010000.0,83-2820906,441 NORTH LEE STREET,SUITE 100,ALEXANDRIA,ALEXANDRIA CITY,VA,22314,...,0,100000,0.0,0.2,0.0,0.0,0.0,B-EDUCATION,,
2,"1 LIQUID HOUSE, INC.",,20033010000.0,73-1267224,6668 LYNX COVE,,LITTLETON,DOUGLAS,CO,80124,...,0,16647,0.0,1.0,16757.0,3839.0,12918.0,"A-ARTS,CULTURE & HUMANITIES",,
3,1/20/21 ACTION FUND,,20193000000.0,83-2210730,2370 MARKET STREET,# 433,SAN FRANCISCO,SAN FRANCISCO,CA,94114,...,275000,1993210,0.131,0.8319,106790.0,0.0,106790.0,"R-CIVIL RIGHTS,SOCIAL ACTION,& ADVOCACY",,
4,"1% FOR THE PLANET, INC.","1 FOR THE PLANET, 1% FOR THE PLANET, ONE PERCE...",20163030000.0,91-2151932,"47 MAPLE ST., SUITE 103",,BURLINGTON,CHITTENDEN,VT,5401,...,170341,1469936,0.2865,0.7454,1044315.0,238422.0,805893.0,"C-ENVIRONMENT QUALITY,PROTECTION & BEAUTIFICATION",,


In [4]:
#Check the data frame structure
charities_all.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17825 entries, 0 to 17824
Data columns (total 36 columns):
Principal Name of Organization                                    17824 non-null object
DBAs                                                              5839 non-null object
Registration Number                                               17824 non-null float64
EIN                                                               17824 non-null object
Principal Address 1                                               17825 non-null object
Principal Address 2                                               1980 non-null object
Principal City                                                    17824 non-null object
Principal County                                                  17549 non-null object
Principal State                                                   17818 non-null object
Principal Zip                                                     17824 non-null object
Phone       

In [5]:
charities_all.tail()

Unnamed: 0,Principal Name of Organization,DBAs,Registration Number,EIN,Principal Address 1,Principal Address 2,Principal City,Principal County,Principal State,Principal Zip,...,Fundraising Expenses,Total Expenses,Fundraising Ratio Fundraising Expenses to Contributions,Program Service Ratio Program Services Expenses Total Expenses,Total Asset Amount,Total Liabilities Amount,Net Assets,NTEE Code 1,NTEE Code 2,NTEE Code 3
17820,ZOOLOGY FOUNDATION AT CROOKED WILLOW FARMS,,20113040000.0,27-1125802,10554 S. PERRY PARK RD.,,LARKSPUR,DOUGLAS,CO,80118.0,...,0.0,794825.0,0.0,0.0,17747688.0,0.0,17747688.0,D-ANIMALS,B-EDUCATION,O-YOUTH DEVELOPMENT
17821,"ZOOM TRACK CLUB, INC.",,20113020000.0,68-0529237,6548 SERENGETI CIR,,LITTLETON,DOUGLAS,CO,80124.0,...,150.0,70033.9,0.1193,0.832,97083.0,1682.0,95401.0,"N-RECREATION,SPORTS",,
17822,"ZOOMERS, INC. DBA LAKE DILLON PRESCHOOL","LAKE DILLON PRESCHOOL, LAKE DILLON PRESCHOOL A...",20023010000.0,84-1139106,200 VILLAGE PLACE,,DILLON,SUMMIT,CO,80435.0,...,0.0,754358.0,0.0,1.0,74220.0,23462.0,50758.0,B-EDUCATION,O-YOUTH DEVELOPMENT,
17823,ZUMA'S RESCUE RANCH,"ALL SOULS RESCUE, ZUMA'S EXPERIENTIAL LEARNING...",20123030000.0,80-0236203,7745 N. MOORE RD.,,LITTLETON,DOUGLAS,CO,80125.0,...,0.0,260614.0,0.0,0.7492,97010.0,0.0,97010.0,D-ANIMALS,B-EDUCATION,P-HUMAN SERVICES
17824,,,,,"501 (3) (C) PUBLIC TRUST, 501 (C) (3), 501 (C)...",,,,,,...,,,,,,,,,,


In [6]:
# drop the last row with comments and null values
charities_all=charities_all[:-1]


Overall, there  were 17824 charitable organizations that submitted registration paperwork to the Colorado Secretary of State. Let's see how many of them are maintaining registration (are in good standing) by excluding those who withdrew their applications or received any special notices from the State. 


In [7]:
# Select only charitable organizations that are current on their registration
charities_current=charities_all.loc[charities_all['Current Status']=='GOOD']

In [8]:
charities_current.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12536 entries, 0 to 17823
Data columns (total 36 columns):
Principal Name of Organization                                    12536 non-null object
DBAs                                                              4233 non-null object
Registration Number                                               12536 non-null float64
EIN                                                               12536 non-null object
Principal Address 1                                               12536 non-null object
Principal Address 2                                               1552 non-null object
Principal City                                                    12536 non-null object
Principal County                                                  12478 non-null object
Principal State                                                   12531 non-null object
Principal Zip                                                     12536 non-null object
Phone       


So, there are 12536 charities that can be active in the state. Let's see how many of them are local meaning 'Principal State' == 'CO' or were originally established in Colorado ('State Established' == 'CO')


In [9]:
# Find  local charities 
charities_local= charities_current.loc[(charities_all['Principal State']=='CO') | (charities_all['State Established']=='CO')]
charities_local.reset_index(inplace=True, drop=True) #reset index in the new df

In [10]:
#Check results
charities_local.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7808 entries, 0 to 7807
Data columns (total 36 columns):
Principal Name of Organization                                    7808 non-null object
DBAs                                                              2612 non-null object
Registration Number                                               7808 non-null float64
EIN                                                               7808 non-null object
Principal Address 1                                               7808 non-null object
Principal Address 2                                               800 non-null object
Principal City                                                    7808 non-null object
Principal County                                                  7806 non-null object
Principal State                                                   7806 non-null object
Principal Zip                                                     7808 non-null object
Phone                  

In [11]:
#Find organizations that were originally established in Colorado, but moved out of state
charities_current.loc[(charities_all['Principal State'] !='CO') & (charities_all['State Established'] =='CO')]

Unnamed: 0,Principal Name of Organization,DBAs,Registration Number,EIN,Principal Address 1,Principal Address 2,Principal City,Principal County,Principal State,Principal Zip,...,Fundraising Expenses,Total Expenses,Fundraising Ratio Fundraising Expenses to Contributions,Program Service Ratio Program Services Expenses Total Expenses,Total Asset Amount,Total Liabilities Amount,Net Assets,NTEE Code 1,NTEE Code 2,NTEE Code 3
54,4 D MINISTRIES,4D MINISTRIES,2.012302e+10,45-3177621,1010 NATALIE COURT,,ALLEN,COLLIN,TX,75013,...,21205,529311,0.0404,0.9145,5.067860e+05,0.000000e+00,5.067860e+05,"X-RELIGION,SPIRITUAL DEVELOPMENT",,
195,ACCENTCARE HOSPICE FOUNDATION,"ACCENTCARE HOME HEALTH OF CALIFORNIA, INC. ( D...",2.009301e+10,26-0871391,"17855 NORTH DALLAS PARKWAY, SUITE 200",,DALLAS,COLLIN,TX,75287,...,0,55238,0.0000,0.9917,7.116700e+04,0.000000e+00,7.116700e+04,"A-ARTS,CULTURE & HUMANITIES",,
338,AFP SOUTHERN COLORADO CHAPTER,"AFP SOUTHERN COLORADO, AFPSOCO, ASSOCIATION OF...",2.002300e+10,84-1395173,"4300 WILSON BLVD., SUITE 300",,ARLINGTON,ARLINGTON,VA,22203,...,0,22591.6,,0.7198,2.526148e+04,0.000000e+00,2.526148e+04,"T-PHILANTHROPY,VOLUNTARISM,& GRANTMAKING",,
438,"ALEXA'S HUGS, INC.",,2.015301e+10,46-4417223,2035 WASMER CIRCLE,,BOSQUE FARMS,VALENCIA,NM,87068,...,3265,65332,0.1379,0.8500,8.175000e+03,6.894000e+03,1.281000e+03,E-HEALTH,"M-PUBLIC SAFETY,DISASTER PREPAREDNESS,& RELIEF",Z-UNKNOWN
492,ALLIANCE FOR CONTRACEPTION IN CATS AND DOGS,ACC&D,2.006301e+10,41-2185841,11145 NW OLD CORNELIUS PASS ROAD,,PORTLAND,MULTNOMAH,OR,97231,...,29495,330292,0.0622,0.7493,8.750670e+05,4.741900e+04,8.276480e+05,D-ANIMALS,"W-PUBLIC,SOCIETY BENEFIT",U-SCIENCE & TECHNOLOGY RESEARCH
833,AMERICAN KRATOM ASSOCIATION,,2.015303e+10,47-2208981,5501 MERCHANTS VIEW SQUARE,#202,HAYMARKET,PRINCE WILLIAM,VA,20169,...,35844,1088324,0.0451,0.9183,1.302380e+05,0.000000e+00,1.302380e+05,"R-CIVIL RIGHTS,SOCIAL ACTION,& ADVOCACY",,
903,"AMERICAN RECORDER SOCIETY, INC.",,2.002300e+10,13-2930296,3205 HALCOTT LN,,CHARLOTTE,MECKLENBURG,NC,28269,...,4335,162903,0.0274,0.8560,2.340600e+05,1.878100e+04,2.152790e+05,"A-ARTS,CULTURE & HUMANITIES",B-EDUCATION,O-YOUTH DEVELOPMENT
1009,"AMTGARD INTERNATIONAL, INC.",,2.017301e+10,81-0817813,2012 ROSEBUD DR.,,IRVING,DALLAS,TX,75060,...,0,10185,0.0000,0.0000,1.748805e+04,0.000000e+00,1.748805e+04,B-EDUCATION,"W-PUBLIC,SOCIETY BENEFIT","N-RECREATION,SPORTS"
1229,ARTS COUNCIL OF MONGOLIA- US,,2.007301e+10,56-2373006,2025 23RD AVE. EAST,,SEATTLE,KING,WA,98112,...,0,17515.8,0.0000,0.9871,6.277673e+04,0.000000e+00,6.277673e+04,"A-ARTS,CULTURE & HUMANITIES",B-EDUCATION,"T-PHILANTHROPY,VOLUNTARISM,& GRANTMAKING"
1281,"ASIAN HOPE, INC.",,2.008301e+10,84-1553945,1605 ENTERPRISE DRIVE,,LYNCHBURG,LYNCHBURG CITY,VA,24502,...,0,264665,0.0000,0.9919,1.284330e+06,4.245420e+05,8.597880e+05,B-EDUCATION,"X-RELIGION,SPIRITUAL DEVELOPMENT",



There are 7808 charities that either were established in Colorado or have Colorado listed as the principal state. This number includes 120 organizations that were originally established in Colorado, but have other states listed as their principal state.  For the purpose of our analysis, we will consider both types of organizations "local" to distinguish them from the country-wide and international charities operating in Colorado. 


In [12]:
#charities that have prinicipal location in Colorado by county
#charities_local.loc[charities_local['Principal State' == 'CO']].groupby(charities_local['Principal County'])

charities_local['EIN'].loc[charities_all['Principal State']=='CO'].groupby(charities_local['Principal County']).count()

Principal County
ADAMS                 186
ALAMEDA                 2
ALAMOSA                24
ARAPAHOE              461
ARCHULETA              28
ARLINGTON               1
BACA                    2
BEAUFORT                1
BENT                    2
BOULDER               479
BROOMFIELD             32
CASS                    1
CHAFFEE                23
CITY OF ALEXANDRIA      1
CLARK                   1
CLAY                    1
CLEAR CREEK            11
CO                     31
CO - COLORADO           2
COLORADO               28
COLORADO (CO)           3
COLORADO [CO]           1
CONEJOS                 3
CONNECTICUT             1
CONTRA COSTA            2
COOK                    1
COOK COUNTY             1
COSTILLA                6
CROWLEY                 1
CUSTER                 16
                     ... 
RAMSEY                  1
RIO BLANCO              5
RIO GRANDE             12
ROUTT                  62
SAGUACHE               18
SAN DIEGO               1
SAN JUAN             