### Enriching 990 dataset with the mission/purpose of the not-for-profit

1. The contractor dataset downloaded from here (https://www.open990.org/catalog/) didn't have the mission/purpose of the not-for-profit included in the dataset.
1. I was able to download a more expansive 990 dataset from here (https://appliednonprofitresearch.com/documentation/irs-990-spreadsheets/, a website affiliated with the open990 website) which included the mission/purpose of the not-for-profit.
1. Since the open990 dataset was in a more clean/accessible format, I continued using it for my project, however I joined the mission/purpose field to it.  This allowed me to perform analysis on specific categories of not-for-profits.

In [264]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 999)

In [17]:
org_mission = pd.read_csv('data/990_Part I_Line_1_mission_significant_activities.csv',
                low_memory=False)

In [18]:
org_mission.shape

(237854, 2)

In [8]:
org_mission.dtypes

ein_org               int64
990_Part I_Line 1    object
dtype: object

In [20]:
# rename column
org_mission.rename(columns={'990_Part I_Line 1': 'mission_of_org'}, inplace=True)

In [33]:
org_mission.head()

Unnamed: 0,ein_org,mission_of_org
0,461233726,"PROVIDING HELP, ASSISTANCE, & HEALING TO VICTIMS OF HUMAN TRAFFICKING."
1,591965600,SEE SCHEDULE O
2,840889330,"BUILD LONG-TERM, LIFE-CHANGING RELATIONSHIPS WITH URBAN YOUTH."
3,274862807,OUR MISSION IS TO PREPARE STUDENTS FOR A COLLEGE PREPARATORY HIGH SCHOOL THAT WILL ENSURE SUCCESS AND GRADUATION AND ACCEPTANCE INTO A FOUR YEAR COLLEGE.
4,141666301,"THE VERDOY VOLUNTEER FIRE ASSOCIATION, INC. (THE ""ASSOCIATION"")IS ORGANIZED EXCLUSIVELY FOR RESCUE, CHARITABLE AND EDUCATIONAL PURPOSES; THE ASSOCIATION PROVIDES THE PERSONNEL NEEDED TO STAFF THE APPARATUS OF THE LOCAL VOLUNTEER FIRE DISTRICT FOR THE PURPOSE OF PREVENTING AND EXTINGUISHING FIRES, PROTECTING LIFE AND PROPERTY FROM THE HAZARDS OF FIRE AND TO SERVE THE PUBLIC IN ANY EMERGENCY OR PERIL, WHENEVER AND WHEREVER LEGAL AND PROPER. THE ASSOCIATION PROMOTES AND MAINTAINS THE INTEREST OF THE VOLUNTEER MEMBERSHIP AND CONDUCTS CHARITABLE AND EDUCATIONAL ACTIVITIES IN THE BETTERMENT OF THE LOCAL COMMUNITY."


In [22]:
# format mission into upper case to be consistent
org_mission['mission_of_org'] = org_mission['mission_of_org'].apply(lambda x: str(x).upper())

In [23]:
# verify uppercase
org_mission.head()

Unnamed: 0,ein_org,mission_of_org
0,461233726,"PROVIDING HELP, ASSISTANCE, & HEALING TO VICTI..."
1,591965600,SEE SCHEDULE O
2,840889330,"BUILD LONG-TERM, LIFE-CHANGING RELATIONSHIPS W..."
3,274862807,OUR MISSION IS TO PREPARE STUDENTS FOR A COLLE...
4,141666301,"THE VERDOY VOLUNTEER FIRE ASSOCIATION, INC. (T..."


### Join not-for-profit orgs with at least one contractor over 100K

In [24]:
orgs_with_cont = pd.read_csv('data/ein_orgs_with_contractors.csv',
                low_memory=False)

In [25]:
orgs_with_cont.head()

Unnamed: 0,ein
0,10130427
1,10177170
2,10179500
3,10196359
4,10198331


In [36]:
orgs_with_cont.shape

(21310, 1)

In [37]:
orgs_with_cont_mission = orgs_with_cont.merge(org_mission, how='left', left_on='ein', right_on='ein_org')

In [57]:
orgs_with_cont_mission.head()

Unnamed: 0,ein,ein_org,mission_of_org
0,10130427,10130427,BRIDGTON HOSPITAL STRIVES TO PROVIDE EXCEPTIONAL HEALTHCARE SERVICES AND DEPENDS ON CAREGIVER EXPERTISE AND THE COMMITMENT AND COMPASSION THEY PROVIDE TO FULFILL ITS MISSION.
1,10177170,10177170,"WALDO COUNTY GENERAL HOSPITAL'S MISSION IS TO BE THE BEST - BETTER, EMPATHY, SERVICE AND TEAMWORK. OUR GOAL IS TO ENSURE QUALITY, ACCESSIBLE AND AFFORDABLE HEALTH CARE SERVICES AND TO IMPROVE THE HEALTH AND WELL-BEING OF OUR COMMUNITY. PLEASE SEE ATTACHED COMMUNITY BENEFITS REPORT."
2,10179500,10179500,SOUTHERN MAINE HEALTH CARE EXISTS TO IMPROVE THE HEALTH AND HEALTH CARE OF THE COMMUNITIES WE SERVE.
3,10196359,10196359,TO HELP PEOPLE WHO ARE VISUALLY IMPAIRED OR BLIND ATTAIN INDEPENDENCE AND COMMUNITY INTEGRATION.
4,10198331,10198331,THE HOSPITAL IS A NOT-FOR-PROFIT ENTITY ESTABLISHED TO PROVIDE HEALTH CARE SERVICES THROUGH ITS ACUTE CARE FACILITY AND PHYSICIAN PRACTICES.


### Find top word frequency within the mission statement of the orgs

In [50]:
tokens = []

for row in orgs_with_cont_mission.loc[:,'mission_of_org'].str.split():
    for word in row:
        if len(word) > 4:
            tokens.append(word)

In [52]:
len(tokens)

324924

In [53]:
# form tokens into dataframe
tokens_df = pd.DataFrame(tokens, columns=['tokens'])

# sort dataframe by frequency of tokens
tokens_count = tokens_df.groupby('tokens')['tokens'] \
    .count() \
    .reset_index(name= 'token_count') \
    .sort_values(by='token_count', ascending=False)

In [326]:
tokens_count.head()

Unnamed: 0,tokens,token_count
19888,PROVIDE,4989
12158,HEALTH,4942
22163,SERVICES,4611
24343,THROUGH,3413
6489,COMMUNITY,3292


### Categorize orgs based on mission statement of each not-for-profit.
There is a great deal more ambiguity than I expected in the language of the mission statements.  Also there is a great deal of overlap in the purpose of not-for-profit orgs.  It's definitely not as cut and dry as I was expecting.

501c3 orgs must fit into one of the below 8 categories:

1. Religious, 
1. Educational, 
1. Charitable, 
1. Scientific, 
1. Literary, 
1. Testing for Public Safety, to 
1. Foster National or International Amateur Sports Competition, or 
1. Prevention of Cruelty to Children or Animals Organizations

In [331]:
religious = ['SPIRITUALITY', 'RELIGIOUS', 'CHRISTIAN', 'JEWISH', 'CATHOLIC']
educational =  ['EDUCATION', 'STUDENTS', 'STUDY', 'ACADEMIC', 'SCHOOL', 'HISTORIC', 'COLLEGE']
charitable = ['HOSPITAL', 'LONG TERM CARE', 'HEALTH', 'HOSPICE', 'NURSING', 'VISUALLY IMPAIRED', 'CARE']
scientific =  ['SCIENCE', 'SCIENTIFIC', 'RESEARCH']
literary = ['LITERATURE', 'READING', 'LITERACY']
public_safety = ['SAFETY', 'SAFE', 'SECURE']  
amateur_sports = ['ATHLETIC', 'SPORT', 'SPORTS']
prevention_of_cruelty = ['CRUELTY', 'ANIMALS', 'ANIMAL SHELTER']

def categorizer(mission):
    for word in religious:
        if word in mission:
            return 'religious'
    for word in prevention_of_cruelty:
        if word in mission:
            return 'prevention_of_cruelty'
    for word in scientific:
        if word in mission:
            return 'scientific'
    for word in literary:
        if word in mission:
            return 'literary'
    for word in public_safety:
        if word in mission:
            return 'public_safety'
    for word in amateur_sports:
        if word in mission:
            return 'amateur_sports'
    
    for word in educational:
        if word in mission:          
            return 'educational'
    
    for word in charitable:
        if word in mission:
            return 'charitable'
    else:
        return None

In [332]:
# apply function to df
orgs_with_cont_mission['categories'] = orgs_with_cont_mission.mission_of_org.apply(categorizer)

In [343]:
# examine results
orgs_with_cont_mission.head()

Unnamed: 0,ein,ein_org,mission_of_org,categories
0,10130427,10130427,BRIDGTON HOSPITAL STRIVES TO PROVIDE EXCEPTIONAL HEALTHCARE SERVICES AND DEPENDS ON CAREGIVER EXPERTISE AND THE COMMITMENT AND COMPASSION THEY PROVIDE TO FULFILL ITS MISSION.,charitable
1,10177170,10177170,"WALDO COUNTY GENERAL HOSPITAL'S MISSION IS TO BE THE BEST - BETTER, EMPATHY, SERVICE AND TEAMWORK. OUR GOAL IS TO ENSURE QUALITY, ACCESSIBLE AND AFFORDABLE HEALTH CARE SERVICES AND TO IMPROVE THE HEALTH AND WELL-BEING OF OUR COMMUNITY. PLEASE SEE ATTACHED COMMUNITY BENEFITS REPORT.",charitable
2,10179500,10179500,SOUTHERN MAINE HEALTH CARE EXISTS TO IMPROVE THE HEALTH AND HEALTH CARE OF THE COMMUNITIES WE SERVE.,charitable
3,10196359,10196359,TO HELP PEOPLE WHO ARE VISUALLY IMPAIRED OR BLIND ATTAIN INDEPENDENCE AND COMMUNITY INTEGRATION.,charitable
4,10198331,10198331,THE HOSPITAL IS A NOT-FOR-PROFIT ENTITY ESTABLISHED TO PROVIDE HEALTH CARE SERVICES THROUGH ITS ACUTE CARE FACILITY AND PHYSICIAN PRACTICES.,charitable


In [334]:
# I am able to categorize about two-thirds of the orgs
orgs_with_cont_mission.categories.value_counts(dropna=False)

NaN                      7270
charitable               5720
educational              4389
scientific               1678
religious                1145
public_safety             646
amateur_sports            307
prevention_of_cruelty     102
literary                   73
Name: categories, dtype: int64

In [342]:
# what percent of the total am I categorizing?
orgs_with_cont_mission.categories.value_counts().sum() / orgs_with_cont_mission.categories.value_counts(dropna=False).sum() 

0.6591654946085326

In [330]:
# examin NaN results
orgs_with_cont_mission[orgs_with_cont_mission.categories.isna()].head()

Unnamed: 0,ein,ein_org,mission_of_org,categories
15,10211512,10211512,TO PROVIDE HIGH QUALITY HOUSING AND SERVICES TO THOSE 60 YEARS OF AGE AND OLDER.,
31,10215213,10215213,FOUR-YEAR PRIVATE UNDERGRADUATE LIBERAL ARTS COLLEGE. SEE SCHEDULE O,
49,10265559,10265559,"KBH'S MISSION IS TO PROMOTE THE WELL-BEING OF CHILDREN, ADULTS AND FAMILIES WHO EXPERIENCE MENTAL ILLNESS, EMOTIONAL OR DEVELOPMENTAL DIFFICULTIES, OR BEHAVIORAL CHALLENGES.",
51,10272879,10272879,"TO PROVIDE RESIDENTIAL, COMMUNITY AND WORK SUPPORTS THAT ENRICH, EMPOWER, EMPLOY, EDUCATE AND EXCEL PEOPLE WITH INTELLECTUAL DISABILITIES TO ACHIEVE THEIR INDIVIDUAL PERSONAL GOALS AND TO BE INCLUDED AND ACCEPTED IN OUR COMMUNITIES.",
52,10274725,10274725,TRANSFORMING OUR COMMUNITY BY HELPING PEOPLE IN NEED BUILD BETTER LIVES.,


In [328]:
orgs_with_cont_mission[orgs_with_cont_mission.categories =='literary'].head()

Unnamed: 0,ein,ein_org,mission_of_org,categories
1158,43846060,43846060,"ALBANY COMMUNITY CHARTER SCHOOL PREPARES STUDENTS FOR A LIFETIME OF OPPORTUNITY BY HELPING THEM MASTER PRIMARY RIGOROUS, STANDARDS-BASED CURRICULUM FOCUSED ON LITERACY AND OTHER FOUNDATIONAL KNOWLEDGE.",literary
1436,60862072,60862072,"TO OPERATE A COMMUNITY CENTER ENCOMPASSING: AFTER-SCHOOL PROGRAMS, YOUTH DEVELOPMENT, ADULT EDUCATION AND ACTIVITIES, PHYSICAL EDUCATION AND REFERRAL SERVICES IN AN EFFORT TO COMBAT ILLITERACY AND UNREALIZED HUMAN POTENTIAL.",literary
1646,110339109,110339109,TO PROMOTE ADULT AND FAMILY LITERACY IN THE HOLYOKE AREA AND TO ASSIST IN JOB PLACEMENT FOR UNEMPLOYED COMMUNITY RESIDENTS IN LOCAL BUSINESSES.,literary
2117,131679617,131679617,DISSEMINATION OF LITERATURE AND RELATED ITEMS DIRECTED TOWARDS ALCOHOLICS FOLLOWING THE A.A RECOVERY PROGRAM.,literary
2243,131969570,131969570,"STEPHEN GAYNOR SCHOOL IS AN INDEPENDENT, NONPROFIT PRE-K, LOWER, AND MIDDLE SCHOOL FOR BRIGHT STUDENTS WITH LEARNING DIFFERENCES. AROUND 350 STUDENTS AGES THREE TO 14 ATTEND OUR SCHOOL WITH A RANGE OF LEARNING DIFFERENCES, FROM ATTENTION HYPERACTIVITY DISORDER (ADHD) TO SPEECH, LANGUAGE, AND READING DELAYS.",literary


In [335]:
orgs_with_cont_mission[orgs_with_cont_mission.categories =='scientific'].head()

Unnamed: 0,ein,ein_org,mission_of_org,categories
5,10202467,10202467,TO DEVELOP SOLUTIONS TO COMPLEX HUMAN & ENVIRONMENTAL HEALTH PROBLEMS THROUGH RESEARCH & EDUCATION.,scientific
16,10211513,10211513,"THE PURPOSES OF THE LABORATORY ARE SCIENTIFIC, MEDICAL, CHARITABLE, AND EDUCATIONAL TO DISCOVER PRECISE GENOMIC SOLUTIONS FOR DISEASE AND EMPOWER THE BIOMEDICAL COMMUNITY.",scientific
21,10211781,10211781,"BATES COLLEGE IS A PRIVATE, HIGHLY SELECTIVE, RESIDENTIAL COLLEGE DEVOTED TO UNDERGRADUATE STUDY IN THE TRADITIONAL DISCIPLINES OF THE LIBERAL ARTS AND SCIENCES AS WELL AS IN EMERGING INTERDISCIPLINARY PROGRAMS.",scientific
25,10211810,10211810,"THE UNIVERSITY'S MISSION IS TO PROVIDE STUDENTS WITH A HIGHLY INTEGRATED LEARNING EXPERIENCE THAT PROMOTES EXCELLENCE THROUGH INTERDISCIPLINARY COLLABORATION AND INNOVATION IN EDUCATION, RESEARCH, AND SERVICE.",scientific
44,10238552,10238552,"THE MAINE MEDICAL CENTER (THE MEDICAL CENTER) IS A VOLUNTARY, NOT-FOR-PROFIT COMMUNITY AND REFERRAL HOSPITAL, DEDICATED TO PROVIDING HIGH QUALITY HEALTH CARE SERVICES TO ALL PERSONS WHO SEEK CARE REGARDLESS OF THEIR SEX, RACE, RELIGION, AGE, COLOR, SEXUAL ORIENTATION, NATIONAL ORIGIN, PHYSICAL OR EMOTIONAL DISABILITY OR SOCIAL OR ECONOMIC STATUS. MAINE MEDICAL CENTER IS ALSO COMMITTED TO EDUCATION AT THE UNDERGRADUATE, GRADUATE, POST-GRADUATE AND CONTINUING EDUCATION LEVELS FOR PHYSICIANS, NURSES AND ALLIED HEALTH PERSONNEL, AND IN-SERVICE TRAINING FOR SUPPORT STAFF ALL OF WHICH ARE ESSENTIAL TO THE DELIVERY OF QUALITY PATIENT CARE. OUTREACH EDUCATION TO OTHER INSTITUTIONS AND AGENCIES IS ALSO VITAL TO THE FULFILLMENT OF THE MAINE MEDICAL CENTER'S MISSION. THE MEDICAL CENTER ALSO SUPPORTS BASIC AND CLINICAL RESEARCH AS ESSENTIAL TO THE ADVANCEMENT OF HEALTH CARE.",scientific


In [336]:
orgs_with_cont_mission[orgs_with_cont_mission.categories =='religious'].head()

Unnamed: 0,ein,ein_org,mission_of_org,categories
89,10461075,10461075,THE PURPOSE OF THE MARIAN MOVEMENT OF PRIESTS IS TO PROVIDE A MEANS FOR PRIESTS AND LAITY OF THE CATHOLIC CHURCH TO ACHIEVE A GENUINE SPIRITUAL RENEWAL,religious
110,10548823,10548823,"PRINCE AVENUE CHRISTIAN SCHOOL EXISTS TO INFUSE OUR SCHOOL COMMUNITY WITH A BIBLICAL WORLDVIEW BY EFFECTIVELY SHARING THE GOSPEL AND DEVELOPING FULLY DEVOTED FOLLOWERS OF JESUS CHRIST THROUGH SCRIPTURALLY BASED DISCIPLESHIP, ACADEMICS, FINE ARTS, AND ATHLETICS.",religious
133,10651843,10651843,"TO PROVIDE FINANCIAL SUPPORT TO OTHER CHARITABLE ORGANIZATIONS WHICH PROMOTE SOCIAL, EDUCATIONAL AND OTHER CHARITABLE SERVICES IN THE UNITED STATES AND ISRAEL. IT ALSO PROVIDES SOCIAL SERVICES TO POOR AND DISADVANTAGED INDIVIDUALS IN THE IRANIAN AMERICAN JEWISH COMMUNITY.",religious
195,20222163,20222163,"GROUNDED IN THE LIFE AND MINISTRY OF JESUS CHRIST, NEW HAMPSHIRE CATHOLIC CHARITIES RESPONDS TO THOSE IN NEED WITH PROGRAMS THAT HEAL, COMFORT AND EMPOWER.",religious
200,20222182,20222182,"A CATHOLIC, BENEDICTINE COLLEGE PROVIDING ALL ITS STUDENTS A DISTINCTIVE LIBERAL ARTS EDUCATION THAT INCORPORATES OPPORTUNITIES FOR PROFESSIONAL AND CAREER PREPARATION.",religious


In [344]:
orgs_with_cont_mission.shape

(21330, 4)

In [346]:
orgs_with_cont_mission.head()

Unnamed: 0,ein,ein_org,mission_of_org,categories
0,10130427,10130427,BRIDGTON HOSPITAL STRIVES TO PROVIDE EXCEPTIONAL HEALTHCARE SERVICES AND DEPENDS ON CAREGIVER EXPERTISE AND THE COMMITMENT AND COMPASSION THEY PROVIDE TO FULFILL ITS MISSION.,charitable
1,10177170,10177170,"WALDO COUNTY GENERAL HOSPITAL'S MISSION IS TO BE THE BEST - BETTER, EMPATHY, SERVICE AND TEAMWORK. OUR GOAL IS TO ENSURE QUALITY, ACCESSIBLE AND AFFORDABLE HEALTH CARE SERVICES AND TO IMPROVE THE HEALTH AND WELL-BEING OF OUR COMMUNITY. PLEASE SEE ATTACHED COMMUNITY BENEFITS REPORT.",charitable
2,10179500,10179500,SOUTHERN MAINE HEALTH CARE EXISTS TO IMPROVE THE HEALTH AND HEALTH CARE OF THE COMMUNITIES WE SERVE.,charitable
3,10196359,10196359,TO HELP PEOPLE WHO ARE VISUALLY IMPAIRED OR BLIND ATTAIN INDEPENDENCE AND COMMUNITY INTEGRATION.,charitable
4,10198331,10198331,THE HOSPITAL IS A NOT-FOR-PROFIT ENTITY ESTABLISHED TO PROVIDE HEALTH CARE SERVICES THROUGH ITS ACUTE CARE FACILITY AND PHYSICIAN PRACTICES.,charitable


In [353]:
orgs_with_cont_mission[['ein', 'categories']].head()

Unnamed: 0,ein,categories
0,10130427,charitable
1,10177170,charitable
2,10179500,charitable
3,10196359,charitable
4,10198331,charitable


#### Write to CSV 

In [354]:
orgs_with_cont_mission[['ein', 'categories']].to_csv('data/501c3_categorized.csv', index=None, header=True)