### Enriching 990 dataset with the mission/purpose of the not-for-profit

1. The contractor dataset downloaded from here (https://www.open990.org/catalog/) didn't have the mission/purpose of the not-for-profit included in the dataset.
1. I was able to download a more expansive 990 dataset from here (https://appliednonprofitresearch.com/documentation/irs-990-spreadsheets/, a website affiliated with the open990 website) which included the mission/purpose of the not-for-profit.
1. Since the open990 dataset was in a more clean/accessible format, I continued using it for my project, however I joined the mission/purpose field to it.  This allowed me to perform analysis on specific categories of not-for-profits.

In [30]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 999)

In [17]:
org_mission = pd.read_csv('data/990_Part I_Line_1_mission_significant_activities.csv',
                low_memory=False)

In [18]:
org_mission.shape

(237854, 2)

In [8]:
org_mission.dtypes

ein_org               int64
990_Part I_Line 1    object
dtype: object

In [20]:
# rename column
org_mission.rename(columns={'990_Part I_Line 1': 'mission_of_org'}, inplace=True)

In [21]:
org_mission.head()

Unnamed: 0,ein_org,mission_of_org
0,461233726,"PROVIDING HELP, ASSISTANCE, & HEALING TO VICTI..."
1,591965600,SEE SCHEDULE O
2,840889330,"BUILD LONG-TERM, LIFE-CHANGING RELATIONSHIPS W..."
3,274862807,Our mission is to prepare students for a colle...
4,141666301,"THE VERDOY VOLUNTEER FIRE ASSOCIATION, INC. (T..."


In [22]:
# format mission into upper case to be consistent
org_mission['mission_of_org'] = org_mission['mission_of_org'].apply(lambda x: str(x).upper())

In [23]:
# verify uppercase
org_mission.head()

Unnamed: 0,ein_org,mission_of_org
0,461233726,"PROVIDING HELP, ASSISTANCE, & HEALING TO VICTI..."
1,591965600,SEE SCHEDULE O
2,840889330,"BUILD LONG-TERM, LIFE-CHANGING RELATIONSHIPS W..."
3,274862807,OUR MISSION IS TO PREPARE STUDENTS FOR A COLLE...
4,141666301,"THE VERDOY VOLUNTEER FIRE ASSOCIATION, INC. (T..."


### Join not-for-profit orgs with at least one contractor over 100K

In [24]:
orgs_with_cont = pd.read_csv('data/ein_orgs_with_contractors.csv',
                low_memory=False)

In [25]:
orgs_with_cont.head()

Unnamed: 0,ein
0,10130427
1,10177170
2,10179500
3,10196359
4,10198331


In [26]:
orgs_with_cont.shape

(21310, 1)

In [32]:
orgs_with_cont.merge(org_mission, how='left', left_on='ein', right_on='ein_org').head(20)

Unnamed: 0,ein,ein_org,mission_of_org
0,10130427,10130427,BRIDGTON HOSPITAL STRIVES TO PROVIDE EXCEPTIONAL HEALTHCARE SERVICES AND DEPENDS ON CAREGIVER EXPERTISE AND THE COMMITMENT AND COMPASSION THEY PROVIDE TO FULFILL ITS MISSION.
1,10177170,10177170,"WALDO COUNTY GENERAL HOSPITAL'S MISSION IS TO BE THE BEST - BETTER, EMPATHY, SERVICE AND TEAMWORK. OUR GOAL IS TO ENSURE QUALITY, ACCESSIBLE AND AFFORDABLE HEALTH CARE SERVICES AND TO IMPROVE THE HEALTH AND WELL-BEING OF OUR COMMUNITY. PLEASE SEE ATTACHED COMMUNITY BENEFITS REPORT."
2,10179500,10179500,SOUTHERN MAINE HEALTH CARE EXISTS TO IMPROVE THE HEALTH AND HEALTH CARE OF THE COMMUNITIES WE SERVE.
3,10196359,10196359,TO HELP PEOPLE WHO ARE VISUALLY IMPAIRED OR BLIND ATTAIN INDEPENDENCE AND COMMUNITY INTEGRATION.
4,10198331,10198331,THE HOSPITAL IS A NOT-FOR-PROFIT ENTITY ESTABLISHED TO PROVIDE HEALTH CARE SERVICES THROUGH ITS ACUTE CARE FACILITY AND PHYSICIAN PRACTICES.
5,10202467,10202467,TO DEVELOP SOLUTIONS TO COMPLEX HUMAN & ENVIRONMENTAL HEALTH PROBLEMS THROUGH RESEARCH & EDUCATION.
6,10211483,10211483,"PROVIDING ADULT, CHILDREN, AND FAMILY MENTAL HEALTH AND SOCIAL SERVICES. PROVIDING HOME HEALTH AND HOSPICE SERVICES."
7,10211488,10211488,SUPPORT PRESERVATION OF HISTORIC BUILDINGS
8,10211494,10211494,CMMC STRIVES TO PROVIDE EXCEPTIONAL HEALTHCARE SERVICES. THE HOSPITAL DEPENDS ON THE EXPERTISE OF ITS CAREGIVERS IN ADDITION TO THE COMMITMENT AND COMPASSION THE PROVIDE.
9,10211497,10211497,EDUCATION


### Extract key words/phrases from mission statement of each not-for-profit

501c3 orgs must fit into one of the below 8 categories:

1. Religious, 
1. Educational, 
1. Charitable, 
1. Scientific, 
1. Literary, 
1. Testing for Public Safety, to 
1. Foster National or International Amateur Sports Competition, or 
1. Prevention of Cruelty to Children or Animals Organizations

In [None]:
pseudocode

if string contains x, y, z:
    then Religious, 
elif string contains x, y, z:
    Educational, 
elif string contains x, y, z: 
    Charitable
ect.