# Overview
- Read in `D:/990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state.feather`  
- Fix NTEE variable:
  - I use Jesse' advice to "start by combining NTEE_IRS (the official value in the IRS BMF file) and NTEE_NCCS (the unofficial version that has been improved over time by recoding some of the default IRS values). Use **NTEE_NCCS** unless the value is missing, then use **NTEE_IRS**." I do this and create a new variable, `NTEE`
    ```python
    %%time
    df['NTEE'] = np.where(df['BMF_NTEE_NCCS'].isnull(), df['BMF_NTEE_IRS'], df['BMF_NTEE_NCCS'])
    ```

- I then create a custom function, `map_ntee`, which I then use to create two versions of the 12-category 'industry':
  - `NTEE_MAJ12` and `NTEE_MAJ12_EV`
    - See Jesse's notes and my notes below for insights and background. The short version is that you can use either of these two variables for industry fixed effects. The statistical results will be identical either way. 

- I then merge in NTEE code details from the crosswalk file, perform some final verifications, and then save the output in various formats, including:
  - `D:/990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state_ntee.feather`

# Jesse's Notes

#### Mon, Apr 28, 2025 at 3:25 PM Jesse Lecy <jdlecy@gmail.com> 

OK, I found my field dependency diagram. See attached. 

I would NOT trust the LEVEL3 fields because after Tom Pollack left Urban the data processing task was taken over by a talented analyst that did not work regularly with nonprofit data. Some new python scripts were developed, but they failed to account for gaps in the data and variable dependencies. The BMF files also started having issues like missing NTEE codes and inconsistent activity codes. As a result, there was a cascade of field quality degradation. 

With the LEVELS, specifically, the new code that was developed used some default categories. So if an input field was missing or had an unusual value, the new derived fields would be assigned to misc or missing categories and there was not good error handling. It resulted in a growing number of missing values in each field, or a growing number of cases assigned to "uncategorized" or "misc" categories when they should not have been. 

The LEVEL1 - LEVEL3 fields were retained only for legacy purposes, but they should not have higher rates of completion than their input fields (NTEE and ACTIV). 

image.png  
image.png

I would start by combining NTEE_IRS (the official value in the IRS BMF file) and NTEE_NCCS (the unofficial version that has been improved over time by recoding some of the default IRS values). Use **NTEE_NCCS** unless the value is missing, then use **NTEE_IRS**. From there you can apply the NTEEV2 crosswalk or recreate the LEVEL3 categories (old code with NTEE to LEVEL3 rules is in the PPT). 

There are archived raw BMF files from June 2023 to today, just change the dates here: 

https://nccsdata.s3.us-east-1.amazonaws.com/raw/bmf/2025-04-BMF.csv

Hopefully that makes sense. 

### NTEE Code Structure
The NTEEV2 code system is an evolution of the original NTEE code system, a classification system used by the IRS and NCCS for nonprofit organizations. For all of the NTEE codes, refer to this comprehensive overview.

This new NTEE system can be organized into 5 levels.

The NTEEV2 codes are structured in three parts:

Level 1: Industry Group
Level 2-4: Major Group, Division and Subdivision
Level 5: Organization Type
These parts are separated by a hyphen, and all NTEEV2 codes must contain all three parts (or five levels) in sequence.

https://urbaninstitute.github.io/nccs/stories/nccsdata-ntee/

#### Level 1: Industry Groups
The Industry Group is represented by three letters. The 10 options are:

| Industry Group | Description                   |
|----------------|-------------------------------|
| ART            | Arts, Culture & Humanities    |
| EDU            | Education                     |
| ENV            | Environment and Animals       |
| HEL            | Health                        |
| HMS            | Human Services                |
| IFA            | International, Foreign Affairs|
| PSB            | Public, Societal Benefit      |
| REL            | Religion Related              |
| MMB            | Mutual/Membership Benefit     |
| UNU            | Unknown, Unclassified         |
| UNI            | University                    |
| HOS            | Hospital                      |


---

Jesse Lecy's GitHub site has the best explanation and allows for easiest comparison between the old and new NTEE code: https://github.com/Nonprofit-Open-Data-Collective/mission-taxonomies/tree/main/NTEEV2

Here is the old codes for `NTMAJ12`  
"Broad Industries - The NTEE classification system aggregates the 26 major groups (letters A - Z) into 12 broad categories or industries as follows:"

| NTEE Codes              | Label | Description                          |
|-------------------------|-------|--------------------------------------|
| A                       | AR    | Arts, culture, and humanities        |
| B4, B5                  | BH    | Higher education                     |
| B (other than B4,B5)    | ED    | Education (other)                    |
| C, D                    | EN    | Environment                          |
| E2                      | EH    | Hospitals                            |
| E (other than E2), F,G,H| HE    | Health                               |
| I, J, K, L, M, N, O, P  | HU    | Human services                       |
| Q                       | IN    | International                        |
| R, S, T, U, V, W        | PU    | Public and societal benefit          |
| X                       | RE    | Religion                             |
| Y                       | MU    | Mutual benefit                       |
| Z                       | UN    | Unknown                              |



Due to the confusing structure of the NTEE codes, a new format has been created to improve interpretation and sampling.

The NTEEV2 code format is as follows:

![NTEE Structure](ntee2_structure.png)


Level 1: Industry Group  
The industry group portion (the first three letters) contains the code for the 12-category 'NTMAJ12' industry clustering:


| Code | Category                        | NTEE Codes                        |
|------|----------------------------------|-----------------------------------|
| ART  | Arts, Culture, and Humanities   | A                                 |
| EDU  | Education (minus universities)  | B (excluding B40–B43, B50)        |
| ENV  | Environment and Animals         | C, D                              |
| HEL  | Health (minus hospitals)        | E, F, G, H (excluding E20–E24)    |
| HMS  | Human Services                  | I, J, K, L, M, N, O, P            |
| IFA  | International, Foreign Affairs  | Q                                 |
| PSB  | Public, Societal Benefit        | R, S, T, U, V, W                  |
| REL  | Religion Related                | X                                 |
| MMB  | Mutual/Membership Benefit       | Y                                 |
| UNU  | Unknown, Unclassified           | Z                                 |
| UNI  | Universities                    | B40, B41, B42, B43, B50           |
| HOS  | Hospitals                       | E20, E21, E22, E24                |


---

You're absolutely right — thanks for catching that! Here's the revised and complete **markdown table** that includes:

- The new collapsed **Code**
- The **New Category** name
- The corresponding **Old Label** and **Old Label Description**
- The **Old NTEE Codes** from the 12-label system
- The **New NTEE Codes** used in your updated classification
- ✅ Now also includes the **`| NTEE Codes`** column (which matches `New NTEE Codes` but with the original header name from your request)

---

# New vs. Old NTEE 12 Major Categories

There's a clear **1:1 match** between each old 12-category **Label** (e.g., `AR`, `ED`, `HE`) and the **new Code** (e.g., `ART`, `EDU`, `HEL`), so we can safely align them.

```python
# Mapping from Old Label to New Code
old_to_new = {
    'AR': 'ART',
    'BH': 'UNI',
    'ED': 'EDU',
    'EN': 'ENV',
    'EH': 'HOS',
    'HE': 'HEL',
    'HU': 'HMS',
    'IN': 'IFA',
    'PU': 'PSB',
    'RE': 'REL',
    'MU': 'MMB',
    'UN': 'UNU'
}
```

Here’s the **updated table** with an additional column for the **original NTEE Codes from the old 12-category mapping**:


| New Code | New Category                  | Old Label | Old Label Description               | Old NTEE Codes               | New NTEE Codes                        |
|----------|-------------------------------|-----------|--------------------------------------|------------------------------|----------------------------------------|
| ART      | Arts, Culture, and Humanities | AR        | Arts, culture, and humanities        | A                            | A                                      |
| EDU      | Education (minus universities)| ED        | Education (other)                    | B (other than B4, B5)        | B (excluding B40–B43, B50)             |
| UNI      | Universities                  | BH        | Higher education                     | B4, B5                       | B40, B41, B42, B43, B50                |
| ENV      | Environment and Animals       | EN        | Environment                          | C, D                         | C, D                                   |
| HOS      | Hospitals                     | EH        | Hospitals                            | E2                           | E20, E21, E22, E24                     |
| HEL      | Health (minus hospitals)      | HE        | Health                               | E (other than E2), F, G, H   | E (excluding E20–E24), F, G, H         |
| HMS      | Human Services                | HU        | Human services                       | I, J, K, L, M, N, O, P       | I, J, K, L, M, N, O, P                 |
| IFA      | International, Foreign Affairs| IN        | International                        | Q                            | Q                                      |
| PSB      | Public, Societal Benefit      | PU        | Public and societal benefit          | R, S, T, U, V, W             | R, S, T, U, V, W                       |
| REL      | Religion Related              | RE        | Religion                             | X                            | X                                      |
| MMB      | Mutual/Membership Benefit     | MU        | Mutual benefit                       | Y                            | Y                                      |
| UNU      | Unknown, Unclassified         | UN        | Unknown                              | Z                            | Z                                      |

# Start

In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series

In [2]:
print(pd.__version__)

2.2.2


In [2]:
from platform import python_version
print(python_version())

3.10.11


In [3]:
# http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
# http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_colwidth', 500)

In [4]:
pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [5]:
pd.options.display.float_format = '{:,.1f}'.format

In [6]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
warnings.simplefilter(action='ignore', category=pd.errors.SettingWithCopyWarning)

In [7]:
import datetime
import gc

#### Set working directory

In [8]:
cd "C:\\Users\\Gregory\\IRS 990 Control Variables\\"

C:\Users\Gregory\IRS 990 Control Variables


# Read in PANDAS DF

In [10]:
#%%time
#import datetime
#print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df = pd.read_pickle('990 and BMF control variables for all NEW filings February 2024 -- 277,112 501c3 orgs -- duplicated filings dropped (N=655,657).pkl.gz',
#                    compression='gzip')
#print('# of columns:', len(df.columns))
#print('# of observations:', len(df))
#df[:1]

Current date and time :  2024-03-31 19:30:34 

# of columns: 342
# of observations: 655657
CPU times: total: 18.7 s
Wall time: 19.6 s


Unnamed: 0,URL,F9_09_PC_FEES_FOR_SVCE_FR_TOT,F9_00_HD_BUILD_TIME_STAMP,fiscal_year,EIN,BusinessName,BusinessNameControlTxt,PhoneNum,USAddress,InCareOfNm,ForeignAddress,ForeignPhoneNum,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_CTRY_OF_DOMICILE,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROSS_EXEMPT_NUM,F9_00_HD_GROSS_RCPT,F9_00_HD_GROUP_RETURN,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_00_HD_INITIAL_RETURN,F9_00_HD_PRIN_OFF_NAME,F9_00_HD_SIGNING_OFFICER_SIGNTR,F9_00_HD_SPECIAL_CONDITION_DESC,F9_00_HD_STATE_OF_DOMICILE,F9_00_HD_TAX_PER_BEGIN,F9_00_HD_TAX_PER_END,F9_00_HD_TAX_YEAR,F9_00_HD_TIME_STAMP,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_OTHER_DESC,F9_00_HD_TYPE_ORG_TRUST,F9_00_HD_WEBSITE,F9_00_HD_YEAR_FORMED,F9_01_PC_BEN_PAID_MEMB_PRIOR,F9_01_PC_CONTR_GRANTS_CURR,F9_01_PC_CONTR_GRANTS_PRIOR,F9_01_PC_GRANTS_PRIOR,F9_01_PC_INDEP_VOTING_MEMB,F9_01_PC_INVEST_INCOME_PRIOR,F9_01_PC_NET_ASSETS_BOY,F9_01_PC_OTHER_EXPENSE_PRIOR,F9_01_PC_OTHER_REV_PRIOR,F9_01_PC_PROF_FUNDRISING_EXP_CURR,F9_01_PC_PROF_FUNDRISING_EXP_PRIOR,F9_01_PC_PROG_SERVICE_REV_PRIOR,F9_01_PC_REV_LESS_EXP_CURR,F9_01_PC_REV_LESS_EXP_PRIOR,F9_01_PC_TERMINATION_CONTRACTION,F9_01_PC_TOT_ASSETS_EOY,F9_01_PC_TOT_EXP_PRIOR,F9_01_PC_TOT_FNDR_EXP_CURR,F9_01_PC_TOT_INDIV_EMPLOYED,F9_01_PC_TOT_INDIV_VOLUNTEERS,F9_01_PC_TOT_LIABILITIES_EOY,F9_01_PC_TOT_REVENUE_PRIOR,F9_01_PC_TOT_UBI_GROSS,F9_01_PC_TOT_UBI_NET,F9_01_PC_VOTING_MEMB_GOV_BODY,F9_01_PZ_BEN_PAID_TO_MEMB_CURR,F9_01_PZ_GRANTS_PAID_CURR,F9_01_PZ_INVEST_INCOME_CURR,F9_01_PZ_NAFB_EOY,F9_01_PZ_ORGANIZATIONAL_MISSION,F9_01_PZ_OTHER_EXPENSE_CURR,F9_01_PZ_OTHER_REV_CURR,F9_01_PZ_PROG_SERVICE_REV_CURR,F9_01_PZ_SALARIES_CURR,F9_01_PZ_SALARIES_PRIOR,F9_01_PZ_TOT_ASSETS_BOY,F9_01_PZ_TOT_EXP_CURR,F9_01_PZ_TOT_LIAB_BOY,F9_01_PZ_TOT_REV_CURR,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PC_PROG_SVC_ACC_1_CODE,F9_03_PC_PROG_SVC_ACC_1_DESC,F9_03_PC_PROG_SVC_ACC_1_EXP,F9_03_PC_PROG_SVC_ACC_1_GRNT,F9_03_PC_PROG_SVC_ACC_1_REV,F9_03_PC_PROG_SVC_ACC_2_CODE,F9_03_PC_PROG_SVC_ACC_2_DESC,F9_03_PC_PROG_SVC_ACC_2_EXP,F9_03_PC_PROG_SVC_ACC_2_GRNT,F9_03_PC_PROG_SVC_ACC_2_REV,F9_03_PC_PROG_SVC_ACC_3_CODE,F9_03_PC_PROG_SVC_ACC_3_DESC,F9_03_PC_PROG_SVC_ACC_3_EXP,F9_03_PC_PROG_SVC_ACC_3_GRNT,F9_03_PC_PROG_SVC_ACC_3_REV,F9_03_PC_TOT_OTH_PROG_SVC_EXP,F9_03_PC_TOT_OTH_PROG_SVC_GRNT,F9_03_PC_TOT_OTH_PROG_SVC_REV,F9_03_PC_TOT_PROG_SVC_EXPENSE,F9_03_PZ_MISSION_DESCRIPTION,F9_03_PZ_SCHEDULE_O_PART3,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT,F9_05_EXP_SCHED_O_X,F9_05_PC_NUMBER_EMPLOYEES_W3,F9_05_PC_NUMBER_FORMS_1096,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_NUM_IND_VOTING_MEMBERS,F9_06_PC_NUM_VOTING_GOV_MEMBERS,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_STATES_WHERE_RET_FILED,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_NUM_CONTRCTRS_GRTR_100K,F9_07_PC_NUM_INDS_GREATER_100K,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_07_PC_TOT_OTHER_COMPENSATION,F9_07_PC_TOT_REPRT_COMP_FROM_ORG,F9_07_PC_TOT_REPRT_COMP_RLTD_ORG,F9_08_EXP_SCHED_O_X,F9_08_PC_ALL_OTHER_CONTRIBUTIONS,F9_08_PC_CONTS_REPRTD_FNDRAISNG,F9_08_PC_COST_OF_GOODS_SOLD,F9_08_PC_FEDERATED_CAMPAIGNS,F9_08_PC_FUNDRAISING_DIRECT_EXP,F9_08_PC_FUNDRAISING_EVENTS,F9_08_PC_FUNDRAISING_GROSS_INC,F9_08_PC_GAMING_DIRECT_EXPENSES,F9_08_PC_GAMING_GROSS_INCOME,F9_08_PC_GOVERNMENT_GRANTS,F9_08_PC_GROSS_SALES_INVENTORY,F9_08_PC_MEMBERSHIP_DUES,F9_08_PC_NONCASH_CONTRIBUTIONS,F9_08_PC_PROGRAM_SVCE_REV_TOTAL,F9_08_PC_RELATED_ORGANIZATIONS,F9_08_PC_TOTAL_CONTRIBUTIONS,F9_08_PC_TOTAL_OTHER_REVENUE,F9_08_PC_TOTAL_PROG_SVCE_REVENUE,F9_08_PC_TOTAL_REVENUE,F9_09_EXP_AD_PROMO_TOT,F9_09_EXP_BENF_PAID_MEMB_TOT,F9_09_EXP_CONF_MEETING_TOT,F9_09_EXP_DEPREC_FUNDR,F9_09_EXP_DEPREC_MAG,F9_09_EXP_DEPREC_PROG,F9_09_EXP_DEPREC_TOT,F9_09_EXP_GRANT_FRGN_TOT,F9_09_EXP_GRANT_INDIV_DMSTC_TOT,F9_09_EXP_GRANT_ORG_DMSTC_TOT,F9_09_EXP_INFO_TECH_TOT,F9_09_EXP_INSURANCE_TOT,F9_09_EXP_INTEREST_TOT,F9_09_EXP_JOINT_COSTS_TOT,F9_09_EXP_OCCUPANCY_TOT,F9_09_EXP_OFFICE_TOT,F9_09_EXP_OTH_OTH_TOT,F9_09_EXP_ROY_TOT,F9_09_EXP_SCHED_O_X,F9_09_EXP_TRAVEL_ENTRTNMNT_TOT,F9_09_EXP_TRAVEL_TOT,F9_09_PC_COMP_DISQUAL_FUNDRAISE,F9_09_PC_COMP_DISQUAL_MGMT,F9_09_PC_COMP_DISQUAL_PROG_SVCE,F9_09_PC_COMP_DISQUAL_TOTAL,F9_09_PC_COMP_OFFICERS_FUNDRAISE,F9_09_PC_COMP_OFFICERS_MGMT,F9_09_PC_COMP_OFFICERS_PROG_SVCE,F9_09_PC_COMP_OFFICERS_TOTAL,F9_09_PC_FEES_FOR_SVCE_ACCT_TOT,F9_09_PC_FEES_FOR_SVCE_INVST_TOT,F9_09_PC_FEES_FOR_SVCE_LEGL_TOT,F9_09_PC_FEES_FOR_SVCE_LOBB_TOT,F9_09_PC_FEES_FOR_SVCE_MGMT_TOT,F9_09_PC_FEES_FOR_SVCE_OTH_TOT,F9_09_PC_OTHER_EMP_BEN_FUNDRAISE,F9_09_PC_OTHER_EMP_BEN_MGMT,F9_09_PC_OTHER_EMP_BEN_PROG_SVCE,F9_09_PC_OTHER_EMP_BEN_TOTAL,F9_09_PC_OTHER_SALARY_FUNDRAISE,F9_09_PC_OTHER_SALARY_MGMT,F9_09_PC_OTHER_SALARY_PROG_SVCE,F9_09_PC_OTHER_SALARY_TOTAL,F9_09_PC_PAYMENT_TO_AFFILIATES,F9_09_PC_PAYROLL_TAX_FUNDRAISE,F9_09_PC_PAYROLL_TAX_MGMT,F9_09_PC_PAYROLL_TAX_PROG_SVCE,F9_09_PC_PAYROLL_TAX_TOTAL,F9_09_PC_PENSION_CONT_FUNDRAISE,F9_09_PC_PENSION_CONT_MGMT,F9_09_PC_PENSION_CONT_PROG_SVCE,F9_09_PC_PENSION_CONT_TOTAL,F9_09_PC_TOTAL_FUNC_EXPENSES,F9_09_PC_TOTAL_FUNDRAISE_EXPENSE,F9_09_PC_TOTAL_MGMT_EXPENSE,F9_09_PC_TOTAL_PROG_SVCE_EXPENSE,F9_10_ASSETS_ACC_NET_EOY,F9_10_ASSETS_EXP_PREPAID_EOY,F9_10_ASSETS_INTANGIB_EOY,F9_10_ASSETS_INVENT_SALE_EOY,F9_10_ASSETS_LESS_DEPREC_EOY,F9_10_ASSETS_LOANS_DISQUAL_EOY,F9_10_ASSETS_NOTES_LOANS_NET_EOY,F9_10_ASSETS_OTH_EOY,F9_10_ASSETS_PLEDGES_NET_EOY,F9_10_LIAB_ACC_PAYABLE_EOY,F9_10_LIAB_GRANTS_PAYABLE_EOY,F9_10_LIAB_LOANS_OFF_EOY,F9_10_LIAB_REV_DEFERRED_EOY,F9_10_NAFB_RESTRICT_PERM_EOY,F9_10_NAFB_RESTRICT_TEMP_EOY,F9_10_NAFB_UNRESTRICT_EOY,F9_10_PC_BOND_LIABILITY_EOY,F9_10_PC_CASH_NON_INTEREST_BOY,F9_10_PC_CASH_NON_INTEREST_EOY,F9_10_PC_ESCROW_LIABILITY_EOY,F9_10_PC_INVEST_OTHER_SEC_EOY,F9_10_PC_INVEST_PROG_RELTD_EOY,F9_10_PC_INVEST_PUB_TRADED_EOY,F9_10_PC_LAND_BLDG_EQPMT,F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN,F9_10_PC_LOANS_FROM_OFFICERS_EOY,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_PC_OTHER_LIABILITIES_EOY,F9_10_PC_RET_EARNINGS_ENDWMT_EOY,F9_10_PC_SAVINGS_TEMP_INVEST_BOY,F9_10_PC_SAVINGS_TEMP_INVEST_EOY,F9_10_PC_SECURED_MORTGAGES_EOY,F9_10_PC_SECURE_MORT_NOTES_EOY,F9_10_PC_UNSECURED_LOANS_EOY,F9_10_PC_UNSECURED_NOTES_BOY,F9_10_PC_UNSECURED_NOTES_EOY,F9_10_PZ_TOTAL_ASSETS_EOY,F9_10_SCHED_O_X,F9_11_PC_RECNCLTN_DONATED_SVCES,F9_11_PC_RECNCLTN_INVSTMNT_EXP,F9_11_PC_RECNCLTN_PRIOR_PER_ADJ,F9_11_PC_RECNCLTN_REV_LESS_EXP,F9_11_PC_RECNCLTN_UNRLZD_GAIN,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_ACCTG_METHOD_OTHER,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X,number_of_other_prog_svces,501c3,F9_00_HD_FILER_ADDR_US_L1,F9_00_HD_FILER_ADDR_US_L2,F9_00_HD_FILER_CITY_US,F9_00_HD_FILER_ZIP_US,F9_00_HD_FILER_COUNTRY_FRGN,F9_00_HD_FILER_STATE_US,F9_00_HD_TIME_STAMP_yr,ein_int,BMF_EIN,BMF_SEC_NAME,BMF_FRCD,BMF_SUBSECCD,BMF_TAXPER,BMF_ASSETS,BMF_INCOME,BMF_NAME,BMF_ADDRESS,BMF_CITY,BMF_STATE,BMF_NTEEFINAL,BMF_NAICS,BMF_ZIP5,BMF_RULEDATE,BMF_FIPS,BMF_FNDNCD,BMF_PMSA,BMF_MSA_NECH,BMF_CASSETS,BMF_CFINSRC,BMF_CTAXPER,BMF_CTOTREV,BMF_ACCPER,BMF_RANDNUM,BMF_NTEECC,BMF_NTEE1,BMF_LEVEL4,BMF_LEVEL1,BMF_NTMAJ10,BMF_MAJGRPB,BMF_LEVEL3,BMF_LEVEL2,BMF_NTMAJ12,BMF_NTMAJ5,BMF_FILER,BMF_ZFILER,BMF_OUTREAS,BMF_OUTNCCS,BMF_EIN9,BMF_NTEECONF,number of org-year duplicates,filing_year_had_duplicate
0,https://s3.amazonaws.com/irs-form-990/202323189349305317_public.xml,0.0,2023-04-26 12:10:37+00:00,2022,10017496,{'BusinessNameLine1Txt': 'AGAMENTICUS YACHT CLUB INC'},AGAM,2073638510,"{'AddressLine1Txt': 'PO BOX 534', 'CityNm': 'YORK HARBOR', 'StateAbbreviationCd': 'ME', 'ZIPCd': '03911'}",,,,0.0,0.0,,0.0,,1.0,0.0,,376800,0,0.0,0.0,DANIEL FORD,2023-11-13,,ME,2022-01-01,2022-12-31,2022,2023-11-14 10:30:26-06:00,0.0,1.0,0.0,,0.0,WWW.AYCSAIL.ORG,1937.0,0.0,279970,0.0,0.0,13,0.0,273331.0,0.0,0.0,0,0.0,0.0,184620,0.0,0.0,413907,0.0,2744,16,20.0,8282,0.0,0,0.0,13,0,0,34628,405625,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH SAILING TO CHILDREN BY FOCUSING ON SAFETY, ENJOYMENT AND KNOWLEDGE OF SAILING.",132172,3377,54843,56026,0.0,273331.0,188198,0.0,372818,0.0,0.0,,"PROVIDES SAILING INSTRUCTION, SEAMAN-SHIP AND WATER SAFETY SKILLS TO CHILDREN.",167950.0,0.0,54843.0,,,0.0,0.0,0.0,,,0.0,0.0,0.0,0.0,0.0,0.0,167950.0,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH YOUNGSTERS THE BASICS OF SAILING, SEAMAN-SHIP AND SAFE CONDUCT ON THE WATER. IT IS THE ORGANIZATION'S MISSION TO CREATE AND SUSTAIN A COMMUNITY OF FAMILIES WHO ENJOY BEING ON THE WATER.",0.0,0,0,0,0.0,0,0,0.0,0,0.0,0,0,0.0,0.0,0.0,16,2,0,1.0,1,0.0,0.0,0,0,0,0,0,0,0,1,0.0,1.0,0,0.0,0,0,0,1.0,1,0.0,13,13,0,0.0,0.0,0.0,0.0,,0,0.0,0,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,236965.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3377.0,43005.0,0.0,54843.0,0.0,279970.0,0.0,54843.0,372818,0.0,0.0,0.0,0.0,0.0,16519.0,16519.0,0.0,0.0,0.0,84.0,21088.0,0.0,0.0,2776.0,1000.0,28263,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6748.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,52045.0,52045.0,0.0,0.0,0.0,3981.0,3981.0,0.0,0.0,0.0,0.0,188198,2744.0,17504.0,167950.0,0.0,0.0,30000.0,0.0,174902.0,0.0,0.0,0.0,0.0,682.0,0.0,0.0,7600.0,0.0,0.0,0.0,0.0,29306.0,125682.0,0.0,0.0,0.0,83323.0,453817.0,278915.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,413907,0.0,0.0,0.0,0.0,184620,-52326.0,0.0,0,1.0,0.0,,0.0,0.0,0.0,0,0.0,,1,PO BOX 534,0.0,YORK HARBOR,3911,,ME,2023,10017496,10017496.0,,10.0,3.0,202112.0,273331.0,142793.0,AGAMENTICUS YACHT CLUB OF YORK,PO BOX 534,YORK HARBOR,ME,N50,713990.0,3911.0,199303.0,23031.0,15.0,,,223611.0,19eoextractez.xlsx,201812.0,127023.0,12.0,0.32,N50,N,N,PC,HU,N,HS,O,HU,HU,Y,N,,IN,10017496,,,0


In [9]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df = pd.read_feather('D:/990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state.feather')
print('# of columns:', len(df.columns))
print('# of observations:', len(df))
df[:1]

Current date and time :  2025-06-17 18:47:08 

# of columns: 358
# of observations: 2598477
CPU times: total: 4min 49s
Wall time: 58.5 s


Unnamed: 0,EIN,F9_00_HD_TAX_YEAR,_id,OrganizationName,URL,DLN,TaxPeriod,F9_09_PC_FEES_FOR_SVCE_FR_TOT,F9_00_HD_BUILD_TIME_STAMP,fiscal_year,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_CTRY_OF_DOMICILE,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROSS_EXEMPT_NUM,F9_00_HD_GROSS_RCPT,F9_00_HD_GROUP_RETURN,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_00_HD_INITIAL_RETURN,F9_00_HD_PRIN_OFF_NAME,F9_00_HD_SIGNING_OFFICER_SIGNTR,F9_00_HD_SPECIAL_CONDITION_DESC,F9_00_HD_STATE_OF_DOMICILE,F9_00_HD_TAX_PER_BEGIN,F9_00_HD_TAX_PER_END,F9_00_HD_TIME_STAMP,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_OTHER_DESC,F9_00_HD_TYPE_ORG_TRUST,F9_00_HD_WEBSITE,F9_00_HD_YEAR_FORMED,F9_01_PC_BEN_PAID_MEMB_PRIOR,F9_01_PC_CONTR_GRANTS_CURR,F9_01_PC_CONTR_GRANTS_PRIOR,F9_01_PC_GRANTS_PRIOR,F9_01_PC_INDEP_VOTING_MEMB,F9_01_PC_INVEST_INCOME_PRIOR,F9_01_PC_NET_ASSETS_BOY,F9_01_PC_OTHER_EXPENSE_PRIOR,F9_01_PC_OTHER_REV_PRIOR,F9_01_PC_PROF_FUNDRISING_EXP_CURR,F9_01_PC_PROF_FUNDRISING_EXP_PRIOR,F9_01_PC_PROG_SERVICE_REV_PRIOR,F9_01_PC_REV_LESS_EXP_CURR,F9_01_PC_REV_LESS_EXP_PRIOR,F9_01_PC_TERMINATION_CONTRACTION,F9_01_PC_TOT_ASSETS_EOY,F9_01_PC_TOT_EXP_PRIOR,F9_01_PC_TOT_FNDR_EXP_CURR,F9_01_PC_TOT_INDIV_EMPLOYED,F9_01_PC_TOT_INDIV_VOLUNTEERS,F9_01_PC_TOT_LIABILITIES_EOY,F9_01_PC_TOT_REVENUE_PRIOR,F9_01_PC_TOT_UBI_GROSS,F9_01_PC_TOT_UBI_NET,F9_01_PC_VOTING_MEMB_GOV_BODY,F9_01_PZ_BEN_PAID_TO_MEMB_CURR,F9_01_PZ_GRANTS_PAID_CURR,F9_01_PZ_INVEST_INCOME_CURR,F9_01_PZ_NAFB_EOY,F9_01_PZ_ORGANIZATIONAL_MISSION,F9_01_PZ_OTHER_EXPENSE_CURR,F9_01_PZ_OTHER_REV_CURR,F9_01_PZ_PROG_SERVICE_REV_CURR,F9_01_PZ_SALARIES_CURR,F9_01_PZ_SALARIES_PRIOR,F9_01_PZ_TOT_ASSETS_BOY,F9_01_PZ_TOT_EXP_CURR,F9_01_PZ_TOT_LIAB_BOY,F9_01_PZ_TOT_REV_CURR,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PC_PROG_SVC_ACC_1_CODE,F9_03_PC_PROG_SVC_ACC_1_DESC,F9_03_PC_PROG_SVC_ACC_1_EXP,F9_03_PC_PROG_SVC_ACC_1_GRNT,F9_03_PC_PROG_SVC_ACC_1_REV,F9_03_PC_PROG_SVC_ACC_2_CODE,F9_03_PC_PROG_SVC_ACC_2_DESC,F9_03_PC_PROG_SVC_ACC_2_EXP,F9_03_PC_PROG_SVC_ACC_2_GRNT,F9_03_PC_PROG_SVC_ACC_2_REV,F9_03_PC_PROG_SVC_ACC_3_CODE,F9_03_PC_PROG_SVC_ACC_3_DESC,F9_03_PC_PROG_SVC_ACC_3_EXP,F9_03_PC_PROG_SVC_ACC_3_GRNT,F9_03_PC_PROG_SVC_ACC_3_REV,F9_03_PC_TOT_OTH_PROG_SVC_EXP,F9_03_PC_TOT_OTH_PROG_SVC_GRNT,F9_03_PC_TOT_OTH_PROG_SVC_REV,F9_03_PC_TOT_PROG_SVC_EXPENSE,F9_03_PZ_MISSION_DESCRIPTION,F9_03_PZ_SCHEDULE_O_PART3,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT,F9_05_EXP_SCHED_O_X,F9_05_PC_NUMBER_EMPLOYEES_W3,F9_05_PC_NUMBER_FORMS_1096,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_NUM_IND_VOTING_MEMBERS,F9_06_PC_NUM_VOTING_GOV_MEMBERS,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_STATES_WHERE_RET_FILED,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_NUM_CONTRCTRS_GRTR_100K,F9_07_PC_NUM_INDS_GREATER_100K,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_07_PC_TOT_OTHER_COMPENSATION,F9_07_PC_TOT_REPRT_COMP_FROM_ORG,F9_07_PC_TOT_REPRT_COMP_RLTD_ORG,F9_08_EXP_SCHED_O_X,F9_08_PC_ALL_OTHER_CONTRIBUTIONS,F9_08_PC_CONTS_REPRTD_FNDRAISNG,F9_08_PC_COST_OF_GOODS_SOLD,F9_08_PC_FEDERATED_CAMPAIGNS,F9_08_PC_FUNDRAISING_DIRECT_EXP,F9_08_PC_FUNDRAISING_EVENTS,F9_08_PC_FUNDRAISING_GROSS_INC,F9_08_PC_GAMING_DIRECT_EXPENSES,F9_08_PC_GAMING_GROSS_INCOME,F9_08_PC_GOVERNMENT_GRANTS,F9_08_PC_GROSS_SALES_INVENTORY,F9_08_PC_MEMBERSHIP_DUES,F9_08_PC_NONCASH_CONTRIBUTIONS,F9_08_PC_PROGRAM_SVCE_REV_TOTAL,F9_08_PC_RELATED_ORGANIZATIONS,F9_08_PC_TOTAL_CONTRIBUTIONS,F9_08_PC_TOTAL_OTHER_REVENUE,F9_08_PC_TOTAL_PROG_SVCE_REVENUE,F9_08_PC_TOTAL_REVENUE,F9_09_EXP_AD_PROMO_TOT,F9_09_EXP_BENF_PAID_MEMB_TOT,F9_09_EXP_CONF_MEETING_TOT,F9_09_EXP_DEPREC_FUNDR,F9_09_EXP_DEPREC_MAG,F9_09_EXP_DEPREC_PROG,F9_09_EXP_DEPREC_TOT,F9_09_EXP_GRANT_FRGN_TOT,F9_09_EXP_GRANT_INDIV_DMSTC_TOT,F9_09_EXP_GRANT_ORG_DMSTC_TOT,F9_09_EXP_INFO_TECH_TOT,F9_09_EXP_INSURANCE_TOT,F9_09_EXP_INTEREST_TOT,F9_09_EXP_JOINT_COSTS_TOT,F9_09_EXP_OCCUPANCY_TOT,F9_09_EXP_OFFICE_TOT,F9_09_EXP_OTH_OTH_TOT,F9_09_EXP_ROY_TOT,F9_09_EXP_SCHED_O_X,F9_09_EXP_TRAVEL_ENTRTNMNT_TOT,F9_09_EXP_TRAVEL_TOT,F9_09_PC_COMP_DISQUAL_FUNDRAISE,F9_09_PC_COMP_DISQUAL_MGMT,F9_09_PC_COMP_DISQUAL_PROG_SVCE,F9_09_PC_COMP_DISQUAL_TOTAL,F9_09_PC_COMP_OFFICERS_FUNDRAISE,F9_09_PC_COMP_OFFICERS_MGMT,F9_09_PC_COMP_OFFICERS_PROG_SVCE,F9_09_PC_COMP_OFFICERS_TOTAL,F9_09_PC_FEES_FOR_SVCE_ACCT_TOT,F9_09_PC_FEES_FOR_SVCE_INVST_TOT,F9_09_PC_FEES_FOR_SVCE_LEGL_TOT,F9_09_PC_FEES_FOR_SVCE_LOBB_TOT,F9_09_PC_FEES_FOR_SVCE_MGMT_TOT,F9_09_PC_FEES_FOR_SVCE_OTH_TOT,F9_09_PC_OTHER_EMP_BEN_FUNDRAISE,F9_09_PC_OTHER_EMP_BEN_MGMT,F9_09_PC_OTHER_EMP_BEN_PROG_SVCE,F9_09_PC_OTHER_EMP_BEN_TOTAL,F9_09_PC_OTHER_SALARY_FUNDRAISE,F9_09_PC_OTHER_SALARY_MGMT,F9_09_PC_OTHER_SALARY_PROG_SVCE,F9_09_PC_OTHER_SALARY_TOTAL,F9_09_PC_PAYMENT_TO_AFFILIATES,F9_09_PC_PAYROLL_TAX_FUNDRAISE,F9_09_PC_PAYROLL_TAX_MGMT,F9_09_PC_PAYROLL_TAX_PROG_SVCE,F9_09_PC_PAYROLL_TAX_TOTAL,F9_09_PC_PENSION_CONT_FUNDRAISE,F9_09_PC_PENSION_CONT_MGMT,F9_09_PC_PENSION_CONT_PROG_SVCE,F9_09_PC_PENSION_CONT_TOTAL,F9_09_PC_TOTAL_FUNC_EXPENSES,F9_09_PC_TOTAL_FUNDRAISE_EXPENSE,F9_09_PC_TOTAL_MGMT_EXPENSE,F9_09_PC_TOTAL_PROG_SVCE_EXPENSE,F9_10_ASSETS_ACC_NET_EOY,F9_10_ASSETS_EXP_PREPAID_EOY,F9_10_ASSETS_INTANGIB_EOY,F9_10_ASSETS_INVENT_SALE_EOY,F9_10_ASSETS_LESS_DEPREC_EOY,F9_10_ASSETS_LOANS_DISQUAL_EOY,F9_10_ASSETS_NOTES_LOANS_NET_EOY,F9_10_ASSETS_OTH_EOY,F9_10_ASSETS_PLEDGES_NET_EOY,F9_10_LIAB_ACC_PAYABLE_EOY,F9_10_LIAB_GRANTS_PAYABLE_EOY,F9_10_LIAB_LOANS_OFF_EOY,F9_10_LIAB_REV_DEFERRED_EOY,F9_10_NAFB_RESTRICT_PERM_EOY,F9_10_NAFB_RESTRICT_TEMP_EOY,F9_10_NAFB_UNRESTRICT_EOY,F9_10_PC_BOND_LIABILITY_EOY,F9_10_PC_CASH_NON_INTEREST_BOY,F9_10_PC_CASH_NON_INTEREST_EOY,F9_10_PC_ESCROW_LIABILITY_EOY,F9_10_PC_INVEST_OTHER_SEC_EOY,F9_10_PC_INVEST_PROG_RELTD_EOY,F9_10_PC_INVEST_PUB_TRADED_EOY,F9_10_PC_LAND_BLDG_EQPMT,F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN,F9_10_PC_LOANS_FROM_OFFICERS_EOY,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_PC_OTHER_LIABILITIES_EOY,F9_10_PC_RET_EARNINGS_ENDWMT_EOY,F9_10_PC_SAVINGS_TEMP_INVEST_BOY,F9_10_PC_SAVINGS_TEMP_INVEST_EOY,F9_10_PC_SECURED_MORTGAGES_EOY,F9_10_PC_SECURE_MORT_NOTES_EOY,F9_10_PC_UNSECURED_LOANS_EOY,F9_10_PC_UNSECURED_NOTES_BOY,F9_10_PC_UNSECURED_NOTES_EOY,F9_10_PZ_TOTAL_ASSETS_EOY,F9_10_SCHED_O_X,F9_11_PC_RECNCLTN_DONATED_SVCES,F9_11_PC_RECNCLTN_INVSTMNT_EXP,F9_11_PC_RECNCLTN_PRIOR_PER_ADJ,F9_11_PC_RECNCLTN_REV_LESS_EXP,F9_11_PC_RECNCLTN_UNRLZD_GAIN,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_ACCTG_METHOD_OTHER,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X,number_of_other_prog_svces,501c3,F9_00_HD_FILER_ADDR_US_L1,F9_00_HD_FILER_ADDR_US_L2,F9_00_HD_FILER_CITY_US,F9_00_HD_FILER_ZIP_US,F9_00_HD_FILER_COUNTRY_FRGN,F9_00_HD_FILER_STATE_US,F9_00_HD_TIME_STAMP_yr,ein_int,BMF_EIN2,BMF_EIN,BMF_NTEE_IRS,BMF_NTEE_NCCS,BMF_NTEEV2,BMF_NCCS_LEVEL_1,BMF_NCCS_LEVEL_2,BMF_NCCS_LEVEL_3,BMF_F990_TOTAL_REVENUE_RECENT,BMF_F990_TOTAL_INCOME_RECENT,BMF_F990_TOTAL_ASSETS_RECENT,BMF_F990_ORG_ADDR_CITY,BMF_F990_ORG_ADDR_STATE,BMF_F990_ORG_ADDR_ZIP,BMF_F990_ORG_ADDR_STREET,BMF_CENSUS_CBSA_FIPS,BMF_CENSUS_CBSA_NAME,BMF_CENSUS_BLOCK_FIPS,BMF_CENSUS_URBAN_AREA,BMF_CENSUS_STATE_ABBR,BMF_CENSUS_COUNTY_NAME,BMF_ORG_ADDR_FULL,BMF_ORG_ADDR_MATCH,BMF_LATITUDE,BMF_LONGITUDE,BMF_GEOCODER_SCORE,BMF_GEOCODER_MATCH,BMF_BMF_SUBSECTION_CODE,BMF_BMF_STATUS_CODE,BMF_BMF_PF_FILING_REQ_CODE,BMF_BMF_ORGANIZATION_CODE,BMF_BMF_INCOME_CODE,BMF_BMF_GROUP_EXEMPT_NUM,BMF_BMF_FOUNDATION_CODE,BMF_BMF_FILING_REQ_CODE,BMF_BMF_DEDUCTIBILITY_CODE,BMF_BMF_CLASSIFICATION_CODE,BMF_BMF_ASSET_CODE,BMF_BMF_AFFILIATION_CODE,BMF_ORG_RULING_DATE,BMF_ORG_FISCAL_YEAR,BMF_ORG_RULING_YEAR,BMF_ORG_YEAR_FIRST,BMF_ORG_YEAR_LAST,BMF_ORG_YEAR_COUNT,BMF_ORG_PERS_ICO,BMF_ORG_NAME_SEC,BMF_ORG_NAME_CURRENT,BMF_ORG_FISCAL_PERIOD,filing_year_had_duplicate,COUNTY_CODE
0,10017496,2022,65c1a1d52a9ba8ce45342904,,https://s3.amazonaws.com/irs-form-990/202323189349305317_public.xml,,,0,2023-04-26 12:10:37+00:00,2022,,,,"{'AddressLine1': None, 'AddressLine1Txt': 'PO BOX 534', 'AddressLine2': None, 'AddressLine2Txt': None, 'City': None, 'CityNm': 'YORK HARBOR', 'State': None, 'StateAbbreviationCd': 'ME', 'ZIPCd': '03911', 'ZIPCode': None}",,,{'BusinessNameLine1Txt': 'AGAMENTICUS YACHT CLUB INC'},AGAM,2073638510,,,0,0,,0,,1,0,,376800,0,0,0,DANIEL FORD,2023-11-13,,ME,2022-01-01,2022-12-31,2023-11-14 16:30:26+00:00,0,1,0,,0,WWW.AYCSAIL.ORG,1937,0,279970,0,0,13,0,273331,0,0,0,0,0,184620,0,0,413907,0,2744,16,20,8282,0,0,0,13,0,0,34628,405625,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH SAILING TO CHILDREN BY FOCUSING ON SAFETY, ENJOYMENT AND KNOWLEDGE OF SAILING.",132172,3377,54843,56026,0,273331,188198,0,372818,0,0,,"PROVIDES SAILING INSTRUCTION, SEAMAN-SHIP AND WATER SAFETY SKILLS TO CHILDREN.",167950,0,54843,,,0,0,0,,,0,0,0,0,0,0,167950,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH YOUNGSTERS THE BASICS OF SAILING, SEAMAN-SHIP AND SAFE CONDUCT ON THE WATER. IT IS THE ORGANIZATION'S MISSION TO CREATE AND SUSTAIN A COMMUNITY OF FAMILIES WHO ENJOY BEING ON THE WATER.",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,2,0,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,1,0,13,13,0,0,0,0,0,,0,0,0,0,1,0,0,0,0,0,0,0,236965,0,0,0,0,0,0,0,0,0,3377,43005,0,54843,0,279970,0,54843,372818,0,0,0,0,0,16519,16519,0,0,0,84,21088,0,0,2776,1000,28263,0,1,0,0,0,0,0,0,0,0,0,0,6748,0,0,0,0,0,0,0,0,0,0,0,52045,52045,0,0,0,3981,3981,0,0,0,0,188198,2744,17504,167950,0,0,30000,0,174902,0,0,0,0,682,0,0,7600,0,0,0,0,29306,125682,0,0,0,83323,453817,278915,0,0,0,0,0,0,0,0,0,0,0,0,413907,0,0,0,0,184620,-52326,0,0,1,0,,0,0,0,0,0,,1,PO BOX 534,,YORK HARBOR,3911,,ME,2023,10017496,EIN-01-0017496,10017496,N50,N50,HMS-N50-RG,501C3 CHARITY,O,HS,372818.0,376800.0,413907.0,YORK HARBOR,ME,03911-0534,PO BOX 534,38860,"Portland-South Portland, ME",230310360032023,U,ME,York County,"PO BOX 534,YORK HARBOR,ME,03911-0534","03911-0534, York Harbor, Maine",43.1,-70.6,98.0,M,3.0,1.0,0.0,1.0,4.0,0.0,15.0,1.0,1.0,2000.0,4.0,3.0,1993-03,2024.0,1993.0,1995.0,2024.0,30.0,,,AGAMENTICUS YACHT CLUB OF YORK,3.0,0,23031


# Fix NTEE codes
Jesse: `I would start by combining NTEE_IRS (the official value in the IRS BMF file) and NTEE_NCCS (the unofficial version that has been improved over time by recoding some of the default IRS values). Use **NTEE_NCCS** unless the value is missing, then use **NTEE_IRS**. From there you can apply the NTEEV2 crosswalk or recreate the LEVEL3 categories (old code with NTEE to LEVEL3 rules is in the PPT).` 


In [10]:
ntee_cols = ['BMF_NTEE_NCCS', 'BMF_NTEE_IRS', 'BMF_NTEEV2', 'BMF_NCCS_LEVEL_1',
             'BMF_NCCS_LEVEL_2', 'BMF_NCCS_LEVEL_3']
df[ntee_cols].sample(10)

Unnamed: 0,BMF_NTEE_NCCS,BMF_NTEE_IRS,BMF_NTEEV2,BMF_NCCS_LEVEL_1,BMF_NCCS_LEVEL_2,BMF_NCCS_LEVEL_3
1715278,Q33,Q330,IFA-Q33-RG,501C3 CHARITY,O,IN
1082424,S20,S200,PSB-S20-RG,501C3 CHARITY,O,PB
2177949,A12,A12,ART-A00-MM,501C3 CHARITY,O,AR
414569,L22,L22,HMS-L22-RG,501C3 CHARITY,O,HS
611660,U42,U42,PSB-U42-RG,501C3 CHARITY,O,PB
2320201,M24,M24,HMS-M24-RG,501C3 CHARITY,O,HS
781631,L12,L12,HMS-L00-MM,501C3 CHARITY,O,HS
1762085,S20,L210,PSB-S20-RG,501C3 CHARITY,O,PB
2234569,P20,P20,HMS-P20-RG,501C3 CHARITY,O,HS
173537,L22,L22,HMS-L22-RG,501C3 CHARITY,O,HS


In [11]:
pd.concat([df[ntee_cols].isna().sum(), 
           df[ntee_cols].isna().sum()/len(df)*100], axis=1)

Unnamed: 0,0,1
BMF_NTEE_NCCS,386056,14.9
BMF_NTEE_IRS,10521,0.4
BMF_NTEEV2,480957,18.5
BMF_NCCS_LEVEL_1,4239,0.2
BMF_NCCS_LEVEL_2,4239,0.2
BMF_NCCS_LEVEL_3,4239,0.2


In [12]:
df[ntee_cols].info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2598477 entries, 0 to 2598476
Data columns (total 6 columns):
 #   Column            Non-Null Count    Dtype 
---  ------            --------------    ----- 
 0   BMF_NTEE_NCCS     2212421 non-null  object
 1   BMF_NTEE_IRS      2587956 non-null  object
 2   BMF_NTEEV2        2117520 non-null  object
 3   BMF_NCCS_LEVEL_1  2594238 non-null  object
 4   BMF_NCCS_LEVEL_2  2594238 non-null  object
 5   BMF_NCCS_LEVEL_3  2594238 non-null  object
dtypes: object(6)
memory usage: 118.9+ MB


#### Rows with `BMF_NTEE_NCCS` but not `BMF_NTEE_IRS`

In [13]:
print(len(df[df['BMF_NTEE_NCCS'].notnull()&df['BMF_NTEE_IRS'].isnull()]))
df[df['BMF_NTEE_NCCS'].notnull()&df['BMF_NTEE_IRS'].isnull()][['BMF_NTEE_NCCS', 
                                                                'BMF_NTEE_IRS']][:5]

0


Unnamed: 0,BMF_NTEE_NCCS,BMF_NTEE_IRS


#### Rows with `BMF_NTEE_IRS` but not `BMF_NTEE_NCCS`

In [14]:
print(len(df[df['BMF_NTEE_IRS'].notnull()&df['BMF_NTEE_NCCS'].isnull()]))
df[df['BMF_NTEE_IRS'].notnull()&df['BMF_NTEE_NCCS'].isnull()][['BMF_NTEE_NCCS', 
                                                                'BMF_NTEE_IRS']][:5]

375535


Unnamed: 0,BMF_NTEE_NCCS,BMF_NTEE_IRS
1,,A69Z
2,,A69Z
3,,A69Z
4,,A69Z
5,,A69Z


In [15]:
pd.crosstab(df['BMF_NTEE_NCCS'].isnull(), df['BMF_NTEE_IRS'].isnull())

BMF_NTEE_IRS,False,True
BMF_NTEE_NCCS,Unnamed: 1_level_1,Unnamed: 2_level_1
False,2212421,0
True,375535,10521


In [16]:
print(len(df[(df['BMF_NTEE_NCCS'].isnull()) & (df['BMF_NTEE_IRS'].isnull())]))
print(len(df[(df['BMF_NTEE_NCCS'].isnull()) & (df['BMF_NTEE_IRS'].notnull())]))
print(len(df[(df['BMF_NTEE_NCCS'].notnull()) & (df['BMF_NTEE_IRS'].isnull())]))
print(len(df[(df['BMF_NTEE_NCCS'].notnull())|(df['BMF_NTEE_IRS'].isnull())]))

10521
375535
0
2222942


#### Fill in Values

In [17]:
%%time
df['NTEE'] = np.where(df['BMF_NTEE_NCCS'].isnull(), df['BMF_NTEE_IRS'], df['BMF_NTEE_NCCS'])

CPU times: total: 93.8 ms
Wall time: 132 ms


In [18]:
ntee_cols = ['NTEE'] + ntee_cols
ntee_cols.remove('BMF_NCCS_LEVEL_1')
ntee_cols.remove('BMF_NCCS_LEVEL_2')
ntee_cols

['NTEE', 'BMF_NTEE_NCCS', 'BMF_NTEE_IRS', 'BMF_NTEEV2', 'BMF_NCCS_LEVEL_3']

In [19]:
df[ntee_cols].sample(10)

Unnamed: 0,NTEE,BMF_NTEE_NCCS,BMF_NTEE_IRS,BMF_NTEEV2,BMF_NCCS_LEVEL_3
1035059,P43Z,,P43Z,,UN
90432,E50,E50,E500,HEL-E50-RG,HE
1626735,N40,N40,N40,HMS-N40-RG,HS
1503722,Z99,Z99,Z99,UNU-Z99-RG,ZF
607775,G20,G20,G20,HEL-G20-RG,HE
1524330,S40Z,,S40Z,,UN
549916,A82,A82,A82,ART-A82-RG,AR
1730969,A26,A26,A260,ART-A26-RG,AR
1606385,Y22,Y22,Y22,MMB-Y22-RG,M0
1208173,N63Z,,N63Z,,UN


In [20]:
df[ntee_cols].count()

NTEE                2587956
BMF_NTEE_NCCS       2212421
BMF_NTEE_IRS        2587956
BMF_NTEEV2          2117520
BMF_NCCS_LEVEL_3    2594238
dtype: int64

In [21]:
print(len(df[df['NTEE']==df['BMF_NTEE_NCCS']]))
print(len(df[df['NTEE']==df['BMF_NTEE_IRS']]))

2212421
2145441


In [22]:
print(len(df[df['NTEE']!=df['BMF_NTEE_NCCS']]))
print(len(df[df['NTEE']!=df['BMF_NTEE_IRS']]))

386056
453036


In [23]:
df[ntee_cols].info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2598477 entries, 0 to 2598476
Data columns (total 5 columns):
 #   Column            Non-Null Count    Dtype 
---  ------            --------------    ----- 
 0   NTEE              2587956 non-null  object
 1   BMF_NTEE_NCCS     2212421 non-null  object
 2   BMF_NTEE_IRS      2587956 non-null  object
 3   BMF_NTEEV2        2117520 non-null  object
 4   BMF_NCCS_LEVEL_3  2594238 non-null  object
dtypes: object(5)
memory usage: 99.1+ MB


In [24]:
df['BMF_NTEEV2'].value_counts()[:10]

BMF_NTEEV2
HMS-P20-RG    52816
HMS-L20-RG    51955
REL-X20-RG    48049
ENV-D20-RG    41040
HMS-L22-RG    37305
EDU-B99-RG    35948
EDU-B00-MS    33856
PSB-S20-RG    32413
EDU-B82-RG    30980
EDU-B20-RG    30537
Name: count, dtype: int64

In [25]:
df['BMF_NCCS_LEVEL_3'].value_counts()[:10]

BMF_NCCS_LEVEL_3
HS    737250
UN    383788
ED    317262
HE    289249
AR    197464
PB    167058
ZF    132585
RE    116340
EN    109463
IN     52145
Name: count, dtype: int64

In [26]:
df['NTEE'].value_counts()[:10]

NTEE
P20    52816
L20    51955
X20    48049
D20    41040
L22    37297
B99    35948
B11    33856
S20    32413
B82    30980
B20    30537
Name: count, dtype: int64

In [27]:
df[ntee_cols].dtypes

NTEE                object
BMF_NTEE_NCCS       object
BMF_NTEE_IRS        object
BMF_NTEEV2          object
BMF_NCCS_LEVEL_3    object
dtype: object

In [28]:
pd.concat([df[ntee_cols].isna().sum(), 
           df[ntee_cols].isna().sum()/len(df)*100], axis=1)

Unnamed: 0,0,1
NTEE,10521,0.4
BMF_NTEE_NCCS,386056,14.9
BMF_NTEE_IRS,10521,0.4
BMF_NTEEV2,480957,18.5
BMF_NCCS_LEVEL_3,4239,0.2


### Read in Crosswalk file
- `UNI` and `HOS` is missing from `level1`

In [29]:
dfc = pd.read_csv('ntee-crosswalk.csv')
print(len(dfc))
dfc[:2]

1086


Unnamed: 0,NTEE,NTEE2,level1,level2,level3,level4,level5,level1.label,level2.label,level3.label,level4.label,level5.label,keywords
0,A01,ART-A00-AA,ART,Axx,A0x,A00,AA,"Arts, Culture & Humanities","Arts, Culture & Humanities","Alliance/Advocacy Organization for a nonprofit in arts, culture & humanities.","Alliance/Advocacy Organization for a nonprofit in arts, culture & humanities.",Alliance/Advocacy Organization,"Arts Alliances, Arts Coalitions, Lobbying, Public Awareness"
1,A0161,ART-A61-AA,ART,Axx,A6x,A61,AA,"Arts, Culture & Humanities","Arts, Culture & Humanities","Alliance/Advocacy Organization for a nonprofit in arts, culture & humanities.",Organizations that operate facilities including theaters for the performing arts.,Alliance/Advocacy Organization,


#### `level1`
- `UNI` and `HOS` is missing from `level1`

In [30]:
dfc['level1'].value_counts().sort_index()

level1
ART     87
EDU     93
ENV     79
HEL    204
HMS    338
IFA     46
MMB     27
PSB    181
REL     30
UNU      1
Name: count, dtype: int64

In [31]:
df['NTEE'].value_counts()[20:25]

NTEE
A65    20767
Q33    20026
E32    17467
A50    16740
A80    16442
Name: count, dtype: int64

In [32]:
dfc[dfc['NTEE']=='A65']    

Unnamed: 0,NTEE,NTEE2,level1,level2,level3,level4,level5,level1.label,level2.label,level3.label,level4.label,level5.label,keywords
74,A65,ART-A65-RG,ART,Axx,A6x,A65,RG,"Arts, Culture & Humanities","Arts, Culture & Humanities",Performing Arts,Organizations whose primary activity is the production of plays. (Organizations that present the productions of others should be classified as presenters. (A61),Regular Nonprofit,"Acting Companies, Amateur Theaters, Broadway Shows, Burlesque, Childrens Theaters, Childrens Performances, Childrens Plays, Childrens Theater, Comedies, Community Theaters, Community Theatrical Groups, Drama, Dramatic Arts, Dramatic Productions, Marionette Shows, Mimes, Musical Plays, Musical Theater, Musicals, Plays, Playwriting, Puppet Shows, Shakespeare Festivals, Shakespearean Festivals, Stage Plays, Storytelling, Summer Stock, Theater Companies, Theater Festivals, Theater Performances, ..."


In [33]:
dfc.dtypes

NTEE            object
NTEE2           object
level1          object
level2          object
level3          object
level4          object
level5          object
level1.label    object
level2.label    object
level3.label    object
level4.label    object
level5.label    object
keywords        object
dtype: object

### Mapping

In [34]:
def map_ntee(ntee_code):
    if not isinstance(ntee_code, str) or len(ntee_code) < 1:
        return 'UN', 'UNU'  # fallback for missing or invalid codes

    # Handle Universities (starts with B4 or B5)
    if ntee_code.startswith('B4') or ntee_code.startswith('B5'):
        return 'BH', 'UNI'
    
    # Handle Hospitals (starts with E2)
    if ntee_code.startswith('E2'):
        return 'EH', 'HOS'

    prefix = ntee_code[0]

    if prefix == 'A':
        return 'AR', 'ART'
    elif prefix == 'B':
        return 'ED', 'EDU'
    elif prefix in ['C', 'D']:
        return 'EN', 'ENV'
    elif prefix == 'E':
        return 'HE', 'HEL'
    elif prefix in ['F', 'G', 'H']:
        return 'HE', 'HEL'
    elif prefix in ['I', 'J', 'K', 'L', 'M', 'N', 'O', 'P']:
        return 'HU', 'HMS'
    elif prefix == 'Q':
        return 'IN', 'IFA'
    elif prefix in ['R', 'S', 'T', 'U', 'V', 'W']:
        return 'PU', 'PSB'
    elif prefix == 'X':
        return 'RE', 'REL'
    elif prefix == 'Y':
        return 'MU', 'MMB'
    elif prefix == 'Z':
        return 'UN', 'UNU'
    else:
        return 'UN', 'UNU'

In [35]:
%%time
# Apply the mapping function
df[['NTEE_MAJ12', 'NTEE_MAJ12_EV']] = df['NTEE'].apply(lambda x: pd.Series(map_ntee(x)))

CPU times: total: 5min 20s
Wall time: 5min 30s


In [43]:
ntee_cols = ['NTEE', 'NTEE_MAJ12', 'NTEE_MAJ12_EV', 'BMF_NTEE_NCCS', 'BMF_NTEE_IRS', 'BMF_NTEEV2', 'BMF_NCCS_LEVEL_3']

In [44]:
df[ntee_cols].sample(15)

Unnamed: 0,NTEE,NTEE_MAJ12,NTEE_MAJ12_EV,BMF_NTEE_NCCS,BMF_NTEE_IRS,BMF_NTEEV2,BMF_NCCS_LEVEL_3
505460,X80,RE,REL,X80,X80,REL-X80-RG,RE
2139017,N64,HU,HMS,N64,N64,HMS-N64-RG,HS
2091599,E62,HE,HEL,E62,E62,HEL-E62-RG,HE
1938285,N63,HU,HMS,N63,N63,HMS-N63-RG,HS
2404347,P20,HU,HMS,P20,P20,HMS-P20-RG,HS
638510,S41,PU,PSB,S41,S41,PSB-S41-RG,PB
2378519,R12,PU,PSB,R12,R12,PSB-R00-MM,PB
2387040,X20,RE,REL,X20,X20,REL-X20-RG,RE
181096,N71,HU,HMS,N71,B280,HMS-N71-RG,HS
163278,X20,RE,REL,X20,X20,REL-X20-RG,RE


In [45]:
df['NTEE_MAJ12'].value_counts().sort_index()

NTEE_MAJ12
AR    235466
BH     31952
ED    377863
EH     53664
EN    119145
HE    309364
HU    925266
IN     57104
MU      4605
PU    312376
RE    138372
UN     33300
Name: count, dtype: int64

In [46]:
df['NTEE_MAJ12_EV'].value_counts().sort_index()

NTEE_MAJ12_EV
ART    235466
EDU    377863
ENV    119145
HEL    309364
HMS    925266
HOS     53664
IFA     57104
MMB      4605
PSB    312376
REL    138372
UNI     31952
UNU     33300
Name: count, dtype: int64

### Merge in NTEE crosswalk file to get additional industry details

In [50]:
%%time
# Merge dfc into df based on the common 'NTEE' column
dfm = df.merge(dfc, on='NTEE', how='left', indicator=True)
dfm[:1]

CPU times: total: 28.8 s
Wall time: 29.4 s


Unnamed: 0,EIN,F9_00_HD_TAX_YEAR,_id,OrganizationName,URL,DLN,TaxPeriod,F9_09_PC_FEES_FOR_SVCE_FR_TOT,F9_00_HD_BUILD_TIME_STAMP,fiscal_year,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_CTRY_OF_DOMICILE,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROSS_EXEMPT_NUM,F9_00_HD_GROSS_RCPT,F9_00_HD_GROUP_RETURN,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_00_HD_INITIAL_RETURN,F9_00_HD_PRIN_OFF_NAME,F9_00_HD_SIGNING_OFFICER_SIGNTR,F9_00_HD_SPECIAL_CONDITION_DESC,F9_00_HD_STATE_OF_DOMICILE,F9_00_HD_TAX_PER_BEGIN,F9_00_HD_TAX_PER_END,F9_00_HD_TIME_STAMP,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_OTHER_DESC,F9_00_HD_TYPE_ORG_TRUST,F9_00_HD_WEBSITE,F9_00_HD_YEAR_FORMED,F9_01_PC_BEN_PAID_MEMB_PRIOR,F9_01_PC_CONTR_GRANTS_CURR,F9_01_PC_CONTR_GRANTS_PRIOR,F9_01_PC_GRANTS_PRIOR,F9_01_PC_INDEP_VOTING_MEMB,F9_01_PC_INVEST_INCOME_PRIOR,F9_01_PC_NET_ASSETS_BOY,F9_01_PC_OTHER_EXPENSE_PRIOR,F9_01_PC_OTHER_REV_PRIOR,F9_01_PC_PROF_FUNDRISING_EXP_CURR,F9_01_PC_PROF_FUNDRISING_EXP_PRIOR,F9_01_PC_PROG_SERVICE_REV_PRIOR,F9_01_PC_REV_LESS_EXP_CURR,F9_01_PC_REV_LESS_EXP_PRIOR,F9_01_PC_TERMINATION_CONTRACTION,F9_01_PC_TOT_ASSETS_EOY,F9_01_PC_TOT_EXP_PRIOR,F9_01_PC_TOT_FNDR_EXP_CURR,F9_01_PC_TOT_INDIV_EMPLOYED,F9_01_PC_TOT_INDIV_VOLUNTEERS,F9_01_PC_TOT_LIABILITIES_EOY,F9_01_PC_TOT_REVENUE_PRIOR,F9_01_PC_TOT_UBI_GROSS,F9_01_PC_TOT_UBI_NET,F9_01_PC_VOTING_MEMB_GOV_BODY,F9_01_PZ_BEN_PAID_TO_MEMB_CURR,F9_01_PZ_GRANTS_PAID_CURR,F9_01_PZ_INVEST_INCOME_CURR,F9_01_PZ_NAFB_EOY,F9_01_PZ_ORGANIZATIONAL_MISSION,F9_01_PZ_OTHER_EXPENSE_CURR,F9_01_PZ_OTHER_REV_CURR,F9_01_PZ_PROG_SERVICE_REV_CURR,F9_01_PZ_SALARIES_CURR,F9_01_PZ_SALARIES_PRIOR,F9_01_PZ_TOT_ASSETS_BOY,F9_01_PZ_TOT_EXP_CURR,F9_01_PZ_TOT_LIAB_BOY,F9_01_PZ_TOT_REV_CURR,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PC_PROG_SVC_ACC_1_CODE,F9_03_PC_PROG_SVC_ACC_1_DESC,F9_03_PC_PROG_SVC_ACC_1_EXP,F9_03_PC_PROG_SVC_ACC_1_GRNT,F9_03_PC_PROG_SVC_ACC_1_REV,F9_03_PC_PROG_SVC_ACC_2_CODE,F9_03_PC_PROG_SVC_ACC_2_DESC,F9_03_PC_PROG_SVC_ACC_2_EXP,F9_03_PC_PROG_SVC_ACC_2_GRNT,F9_03_PC_PROG_SVC_ACC_2_REV,F9_03_PC_PROG_SVC_ACC_3_CODE,F9_03_PC_PROG_SVC_ACC_3_DESC,F9_03_PC_PROG_SVC_ACC_3_EXP,F9_03_PC_PROG_SVC_ACC_3_GRNT,F9_03_PC_PROG_SVC_ACC_3_REV,F9_03_PC_TOT_OTH_PROG_SVC_EXP,F9_03_PC_TOT_OTH_PROG_SVC_GRNT,F9_03_PC_TOT_OTH_PROG_SVC_REV,F9_03_PC_TOT_PROG_SVC_EXPENSE,F9_03_PZ_MISSION_DESCRIPTION,F9_03_PZ_SCHEDULE_O_PART3,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT,F9_05_EXP_SCHED_O_X,F9_05_PC_NUMBER_EMPLOYEES_W3,F9_05_PC_NUMBER_FORMS_1096,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_NUM_IND_VOTING_MEMBERS,F9_06_PC_NUM_VOTING_GOV_MEMBERS,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_STATES_WHERE_RET_FILED,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_NUM_CONTRCTRS_GRTR_100K,F9_07_PC_NUM_INDS_GREATER_100K,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_07_PC_TOT_OTHER_COMPENSATION,F9_07_PC_TOT_REPRT_COMP_FROM_ORG,F9_07_PC_TOT_REPRT_COMP_RLTD_ORG,F9_08_EXP_SCHED_O_X,F9_08_PC_ALL_OTHER_CONTRIBUTIONS,F9_08_PC_CONTS_REPRTD_FNDRAISNG,F9_08_PC_COST_OF_GOODS_SOLD,F9_08_PC_FEDERATED_CAMPAIGNS,F9_08_PC_FUNDRAISING_DIRECT_EXP,F9_08_PC_FUNDRAISING_EVENTS,F9_08_PC_FUNDRAISING_GROSS_INC,F9_08_PC_GAMING_DIRECT_EXPENSES,F9_08_PC_GAMING_GROSS_INCOME,F9_08_PC_GOVERNMENT_GRANTS,F9_08_PC_GROSS_SALES_INVENTORY,F9_08_PC_MEMBERSHIP_DUES,F9_08_PC_NONCASH_CONTRIBUTIONS,F9_08_PC_PROGRAM_SVCE_REV_TOTAL,F9_08_PC_RELATED_ORGANIZATIONS,F9_08_PC_TOTAL_CONTRIBUTIONS,F9_08_PC_TOTAL_OTHER_REVENUE,F9_08_PC_TOTAL_PROG_SVCE_REVENUE,F9_08_PC_TOTAL_REVENUE,F9_09_EXP_AD_PROMO_TOT,F9_09_EXP_BENF_PAID_MEMB_TOT,F9_09_EXP_CONF_MEETING_TOT,F9_09_EXP_DEPREC_FUNDR,F9_09_EXP_DEPREC_MAG,F9_09_EXP_DEPREC_PROG,F9_09_EXP_DEPREC_TOT,F9_09_EXP_GRANT_FRGN_TOT,F9_09_EXP_GRANT_INDIV_DMSTC_TOT,F9_09_EXP_GRANT_ORG_DMSTC_TOT,F9_09_EXP_INFO_TECH_TOT,F9_09_EXP_INSURANCE_TOT,F9_09_EXP_INTEREST_TOT,F9_09_EXP_JOINT_COSTS_TOT,F9_09_EXP_OCCUPANCY_TOT,F9_09_EXP_OFFICE_TOT,F9_09_EXP_OTH_OTH_TOT,F9_09_EXP_ROY_TOT,F9_09_EXP_SCHED_O_X,F9_09_EXP_TRAVEL_ENTRTNMNT_TOT,F9_09_EXP_TRAVEL_TOT,F9_09_PC_COMP_DISQUAL_FUNDRAISE,F9_09_PC_COMP_DISQUAL_MGMT,F9_09_PC_COMP_DISQUAL_PROG_SVCE,F9_09_PC_COMP_DISQUAL_TOTAL,F9_09_PC_COMP_OFFICERS_FUNDRAISE,F9_09_PC_COMP_OFFICERS_MGMT,F9_09_PC_COMP_OFFICERS_PROG_SVCE,F9_09_PC_COMP_OFFICERS_TOTAL,F9_09_PC_FEES_FOR_SVCE_ACCT_TOT,F9_09_PC_FEES_FOR_SVCE_INVST_TOT,F9_09_PC_FEES_FOR_SVCE_LEGL_TOT,F9_09_PC_FEES_FOR_SVCE_LOBB_TOT,F9_09_PC_FEES_FOR_SVCE_MGMT_TOT,F9_09_PC_FEES_FOR_SVCE_OTH_TOT,F9_09_PC_OTHER_EMP_BEN_FUNDRAISE,F9_09_PC_OTHER_EMP_BEN_MGMT,F9_09_PC_OTHER_EMP_BEN_PROG_SVCE,F9_09_PC_OTHER_EMP_BEN_TOTAL,F9_09_PC_OTHER_SALARY_FUNDRAISE,F9_09_PC_OTHER_SALARY_MGMT,F9_09_PC_OTHER_SALARY_PROG_SVCE,F9_09_PC_OTHER_SALARY_TOTAL,F9_09_PC_PAYMENT_TO_AFFILIATES,F9_09_PC_PAYROLL_TAX_FUNDRAISE,F9_09_PC_PAYROLL_TAX_MGMT,F9_09_PC_PAYROLL_TAX_PROG_SVCE,F9_09_PC_PAYROLL_TAX_TOTAL,F9_09_PC_PENSION_CONT_FUNDRAISE,F9_09_PC_PENSION_CONT_MGMT,F9_09_PC_PENSION_CONT_PROG_SVCE,F9_09_PC_PENSION_CONT_TOTAL,F9_09_PC_TOTAL_FUNC_EXPENSES,F9_09_PC_TOTAL_FUNDRAISE_EXPENSE,F9_09_PC_TOTAL_MGMT_EXPENSE,F9_09_PC_TOTAL_PROG_SVCE_EXPENSE,F9_10_ASSETS_ACC_NET_EOY,F9_10_ASSETS_EXP_PREPAID_EOY,F9_10_ASSETS_INTANGIB_EOY,F9_10_ASSETS_INVENT_SALE_EOY,F9_10_ASSETS_LESS_DEPREC_EOY,F9_10_ASSETS_LOANS_DISQUAL_EOY,F9_10_ASSETS_NOTES_LOANS_NET_EOY,F9_10_ASSETS_OTH_EOY,F9_10_ASSETS_PLEDGES_NET_EOY,F9_10_LIAB_ACC_PAYABLE_EOY,F9_10_LIAB_GRANTS_PAYABLE_EOY,F9_10_LIAB_LOANS_OFF_EOY,F9_10_LIAB_REV_DEFERRED_EOY,F9_10_NAFB_RESTRICT_PERM_EOY,F9_10_NAFB_RESTRICT_TEMP_EOY,F9_10_NAFB_UNRESTRICT_EOY,F9_10_PC_BOND_LIABILITY_EOY,F9_10_PC_CASH_NON_INTEREST_BOY,F9_10_PC_CASH_NON_INTEREST_EOY,F9_10_PC_ESCROW_LIABILITY_EOY,F9_10_PC_INVEST_OTHER_SEC_EOY,F9_10_PC_INVEST_PROG_RELTD_EOY,F9_10_PC_INVEST_PUB_TRADED_EOY,F9_10_PC_LAND_BLDG_EQPMT,F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN,F9_10_PC_LOANS_FROM_OFFICERS_EOY,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_PC_OTHER_LIABILITIES_EOY,F9_10_PC_RET_EARNINGS_ENDWMT_EOY,F9_10_PC_SAVINGS_TEMP_INVEST_BOY,F9_10_PC_SAVINGS_TEMP_INVEST_EOY,F9_10_PC_SECURED_MORTGAGES_EOY,F9_10_PC_SECURE_MORT_NOTES_EOY,F9_10_PC_UNSECURED_LOANS_EOY,F9_10_PC_UNSECURED_NOTES_BOY,F9_10_PC_UNSECURED_NOTES_EOY,F9_10_PZ_TOTAL_ASSETS_EOY,F9_10_SCHED_O_X,F9_11_PC_RECNCLTN_DONATED_SVCES,F9_11_PC_RECNCLTN_INVSTMNT_EXP,F9_11_PC_RECNCLTN_PRIOR_PER_ADJ,F9_11_PC_RECNCLTN_REV_LESS_EXP,F9_11_PC_RECNCLTN_UNRLZD_GAIN,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_ACCTG_METHOD_OTHER,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X,number_of_other_prog_svces,501c3,F9_00_HD_FILER_ADDR_US_L1,F9_00_HD_FILER_ADDR_US_L2,F9_00_HD_FILER_CITY_US,F9_00_HD_FILER_ZIP_US,F9_00_HD_FILER_COUNTRY_FRGN,F9_00_HD_FILER_STATE_US,F9_00_HD_TIME_STAMP_yr,ein_int,BMF_EIN2,BMF_EIN,BMF_NTEE_IRS,BMF_NTEE_NCCS,BMF_NTEEV2,BMF_NCCS_LEVEL_1,BMF_NCCS_LEVEL_2,BMF_NCCS_LEVEL_3,BMF_F990_TOTAL_REVENUE_RECENT,BMF_F990_TOTAL_INCOME_RECENT,BMF_F990_TOTAL_ASSETS_RECENT,BMF_F990_ORG_ADDR_CITY,BMF_F990_ORG_ADDR_STATE,BMF_F990_ORG_ADDR_ZIP,BMF_F990_ORG_ADDR_STREET,BMF_CENSUS_CBSA_FIPS,BMF_CENSUS_CBSA_NAME,BMF_CENSUS_BLOCK_FIPS,BMF_CENSUS_URBAN_AREA,BMF_CENSUS_STATE_ABBR,BMF_CENSUS_COUNTY_NAME,BMF_ORG_ADDR_FULL,BMF_ORG_ADDR_MATCH,BMF_LATITUDE,BMF_LONGITUDE,BMF_GEOCODER_SCORE,BMF_GEOCODER_MATCH,BMF_BMF_SUBSECTION_CODE,BMF_BMF_STATUS_CODE,BMF_BMF_PF_FILING_REQ_CODE,BMF_BMF_ORGANIZATION_CODE,BMF_BMF_INCOME_CODE,BMF_BMF_GROUP_EXEMPT_NUM,BMF_BMF_FOUNDATION_CODE,BMF_BMF_FILING_REQ_CODE,BMF_BMF_DEDUCTIBILITY_CODE,BMF_BMF_CLASSIFICATION_CODE,BMF_BMF_ASSET_CODE,BMF_BMF_AFFILIATION_CODE,BMF_ORG_RULING_DATE,BMF_ORG_FISCAL_YEAR,BMF_ORG_RULING_YEAR,BMF_ORG_YEAR_FIRST,BMF_ORG_YEAR_LAST,BMF_ORG_YEAR_COUNT,BMF_ORG_PERS_ICO,BMF_ORG_NAME_SEC,BMF_ORG_NAME_CURRENT,BMF_ORG_FISCAL_PERIOD,filing_year_had_duplicate,COUNTY_CODE,NTEE,NTEE_MAJ12,NTEE_MAJ12_EV,NTEE2,level1,level2,level3,level4,level5,level1.label,level2.label,level3.label,level4.label,level5.label,keywords,_merge
0,10017496,2022,65c1a1d52a9ba8ce45342904,,https://s3.amazonaws.com/irs-form-990/202323189349305317_public.xml,,,0,2023-04-26 12:10:37+00:00,2022,,,,"{'AddressLine1': None, 'AddressLine1Txt': 'PO BOX 534', 'AddressLine2': None, 'AddressLine2Txt': None, 'City': None, 'CityNm': 'YORK HARBOR', 'State': None, 'StateAbbreviationCd': 'ME', 'ZIPCd': '03911', 'ZIPCode': None}",,,{'BusinessNameLine1Txt': 'AGAMENTICUS YACHT CLUB INC'},AGAM,2073638510,,,0,0,,0,,1,0,,376800,0,0,0,DANIEL FORD,2023-11-13,,ME,2022-01-01,2022-12-31,2023-11-14 16:30:26+00:00,0,1,0,,0,WWW.AYCSAIL.ORG,1937,0,279970,0,0,13,0,273331,0,0,0,0,0,184620,0,0,413907,0,2744,16,20,8282,0,0,0,13,0,0,34628,405625,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH SAILING TO CHILDREN BY FOCUSING ON SAFETY, ENJOYMENT AND KNOWLEDGE OF SAILING.",132172,3377,54843,56026,0,273331,188198,0,372818,0,0,,"PROVIDES SAILING INSTRUCTION, SEAMAN-SHIP AND WATER SAFETY SKILLS TO CHILDREN.",167950,0,54843,,,0,0,0,,,0,0,0,0,0,0,167950,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH YOUNGSTERS THE BASICS OF SAILING, SEAMAN-SHIP AND SAFE CONDUCT ON THE WATER. IT IS THE ORGANIZATION'S MISSION TO CREATE AND SUSTAIN A COMMUNITY OF FAMILIES WHO ENJOY BEING ON THE WATER.",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,2,0,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,1,0,13,13,0,0,0,0,0,,0,0,0,0,1,0,0,0,0,0,0,0,236965,0,0,0,0,0,0,0,0,0,3377,43005,0,54843,0,279970,0,54843,372818,0,0,0,0,0,16519,16519,0,0,0,84,21088,0,0,2776,1000,28263,0,1,0,0,0,0,0,0,0,0,0,0,6748,0,0,0,0,0,0,0,0,0,0,0,52045,52045,0,0,0,3981,3981,0,0,0,0,188198,2744,17504,167950,0,0,30000,0,174902,0,0,0,0,682,0,0,7600,0,0,0,0,29306,125682,0,0,0,83323,453817,278915,0,0,0,0,0,0,0,0,0,0,0,0,413907,0,0,0,0,184620,-52326,0,0,1,0,,0,0,0,0,0,,1,PO BOX 534,,YORK HARBOR,3911,,ME,2023,10017496,EIN-01-0017496,10017496,N50,N50,HMS-N50-RG,501C3 CHARITY,O,HS,372818.0,376800.0,413907.0,YORK HARBOR,ME,03911-0534,PO BOX 534,38860,"Portland-South Portland, ME",230310360032023,U,ME,York County,"PO BOX 534,YORK HARBOR,ME,03911-0534","03911-0534, York Harbor, Maine",43.1,-70.6,98.0,M,3.0,1.0,0.0,1.0,4.0,0.0,15.0,1.0,1.0,2000.0,4.0,3.0,1993-03,2024.0,1993.0,1995.0,2024.0,30.0,,,AGAMENTICUS YACHT CLUB OF YORK,3.0,0,23031,N50,HU,HMS,HMS-N50-RG,HMS,Nxx,N5x,N50,RG,Human Services,Recreation & Sports,Recreational Clubs,"Organizations that make available to members and their guests and facilities for recreational activities, sports and games. Also included are social clubs that provide opportunities for people to meet and socialize with their peers at dances, parties, picnics, barbecues and other companionable events; and special interest clubs which enable people to share hobbies or other interests with individuals with those same interests. \[Many of these are 501(c)(7) organizations.\]",Regular Nonprofit,"Amateur Radio Clubs, Arts and Crafts Clubs, Athletic Clubs, Beach Clubs, Car Clubs, Collecting Clubs, Computer User Groups, Country Clubs, Health Clubs, Hobby Clubs, Hunt Clubs, Investment Clubs, Private Golf Clubs, Private Membership Facilities, Private Swim Clubs, Private Tennis Clubs, Singles Clubs, Social Clubs, Travel Clubs, Yacht Clubs",both


In [51]:
dfm['_merge'].value_counts()

_merge
both          2117486
left_only      480991
right_only          0
Name: count, dtype: int64

In [52]:
dfm = dfm.drop('level4.label', axis=1)
dfm = dfm.drop('keywords', axis=1)
dfm[:1]

Unnamed: 0,EIN,F9_00_HD_TAX_YEAR,_id,OrganizationName,URL,DLN,TaxPeriod,F9_09_PC_FEES_FOR_SVCE_FR_TOT,F9_00_HD_BUILD_TIME_STAMP,fiscal_year,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_CTRY_OF_DOMICILE,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROSS_EXEMPT_NUM,F9_00_HD_GROSS_RCPT,F9_00_HD_GROUP_RETURN,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_00_HD_INITIAL_RETURN,F9_00_HD_PRIN_OFF_NAME,F9_00_HD_SIGNING_OFFICER_SIGNTR,F9_00_HD_SPECIAL_CONDITION_DESC,F9_00_HD_STATE_OF_DOMICILE,F9_00_HD_TAX_PER_BEGIN,F9_00_HD_TAX_PER_END,F9_00_HD_TIME_STAMP,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_OTHER_DESC,F9_00_HD_TYPE_ORG_TRUST,F9_00_HD_WEBSITE,F9_00_HD_YEAR_FORMED,F9_01_PC_BEN_PAID_MEMB_PRIOR,F9_01_PC_CONTR_GRANTS_CURR,F9_01_PC_CONTR_GRANTS_PRIOR,F9_01_PC_GRANTS_PRIOR,F9_01_PC_INDEP_VOTING_MEMB,F9_01_PC_INVEST_INCOME_PRIOR,F9_01_PC_NET_ASSETS_BOY,F9_01_PC_OTHER_EXPENSE_PRIOR,F9_01_PC_OTHER_REV_PRIOR,F9_01_PC_PROF_FUNDRISING_EXP_CURR,F9_01_PC_PROF_FUNDRISING_EXP_PRIOR,F9_01_PC_PROG_SERVICE_REV_PRIOR,F9_01_PC_REV_LESS_EXP_CURR,F9_01_PC_REV_LESS_EXP_PRIOR,F9_01_PC_TERMINATION_CONTRACTION,F9_01_PC_TOT_ASSETS_EOY,F9_01_PC_TOT_EXP_PRIOR,F9_01_PC_TOT_FNDR_EXP_CURR,F9_01_PC_TOT_INDIV_EMPLOYED,F9_01_PC_TOT_INDIV_VOLUNTEERS,F9_01_PC_TOT_LIABILITIES_EOY,F9_01_PC_TOT_REVENUE_PRIOR,F9_01_PC_TOT_UBI_GROSS,F9_01_PC_TOT_UBI_NET,F9_01_PC_VOTING_MEMB_GOV_BODY,F9_01_PZ_BEN_PAID_TO_MEMB_CURR,F9_01_PZ_GRANTS_PAID_CURR,F9_01_PZ_INVEST_INCOME_CURR,F9_01_PZ_NAFB_EOY,F9_01_PZ_ORGANIZATIONAL_MISSION,F9_01_PZ_OTHER_EXPENSE_CURR,F9_01_PZ_OTHER_REV_CURR,F9_01_PZ_PROG_SERVICE_REV_CURR,F9_01_PZ_SALARIES_CURR,F9_01_PZ_SALARIES_PRIOR,F9_01_PZ_TOT_ASSETS_BOY,F9_01_PZ_TOT_EXP_CURR,F9_01_PZ_TOT_LIAB_BOY,F9_01_PZ_TOT_REV_CURR,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PC_PROG_SVC_ACC_1_CODE,F9_03_PC_PROG_SVC_ACC_1_DESC,F9_03_PC_PROG_SVC_ACC_1_EXP,F9_03_PC_PROG_SVC_ACC_1_GRNT,F9_03_PC_PROG_SVC_ACC_1_REV,F9_03_PC_PROG_SVC_ACC_2_CODE,F9_03_PC_PROG_SVC_ACC_2_DESC,F9_03_PC_PROG_SVC_ACC_2_EXP,F9_03_PC_PROG_SVC_ACC_2_GRNT,F9_03_PC_PROG_SVC_ACC_2_REV,F9_03_PC_PROG_SVC_ACC_3_CODE,F9_03_PC_PROG_SVC_ACC_3_DESC,F9_03_PC_PROG_SVC_ACC_3_EXP,F9_03_PC_PROG_SVC_ACC_3_GRNT,F9_03_PC_PROG_SVC_ACC_3_REV,F9_03_PC_TOT_OTH_PROG_SVC_EXP,F9_03_PC_TOT_OTH_PROG_SVC_GRNT,F9_03_PC_TOT_OTH_PROG_SVC_REV,F9_03_PC_TOT_PROG_SVC_EXPENSE,F9_03_PZ_MISSION_DESCRIPTION,F9_03_PZ_SCHEDULE_O_PART3,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT,F9_05_EXP_SCHED_O_X,F9_05_PC_NUMBER_EMPLOYEES_W3,F9_05_PC_NUMBER_FORMS_1096,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_NUM_IND_VOTING_MEMBERS,F9_06_PC_NUM_VOTING_GOV_MEMBERS,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_STATES_WHERE_RET_FILED,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_NUM_CONTRCTRS_GRTR_100K,F9_07_PC_NUM_INDS_GREATER_100K,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_07_PC_TOT_OTHER_COMPENSATION,F9_07_PC_TOT_REPRT_COMP_FROM_ORG,F9_07_PC_TOT_REPRT_COMP_RLTD_ORG,F9_08_EXP_SCHED_O_X,F9_08_PC_ALL_OTHER_CONTRIBUTIONS,F9_08_PC_CONTS_REPRTD_FNDRAISNG,F9_08_PC_COST_OF_GOODS_SOLD,F9_08_PC_FEDERATED_CAMPAIGNS,F9_08_PC_FUNDRAISING_DIRECT_EXP,F9_08_PC_FUNDRAISING_EVENTS,F9_08_PC_FUNDRAISING_GROSS_INC,F9_08_PC_GAMING_DIRECT_EXPENSES,F9_08_PC_GAMING_GROSS_INCOME,F9_08_PC_GOVERNMENT_GRANTS,F9_08_PC_GROSS_SALES_INVENTORY,F9_08_PC_MEMBERSHIP_DUES,F9_08_PC_NONCASH_CONTRIBUTIONS,F9_08_PC_PROGRAM_SVCE_REV_TOTAL,F9_08_PC_RELATED_ORGANIZATIONS,F9_08_PC_TOTAL_CONTRIBUTIONS,F9_08_PC_TOTAL_OTHER_REVENUE,F9_08_PC_TOTAL_PROG_SVCE_REVENUE,F9_08_PC_TOTAL_REVENUE,F9_09_EXP_AD_PROMO_TOT,F9_09_EXP_BENF_PAID_MEMB_TOT,F9_09_EXP_CONF_MEETING_TOT,F9_09_EXP_DEPREC_FUNDR,F9_09_EXP_DEPREC_MAG,F9_09_EXP_DEPREC_PROG,F9_09_EXP_DEPREC_TOT,F9_09_EXP_GRANT_FRGN_TOT,F9_09_EXP_GRANT_INDIV_DMSTC_TOT,F9_09_EXP_GRANT_ORG_DMSTC_TOT,F9_09_EXP_INFO_TECH_TOT,F9_09_EXP_INSURANCE_TOT,F9_09_EXP_INTEREST_TOT,F9_09_EXP_JOINT_COSTS_TOT,F9_09_EXP_OCCUPANCY_TOT,F9_09_EXP_OFFICE_TOT,F9_09_EXP_OTH_OTH_TOT,F9_09_EXP_ROY_TOT,F9_09_EXP_SCHED_O_X,F9_09_EXP_TRAVEL_ENTRTNMNT_TOT,F9_09_EXP_TRAVEL_TOT,F9_09_PC_COMP_DISQUAL_FUNDRAISE,F9_09_PC_COMP_DISQUAL_MGMT,F9_09_PC_COMP_DISQUAL_PROG_SVCE,F9_09_PC_COMP_DISQUAL_TOTAL,F9_09_PC_COMP_OFFICERS_FUNDRAISE,F9_09_PC_COMP_OFFICERS_MGMT,F9_09_PC_COMP_OFFICERS_PROG_SVCE,F9_09_PC_COMP_OFFICERS_TOTAL,F9_09_PC_FEES_FOR_SVCE_ACCT_TOT,F9_09_PC_FEES_FOR_SVCE_INVST_TOT,F9_09_PC_FEES_FOR_SVCE_LEGL_TOT,F9_09_PC_FEES_FOR_SVCE_LOBB_TOT,F9_09_PC_FEES_FOR_SVCE_MGMT_TOT,F9_09_PC_FEES_FOR_SVCE_OTH_TOT,F9_09_PC_OTHER_EMP_BEN_FUNDRAISE,F9_09_PC_OTHER_EMP_BEN_MGMT,F9_09_PC_OTHER_EMP_BEN_PROG_SVCE,F9_09_PC_OTHER_EMP_BEN_TOTAL,F9_09_PC_OTHER_SALARY_FUNDRAISE,F9_09_PC_OTHER_SALARY_MGMT,F9_09_PC_OTHER_SALARY_PROG_SVCE,F9_09_PC_OTHER_SALARY_TOTAL,F9_09_PC_PAYMENT_TO_AFFILIATES,F9_09_PC_PAYROLL_TAX_FUNDRAISE,F9_09_PC_PAYROLL_TAX_MGMT,F9_09_PC_PAYROLL_TAX_PROG_SVCE,F9_09_PC_PAYROLL_TAX_TOTAL,F9_09_PC_PENSION_CONT_FUNDRAISE,F9_09_PC_PENSION_CONT_MGMT,F9_09_PC_PENSION_CONT_PROG_SVCE,F9_09_PC_PENSION_CONT_TOTAL,F9_09_PC_TOTAL_FUNC_EXPENSES,F9_09_PC_TOTAL_FUNDRAISE_EXPENSE,F9_09_PC_TOTAL_MGMT_EXPENSE,F9_09_PC_TOTAL_PROG_SVCE_EXPENSE,F9_10_ASSETS_ACC_NET_EOY,F9_10_ASSETS_EXP_PREPAID_EOY,F9_10_ASSETS_INTANGIB_EOY,F9_10_ASSETS_INVENT_SALE_EOY,F9_10_ASSETS_LESS_DEPREC_EOY,F9_10_ASSETS_LOANS_DISQUAL_EOY,F9_10_ASSETS_NOTES_LOANS_NET_EOY,F9_10_ASSETS_OTH_EOY,F9_10_ASSETS_PLEDGES_NET_EOY,F9_10_LIAB_ACC_PAYABLE_EOY,F9_10_LIAB_GRANTS_PAYABLE_EOY,F9_10_LIAB_LOANS_OFF_EOY,F9_10_LIAB_REV_DEFERRED_EOY,F9_10_NAFB_RESTRICT_PERM_EOY,F9_10_NAFB_RESTRICT_TEMP_EOY,F9_10_NAFB_UNRESTRICT_EOY,F9_10_PC_BOND_LIABILITY_EOY,F9_10_PC_CASH_NON_INTEREST_BOY,F9_10_PC_CASH_NON_INTEREST_EOY,F9_10_PC_ESCROW_LIABILITY_EOY,F9_10_PC_INVEST_OTHER_SEC_EOY,F9_10_PC_INVEST_PROG_RELTD_EOY,F9_10_PC_INVEST_PUB_TRADED_EOY,F9_10_PC_LAND_BLDG_EQPMT,F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN,F9_10_PC_LOANS_FROM_OFFICERS_EOY,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_PC_OTHER_LIABILITIES_EOY,F9_10_PC_RET_EARNINGS_ENDWMT_EOY,F9_10_PC_SAVINGS_TEMP_INVEST_BOY,F9_10_PC_SAVINGS_TEMP_INVEST_EOY,F9_10_PC_SECURED_MORTGAGES_EOY,F9_10_PC_SECURE_MORT_NOTES_EOY,F9_10_PC_UNSECURED_LOANS_EOY,F9_10_PC_UNSECURED_NOTES_BOY,F9_10_PC_UNSECURED_NOTES_EOY,F9_10_PZ_TOTAL_ASSETS_EOY,F9_10_SCHED_O_X,F9_11_PC_RECNCLTN_DONATED_SVCES,F9_11_PC_RECNCLTN_INVSTMNT_EXP,F9_11_PC_RECNCLTN_PRIOR_PER_ADJ,F9_11_PC_RECNCLTN_REV_LESS_EXP,F9_11_PC_RECNCLTN_UNRLZD_GAIN,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_ACCTG_METHOD_OTHER,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X,number_of_other_prog_svces,501c3,F9_00_HD_FILER_ADDR_US_L1,F9_00_HD_FILER_ADDR_US_L2,F9_00_HD_FILER_CITY_US,F9_00_HD_FILER_ZIP_US,F9_00_HD_FILER_COUNTRY_FRGN,F9_00_HD_FILER_STATE_US,F9_00_HD_TIME_STAMP_yr,ein_int,BMF_EIN2,BMF_EIN,BMF_NTEE_IRS,BMF_NTEE_NCCS,BMF_NTEEV2,BMF_NCCS_LEVEL_1,BMF_NCCS_LEVEL_2,BMF_NCCS_LEVEL_3,BMF_F990_TOTAL_REVENUE_RECENT,BMF_F990_TOTAL_INCOME_RECENT,BMF_F990_TOTAL_ASSETS_RECENT,BMF_F990_ORG_ADDR_CITY,BMF_F990_ORG_ADDR_STATE,BMF_F990_ORG_ADDR_ZIP,BMF_F990_ORG_ADDR_STREET,BMF_CENSUS_CBSA_FIPS,BMF_CENSUS_CBSA_NAME,BMF_CENSUS_BLOCK_FIPS,BMF_CENSUS_URBAN_AREA,BMF_CENSUS_STATE_ABBR,BMF_CENSUS_COUNTY_NAME,BMF_ORG_ADDR_FULL,BMF_ORG_ADDR_MATCH,BMF_LATITUDE,BMF_LONGITUDE,BMF_GEOCODER_SCORE,BMF_GEOCODER_MATCH,BMF_BMF_SUBSECTION_CODE,BMF_BMF_STATUS_CODE,BMF_BMF_PF_FILING_REQ_CODE,BMF_BMF_ORGANIZATION_CODE,BMF_BMF_INCOME_CODE,BMF_BMF_GROUP_EXEMPT_NUM,BMF_BMF_FOUNDATION_CODE,BMF_BMF_FILING_REQ_CODE,BMF_BMF_DEDUCTIBILITY_CODE,BMF_BMF_CLASSIFICATION_CODE,BMF_BMF_ASSET_CODE,BMF_BMF_AFFILIATION_CODE,BMF_ORG_RULING_DATE,BMF_ORG_FISCAL_YEAR,BMF_ORG_RULING_YEAR,BMF_ORG_YEAR_FIRST,BMF_ORG_YEAR_LAST,BMF_ORG_YEAR_COUNT,BMF_ORG_PERS_ICO,BMF_ORG_NAME_SEC,BMF_ORG_NAME_CURRENT,BMF_ORG_FISCAL_PERIOD,filing_year_had_duplicate,COUNTY_CODE,NTEE,NTEE_MAJ12,NTEE_MAJ12_EV,NTEE2,level1,level2,level3,level4,level5,level1.label,level2.label,level3.label,level5.label,_merge
0,10017496,2022,65c1a1d52a9ba8ce45342904,,https://s3.amazonaws.com/irs-form-990/202323189349305317_public.xml,,,0,2023-04-26 12:10:37+00:00,2022,,,,"{'AddressLine1': None, 'AddressLine1Txt': 'PO BOX 534', 'AddressLine2': None, 'AddressLine2Txt': None, 'City': None, 'CityNm': 'YORK HARBOR', 'State': None, 'StateAbbreviationCd': 'ME', 'ZIPCd': '03911', 'ZIPCode': None}",,,{'BusinessNameLine1Txt': 'AGAMENTICUS YACHT CLUB INC'},AGAM,2073638510,,,0,0,,0,,1,0,,376800,0,0,0,DANIEL FORD,2023-11-13,,ME,2022-01-01,2022-12-31,2023-11-14 16:30:26+00:00,0,1,0,,0,WWW.AYCSAIL.ORG,1937,0,279970,0,0,13,0,273331,0,0,0,0,0,184620,0,0,413907,0,2744,16,20,8282,0,0,0,13,0,0,34628,405625,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH SAILING TO CHILDREN BY FOCUSING ON SAFETY, ENJOYMENT AND KNOWLEDGE OF SAILING.",132172,3377,54843,56026,0,273331,188198,0,372818,0,0,,"PROVIDES SAILING INSTRUCTION, SEAMAN-SHIP AND WATER SAFETY SKILLS TO CHILDREN.",167950,0,54843,,,0,0,0,,,0,0,0,0,0,0,167950,"THE ORGANIZATION'S PRIMARY EXEMPT PURPOSE IS TO TEACH YOUNGSTERS THE BASICS OF SAILING, SEAMAN-SHIP AND SAFE CONDUCT ON THE WATER. IT IS THE ORGANIZATION'S MISSION TO CREATE AND SUSTAIN A COMMUNITY OF FAMILIES WHO ENJOY BEING ON THE WATER.",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,2,0,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,1,0,13,13,0,0,0,0,0,,0,0,0,0,1,0,0,0,0,0,0,0,236965,0,0,0,0,0,0,0,0,0,3377,43005,0,54843,0,279970,0,54843,372818,0,0,0,0,0,16519,16519,0,0,0,84,21088,0,0,2776,1000,28263,0,1,0,0,0,0,0,0,0,0,0,0,6748,0,0,0,0,0,0,0,0,0,0,0,52045,52045,0,0,0,3981,3981,0,0,0,0,188198,2744,17504,167950,0,0,30000,0,174902,0,0,0,0,682,0,0,7600,0,0,0,0,29306,125682,0,0,0,83323,453817,278915,0,0,0,0,0,0,0,0,0,0,0,0,413907,0,0,0,0,184620,-52326,0,0,1,0,,0,0,0,0,0,,1,PO BOX 534,,YORK HARBOR,3911,,ME,2023,10017496,EIN-01-0017496,10017496,N50,N50,HMS-N50-RG,501C3 CHARITY,O,HS,372818.0,376800.0,413907.0,YORK HARBOR,ME,03911-0534,PO BOX 534,38860,"Portland-South Portland, ME",230310360032023,U,ME,York County,"PO BOX 534,YORK HARBOR,ME,03911-0534","03911-0534, York Harbor, Maine",43.1,-70.6,98.0,M,3.0,1.0,0.0,1.0,4.0,0.0,15.0,1.0,1.0,2000.0,4.0,3.0,1993-03,2024.0,1993.0,1995.0,2024.0,30.0,,,AGAMENTICUS YACHT CLUB OF YORK,3.0,0,23031,N50,HU,HMS,HMS-N50-RG,HMS,Nxx,N5x,N50,RG,Human Services,Recreation & Sports,Recreational Clubs,Regular Nonprofit,both


In [53]:
print(dfc.columns.tolist())

['NTEE', 'NTEE2', 'level1', 'level2', 'level3', 'level4', 'level5', 'level1.label', 'level2.label', 'level3.label', 'level4.label', 'level5.label', 'keywords']


In [54]:
print(ntee_cols)

['NTEE', 'NTEE_MAJ12', 'NTEE_MAJ12_EV', 'BMF_NTEE_NCCS', 'BMF_NTEE_IRS', 'BMF_NTEEV2', 'BMF_NCCS_LEVEL_3']


In [55]:
dfm[ntee_cols].sample(5)

Unnamed: 0,NTEE,NTEE_MAJ12,NTEE_MAJ12_EV,BMF_NTEE_NCCS,BMF_NTEE_IRS,BMF_NTEEV2,BMF_NCCS_LEVEL_3
794425,E21,EH,HOS,E21,E21,HOS-E21-RG,HE
430864,B28,ED,EDU,B28,B28,EDU-B28-RG,ED
1045357,S50C,PU,PSB,,S50C,,UN
2419448,B99,ED,EDU,B99,B99,EDU-B99-RG,ZF
2537636,J320,HU,HMS,,J320,,UN


In [56]:
%%time
dfm['NTEE_3digit'] = dfm['NTEE'].str[:3]

CPU times: total: 500 ms
Wall time: 599 ms


In [57]:
%%time
dfm['NTEE_2digit'] = dfm['NTEE'].str[:2]

CPU times: total: 438 ms
Wall time: 643 ms


In [58]:
ntee_cols = ['NTEE', 'NTEE_2digit', 'NTEE_MAJ12', 'NTEE_MAJ12_EV', 'NTEE2',
             'BMF_NTEE_NCCS', 'BMF_NTEE_IRS', 'BMF_NTEEV2'] # 'BMF_NCCS_LEVEL_3']
dfm[ntee_cols].info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2598477 entries, 0 to 2598476
Data columns (total 8 columns):
 #   Column         Non-Null Count    Dtype 
---  ------         --------------    ----- 
 0   NTEE           2587956 non-null  object
 1   NTEE_2digit    2587956 non-null  object
 2   NTEE_MAJ12     2598477 non-null  object
 3   NTEE_MAJ12_EV  2598477 non-null  object
 4   NTEE2          2117486 non-null  object
 5   BMF_NTEE_NCCS  2212421 non-null  object
 6   BMF_NTEE_IRS   2587956 non-null  object
 7   BMF_NTEEV2     2117520 non-null  object
dtypes: object(8)
memory usage: 158.6+ MB


In [60]:
dfm[ntee_cols].sample(10)

Unnamed: 0,NTEE,NTEE_2digit,NTEE_MAJ12,NTEE_MAJ12_EV,NTEE2,BMF_NTEE_NCCS,BMF_NTEE_IRS,BMF_NTEEV2
48863,A82,A8,AR,ART,ART-A82-RG,A82,A82,ART-A82-RG
2022350,P29,P2,HU,HMS,HMS-P29-RG,P29,P29,HMS-P29-RG
639271,N20Z,N2,HU,HMS,,,N20Z,
2550203,J33Z,J3,HU,HMS,,,J33Z,
1545646,M24,M2,HU,HMS,HMS-M24-RG,M24,M24,HMS-M24-RG
867753,E22I,E2,EH,HOS,,,E22I,
65812,A70,A7,AR,ART,ART-A70-RG,A70,A70,ART-A70-RG
676397,A20,A2,AR,ART,ART-A20-RG,A20,A20,ART-A20-RG
127406,P75,P7,HU,HMS,HMS-P75-RG,P75,P75,HMS-P75-RG
430380,N20,N2,HU,HMS,HMS-N20-RG,N20,N200,HMS-N20-RG


# Verification
Make sure the 12 major categories match 2-character NTEE codes

Here’s the **updated table** with an additional column for the **original NTEE Codes from the old 12-category mapping**:


| New Code | New Category                  | Old Label | Old Label Description               | Old NTEE Codes               | New NTEE Codes                        |
|----------|-------------------------------|-----------|--------------------------------------|------------------------------|----------------------------------------|
| ART      | Arts, Culture, and Humanities | AR        | Arts, culture, and humanities        | A                            | A                                      |
| EDU      | Education (minus universities)| ED        | Education (other)                    | B (other than B4, B5)        | B (excluding B40–B43, B50)             |
| UNI      | Universities                  | BH        | Higher education                     | B4, B5                       | B40, B41, B42, B43, B50                |
| ENV      | Environment and Animals       | EN        | Environment                          | C, D                         | C, D                                   |
| HOS      | Hospitals                     | EH        | Hospitals                            | E2                           | E20, E21, E22, E24                     |
| HEL      | Health (minus hospitals)      | HE        | Health                               | E (other than E2), F, G, H   | E (excluding E20–E24), F, G, H         |
| HMS      | Human Services                | HU        | Human services                       | I, J, K, L, M, N, O, P       | I, J, K, L, M, N, O, P                 |
| IFA      | International, Foreign Affairs| IN        | International                        | Q                            | Q                                      |
| PSB      | Public, Societal Benefit      | PU        | Public and societal benefit          | R, S, T, U, V, W             | R, S, T, U, V, W                       |
| REL      | Religion Related              | RE        | Religion                             | X                            | X                                      |
| MMB      | Mutual/Membership Benefit     | MU        | Mutual benefit                       | Y                            | Y                                      |
| UNU      | Unknown, Unclassified         | UN        | Unknown                              | Z                            | Z                                      |



In [61]:
dfm[dfm['NTEE_MAJ12']=='AR']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
A0     3431
A1    15279
A2    43170
A3    20489
A4     5340
A5    33206
A6    66238
A7     4686
A8    37902
A9     5725
Name: count, dtype: int64

<br>Good - B4, B5 omitted

In [62]:
dfm[dfm['NTEE_MAJ12']=='ED']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
B0     15448
B1     70128
B2    133440
B3      4277
B6     10027
B7     16396
B8     45341
B9     82806
Name: count, dtype: int64

In [66]:
dfm[dfm['NTEE_MAJ12']=='BH']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
B4    25275
B5     6677
Name: count, dtype: int64

In [67]:
dfm[dfm['NTEE_MAJ12']=='EN']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
C0     7848
C1     3223
C2     3486
C3    26159
C4     4020
C5     7782
C6     4289
C9     2388
D0     2159
D1     4058
D2    41472
D3     6316
D4     2008
D5     1654
D6     1056
D9     1227
Name: count, dtype: int64

In [68]:
dfm[dfm['NTEE_MAJ12']=='EH']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
E2    53664
Name: count, dtype: int64

In [69]:
dfm[dfm['NTEE_MAJ12']=='HE']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
E0     8847
E1    30443
E3    39507
E4     9012
E5     7406
E6    26424
E7    10474
E8     9789
E9    29296
F0     2970
F1     2480
F2    23900
F3    22694
F4     3664
F5     1298
F6     5125
F7      806
F8     4099
F9     3031
FJ        6
G0     2257
G1     5643
G2     3859
G3     7127
G4     8720
G5     3086
G6      250
G7      296
G8    11292
G9     5382
H0     1694
H1     4038
H2      714
H3     2140
H4     2423
H5      897
H6      102
H7      211
H8     1444
H9     6518
Name: count, dtype: int64

In [70]:
dfm[dfm['NTEE_MAJ12']=='HU']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
I0     2413
I1     2284
I2     6156
I3     1353
I4     4491
      ...  
P5    11678
P6    14956
P7    53493
P8    82131
P9    11493
Name: count, Length: 70, dtype: int64

In [71]:
dfm[dfm['NTEE_MAJ12']=='IN']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
Q0     2884
Q1     5436
Q2     5958
Q3    38148
Q4     1624
Q5      200
Q7     1792
Q9     1062
Name: count, dtype: int64

In [72]:
dfm[dfm['NTEE_MAJ12']=='PU']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
R0     1516
R1      644
R2     8906
R3     1310
R4     1059
R6     3912
R9     1557
S0     3144
S1     4983
S2    40983
S3    22535
S4    11197
S5     3449
S8    16827
S9     5085
T0     2843
T1    15310
T2    23496
T3    49149
T4     1419
T5     4705
T6       80
T7    22458
T9    12374
U0     2500
U1     1009
U2     2034
U3     3472
U4     3069
U5     1544
U9     2187
V0      741
V1      132
V2     2934
V3     1441
V4       49
V9      421
W0     3597
W1     2877
W2     3259
W3     5525
W4     1197
W5      399
W6     6456
W7     3430
W8      849
W9     4313
Name: count, dtype: int64

In [73]:
dfm[dfm['NTEE_MAJ12']=='RE']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
X0     2465
X1    10727
X2    81687
X3     7196
X4     3292
X5     3550
X6       14
X7     2247
X8     6620
X9    20574
Name: count, dtype: int64

In [74]:
dfm[dfm['NTEE_MAJ12']=='MU']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
Y0     244
Y1     191
Y2    1619
Y3     258
Y4    1260
Y5     557
Y9     476
Name: count, dtype: int64

In [75]:
dfm[dfm['NTEE_MAJ12']=='UN']['NTEE_2digit'].value_counts().sort_index()

NTEE_2digit
00       23
05        3
Z2       26
Z3       17
Z5       14
Z8       11
Z9    22647
c0       26
c3       10
w3        2
Name: count, dtype: int64

In [76]:
dfm[ntee_cols].isna().sum()

NTEE              10521
NTEE_2digit       10521
NTEE_MAJ12            0
NTEE_MAJ12_EV         0
NTEE2            480991
BMF_NTEE_NCCS    386056
BMF_NTEE_IRS      10521
BMF_NTEEV2       480957
dtype: int64

In [77]:
dfm[dfm['NTEE'].isnull()]['NTEE_MAJ12'].value_counts()

NTEE_MAJ12
UN    10521
Name: count, dtype: int64

In [78]:
dfm['NTEE_MAJ12'].value_counts().sort_index()

NTEE_MAJ12
AR    235466
BH     31952
ED    377863
EH     53664
EN    119145
HE    309364
HU    925266
IN     57104
MU      4605
PU    312376
RE    138372
UN     33300
Name: count, dtype: int64

In [79]:
dfm[dfm['NTEE_MAJ12']=='UN']['NTEE'].value_counts().sum()

22779

In [80]:
22779+10521

33300

In [81]:
dfm[ntee_cols].info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2598477 entries, 0 to 2598476
Data columns (total 8 columns):
 #   Column         Non-Null Count    Dtype 
---  ------         --------------    ----- 
 0   NTEE           2587956 non-null  object
 1   NTEE_2digit    2587956 non-null  object
 2   NTEE_MAJ12     2598477 non-null  object
 3   NTEE_MAJ12_EV  2598477 non-null  object
 4   NTEE2          2117486 non-null  object
 5   BMF_NTEE_NCCS  2212421 non-null  object
 6   BMF_NTEE_IRS   2587956 non-null  object
 7   BMF_NTEEV2     2117520 non-null  object
dtypes: object(8)
memory usage: 158.6+ MB


#### Save DF

In [215]:
%%time
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_feather('D:/990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state_ntee.feather')

Current date and time :  2025-05-24 21:09:32 

CPU times: total: 39.4 s
Wall time: 30.8 s


In [216]:
%%time
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_parquet("D:/990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state_ntee.parquet", engine="pyarrow", compression="snappy", index=False)

Current date and time :  2025-05-24 21:10:02 

CPU times: total: 1min 17s
Wall time: 1min 23s


In [None]:
%%time
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_pickle('990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state_ntee.pkl.gz', compression='gzip')

In [217]:
%%time
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_csv('990_and_bmf_april_2025_all_controls_351875_orgs_2598477_filings_no_duplicates_fixed_state_ntee.csv')

Current date and time :  2025-05-24 21:11:26 

CPU times: total: 10min 57s
Wall time: 11min 38s
