#### Update of this notebook:
- *IRS Form 990 e-File Data (4) -- Combine Columns -- DOWNLOADED XML FILINGS.ipynb*

# Overview

This is the fourth in a series of tutorials that illustrate how to download, parse, and analyze the IRS 990 e-file data available at https://aws.amazon.com/public-data-sets/irs-990/

In the previous notebook we extracted the JSON data from our MongoDB database into a Python PANDAS dataset. In this notebook I first read in the 'concordance' file and the e-file dataset that was generated in the previous notebook:
- *all filings August 2022 - all control variables.pkl.gz*

Regarding the concordance file, recall that the 990 e-file data contains myriad variables, each of which has to be verified before extracting and analyzing. Among other issues, most variables have more than one name in the XML files. Working with Jesse Lecy at Arizona State and others, a group of us has come up with a "concordance" file containing the *xpath* of all verified variables. Among other things, this concordance file maps the specific lines from the Form 990 to the xpaths in the XML file and contains a standardized, more descriptive variable name. Accordingly, we will read in the concordance file that has **_all_** reconciled and verified variables to date:
- *concordance_VERIFIED.xlsx*

I then use information contained in the concordance file to combine pairs of columns that reflect the same 990 variable, such as *TaxPeriodBeginDt* and *TaxPeriodBeginDate*, and assign the relevant 'standardized' name from the concordance file, such as *F9_00_HD_TAX_PER_BEGIN*. I then 'binarize' relevant columns and delete unneeded columns.

Saved final file (N=2,104,435): 
- *all filings August 2022 - all control variables (renamed).pkl.gz*


*Notes:*
- This notebook was updated to use information in the 'BINARIZE' column in the concordance file in order to identify relevant variables (those with a mix of 'True', 1, 0, 'False', 'X', etc.). 

- This notebook was also recently updated to take into account a new way of creating the *ReturnHeader* variables. Namely, I have updated the concordance file to include additional *ReturnHeader* variables. I have also changed the MongoDB name to be not, for example, 'ReturnHeader.TaxYear' but instead 'ReturnHeader'. In the first notebook in this series I flatten the *ReturnHeader* column and then I do the combining and renaming. So, I need to make sure the changes are followed through in subsequent notebooks to parse the following variables:

    - 'ReturnHeader.TaxYear': 1, 'ReturnHeader.TaxYr': 1,
    - 'ReturnHeader.TaxPeriodEndDate': 1, 'ReturnHeader.TaxPeriodEndDt': 1,  
    - 'ReturnHeader.TaxPeriodBeginDate': 1, 'ReturnHeader.TaxPeriodBeginDt': 1,      
    
- Similarly, I needed to make sure that *F9_00_HD_FILER_STATE_US* works, which is based on the 'Filer' dictionary column subsumed under 'ReturnHeader'. [UPDATE: It works]

- In future runs I might consider removing these variables from *binarize_cols*
    - *F9_12_PC_ACCTG_METHOD_OTHER*
    - *F9_00_HD_EXEMPT_STATUS_501C*

# Set up Working Space

In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series

In [2]:
print(pd.__version__)

2.2.2


In [3]:
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 250)

#### Set working directory

In [4]:
pwd

'C:\\Users\\Gregory\\Jupyter_Notebooks'

In [5]:
cd "C:\\Users\\Gregory\\IRS 990 Control Variables\\"

C:\Users\Gregory\IRS 990 Control Variables


# Read in Concordance File
We are going to read in a codebook called a 'concordance' file. We will use this file to identify variables to grab from the e-file data.

In [6]:
concordance = pd.read_excel('concordance_VERIFIED.xlsx')
print('# of columns:', len(concordance.columns))
print('# of observations:', len(concordance))
concordance[:2]

# of columns: 17
# of observations: 574


Unnamed: 0,xpath,variable_name_new,# of Characters (newly named),variable name notes,PARSING NOTES,OTHER NOTES,description,location_code,part,data_type_xsd,python_data_type,fill_null,BINARIZE,MongoDB_Name,sub_key,sub_sub_key,cardinality
0,/Return/ReturnData/IRS990/SpecialConditionDesc,F9_00_HD_SPECIAL_CONDITION_DESC,,,,,Special condition description,F990-PC-PART-00,PART-00,TextType,string,Do not fill null,,SpecialConditionDesc,,,
1,/Return/ReturnData/IRS990/SpecialConditionDescription,F9_00_HD_SPECIAL_CONDITION_DESC,31.0,,,,Special condition description,F990-PC-PART-00,PART-00,TextType,string,Do not fill null,,SpecialConditionDescription,,,


In [39]:
concordance['data_type_xsd'].value_counts()

data_type_xsd
USAmountType            294
BooleanType              98
CheckboxType             62
USAmountNNType           40
IntegerNNType            14
CountType                12
ExplanationType           8
DateType                  6
StateType                 6
ShortExplanationType      4
CountryType               4
StreetAddressType         4
YearType                  4
LineExplanationType       4
TimestampType             3
CityType                  2
ZIPCodeType               2
TextType                  2
StringType                2
PersonNameType            2
YearMonthType             1
Name: count, dtype: int64

In [44]:
concordance[concordance['data_type_xsd']=='CheckboxType']['MongoDB_Name'].tolist()

['AddressChange',
 'AddressChangeInd',
 'AmendedReturn',
 'AmendedReturnInd',
 'FinalReturnInd',
 'TerminatedReturn',
 'InitialReturn',
 'InitialReturnInd',
 'Organization4947a1',
 'Organization4947a1NotPFInd',
 'Organization501c',
 'Organization501cInd',
 'Organization501c3',
 'Organization501c3Ind',
 'TypeOfOrganizationAssociation',
 'TypeOfOrganizationAssocInd',
 'TypeOfOrganizationCorpInd',
 'TypeOfOrganizationCorporation',
 'TypeOfOrganizationOther',
 'TypeOfOrganizationOtherInd',
 'TypeOfOrganizationTrust',
 'TypeOfOrganizationTrustInd',
 'ContractTerminationInd',
 'TerminationOrContraction',
 'InfoInScheduleOPartIII',
 'InfoInScheduleOPartIIIInd',
 'OwnWebsite',
 'OwnWebsiteInd',
 'OwnWebsite',
 'OwnWebsiteInd',
 'OtherWebsite',
 'OtherWebsiteInd',
 'UponRequest',
 'UponRequestInd',
 'NoListedPersonsCompensated',
 'NoListedPersonsCompensatedInd',
 'InfoInScheduleOPartIX',
 'InfoInScheduleOPartIXInd',
 'InfoInScheduleOPartX',
 'InfoInScheduleOPartXInd',
 'FollowSFAS117',
 'Organi

In [42]:
concordance[concordance['data_type_xsd']=='CheckboxType'].sample(5)

Unnamed: 0,xpath,variable_name_new,# of Characters (newly named),variable name notes,PARSING NOTES,OTHER NOTES,description,location_code,part,data_type_xsd,python_data_type,fill_null,BINARIZE,MongoDB_Name,sub_key,sub_sub_key,cardinality
109,/Return/ReturnData/IRS990/TerminationOrContraction,F9_01_PC_TERMINATION_CONTRACTION,32.0,old variable_name_new was F9_01_PC_TERMINATE_CONTRACTION,,,Termination or contraction,F990-PC-PART-01-LINE-2,PART-01,CheckboxType,Int64,,binarize,TerminationOrContraction,,,
254,/Return/ReturnData/IRS990/OwnWebsite,F9_06_PC_OWN_WEBSITE,20.0,variable_name_new value added,,,Form available on own website,F990-PC-PART-06-SECTION-C-LINE-18-1,PART-06,CheckboxType,Int64,,binarize,OwnWebsite,,,
258,/Return/ReturnData/IRS990/OtherWebsite,F9_06_PC_OTHER_WEBSITE,22.0,variable_name_new value added,,,Form available on another's website,F990-PC-PART-06-SECTION-C-LINE-18-2,PART-06,CheckboxType,Int64,,binarize,OtherWebsite,,,
33,/Return/ReturnData/IRS990/TypeOfOrganizationOtherInd,F9_00_HD_TYPE_ORG_OTHER,,,,,Form of organization: Other,F990-PC-PART-00-SECTION-K,PART-00,CheckboxType,Int64,,binarize,TypeOfOrganizationOtherInd,,,
257,/Return/ReturnData/IRS990/OwnWebsiteInd,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,31.0,variable_name_new value added,,,Form available on own website,F990-PC-PART-06-SECTION-C-LINE-18-1,PART-06,CheckboxType,Int64,,binarize,OwnWebsiteInd,,,


In [43]:
concordance['python_data_type'].value_counts()

python_data_type
Int64       524
string       47
DateTime      3
Name: count, dtype: int64

In [7]:
[c for c in concordance['xpath'] if 'USAddress' in c]

['/Return/ReturnHeader/Filer/USAddress/State',
 '/Return/ReturnHeader/Filer/USAddress/StateAbbreviationCd',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine1',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine1Txt',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine2',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine2Txt',
 '/Return/ReturnHeader/Filer/USAddress/City',
 '/Return/ReturnHeader/Filer/USAddress/CityNm',
 '/Return/ReturnHeader/Filer/USAddress/ZIPCd',
 '/Return/ReturnHeader/Filer/USAddress/ZIPCode']

In [8]:
concordance[concordance['xpath'].isin(['/Return/ReturnHeader/Filer/USAddress/State',
 '/Return/ReturnHeader/Filer/USAddress/StateAbbreviationCd',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine1',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine1Txt',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine2',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine2Txt',
 '/Return/ReturnHeader/Filer/USAddress/City',
 '/Return/ReturnHeader/Filer/USAddress/CityNm',
 '/Return/ReturnHeader/Filer/USAddress/ZIPCd',
 '/Return/ReturnHeader/Filer/USAddress/ZIPCode'])]

Unnamed: 0,xpath,variable_name_new,# of Characters (newly named),variable name notes,PARSING NOTES,OTHER NOTES,description,location_code,part,data_type_xsd,python_data_type,fill_null,BINARIZE,MongoDB_Name,sub_key,sub_sub_key,cardinality
549,/Return/ReturnHeader/Filer/USAddress/State,F9_00_HD_FILER_STATE_US,,,Will be nested under ReturnHeader,,Address of Filing Organization (US State),HEADER,PART-00,StateType,string,Do not fill null,,Filer,USAddress,State,
550,/Return/ReturnHeader/Filer/USAddress/StateAbbreviationCd,F9_00_HD_FILER_STATE_US,,,Will be nested under ReturnHeader,,Address of Filing Organization (US State),HEADER,PART-00,StateType,string,Do not fill null,,Filer,USAddress,StateAbbreviationCd,
551,/Return/ReturnHeader/Filer/USAddress/AddressLine1,F9_00_HD_FILER_ADDR_US_L1,,,Will be nested under ReturnHeader,,Address of Filing Organization (US Line 1),HEADER-OR-SIGNATURE-BLOCK,PART-00,StreetAddressType,string,Do not fill null,,Filer,USAddress,AddressLine1,
552,/Return/ReturnHeader/Filer/USAddress/AddressLine1Txt,F9_00_HD_FILER_ADDR_US_L1,,,Will be nested under ReturnHeader,,Address of Filing Organization (US Line 1),HEADER-OR-SIGNATURE-BLOCK,PART-00,StreetAddressType,string,Do not fill null,,Filer,USAddress,AddressLine1Txt,
553,/Return/ReturnHeader/Filer/USAddress/AddressLine2,F9_00_HD_FILER_ADDR_US_L2,,,Will be nested under ReturnHeader,,Address of Filing Organization (US Line 2),HEADER-OR-SIGNATURE-BLOCK,PART-00,StreetAddressType,string,Do not fill null,,Filer,USAddress,AddressLine2,
554,/Return/ReturnHeader/Filer/USAddress/AddressLine2Txt,F9_00_HD_FILER_ADDR_US_L2,,,Will be nested under ReturnHeader,,Address of Filing Organization (US Line 2),HEADER-OR-SIGNATURE-BLOCK,PART-00,StreetAddressType,string,Do not fill null,,Filer,USAddress,AddressLine2Txt,
555,/Return/ReturnHeader/Filer/USAddress/City,F9_00_HD_FILER_CITY_US,,,Will be nested under ReturnHeader,,Address of Filing Organization (US City),HEADER-OR-SIGNATURE-BLOCK,PART-00,CityType,string,Do not fill null,,Filer,USAddress,City,
556,/Return/ReturnHeader/Filer/USAddress/CityNm,F9_00_HD_FILER_CITY_US,,,Will be nested under ReturnHeader,,Address of Filing Organization (US City),HEADER-OR-SIGNATURE-BLOCK,PART-00,CityType,string,Do not fill null,,Filer,USAddress,CityNm,
559,/Return/ReturnHeader/Filer/USAddress/ZIPCd,F9_00_HD_FILER_ZIP_US,,,Will be nested under ReturnHeader,,Address of Filing Organization (US Zip Code),HEADER-OR-SIGNATURE-BLOCK,PART-00,ZIPCodeType,string,Do not fill null,,Filer,USAddress,ZIPCd,
560,/Return/ReturnHeader/Filer/USAddress/ZIPCode,F9_00_HD_FILER_ZIP_US,,,Will be nested under ReturnHeader,,Address of Filing Organization (US Zip Code),HEADER-OR-SIGNATURE-BLOCK,PART-00,ZIPCodeType,string,Do not fill null,,Filer,USAddress,ZIPCode,


In [9]:
concordance[concordance['xpath'].isin(['/Return/ReturnHeader/Filer/USAddress/State',
 '/Return/ReturnHeader/Filer/USAddress/StateAbbreviationCd',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine1',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine1Txt',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine2',
 '/Return/ReturnHeader/Filer/USAddress/AddressLine2Txt',
 '/Return/ReturnHeader/Filer/USAddress/City',
 '/Return/ReturnHeader/Filer/USAddress/CityNm',
 '/Return/ReturnHeader/Filer/USAddress/ZIPCd',
 '/Return/ReturnHeader/Filer/USAddress/ZIPCode'])]['variable_name_new'].to_list()

['F9_00_HD_FILER_STATE_US',
 'F9_00_HD_FILER_STATE_US',
 'F9_00_HD_FILER_ADDR_US_L1',
 'F9_00_HD_FILER_ADDR_US_L1',
 'F9_00_HD_FILER_ADDR_US_L2',
 'F9_00_HD_FILER_ADDR_US_L2',
 'F9_00_HD_FILER_CITY_US',
 'F9_00_HD_FILER_CITY_US',
 'F9_00_HD_FILER_ZIP_US',
 'F9_00_HD_FILER_ZIP_US']

In [10]:
concordance[20:30]

Unnamed: 0,xpath,variable_name_new,# of Characters (newly named),variable name notes,PARSING NOTES,OTHER NOTES,description,location_code,part,data_type_xsd,python_data_type,fill_null,BINARIZE,MongoDB_Name,sub_key,sub_sub_key,cardinality
20,/Return/ReturnData/IRS990/Organization4947a1,F9_00_HD_EXEMPT_STATUS_4847A1,,,,,Indicates a 4947(a)(1) organization,F990-PC-PART-00-SECTION-I,PART-00,CheckboxType,Int64,,binarize,Organization4947a1,,,
21,/Return/ReturnData/IRS990/Organization4947a1NotPFInd,F9_00_HD_EXEMPT_STATUS_4847A1,,,,,Indicates a 4947(a)(1) organization,F990-PC-PART-00-SECTION-I,PART-00,CheckboxType,Int64,,binarize,Organization4947a1NotPFInd,,,
22,/Return/ReturnData/IRS990/Organization501c,F9_00_HD_EXEMPT_STATUS_501C,,,need to parse dictionary for some values. Dealt with in Control Variables (A1),,Indicates a 501(c) organization,F990-PC-PART-00-SECTION-I,PART-00,CheckboxType,Int64,Do not fill null,binarize_with_dict,Organization501c,,,
23,/Return/ReturnData/IRS990/Organization501cInd,F9_00_HD_EXEMPT_STATUS_501C,,,need to parse dictionary for some values. Dealt with in Control Variables (A1),,Indicates a 501(c) organization,F990-PC-PART-00-SECTION-I,PART-00,CheckboxType,Int64,Do not fill null,binarize_with_dict,Organization501cInd,,,
24,/Return/ReturnData/IRS990/Organization501c3,F9_00_HD_EXEMPT_STATUS_501C3,,,need to parse dictionary for some values. Dealt with in Control Variables (A1),,Indicates a 501(c)(3) organization,F990-PC-PART-00-SECTION-I,PART-00,CheckboxType,Int64,,binarize,Organization501c3,,,
25,/Return/ReturnData/IRS990/Organization501c3Ind,F9_00_HD_EXEMPT_STATUS_501C3,,,need to parse dictionary for some values. Dealt with in Control Variables (A1),,Indicates a 501(c)(3) organization,F990-PC-PART-00-SECTION-I,PART-00,CheckboxType,Int64,,binarize,Organization501c3Ind,,,
26,/Return/ReturnData/IRS990/WebSite,F9_00_HD_WEBSITE,,,,,Website,F990-PC-PART-00-SECTION-J,PART-00,LineExplanationType,string,Do not fill null,,WebSite,,,
27,/Return/ReturnData/IRS990/WebsiteAddressTxt,F9_00_HD_WEBSITE,,,,,Website,F990-PC-PART-00-SECTION-J,PART-00,LineExplanationType,string,Do not fill null,,WebsiteAddressTxt,,,
28,/Return/ReturnData/IRS990/TypeOfOrganizationAssociation,F9_00_HD_TYPE_ORG_ASSOCIATION,,,,,Form of organization: Association,F990-PC-PART-00-SECTION-K,PART-00,CheckboxType,Int64,,binarize,TypeOfOrganizationAssociation,,,
29,/Return/ReturnData/IRS990/TypeOfOrganizationAssocInd,F9_00_HD_TYPE_ORG_ASSOCIATION,,,,,Form of organization: Association,F990-PC-PART-00-SECTION-K,PART-00,CheckboxType,Int64,,binarize,TypeOfOrganizationAssocInd,,,


<br>Show frequencies for intended data type for the variables

In [11]:
concordance['data_type_xsd'].value_counts()

data_type_xsd
USAmountType            294
BooleanType              98
CheckboxType             62
USAmountNNType           40
IntegerNNType            14
CountType                12
ExplanationType           8
DateType                  6
StateType                 6
ShortExplanationType      4
CountryType               4
StreetAddressType         4
YearType                  4
LineExplanationType       4
TimestampType             3
CityType                  2
ZIPCodeType               2
TextType                  2
StringType                2
PersonNameType            2
YearMonthType             1
Name: count, dtype: int64

<br>Show an example of a variable that is Boolean followed by one that is a CheckBox type.

In [12]:
concordance[concordance['data_type_xsd']=='BooleanType'][:1]

Unnamed: 0,xpath,variable_name_new,# of Characters (newly named),variable name notes,PARSING NOTES,OTHER NOTES,description,location_code,part,data_type_xsd,python_data_type,fill_null,BINARIZE,MongoDB_Name,sub_key,sub_sub_key,cardinality
14,/Return/ReturnData/IRS990/GroupReturnForAffiliates,F9_00_HD_GROUP_RETURN,,,,,Indicates this form is a group return for subordinates,F990-PC-PART-00-SECTION-HA,PART-00,BooleanType,Int64,,binarize,GroupReturnForAffiliates,,,


In [13]:
concordance[concordance['data_type_xsd']=='CheckboxType'][:1]

Unnamed: 0,xpath,variable_name_new,# of Characters (newly named),variable name notes,PARSING NOTES,OTHER NOTES,description,location_code,part,data_type_xsd,python_data_type,fill_null,BINARIZE,MongoDB_Name,sub_key,sub_sub_key,cardinality
2,/Return/ReturnData/IRS990/AddressChange,F9_00_HD_ADDR_CHANGE,20.0,,,,Indicates this form has an address change,F990-PC-PART-00-SECTION-B,PART-00,CheckboxType,Int64,,binarize,AddressChange,,,


# Read 990 DB into PANDAS 
- In a previous round in 2019 there were 1,547,828 observations; in Feb. 2020 there were 1,727,056 observations; in Nov. 2020 there are 1,895,016 observations; in May 2021 there were 2,016,624 observations; in November 2021 there were 2,192,435 observations.

- Now I am only going to parse the NEW filings

# Test Using `Feather`

In [77]:
import modin as mpd
import pandas as pd

In [79]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df = pd.read_feather("D:/all_filings.feather")
print('# of columns:', len(df.columns))
print('# of observations:', len(df))
df[:2]

Current date and time :  2025-04-16 21:11:14 

# of columns: 496
# of observations: 3469008
CPU times: total: 2min 40s
Wall time: 3min 52s


Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,AddressChange,NameOfPrincipalOfficerPerson,GrossReceipts,GroupReturnForAffiliates,Organization501c3,WebSite,TypeOfOrganizationCorporation,YearFormation,StateLegalDomicile,ActivityOrMissionDescription,NbrVotingMembersGoverningBody,NbrIndependentVotingMembers,TotalNbrEmployees,TotalNbrVolunteers,TotalGrossUBI,NetUnrelatedBusinessTxblIncome,ContributionsGrantsPriorYear,ContributionsGrantsCurrentYear,ProgramServiceRevenuePriorYear,ProgramServiceRevenueCY,InvestmentIncomePriorYear,InvestmentIncomeCurrentYear,OtherRevenuePriorYear,OtherRevenueCurrentYear,TotalRevenuePriorYear,TotalRevenueCurrentYear,GrantsAndSimilarAmntsPriorYear,GrantsAndSimilarAmntsCY,BenefitsPaidToMembersPriorYear,BenefitsPaidToMembersCY,SalariesEtcPriorYear,SalariesEtcCurrentYear,TotalProfFundrsngExpPriorYear,TotalProfFundrsngExpCY,TotalFundrsngExpCurrentYear,OtherExpensePriorYear,OtherExpensesCurrentYear,TotalExpensesPriorYear,TotalExpensesCurrentYear,RevenuesLessExpensesPriorYear,RevenuesLessExpensesCY,TotalAssetsBOY,TotalAssetsEOY,TotalLiabilitiesBOY,TotalLiabilitiesEOY,NetAssetsOrFundBalancesBOY,NetAssetsOrFundBalancesEOY,InfoInScheduleOPartIII,MissionDescription,SignificantNewProgramServices,SignificantChange,Expense,Grants,Description,TotalProgramServiceExpense,PoliticalActivities,LobbyingActivities,ProfessionalFundraising,FundraisingActivities,Gaming,ExcessBenefitTransaction,PriorExcessBenefitTransaction,DisregardedEntity,RelatedEntity,RelatedOrgControlledEntity,TransactionRelatedEntity,TransfersToExemptNonChrtblOrg,ActivitiesConductedPartnership,NumberFormsTransmittedWith1096,NumberOfEmployees,UnrelatedBusinessIncome,InfoInScheduleOPartVI,NbrVotingGoverningBodyMembers,NumberIndependentVotingMembers,FamilyOrBusinessRelationship,DelegationOfManagementDuties,ChangesToOrganizingDocs,MaterialDiversionOrMisuse,MembersOrStockholders,ElectionOfBoardMembers,DecisionsSubjectToApproval,MinutesOfGoverningBody,MinutesOfCommittees,OfficerMailingAddress,LocalChapters,Form990ProvidedToGoverningBody,ConflictOfInterestPolicy,AnnualDisclosureCoveredPersons,RegularMonitoringEnforcement,WhistleblowerPolicy,DocumentRetentionPolicy,CompensationProcessCEO,CompensationProcessOther,InvestmentInJointVenture,StatesWhereCopyOfReturnIsFiled,UponRequest,NoListedPersonsCompensated,TotalReportableCompFromOrg,TotalReportableCompFrmRltdOrgs,TotalOtherCompensation,NumberIndividualsGT100K,FormersListed,TotalCompGT150K,CompensationFromOtherSources,NumberOfContractorsGT100K,AllOtherContributions,TotalContributions,TotalOtherRevenue,TotalRevenue,GrantsToDomesticOrgs,GrantsToDomesticIndividuals,FeesForServicesLegal,FeesForServicesAccounting,OfficeExpenses,PaymentsToAffiliates,DepreciationDepletion,OtherExpenses,AllOtherExpenses,TotalFunctionalExpenses,SavingsAndTempCashInvestments,AccountsReceivable,LandBuildingsEquipmentBasis,LandBldgEquipmentAccumDeprec,LandBuildingsEquipmentBasisNet,InvestmentsOtherSecurities,TotalAssets,AccountsPayableAccruedExpenses,GrantsPayable,OtherLiabilities,FollowSFAS117,UnrestrictedNetAssets,InfoInScheduleOPartXI,ReconcilationRevenueExpenses,InfoInScheduleOPartXII,MethodOfAccountingAccrual,AccountantCompileOrReview,FSAudited,AuditCommittee,FederalGrantAuditRequired,AllAffiliatesIncluded,GroupExemptionNumber,Revenue,PoliciesReferenceChapters,WrittenPolicyOrProcedure,TotalProgramServiceRevenue,ForeignGrants,BenefitsToMembers,CompCurrentOfficersDirectors,CompDisqualPersons,OtherSalariesAndWages,PensionPlanContributions,OtherEmployeeBenefits,PayrollTaxes,FeesForServicesManagement,FeesForServicesLobbying,FeesForServicesProfFundraising,FeesForServicesInvstMgmntFees,FeesForServicesOther,Advertising,InformationTechnology,Royalties,Occupancy,Travel,TravelEntrtnmntPublicOfficials,ConferencesMeetings,Interest,Insurance,CashNonInterestBearing,PledgesAndGrantsReceivable,ReceivablesFromDisqualPersons,OtherNotesLoansReceivableNet,InventoriesForSaleOrUse,PrepaidExpensesDeferredCharges,InvestmentsPubTradedSecurities,InvestmentsProgramRelated,IntangibleAssets,OtherAssetsTotal,DeferredRevenue,MortNotesPyblSecuredInvestProp,FederalGrantAuditPerformed,LoansFromOfficersDirectors,MethodOfAccountingCash,Activity2,Activity3,InfoInScheduleOPartVII,TaxExemptBondLiabilities,TemporarilyRestrictedNetAssets,OtherWebsite,PermanentlyRestrictedNetAssets,FundraisingEvents,CntrbtnsRprtdFundraisingEvents,RelatedOrganizations,GrossIncomeFundraisingEvents,FundraisingDirectExpenses,FederatedCampaigns,GovernmentGrants,MethodOfAccountingOther,GrossSalesOfInventory,CostOfGoodsSold,DoNotFollowSFAS117,RetainedEarningsEndowmentEtc,InitialReturn,MembershipDues,GrossIncomeGaming,GamingDirectExpenses,NoncashContributions,InfoInScheduleOPartV,OwnWebsite,UnsecuredNotesLoansPayable,ActivityOther,TotalOfOtherProgramServiceExp,TotalOfOtherProgramServiceRev,EscrowAccountLiability,TotalOfOtherProgramServiceGrnt,TypeOfOrganizationOther,Organization501c,TypeOfOrganizationTrust,TypeOfOrganizationAssociation,CountryLegalDomicile,AmendedReturn,TypeOfOrgOtherDescription,TotalJointCosts,TerminatedReturn,TerminationOrContraction,ActivityCode,SpecialConditionDescription,Organization4947a1,InfoInScheduleOPartIX,ReconciliationUnrealizedInvest,ReconcilationPriorAdjustment,ReconcilationDonatedServices,ReconcilationInvestExpenses,InfoInScheduleOPartVIII,InfoInScheduleOPartX,PrincipalOfficerNm,GrossReceiptsAmt,GroupReturnForAffiliatesInd,Organization501c3Ind,TypeOfOrganizationCorpInd,FormationYr,LegalDomicileStateCd,ActivityOrMissionDesc,VotingMembersGoverningBodyCnt,VotingMembersIndependentCnt,TotalEmployeeCnt,TotalGrossUBIAmt,CYContributionsGrantsAmt,CYProgramServiceRevenueAmt,CYInvestmentIncomeAmt,CYOtherRevenueAmt,CYTotalRevenueAmt,CYGrantsAndSimilarPaidAmt,CYBenefitsPaidToMembersAmt,CYSalariesCompEmpBnftPaidAmt,CYTotalProfFndrsngExpnsAmt,CYTotalFundraisingExpenseAmt,CYOtherExpensesAmt,CYTotalExpensesAmt,CYRevenuesLessExpensesAmt,TotalAssetsBOYAmt,TotalAssetsEOYAmt,TotalLiabilitiesEOYAmt,NetAssetsOrFundBalancesBOYAmt,NetAssetsOrFundBalancesEOYAmt,InfoInScheduleOPartIIIInd,MissionDesc,SignificantNewProgramSrvcInd,SignificantChangeInd,Desc,PoliticalCampaignActyInd,LobbyingActivitiesInd,ProfessionalFundraisingInd,FundraisingActivitiesInd,GamingActivitiesInd,EngagedInExcessBenefitTransInd,PYExcessBenefitTransInd,DisregardedEntityInd,RelatedEntityInd,RelatedOrganizationCtrlEntInd,TransactionWithControlEntInd,TrnsfrExmptNonChrtblRltdOrgInd,ActivitiesConductedPrtshpInd,IRPDocumentCnt,EmployeeCnt,UnrelatedBusIncmOverLimitInd,GoverningBodyVotingMembersCnt,IndependentVotingMemberCnt,FamilyOrBusinessRlnInd,DelegationOfMgmtDutiesInd,ChangeToOrgDocumentsInd,MaterialDiversionOrMisuseInd,MembersOrStockholdersInd,ElectionOfBoardMembersInd,DecisionsSubjectToApprovaInd,MinutesOfGoverningBodyInd,MinutesOfCommitteesInd,OfficerMailingAddressInd,LocalChaptersInd,Form990ProvidedToGvrnBodyInd,ConflictOfInterestPolicyInd,WhistleblowerPolicyInd,DocumentRetentionPolicyInd,CompensationProcessCEOInd,CompensationProcessOtherInd,InvestmentInJointVentureInd,StatesWhereCopyOfReturnIsFldCd,NoListedPersonsCompensatedInd,FormerOfcrEmployeesListedInd,TotalCompGreaterThan150KInd,CompensationFromOtherSrcsInd,MembershipDuesAmt,FundraisingAmt,AllOtherContributionsAmt,TotalContributionsAmt,OtherRevenueTotalAmt,TotalRevenueGrp,FeesForServicesAccountingGrp,OfficeExpensesGrp,InformationTechnologyGrp,ConferencesMeetingsGrp,InsuranceGrp,OtherExpensesGrp,AllOtherExpensesGrp,TotalFunctionalExpensesGrp,CashNonInterestBearingGrp,TotalAssetsGrp,OrgDoesNotFollowSFAS117Ind,RtnEarnEndowmentIncmOthFndsGrp,ReconcilationRevenueExpnssAmt,MethodOfAccountingCashInd,AccountantCompileOrReviewInd,FSAuditedInd,FederalGrantAuditRequiredInd,WebsiteAddressTxt,TotalVolunteersCnt,NetUnrelatedBusTxblIncmAmt,PYContributionsGrantsAmt,PYProgramServiceRevenueAmt,PYInvestmentIncomeAmt,PYOtherRevenueAmt,PYTotalRevenueAmt,PYGrantsAndSimilarPaidAmt,PYBenefitsPaidToMembersAmt,PYSalariesCompEmpBnftPaidAmt,PYTotalProfFndrsngExpnsAmt,PYOtherExpensesAmt,PYTotalExpensesAmt,PYRevenuesLessExpensesAmt,TotalLiabilitiesBOYAmt,ExpenseAmt,GrantAmt,RevenueAmt,ProgSrvcAccomActy2Grp,ProgSrvcAccomActy3Grp,ProgSrvcAccomActyOtherGrp,TotalOtherProgSrvcGrantAmt,TotalProgramServiceExpensesAmt,InfoInScheduleOPartVIInd,AnnualDisclosureCoveredPrsnInd,RegularMonitoringEnfrcInd,UponRequestInd,TotalReportableCompFromOrgAmt,TotReportableCompRltdOrgAmt,TotalOtherCompensationAmt,IndivRcvdGreaterThan100KCnt,CntrctRcvdGreaterThan100KCnt,GovernmentGrantsAmt,TotalProgramServiceRevenueAmt,FundraisingGrossIncomeAmt,ContriRptFundraisingEventAmt,FundraisingDirectExpensesAmt,GrossSalesOfInventoryAmt,CostOfGoodsSoldAmt,GrantsToDomesticIndividualsGrp,CompCurrentOfcrDirectorsGrp,OtherSalariesAndWagesGrp,PensionPlanContributionsGrp,OtherEmployeeBenefitsGrp,PayrollTaxesGrp,FeesForServicesOtherGrp,AdvertisingGrp,TravelGrp,InterestGrp,DepreciationDepletionGrp,SavingsAndTempCashInvstGrp,AccountsReceivableGrp,InventoriesForSaleOrUseGrp,PrepaidExpensesDefrdChargesGrp,LandBldgEquipCostOrOtherBssAmt,LandBldgEquipAccumDeprecAmt,LandBldgEquipBasisNetGrp,InvestmentsOtherSecuritiesGrp,IntangibleAssetsGrp,AccountsPayableAccrExpnssGrp,DeferredRevenueGrp,MortgNotesPyblScrdInvstPropGrp,OtherLiabilitiesGrp,OrganizationFollowsSFAS117Ind,UnrestrictedNetAssetsGrp,TemporarilyRstrNetAssetsGrp,InfoInScheduleOPartXIInd,NetUnrlzdGainsLossesInvstAmt,InfoInScheduleOPartXIIInd,AuditCommitteeInd,AllAffiliatesIncludedInd,GrantsToDomesticOrgsGrp,ForeignGrantsGrp,BenefitsToMembersGrp,CompDisqualPersonsGrp,FeesForServicesManagementGrp,FeesForServicesLegalGrp,FeesForServicesLobbyingGrp,FeesForSrvcInvstMgmntFeesGrp,RoyaltiesGrp,OccupancyGrp,PymtTravelEntrtnmntPubOfclGrp,PaymentsToAffiliatesGrp,PledgesAndGrantsReceivableGrp,RcvblFromDisqualifiedPrsnGrp,OthNotesLoansReceivableNetGrp,InvestmentsPubTradedSecGrp,InvestmentsProgramRelatedGrp,OtherAssetsTotalGrp,TotalOtherProgSrvcExpenseAmt,InfoInScheduleOPartVInd,MethodOfAccountingAccrualInd,NoncashContributionsAmt,GrantsPayableGrp,PermanentlyRstrNetAssetsGrp,TaxExemptBondLiabilitiesGrp,EscrowAccountLiabilityGrp,LoansFromOfficersDirectorsGrp,UnsecuredNotesLoansPayableGrp,PriorPeriodAdjustmentsAmt,FederalGrantAuditPerformedInd,PoliciesReferenceChaptersInd,OtherWebsiteInd,AddressChangeInd,WrittenPolicyOrProcedureInd,RelatedOrganizationsAmt,TotalOtherProgSrvcRevenueAmt,OwnWebsiteInd,TotalJointCostsGrp,DonatedServicesAndUseFcltsAmt,LegalDomicileCountryCd,InfoInScheduleOPartIXInd,TypeOfOrganizationTrustInd,FinalReturnInd,ContractTerminationInd,InfoInScheduleOPartXInd,GroupExemptionNum,InfoInScheduleOPartVIIInd,FederatedCampaignsAmt,TypeOfOrganizationOtherInd,OtherOrganizationDsc,InfoInScheduleOPartVIIIInd,TypeOfOrganizationAssocInd,InitialReturnInd,GamingGrossIncomeAmt,GamingDirectExpensesAmt,MethodOfAccountingOtherInd,InvestmentExpenseAmt,Organization501cInd,Organization4947a1NotPFInd,AmendedReturnInd,SpecialConditionDesc,ActivityCd,Timestamp,TaxPeriodEndDate,TaxPeriodBeginDate,Officer,TaxYear,BuildTS,ReturnTs,TaxPeriodEndDt,TaxPeriodBeginDt,BusinessOfficerGrp,TaxYr,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum
0,5d019e6778ffca27b42818d7,RONALD MCDONALD HOUSE CHARITIES- PHILADELPHIA REGION INC,https://s3.amazonaws.com/irs-form-990/201113139349301301_public.xml,93493313013011,201012,X,MICHAEL ANTON,1473903,0,X,,X,1992,PA,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,10,10,0,0.0,0,0.0,1044925.0,1439340,0,0,30447,33563,0.0,1000,1075372,1473903,638637.0,925000,0.0,0,0,0,0.0,0,195892,243131,459751,881768,1384751,193604,89152,1925215,2440859,171810,450430,1753405,1990429,X,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",0,0,1043744,925000.0,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,"""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""",0,0,0,X,10,10,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,"[""PA"", ""NJ"", ""DE""]",X,X,0.0,0,0,0,0,0,0,0,1439340.0,1439340,1000,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}","{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}","{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}","{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""118744"", ""ProgramServices"": ""118744""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",256845,86228,"{""BOY"": ""0"", ""EOY"": ""170617""}","{""BOY"": ""1489143"", ""EOY"": ""1851561""}","{""BOY"": ""1925215"", ""EOY"": ""2440859""}","{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}","{""BOY"": ""51640"", ""EOY"": ""240077""}",X,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",X,89152,X,X,0,1,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T06:41:09-06:00,2010-12-31,2010-01-01,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",2010,2016-02-24 21:20:13Z,,,,,,,232705170,"{'BusinessNameLine1': 'RONALD MCDONALD HOUSE CHARITIES-', 'BusinessNameLine2': 'PHILADELPHIA REGION INC'}",RONA,8565826843,"{'AddressLine1': '1525 VALLEY CENTER PARKWAY NO 300', 'City': 'BETHLEHEM', 'State': 'PA', 'ZIPCode': '18017'}",,,,,,,
1,5d019e6778ffca27b42818d8,TORRINGTON VOA ELDERLY HOUSING INC BELL PARK TOWER,https://s3.amazonaws.com/irs-form-990/201113139349301311_public.xml,93493313013111,201106,,,266420,false,X,,X,1993,WY,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,19,13,0,,0,,,0,222839,265592,1425,828,,0,224264,266420,,0,,0,71405,82955,,0,0,189785,222550,261190,305505,-36926,-39085,1455332,1433342,17482,34577,1437850,1398765,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,false,false,276405,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,276405,"""false""","""false""","""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}",0,0,false,X,19,13,true,true,false,false,false,true,true,true,true,false,false,true,true,true,true,false,false,true,true,false,,X,,,1180355,411648,0,true,true,false,0,,0,0,"{""TotalRevenueColumn"": ""266420"", ""RelatedOrExemptFunctionIncome"": ""266420""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""7500"", ""ManagementAndGeneral"": ""7500""}","{""Total"": ""14222"", ""ProgramServices"": ""14222""}","{""Total"": ""0""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","[{""Description"": ""OPER. & MAINT."", ""Total"": ""46164"", ""ProgramServices"": ""46164""}, {""Description"": ""MISC TAXES"", ""Total"": ""298"", ""ProgramServices"": ""298""}, {""Description"": ""ADMINISTRATIVE"", ""Total"": ""12176"", ""ProgramServices"": ""12176""}]","{""Total"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""231"", ""EOY"": ""474""}",2187206,904332,"{""BOY"": ""1306860"", ""EOY"": ""1282874""}","{""BOY"": ""125980"", ""EOY"": ""102794""}","{""BOY"": ""1455332"", ""EOY"": ""1433342""}","{""BOY"": ""2040"", ""EOY"": ""16145""}",,"{""BOY"": ""9203"", ""EOY"": ""11349""}",X,"{""BOY"": ""1437850"", ""EOY"": ""1398765""}",,-39085,,X,false,true,true,true,"""false""",1736.0,266420.0,False,False,265592.0,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""0""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""21600"", ""ManagementAndGeneral"": ""21600""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""8433"", ""ProgramServices"": ""8433""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""44077"", ""ProgramServices"": ""44077""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""806"", ""ProgramServices"": ""806""}","{""Total"": ""0""}","{""Total"": ""1108"", ""ProgramServices"": ""1108""}","{""BOY"": ""250"", ""EOY"": ""22261""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""7628"", ""EOY"": ""7554""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""14383"", ""EOY"": ""17385""}","{""BOY"": ""20"", ""EOY"": ""48""}","{""BOY"": ""6219"", ""EOY"": ""7035""}",True,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T07:32:06-08:00,2011-06-30,2010-07-01,"{'Name': 'THOMAS D TURNBULL', 'Title': 'ASST. SEC/TREAS', 'DateSigned': '2011-11-09'}",2010,2016-02-24 21:20:13Z,,,,,,,581805618,"{'BusinessNameLine1': 'TORRINGTON VOA ELDERLY HOUSING INC', 'BusinessNameLine2': 'BELL PARK TOWER'}",TORR,7033415000,"{'AddressLine1': '1660 DUKE STREET', 'City': 'ALEXANDRIA', 'State': 'VA', 'ZIPCode': '22314'}",,,,,,,


In [80]:
df.dtypes[:25]

_id                                       object
OrganizationName                          object
URL                                       object
DLN                                       object
TaxPeriod                                 object
AddressChange                     string[python]
NameOfPrincipalOfficerPerson              object
GrossReceipts                             object
GroupReturnForAffiliates                  object
Organization501c3                 string[python]
WebSite                                   object
TypeOfOrganizationCorporation     string[python]
YearFormation                             object
StateLegalDomicile                        object
ActivityOrMissionDescription              object
NbrVotingMembersGoverningBody             object
NbrIndependentVotingMembers               object
TotalNbrEmployees                         object
TotalNbrVolunteers                        object
TotalGrossUBI                             object
NetUnrelatedBusiness

In [63]:
#%%time
#df = df.convert_dtypes()

CPU times: total: 2min 29s
Wall time: 2min 42s


In [64]:
#df.dtypes[:25]

_id                               string[python]
OrganizationName                  string[python]
URL                               string[python]
DLN                               string[python]
TaxPeriod                         string[python]
AddressChange                     string[python]
NameOfPrincipalOfficerPerson      string[python]
GrossReceipts                     string[python]
GroupReturnForAffiliates          string[python]
Organization501c3                 string[python]
WebSite                           string[python]
TypeOfOrganizationCorporation     string[python]
YearFormation                     string[python]
StateLegalDomicile                string[python]
ActivityOrMissionDescription      string[python]
NbrVotingMembersGoverningBody     string[python]
NbrIndependentVotingMembers       string[python]
TotalNbrEmployees                 string[python]
TotalNbrVolunteers                string[python]
TotalGrossUBI                     string[python]
NetUnrelatedBusiness

### Sidebar - Identify colums with 'X'

In [69]:
# Simple approach to find columns containing 'X'
import pandas as pd
import modin.pandas as mpd

# Assuming your DataFrame is already loaded as 'df'

def find_columns_with_x(df, sample_size=10000):
    """
    Find columns that contain 'X' by taking a sample of rows.
    
    Parameters:
    -----------
    df : DataFrame
        The DataFrame to check
    sample_size : int
        Number of rows to sample (default 10000)
    
    Returns:
    --------
    list
        Columns that contain 'X' values
    """
    # Take a sample to reduce memory usage
    sample_df = df.sample(min(sample_size, len(df)))
    
    # Convert the sample to strings for checking
    columns_with_x = []
    
    for col in sample_df.columns:
        # Check if column has any 'X' values
        if (sample_df[col].astype(str) == 'X').any():
            print(f"Column '{col}' contains 'X' values")
            columns_with_x.append(col)
        elif sample_df[col].astype(str).str.contains('X').any():
            print(f"Column '{col}' contains strings with 'X' in them")
            columns_with_x.append(col)
    
    print(f"\nFound {len(columns_with_x)} columns with 'X' values")
    return columns_with_x

# Find columns with 'X'
x_columns = find_columns_with_x(df)
print("\nList of columns with 'X':")
for col in x_columns:
    print(f"- {col}")

Column 'OrganizationName' contains strings with 'X' in them
Column 'AddressChange' contains 'X' values
Column 'NameOfPrincipalOfficerPerson' contains strings with 'X' in them
Column 'Organization501c3' contains 'X' values
Column 'WebSite' contains strings with 'X' in them
Column 'TypeOfOrganizationCorporation' contains 'X' values
Column 'StateLegalDomicile' contains strings with 'X' in them
Column 'ActivityOrMissionDescription' contains strings with 'X' in them
Column 'InfoInScheduleOPartIII' contains 'X' values
Column 'MissionDescription' contains strings with 'X' in them
Column 'Description' contains strings with 'X' in them
Column 'InfoInScheduleOPartVI' contains 'X' values
Column 'StatesWhereCopyOfReturnIsFiled' contains strings with 'X' in them
Column 'UponRequest' contains 'X' values
Column 'NoListedPersonsCompensated' contains 'X' values
Column 'OtherExpenses' contains strings with 'X' in them
Column 'FollowSFAS117' contains 'X' values
Column 'InfoInScheduleOPartXI' contains 'X'

In [70]:
x_columns

['OrganizationName',
 'AddressChange',
 'NameOfPrincipalOfficerPerson',
 'Organization501c3',
 'WebSite',
 'TypeOfOrganizationCorporation',
 'StateLegalDomicile',
 'ActivityOrMissionDescription',
 'InfoInScheduleOPartIII',
 'MissionDescription',
 'Description',
 'InfoInScheduleOPartVI',
 'StatesWhereCopyOfReturnIsFiled',
 'UponRequest',
 'NoListedPersonsCompensated',
 'OtherExpenses',
 'FollowSFAS117',
 'InfoInScheduleOPartXI',
 'InfoInScheduleOPartXII',
 'MethodOfAccountingAccrual',
 'MethodOfAccountingCash',
 'Activity2',
 'Activity3',
 'InfoInScheduleOPartVII',
 'OtherWebsite',
 'MethodOfAccountingOther',
 'DoNotFollowSFAS117',
 'InitialReturn',
 'InfoInScheduleOPartV',
 'OwnWebsite',
 'ActivityOther',
 'TypeOfOrganizationOther',
 'Organization501c',
 'TypeOfOrganizationTrust',
 'TypeOfOrganizationAssociation',
 'AmendedReturn',
 'TerminatedReturn',
 'TerminationOrContraction',
 'Organization4947a1',
 'InfoInScheduleOPartIX',
 'InfoInScheduleOPartVIII',
 'InfoInScheduleOPartX',
 'Pr

In [71]:
# To ensure these columns are saved as strings, simply convert them:
def fix_and_save_parquet(df, string_columns, output_path='fixed_dataframe.parquet'):
    """
    Convert specified columns to string type and save to parquet
    """
    # Make a copy of the dataframe
    df_fixed = df.copy()
    
    # Convert columns with 'X' to string type
    for col in string_columns:
        df_fixed[col] = df_fixed[col].astype(str)
        print(f"Converted '{col}' to string type")
    
    # Save to parquet with string_as_string option
    # If using modin, convert to pandas first
    if 'modin.pandas' in str(type(df_fixed)):
        df_fixed = df_fixed._to_pandas()
    
    # Save with options to preserve string type
    df_fixed.to_parquet(
        output_path,
        engine='pyarrow',
        # These options help ensure strings stay as strings
        use_dictionary=False
    )
    print(f"Saved to {output_path}")
    
    return df_fixed

In [72]:
%%time
# Example usage:
fix_and_save_parquet(df, x_columns)

Converted 'OrganizationName' to string type
Converted 'AddressChange' to string type
Converted 'NameOfPrincipalOfficerPerson' to string type
Converted 'Organization501c3' to string type
Converted 'WebSite' to string type
Converted 'TypeOfOrganizationCorporation' to string type
Converted 'StateLegalDomicile' to string type
Converted 'ActivityOrMissionDescription' to string type
Converted 'InfoInScheduleOPartIII' to string type
Converted 'MissionDescription' to string type
Converted 'Description' to string type
Converted 'InfoInScheduleOPartVI' to string type
Converted 'StatesWhereCopyOfReturnIsFiled' to string type
Converted 'UponRequest' to string type
Converted 'NoListedPersonsCompensated' to string type
Converted 'OtherExpenses' to string type
Converted 'FollowSFAS117' to string type
Converted 'InfoInScheduleOPartXI' to string type
Converted 'InfoInScheduleOPartXII' to string type
Converted 'MethodOfAccountingAccrual' to string type
Converted 'MethodOfAccountingCash' to string type
C

Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,AddressChange,NameOfPrincipalOfficerPerson,GrossReceipts,GroupReturnForAffiliates,Organization501c3,WebSite,TypeOfOrganizationCorporation,YearFormation,StateLegalDomicile,ActivityOrMissionDescription,NbrVotingMembersGoverningBody,NbrIndependentVotingMembers,TotalNbrEmployees,TotalNbrVolunteers,TotalGrossUBI,NetUnrelatedBusinessTxblIncome,ContributionsGrantsPriorYear,ContributionsGrantsCurrentYear,ProgramServiceRevenuePriorYear,ProgramServiceRevenueCY,InvestmentIncomePriorYear,InvestmentIncomeCurrentYear,OtherRevenuePriorYear,OtherRevenueCurrentYear,TotalRevenuePriorYear,TotalRevenueCurrentYear,GrantsAndSimilarAmntsPriorYear,GrantsAndSimilarAmntsCY,BenefitsPaidToMembersPriorYear,BenefitsPaidToMembersCY,SalariesEtcPriorYear,SalariesEtcCurrentYear,TotalProfFundrsngExpPriorYear,TotalProfFundrsngExpCY,TotalFundrsngExpCurrentYear,OtherExpensePriorYear,OtherExpensesCurrentYear,TotalExpensesPriorYear,TotalExpensesCurrentYear,RevenuesLessExpensesPriorYear,RevenuesLessExpensesCY,TotalAssetsBOY,TotalAssetsEOY,TotalLiabilitiesBOY,TotalLiabilitiesEOY,NetAssetsOrFundBalancesBOY,NetAssetsOrFundBalancesEOY,InfoInScheduleOPartIII,MissionDescription,SignificantNewProgramServices,SignificantChange,Expense,Grants,Description,TotalProgramServiceExpense,PoliticalActivities,LobbyingActivities,ProfessionalFundraising,FundraisingActivities,Gaming,ExcessBenefitTransaction,PriorExcessBenefitTransaction,DisregardedEntity,RelatedEntity,RelatedOrgControlledEntity,TransactionRelatedEntity,TransfersToExemptNonChrtblOrg,ActivitiesConductedPartnership,NumberFormsTransmittedWith1096,NumberOfEmployees,UnrelatedBusinessIncome,InfoInScheduleOPartVI,NbrVotingGoverningBodyMembers,NumberIndependentVotingMembers,FamilyOrBusinessRelationship,DelegationOfManagementDuties,ChangesToOrganizingDocs,MaterialDiversionOrMisuse,MembersOrStockholders,ElectionOfBoardMembers,DecisionsSubjectToApproval,MinutesOfGoverningBody,MinutesOfCommittees,OfficerMailingAddress,LocalChapters,Form990ProvidedToGoverningBody,ConflictOfInterestPolicy,AnnualDisclosureCoveredPersons,RegularMonitoringEnforcement,WhistleblowerPolicy,DocumentRetentionPolicy,CompensationProcessCEO,CompensationProcessOther,InvestmentInJointVenture,StatesWhereCopyOfReturnIsFiled,UponRequest,NoListedPersonsCompensated,TotalReportableCompFromOrg,TotalReportableCompFrmRltdOrgs,TotalOtherCompensation,NumberIndividualsGT100K,FormersListed,TotalCompGT150K,CompensationFromOtherSources,NumberOfContractorsGT100K,AllOtherContributions,TotalContributions,TotalOtherRevenue,TotalRevenue,GrantsToDomesticOrgs,GrantsToDomesticIndividuals,FeesForServicesLegal,FeesForServicesAccounting,OfficeExpenses,PaymentsToAffiliates,DepreciationDepletion,OtherExpenses,AllOtherExpenses,TotalFunctionalExpenses,SavingsAndTempCashInvestments,AccountsReceivable,LandBuildingsEquipmentBasis,LandBldgEquipmentAccumDeprec,LandBuildingsEquipmentBasisNet,InvestmentsOtherSecurities,TotalAssets,AccountsPayableAccruedExpenses,GrantsPayable,OtherLiabilities,FollowSFAS117,UnrestrictedNetAssets,InfoInScheduleOPartXI,ReconcilationRevenueExpenses,InfoInScheduleOPartXII,MethodOfAccountingAccrual,AccountantCompileOrReview,FSAudited,AuditCommittee,FederalGrantAuditRequired,AllAffiliatesIncluded,GroupExemptionNumber,Revenue,PoliciesReferenceChapters,WrittenPolicyOrProcedure,TotalProgramServiceRevenue,ForeignGrants,BenefitsToMembers,CompCurrentOfficersDirectors,CompDisqualPersons,OtherSalariesAndWages,PensionPlanContributions,OtherEmployeeBenefits,PayrollTaxes,FeesForServicesManagement,FeesForServicesLobbying,FeesForServicesProfFundraising,FeesForServicesInvstMgmntFees,FeesForServicesOther,Advertising,InformationTechnology,Royalties,Occupancy,Travel,TravelEntrtnmntPublicOfficials,ConferencesMeetings,Interest,Insurance,CashNonInterestBearing,PledgesAndGrantsReceivable,ReceivablesFromDisqualPersons,OtherNotesLoansReceivableNet,InventoriesForSaleOrUse,PrepaidExpensesDeferredCharges,InvestmentsPubTradedSecurities,InvestmentsProgramRelated,IntangibleAssets,OtherAssetsTotal,DeferredRevenue,MortNotesPyblSecuredInvestProp,FederalGrantAuditPerformed,LoansFromOfficersDirectors,MethodOfAccountingCash,Activity2,Activity3,InfoInScheduleOPartVII,TaxExemptBondLiabilities,TemporarilyRestrictedNetAssets,OtherWebsite,PermanentlyRestrictedNetAssets,FundraisingEvents,CntrbtnsRprtdFundraisingEvents,RelatedOrganizations,GrossIncomeFundraisingEvents,FundraisingDirectExpenses,FederatedCampaigns,GovernmentGrants,MethodOfAccountingOther,GrossSalesOfInventory,CostOfGoodsSold,DoNotFollowSFAS117,RetainedEarningsEndowmentEtc,InitialReturn,MembershipDues,GrossIncomeGaming,GamingDirectExpenses,NoncashContributions,InfoInScheduleOPartV,OwnWebsite,UnsecuredNotesLoansPayable,ActivityOther,TotalOfOtherProgramServiceExp,TotalOfOtherProgramServiceRev,EscrowAccountLiability,TotalOfOtherProgramServiceGrnt,TypeOfOrganizationOther,Organization501c,TypeOfOrganizationTrust,TypeOfOrganizationAssociation,CountryLegalDomicile,AmendedReturn,TypeOfOrgOtherDescription,TotalJointCosts,TerminatedReturn,TerminationOrContraction,ActivityCode,SpecialConditionDescription,Organization4947a1,InfoInScheduleOPartIX,ReconciliationUnrealizedInvest,ReconcilationPriorAdjustment,ReconcilationDonatedServices,ReconcilationInvestExpenses,InfoInScheduleOPartVIII,InfoInScheduleOPartX,PrincipalOfficerNm,GrossReceiptsAmt,GroupReturnForAffiliatesInd,Organization501c3Ind,TypeOfOrganizationCorpInd,FormationYr,LegalDomicileStateCd,ActivityOrMissionDesc,VotingMembersGoverningBodyCnt,VotingMembersIndependentCnt,TotalEmployeeCnt,TotalGrossUBIAmt,CYContributionsGrantsAmt,CYProgramServiceRevenueAmt,CYInvestmentIncomeAmt,CYOtherRevenueAmt,CYTotalRevenueAmt,CYGrantsAndSimilarPaidAmt,CYBenefitsPaidToMembersAmt,CYSalariesCompEmpBnftPaidAmt,CYTotalProfFndrsngExpnsAmt,CYTotalFundraisingExpenseAmt,CYOtherExpensesAmt,CYTotalExpensesAmt,CYRevenuesLessExpensesAmt,TotalAssetsBOYAmt,TotalAssetsEOYAmt,TotalLiabilitiesEOYAmt,NetAssetsOrFundBalancesBOYAmt,NetAssetsOrFundBalancesEOYAmt,InfoInScheduleOPartIIIInd,MissionDesc,SignificantNewProgramSrvcInd,SignificantChangeInd,Desc,PoliticalCampaignActyInd,LobbyingActivitiesInd,ProfessionalFundraisingInd,FundraisingActivitiesInd,GamingActivitiesInd,EngagedInExcessBenefitTransInd,PYExcessBenefitTransInd,DisregardedEntityInd,RelatedEntityInd,RelatedOrganizationCtrlEntInd,TransactionWithControlEntInd,TrnsfrExmptNonChrtblRltdOrgInd,ActivitiesConductedPrtshpInd,IRPDocumentCnt,EmployeeCnt,UnrelatedBusIncmOverLimitInd,GoverningBodyVotingMembersCnt,IndependentVotingMemberCnt,FamilyOrBusinessRlnInd,DelegationOfMgmtDutiesInd,ChangeToOrgDocumentsInd,MaterialDiversionOrMisuseInd,MembersOrStockholdersInd,ElectionOfBoardMembersInd,DecisionsSubjectToApprovaInd,MinutesOfGoverningBodyInd,MinutesOfCommitteesInd,OfficerMailingAddressInd,LocalChaptersInd,Form990ProvidedToGvrnBodyInd,ConflictOfInterestPolicyInd,WhistleblowerPolicyInd,DocumentRetentionPolicyInd,CompensationProcessCEOInd,CompensationProcessOtherInd,InvestmentInJointVentureInd,StatesWhereCopyOfReturnIsFldCd,NoListedPersonsCompensatedInd,FormerOfcrEmployeesListedInd,TotalCompGreaterThan150KInd,CompensationFromOtherSrcsInd,MembershipDuesAmt,FundraisingAmt,AllOtherContributionsAmt,TotalContributionsAmt,OtherRevenueTotalAmt,TotalRevenueGrp,FeesForServicesAccountingGrp,OfficeExpensesGrp,InformationTechnologyGrp,ConferencesMeetingsGrp,InsuranceGrp,OtherExpensesGrp,AllOtherExpensesGrp,TotalFunctionalExpensesGrp,CashNonInterestBearingGrp,TotalAssetsGrp,OrgDoesNotFollowSFAS117Ind,RtnEarnEndowmentIncmOthFndsGrp,ReconcilationRevenueExpnssAmt,MethodOfAccountingCashInd,AccountantCompileOrReviewInd,FSAuditedInd,FederalGrantAuditRequiredInd,WebsiteAddressTxt,TotalVolunteersCnt,NetUnrelatedBusTxblIncmAmt,PYContributionsGrantsAmt,PYProgramServiceRevenueAmt,PYInvestmentIncomeAmt,PYOtherRevenueAmt,PYTotalRevenueAmt,PYGrantsAndSimilarPaidAmt,PYBenefitsPaidToMembersAmt,PYSalariesCompEmpBnftPaidAmt,PYTotalProfFndrsngExpnsAmt,PYOtherExpensesAmt,PYTotalExpensesAmt,PYRevenuesLessExpensesAmt,TotalLiabilitiesBOYAmt,ExpenseAmt,GrantAmt,RevenueAmt,ProgSrvcAccomActy2Grp,ProgSrvcAccomActy3Grp,ProgSrvcAccomActyOtherGrp,TotalOtherProgSrvcGrantAmt,TotalProgramServiceExpensesAmt,InfoInScheduleOPartVIInd,AnnualDisclosureCoveredPrsnInd,RegularMonitoringEnfrcInd,UponRequestInd,TotalReportableCompFromOrgAmt,TotReportableCompRltdOrgAmt,TotalOtherCompensationAmt,IndivRcvdGreaterThan100KCnt,CntrctRcvdGreaterThan100KCnt,GovernmentGrantsAmt,TotalProgramServiceRevenueAmt,FundraisingGrossIncomeAmt,ContriRptFundraisingEventAmt,FundraisingDirectExpensesAmt,GrossSalesOfInventoryAmt,CostOfGoodsSoldAmt,GrantsToDomesticIndividualsGrp,CompCurrentOfcrDirectorsGrp,OtherSalariesAndWagesGrp,PensionPlanContributionsGrp,OtherEmployeeBenefitsGrp,PayrollTaxesGrp,FeesForServicesOtherGrp,AdvertisingGrp,TravelGrp,InterestGrp,DepreciationDepletionGrp,SavingsAndTempCashInvstGrp,AccountsReceivableGrp,InventoriesForSaleOrUseGrp,PrepaidExpensesDefrdChargesGrp,LandBldgEquipCostOrOtherBssAmt,LandBldgEquipAccumDeprecAmt,LandBldgEquipBasisNetGrp,InvestmentsOtherSecuritiesGrp,IntangibleAssetsGrp,AccountsPayableAccrExpnssGrp,DeferredRevenueGrp,MortgNotesPyblScrdInvstPropGrp,OtherLiabilitiesGrp,OrganizationFollowsSFAS117Ind,UnrestrictedNetAssetsGrp,TemporarilyRstrNetAssetsGrp,InfoInScheduleOPartXIInd,NetUnrlzdGainsLossesInvstAmt,InfoInScheduleOPartXIIInd,AuditCommitteeInd,AllAffiliatesIncludedInd,GrantsToDomesticOrgsGrp,ForeignGrantsGrp,BenefitsToMembersGrp,CompDisqualPersonsGrp,FeesForServicesManagementGrp,FeesForServicesLegalGrp,FeesForServicesLobbyingGrp,FeesForSrvcInvstMgmntFeesGrp,RoyaltiesGrp,OccupancyGrp,PymtTravelEntrtnmntPubOfclGrp,PaymentsToAffiliatesGrp,PledgesAndGrantsReceivableGrp,RcvblFromDisqualifiedPrsnGrp,OthNotesLoansReceivableNetGrp,InvestmentsPubTradedSecGrp,InvestmentsProgramRelatedGrp,OtherAssetsTotalGrp,TotalOtherProgSrvcExpenseAmt,InfoInScheduleOPartVInd,MethodOfAccountingAccrualInd,NoncashContributionsAmt,GrantsPayableGrp,PermanentlyRstrNetAssetsGrp,TaxExemptBondLiabilitiesGrp,EscrowAccountLiabilityGrp,LoansFromOfficersDirectorsGrp,UnsecuredNotesLoansPayableGrp,PriorPeriodAdjustmentsAmt,FederalGrantAuditPerformedInd,PoliciesReferenceChaptersInd,OtherWebsiteInd,AddressChangeInd,WrittenPolicyOrProcedureInd,RelatedOrganizationsAmt,TotalOtherProgSrvcRevenueAmt,OwnWebsiteInd,TotalJointCostsGrp,DonatedServicesAndUseFcltsAmt,LegalDomicileCountryCd,InfoInScheduleOPartIXInd,TypeOfOrganizationTrustInd,FinalReturnInd,ContractTerminationInd,InfoInScheduleOPartXInd,GroupExemptionNum,InfoInScheduleOPartVIIInd,FederatedCampaignsAmt,TypeOfOrganizationOtherInd,OtherOrganizationDsc,InfoInScheduleOPartVIIIInd,TypeOfOrganizationAssocInd,InitialReturnInd,GamingGrossIncomeAmt,GamingDirectExpensesAmt,MethodOfAccountingOtherInd,InvestmentExpenseAmt,Organization501cInd,Organization4947a1NotPFInd,AmendedReturnInd,SpecialConditionDesc,ActivityCd,Timestamp,TaxPeriodEndDate,TaxPeriodBeginDate,Officer,TaxYear,BuildTS,ReturnTs,TaxPeriodEndDt,TaxPeriodBeginDt,BusinessOfficerGrp,TaxYr,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum
0,5d019e6778ffca27b42818d7,RONALD MCDONALD HOUSE CHARITIES- PHILADELPHIA REGION INC,https://s3.amazonaws.com/irs-form-990/201113139349301301_public.xml,93493313013011,201012,X,MICHAEL ANTON,1473903,0,X,,X,1992,PA,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,10,10,0,0,0,0,1044925,1439340,0,0,30447,33563,0,1000,1075372,1473903,638637,925000,0,0,0,0,0,0,195892,243131,459751,881768,1384751,193604,89152,1925215,2440859,171810,450430,1753405,1990429,X,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",0,0,1043744,925000,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,"""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""",0,0,0,X,10,10,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,"[""PA"", ""NJ"", ""DE""]",X,X,0,0,0,0,0,0,0,0,1439340,1439340,1000,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}","{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}","{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}","{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""118744"", ""ProgramServices"": ""118744""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",256845,86228,"{""BOY"": ""0"", ""EOY"": ""170617""}","{""BOY"": ""1489143"", ""EOY"": ""1851561""}","{""BOY"": ""1925215"", ""EOY"": ""2440859""}","{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}","{""BOY"": ""51640"", ""EOY"": ""240077""}",X,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",X,89152,X,X,0,1,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T06:41:09-06:00,2010-12-31,2010-01-01,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",2010,2016-02-24 21:20:13Z,,,,,,,232705170,"{'BusinessNameLine1': 'RONALD MCDONALD HOUSE CHARITIES-', 'BusinessNameLine2': 'PHILADELPHIA REGION INC'}",RONA,8565826843,"{'AddressLine1': '1525 VALLEY CENTER PARKWAY NO 300', 'City': 'BETHLEHEM', 'State': 'PA', 'ZIPCode': '18017'}",,,,,,,
1,5d019e6778ffca27b42818d8,TORRINGTON VOA ELDERLY HOUSING INC BELL PARK TOWER,https://s3.amazonaws.com/irs-form-990/201113139349301311_public.xml,93493313013111,201106,,,266420,false,X,,X,1993,WY,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,19,13,0,,0,,,0,222839,265592,1425,828,,0,224264,266420,,0,,0,71405,82955,,0,0,189785,222550,261190,305505,-36926,-39085,1455332,1433342,17482,34577,1437850,1398765,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,false,false,276405,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,276405,"""false""","""false""","""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}",0,0,false,X,19,13,true,true,false,false,false,true,true,true,true,false,false,true,true,true,true,false,false,true,true,false,,X,,,1180355,411648,0,true,true,false,0,,0,0,"{""TotalRevenueColumn"": ""266420"", ""RelatedOrExemptFunctionIncome"": ""266420""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""7500"", ""ManagementAndGeneral"": ""7500""}","{""Total"": ""14222"", ""ProgramServices"": ""14222""}","{""Total"": ""0""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","[{""Description"": ""OPER. & MAINT."", ""Total"": ""46164"", ""ProgramServices"": ""46164""}, {""Description"": ""MISC TAXES"", ""Total"": ""298"", ""ProgramServices"": ""298""}, {""Description"": ""ADMINISTRATIVE"", ""Total"": ""12176"", ""ProgramServices"": ""12176""}]","{""Total"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""231"", ""EOY"": ""474""}",2187206,904332,"{""BOY"": ""1306860"", ""EOY"": ""1282874""}","{""BOY"": ""125980"", ""EOY"": ""102794""}","{""BOY"": ""1455332"", ""EOY"": ""1433342""}","{""BOY"": ""2040"", ""EOY"": ""16145""}",,"{""BOY"": ""9203"", ""EOY"": ""11349""}",X,"{""BOY"": ""1437850"", ""EOY"": ""1398765""}",,-39085,,X,false,true,true,true,"""false""",1736,266420,false,false,265592,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""0""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""21600"", ""ManagementAndGeneral"": ""21600""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""8433"", ""ProgramServices"": ""8433""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""44077"", ""ProgramServices"": ""44077""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""806"", ""ProgramServices"": ""806""}","{""Total"": ""0""}","{""Total"": ""1108"", ""ProgramServices"": ""1108""}","{""BOY"": ""250"", ""EOY"": ""22261""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""7628"", ""EOY"": ""7554""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""14383"", ""EOY"": ""17385""}","{""BOY"": ""20"", ""EOY"": ""48""}","{""BOY"": ""6219"", ""EOY"": ""7035""}",true,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T07:32:06-08:00,2011-06-30,2010-07-01,"{'Name': 'THOMAS D TURNBULL', 'Title': 'ASST. SEC/TREAS', 'DateSigned': '2011-11-09'}",2010,2016-02-24 21:20:13Z,,,,,,,581805618,"{'BusinessNameLine1': 'TORRINGTON VOA ELDERLY HOUSING INC', 'BusinessNameLine2': 'BELL PARK TOWER'}",TORR,7033415000,"{'AddressLine1': '1660 DUKE STREET', 'City': 'ALEXANDRIA', 'State': 'VA', 'ZIPCode': '22314'}",,,,,,,
2,5d019e6878ffca27b42818d9,HOUSTON VOA INDEPENDENT HOUSING INC HEIGHTS MANOR,https://s3.amazonaws.com/irs-form-990/201113139349301316_public.xml,93493313013161,201106,,,217525,false,X,,X,1990,TX,PROVIDES HOUSING TO DISABLED PERSONS UNDER SECTION 811 OF THE NATIONAL HOUSING ACT UNDER AGREEMENT WITH THE DEPARTMENT OF HUD.,19,13,0,,0,,,0,191659,217457,75,68,,0,191734,217525,,0,,0,68020,82340,,0,0,152057,167046,220077,249386,-28343,-31861,978564,950110,21245,24652,957319,925458,,PROVIDES HOUSING TO DISABLED PERSONS UNDER SECTION 811 OF THE NATIONAL HOUSING ACT UNDER AGREEMENT WITH THE DEPARTMENT OF HUD.,false,false,231299,,PROVIDES HOUSING TO DISABLED PERSONS UNDER SECTION 811 OF THE NATIONAL HOUSING ACT UNDER AGREEMENT WITH THE DEPARTMENT OF HUD.,231299,"""false""","""false""","""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}",0,0,false,X,19,13,true,true,false,false,false,true,true,true,true,false,false,true,true,true,true,false,false,true,true,false,,X,,,1180355,411648,0,true,true,false,1,,0,0,"{""TotalRevenueColumn"": ""217525"", ""RelatedOrExemptFunctionIncome"": ""217525""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""5880"", ""ManagementAndGeneral"": ""5880""}","{""Total"": ""16411"", ""ProgramServices"": ""16411""}","{""Total"": ""0""}","{""Total"": ""43417"", ""ProgramServices"": ""43417""}","[{""Description"": ""SERV COORD EXP"", ""Total"": ""8602"", ""ProgramServices"": ""8602""}, {""Description"": ""OPER & MAINT"", ""Total"": ""36436"", ""ProgramServices"": ""36436""}, {""Description"": ""MISC. TAXES"", ""Total"": ""438"", ""ProgramServices"": ""438""}, {""Description...","{""Total"": ""0""}","{""Total"": ""249386"", ""ProgramServices"": ""231299"", ""ManagementAndGeneral"": ""18087"", ""Fundraising"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""297"", ""EOY"": ""1542""}",1390535,537845,"{""BOY"": ""885000"", ""EOY"": ""852690""}","{""EOY"": ""0""}","{""BOY"": ""978564"", ""EOY"": ""950110""}","{""BOY"": ""14812"", ""EOY"": ""17896""}",,"{""BOY"": ""6233"", ""EOY"": ""6434""}",X,"{""BOY"": ""957319"", ""EOY"": ""925458""}",,-31861,,X,false,true,true,true,"""false""",1736,217525,false,false,217457,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""76219"", ""ProgramServices"": ""76219""}","{""Total"": ""0""}","{""Total"": ""283"", ""ProgramServices"": ""283""}","{""Total"": ""5838"", ""ProgramServices"": ""5838""}","{""Total"": ""12207"", ""ManagementAndGeneral"": ""12207""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""2271"", ""ProgramServices"": ""2271""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""33025"", ""ProgramServices"": ""33025""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""3299"", ""ProgramServices"": ""3299""}","{""Total"": ""0""}","{""Total"": ""3254"", ""ProgramServices"": ""3254""}","{""BOY"": ""12969"", ""EOY"": ""7210""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""80298"", ""EOY"": ""88668""}","{""BOY"": ""200"", ""EOY"": ""322""}",,true,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T07:33:03-08:00,2011-06-30,2010-07-01,"{'Name': 'THOMAS D TURNBULL', 'Title': 'ASST. SEC/TREAS', 'DateSigned': '2011-11-09'}",2010,2016-02-24 21:20:13Z,,,,,,,581876019,"{'BusinessNameLine1': 'HOUSTON VOA INDEPENDENT HOUSING INC', 'BusinessNameLine2': 'HEIGHTS MANOR'}",HOUS,7033415000,"{'AddressLine1': '1660 DUKE STREET', 'City': 'ALEXANDRIA', 'State': 'VA', 'ZIPCode': '22314'}",,,,,,,
3,5d019e6878ffca27b42818da,FAITH-HAVEN CORPORATION,https://s3.amazonaws.com/irs-form-990/201113139349301321_public.xml,93493313013211,201106,,John P Spiller,45615,false,X,,X,1965,WI,Spiritual retreat,4,4,1,,-27746,,1397666,6582,,0,11306,22783,-10099,-27746,1398873,1619,,0,,0,12918,12918,,0,0,53071,42902,65989,55820,1332884,-54201,1845077,1791712,20898,21734,1824179,1769978,,Spiritual retreat,false,false,53484,,"Chapels provided to the public for use as a place for worship, weddings, meditation or ecumenical retreats.",53484,"""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": "" IRS990ScheduleL"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleL"", ""#text"": ""false""}","""false""","""false""","""false""","""false""","""false""","""false""",1,1,false,X,4,4,false,false,false,false,false,false,false,true,true,false,false,true,true,true,true,true,true,false,false,false,,X,,12000,,,0,false,false,false,0,6582,6582,0,"{""TotalRevenueColumn"": ""1619"", ""RelatedOrExemptFunctionIncome"": ""11759"", ""UnrelatedBusinessRevenue"": ""-27746"", ""ExclusionAmount"": ""11024""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""1480"", ""ManagementAndGeneral"": ""1480""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""15503"", ""ProgramServices"": ""15503""}","[{""Description"": ""Waste removal"", ""Total"": ""585"", ""ProgramServices"": ""585""}, {""Description"": ""Utilities"", ""Total"": ""9320"", ""ProgramServices"": ""9320""}, {""Description"": ""Telephone"", ""Total"": ""506"", ""ManagementAndGeneral"": ""506""}, {""Description"": ""I...","{""Total"": ""161"", ""ProgramServices"": ""161""}","{""Total"": ""55820"", ""ProgramServices"": ""53484"", ""ManagementAndGeneral"": ""2336"", ""Fundraising"": ""0""}","{""BOY"": ""212122"", ""EOY"": ""155369""}","{""EOY"": ""0""}",1768482,280088,"{""BOY"": ""1493086"", ""EOY"": ""1488394""}","{""EOY"": ""0""}","{""BOY"": ""1845077"", ""EOY"": ""1791712""}",,,"{""BOY"": ""830"", ""EOY"": ""1666""}",X,"{""BOY"": ""1824179"", ""EOY"": ""1769978""}",,-54201,,,true,false,true,false,"""false""",,,false,false,0,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""12000"", ""ProgramServices"": ""12000""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""918"", ""ProgramServices"": ""918""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""350"", ""ManagementAndGeneral"": ""350""}","{""Total"": ""0""}","{""BOY"": ""6654"", ""EOY"": ""22708""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""133215"", ""EOY"": ""125241""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}",,,false,"{""BOY"": ""20068"", ""EOY"": ""20068""}",X,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T07:54:44-08:00,2011-06-30,2010-07-01,"{'Name': 'John P Spiller', 'Title': 'Pres./Caretaker', 'DateSigned': '2011-11-09'}",2010,2016-02-24 21:20:13Z,,,,,,,391083432,{'BusinessNameLine1': 'Faith-Haven Corporation'},FAIT,2626299700,"{'AddressLine1': '5611 Hwy D', 'City': 'West Bend', 'State': 'WI', 'ZIPCode': '53090'}",,,,,,,
4,5d019e6878ffca27b42818db,OCHSNER COMMUNITY HOSPITALS,https://s3.amazonaws.com/irs-form-990/201113139349301326_public.xml,93493313013261,201012,,Patrick J Quinlan,666103016,0,X,www.ochsner.org,X,2006,LA,"Provides Patient Care via ownership and operation of 2 hospitals in the New Orleans, LA area.",15,8,1716,148,0,0,0,0,629297438,662987788,58188,4682,1638243,1974404,630993869,664966874,0,0,0,0,62401266,68935384,0,0,178129,577137415,604774393,639538681,673709777,-8544812,-8742903,139385579,147143219,229622579,245608219,-90237000,-98465000,X,"We Serve, Heal, Lead, Educate and Innovate. Ochsner will be a global medical and academic leader who will save and change lives. We will shape the future of healthcare through our integrated health system, fueled by the passion and strength of ou...",0,0,646184916,,"Patient Care: Ochsner Community Hospitals operates two hospitals that served 17,002 inpatients for a total of 70,743 patient days. Outpatient visits totaled 128,154. There were a total of 1,238 births and 50,710 Emergency Room visits to Ochsner C...",646613548,"{""@referenceDocumentId"": ""RetDoc1039600001"", ""#text"": ""0""}","{""@referenceDocumentId"": ""RetDoc1039600001"", ""#text"": ""1""}","""0""","""0""","""0""","{""@referenceDocumentId"": ""RetDoc1042800001"", ""#text"": ""0""}","{""@referenceDocumentId"": ""RetDoc1042800001"", ""#text"": ""0""}","{""@referenceDocumentId"": ""RetDoc1043400001"", ""#text"": ""1""}","{""@referenceDocumentId"": ""RetDoc1043400001"", ""#text"": ""1""}","""0""","{""@referenceDocumentId"": ""RetDoc1043400001"", ""#text"": ""0""}","{""@referenceDocumentId"": ""RetDoc1043400001"", ""#text"": ""0""}","{""@referenceDocumentId"": ""RetDoc1043400001"", ""#text"": ""0""}",115,1716,0,X,15,8,0,0,0,0,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,,X,,1622325,7784049,684924,27,1,1,0,43,,,,"{""TotalRevenueColumn"": ""664966874"", ""RelatedOrExemptFunctionIncome"": ""658216410"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""6750464""}",,,"{""Total"": ""20"", ""ManagementAndGeneral"": ""20""}",,"{""Total"": ""41059955"", ""ProgramServices"": ""40203875"", ""ManagementAndGeneral"": ""852101"", ""Fundraising"": ""3979""}",,"{""Total"": ""8008389"", ""ProgramServices"": ""6321551"", ""ManagementAndGeneral"": ""1682985"", ""Fundraising"": ""3853""}","[{""Description"": ""Discounts & Allowances"", ""Total"": ""490392718"", ""ProgramServices"": ""490392718""}, {""Description"": ""Bad Debt Expense"", ""Total"": ""17101376"", ""ProgramServices"": ""17101376""}, {""Description"": ""Buldg & Equip Rep & Mnt"", ""Total"": ""523282...","{""Total"": ""316659"", ""ProgramServices"": ""240784"", ""ManagementAndGeneral"": ""75875""}","{""Total"": ""673709777"", ""ProgramServices"": ""646613548"", ""ManagementAndGeneral"": ""26918100"", ""Fundraising"": ""178129""}","{""BOY"": ""2888412"", ""EOY"": ""7405344""}","{""BOY"": ""27969595"", ""EOY"": ""28748143""}",118560986,25636630,"{""BOY"": ""93480309"", ""EOY"": ""92924356""}",,"{""BOY"": ""139385579"", ""EOY"": ""147143219""}","{""BOY"": ""66282764"", ""EOY"": ""99095676""}",,"{""BOY"": ""174754"", ""EOY"": ""259997""}",X,"{""BOY"": ""-91325000"", ""EOY"": ""-99046000""}",X,-8742903,X,X,0,1,1,0,,,658047066,1,0,662987788,,,"{""Total"": ""1082701"", ""ManagementAndGeneral"": ""1082701""}",,"{""Total"": ""58817770"", ""ProgramServices"": ""48262842"", ""ManagementAndGeneral"": ""10464751"", ""Fundraising"": ""90177""}","{""Total"": ""1262538"", ""ProgramServices"": ""1232595"", ""ManagementAndGeneral"": ""29611"", ""Fundraising"": ""332""}","{""Total"": ""4000787"", ""ProgramServices"": ""2493598"", ""ManagementAndGeneral"": ""1488373"", ""Fundraising"": ""18816""}","{""Total"": ""3771588"", ""ProgramServices"": ""3398220"", ""ManagementAndGeneral"": ""373368""}","{""Total"": ""5372039"", ""ProgramServices"": ""580877"", ""ManagementAndGeneral"": ""4748282"", ""Fundraising"": ""42880""}","{""Total"": ""47401"", ""ManagementAndGeneral"": ""47401""}",,,"{""Total"": ""18813451"", ""ProgramServices"": ""15913032"", ""ManagementAndGeneral"": ""2900419""}","{""Total"": ""23143"", ""ProgramServices"": ""17449"", ""ManagementAndGeneral"": ""5694""}","{""Total"": ""328430"", ""ProgramServices"": ""238633"", ""ManagementAndGeneral"": ""89797""}",,"{""Total"": ""7696987"", ""ProgramServices"": ""5439609"", ""ManagementAndGeneral"": ""2239286"", ""Fundraising"": ""18092""}","{""Total"": ""14860"", ""ProgramServices"": ""2329"", ""ManagementAndGeneral"": ""12531""}",,"{""Total"": ""87446"", ""ProgramServices"": ""34203"", ""ManagementAndGeneral"": ""53243""}","{""Total"": ""5185861"", ""ProgramServices"": ""5185861""}","{""Total"": ""1255353"", ""ProgramServices"": ""1253346"", ""ManagementAndGeneral"": ""2007""}","{""BOY"": ""99158"", ""EOY"": ""162635""}","{""BOY"": ""58974"", ""EOY"": ""122703""}",,,"{""BOY"": ""4163667"", ""EOY"": ""3603780""}","{""BOY"": ""2196459"", ""EOY"": ""2971754""}","{""BOY"": ""6269858"", ""EOY"": ""7728979""}",,,"{""BOY"": ""2259147"", ""EOY"": ""3475525""}","{""BOY"": ""204160"", ""EOY"": ""188605""}","{""BOY"": ""84192179"", ""EOY"": ""70777113""}",,,,"{""Expense"": ""428632"", ""Revenue"": ""4468276"", ""Description"": ""Rental from Physical Plant: Ochsner Community Hospitals rents its physical plant to Ochsner Clinic Foundation and Ochsner Health System, related 501(c)(3) organizations.""}","{""Revenue"": ""472446"", ""Description"": ""Equity income from Joint Ventures: Ochsner Community Hospitals owns a 25% share of Louisiana Extended Care Hospital of Kenner, a company that provides long term acute care services and records its share of th...",X,"{""BOY"": ""78768722"", ""EOY"": ""75286828""}","{""BOY"": ""1088000"", ""EOY"": ""581000""}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T10:05:52-06:00,2010-12-31,2010-01-01,"{'Name': 'Bobby C Brannon', 'Title': 'VP, Secretary & Treas', 'Phone': '5048423400', 'DateSigned': '2011-11-09', 'AuthorizeThirdParty': '1'}",2010,2016-02-24 21:20:13Z,,,,,,,205297040,{'BusinessNameLine1': 'Ochsner Community Hospitals'},OCHS,5048423400,"{'AddressLine1': '1514 Jefferson Highway', 'City': 'New Orleans', 'State': 'LA', 'ZIPCode': '70121'}",,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3469003,67f0a6febea7582f201aed1e,,https://s3.amazonaws.com/irs-form-990/202441449349301604_public.xml,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"{""TotalAmt"": ""0""}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,433519,false,X,X,1981,CA,Religious workshops and retreats.,13,13,5,0,155093,267498,7232,3696,433519,0,0,116636,0,0,254672,371308,62211,1184606,1259302,39604,1138140,1219698,,Religious workshops and retreats.,false,false,"Quaker Center is a retreat and conference center under the care of the ReligiousSociety of Friends (Quakers). In 2023, the Center was again able to hold a full calendar of in-person and online programs as well as camps for children, young people ...","""false""","""false""","""false""","""false""","""false""","""false""","""false""","""false""","""false""",false,,"""false""","""false""",5,5,false,13,13,false,false,false,false,false,false,false,true,true,false,false,true,false,true,false,false,false,false,"""CA""",X,false,false,false,,,155093,155093,3696,"{""TotalRevenueColumnAmt"": ""433519"", ""RelatedOrExemptFuncIncomeAmt"": ""271194"", ""ExclusionAmt"": ""7232""}","{""TotalAmt"": ""10224"", ""ManagementAndGeneralAmt"": ""10224""}","{""TotalAmt"": ""16826"", ""ProgramServicesAmt"": ""12315"", ""ManagementAndGeneralAmt"": ""4511""}","{""TotalAmt"": ""4562"", ""ProgramServicesAmt"": ""4562""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""46577"", ""ProgramServicesAmt"": ""18467"", ""ManagementAndGeneralAmt"": ""28110""}","[{""Desc"": ""MAINTENANCE"", ""TotalAmt"": ""25579"", ""ProgramServicesAmt"": ""25579""}, {""Desc"": ""COOKS"", ""TotalAmt"": ""23325"", ""ProgramServicesAmt"": ""23325""}, {""Desc"": ""HONORARIA"", ""TotalAmt"": ""11612"", ""ProgramServicesAmt"": ""11612""}, {""Desc"": ""PROGRAM EXPE...","{""TotalAmt"": ""16620"", ""ProgramServicesAmt"": ""16475"", ""ManagementAndGeneralAmt"": ""145""}","{""TotalAmt"": ""371308"", ""ProgramServicesAmt"": ""278521"", ""ManagementAndGeneralAmt"": ""92787"", ""FundraisingAmt"": ""0""}","{""BOYAmt"": ""78916"", ""EOYAmt"": ""159550""}","{""BOYAmt"": ""1184606"", ""EOYAmt"": ""1259302""}",,,62211,,false,false,false,WWW.QUAKERCENTER.ORG,2,,157414,220559,7643,-41526,344090,,,145007,,259047,404054,-59964,46466,278521,,,,,,,278521,X,,,X,,,,0,0,,267498,,,,,,"{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""105671"", ""ProgramServicesAmt"": ""69050"", ""ManagementAndGeneralAmt"": ""36621""}","{""TotalAmt"": ""1205"", ""ManagementAndGeneralAmt"": ""1205""}","{""TotalAmt"": ""1246"", ""ManagementAndGeneralAmt"": ""1246""}","{""TotalAmt"": ""8514"", ""ProgramServicesAmt"": ""5563"", ""ManagementAndGeneralAmt"": ""2951""}","{""TotalAmt"": ""12570"", ""ProgramServicesAmt"": ""4796"", ""ManagementAndGeneralAmt"": ""7774""}","{""TotalAmt"": ""403"", ""ProgramServicesAmt"": ""403""}","{""TotalAmt"": ""7491"", ""ProgramServicesAmt"": ""7491""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""46900"", ""ProgramServicesAmt"": ""46900""}","{""BOYAmt"": ""161335"", ""EOYAmt"": ""177591""}","{""EOYAmt"": ""0""}","{""EOYAmt"": ""0""}","{""BOYAmt"": ""14157"", ""EOYAmt"": ""14361""}",1994217,1131760,"{""BOYAmt"": ""890133"", ""EOYAmt"": ""862457""}","{""EOYAmt"": ""0""}","{""EOYAmt"": ""0""}","{""BOYAmt"": ""5023"", ""EOYAmt"": ""4867""}","{""BOYAmt"": ""30533"", ""EOYAmt"": ""28449""}",,"{""BOYAmt"": ""10910"", ""EOYAmt"": ""6288""}",,,,X,19178,,,,"{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""24769"", ""ProgramServicesAmt"": ""24769""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""EOYAmt"": ""0""}","{""EOYAmt"": ""0""}","{""EOYAmt"": ""0""}","{""EOYAmt"": ""0""}","{""EOYAmt"": ""0""}","{""BOYAmt"": ""40065"", ""EOYAmt"": ""45343""}",,,X,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2023-04-26 12:10:37Z,2024-05-23T14:48:24-07:00,2023-12-31,2023-01-01,"{'PersonNm': 'NICO WRIGHT', 'PersonTitleTxt': 'Director', 'SignatureDt': '2024-05-23', 'DiscussWithPaidPreparerInd': 'true'}",2023,2023,942831137,,,,"{'AddressLine1Txt': 'PO BOX 686', 'CityNm': 'BEN LOMOND', 'StateAbbreviationCd': 'CA', 'ZIPCd': '95005'}",,,{'BusinessNameLine1Txt': 'BEN LOMOND QUAKER CENTER ASSOCIATION'},BENL,8313368333,,
3469004,67f0a6febea7582f201aed1f,,https://s3.amazonaws.com/irs-form-990/202441449349301609_public.xml,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,MARK HANNAN,1705903,false,,X,1967,TX,RURAL WATER UTILITY SALES,7,7,12,0,0,1595931,21172,88800,1705903,0,0,474990,0,0,753989,1228979,476924,6765044,7147292,8509,6657209,7138783,,RURAL WATER UTILITY SALES,false,false,RURAL WATER UTILITY SALES,"""false""",,"""false""","""false""","""false""",,,"""false""","""false""",false,,,"""false""",13,12,false,7,7,false,false,false,false,true,true,true,true,true,false,false,true,true,true,true,false,false,false,,X,false,false,false,,,,,88800,"{""TotalRevenueColumnAmt"": ""1705903"", ""RelatedOrExemptFuncIncomeAmt"": ""1705903""}","{""TotalAmt"": ""2700"", ""ProgramServicesAmt"": ""2700""}","{""TotalAmt"": ""36779"", ""ProgramServicesAmt"": ""36779""}",,"{""TotalAmt"": ""1082"", ""ProgramServicesAmt"": ""1082""}","{""TotalAmt"": ""16711"", ""ProgramServicesAmt"": ""16711""}","[{""Desc"": ""REPAIRS"", ""TotalAmt"": ""124501"", ""ProgramServicesAmt"": ""124501""}, {""Desc"": ""UTILITIES"", ""TotalAmt"": ""100995"", ""ProgramServicesAmt"": ""100995""}, {""Desc"": ""CHEMICALS"", ""TotalAmt"": ""44522"", ""ProgramServicesAmt"": ""44522""}, {""Desc"": ""TRUCK EX...","{""TotalAmt"": ""82276"", ""ProgramServicesAmt"": ""82276""}","{""TotalAmt"": ""1228979"", ""ProgramServicesAmt"": ""1228979"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}","{""BOYAmt"": ""250"", ""EOYAmt"": ""250""}","{""BOYAmt"": ""6765044"", ""EOYAmt"": ""7147292""}",,"{""BOYAmt"": ""6448309"", ""EOYAmt"": ""6925233""}",476924,,true,false,false,WWW.BETHELASHWATER.COM,,,,1390094,13087,101660,1504841,,,457172,,726954,1184126,320715,107835,1228979,,1595931,,,,,1228979,X,true,true,X,,,,,,,1595931,,,,,,,,"{""TotalAmt"": ""391042"", ""ProgramServicesAmt"": ""391042""}","{""TotalAmt"": ""5160"", ""ProgramServicesAmt"": ""5160""}","{""TotalAmt"": ""46199"", ""ProgramServicesAmt"": ""46199""}","{""TotalAmt"": ""32589"", ""ProgramServicesAmt"": ""32589""}","{""TotalAmt"": ""2486"", ""ProgramServicesAmt"": ""2486""}","{""TotalAmt"": ""216"", ""ProgramServicesAmt"": ""216""}","{""TotalAmt"": ""4646"", ""ProgramServicesAmt"": ""4646""}",,"{""TotalAmt"": ""294773"", ""ProgramServicesAmt"": ""294773""}","{""BOYAmt"": ""1518007"", ""EOYAmt"": ""2040834""}","{""BOYAmt"": ""114586"", ""EOYAmt"": ""150098""}","{""BOYAmt"": ""22747"", ""EOYAmt"": ""20764""}",,10148216,5234370,"{""BOYAmt"": ""5109454"", ""EOYAmt"": ""4913846""}",,,"{""BOYAmt"": ""7835"", ""EOYAmt"": ""8509""}",,,"{""BOYAmt"": ""100000""}",,,,X,,,true,,,,,,,,,,,,,,,,,,,"{""EOYAmt"": ""21500""}",,,X,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"{""@organization501cTypeTxt"": ""12"", ""#text"": ""X""}",,,,,,,,,,2023-04-26 12:10:37Z,2024-05-23T17:57:52-04:00,2023-12-31,2023-01-01,"{'PersonNm': 'MARK HANNAN', 'PersonTitleTxt': 'PRESIDENT', 'PhoneNum': '9036758466', 'SignatureDt': '2024-05-10', 'DiscussWithPaidPreparerInd': 'true'}",2023,2023,237162843,,,,"{'AddressLine1Txt': '6435 STATE HWY 19 N', 'CityNm': 'ATHENS', 'StateAbbreviationCd': 'TX', 'ZIPCd': '75752'}",,,{'BusinessNameLine1Txt': 'BETHEL ASH WATER SUPPLY CORPORATION'},BETH,9036758466,,
3469005,67f0a6febea7582f201aed20,,https://s3.amazonaws.com/irs-form-990/202441449349301614_public.xml,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,219723,false,X,X,2005,MD,"To operate exclusively for charitable, religious, scientific, literary or educational purposes within the meaning of section 501(c)(3) of the Internal Revenue code of 1986 (the ""Code""), including but not limited to providing assistance to partici...",6,6,0,0,168515,0,11091,128,179734,1618,0,0,0,0,183171,184789,-5055,319270,342112,0,319270,342112,,"To operate exclusively for charitable, religious, scientific, literary or educational purposes within the meaning of section 501(c)(3) of the Internal Revenue code of 1986 (the ""Code""), including but not limited to providing assistance to partici...",false,false,"Provided financial assistance to drug addicted offenders enrolled in the Adult & Mental Drug Courts, including rent (sober housing), medical and dental assistance, transportation, clothing, and other daily living expenses.","""false""","""false""","""false""","""false""","""false""","""false""","""false""","""false""","""false""",false,,"""false""","""false""",0,0,false,6,6,false,false,false,false,false,false,false,true,true,false,false,true,false,false,false,false,false,false,"""MD""",X,false,false,false,,,168515,168515,128,"{""TotalRevenueColumnAmt"": ""179734"", ""RelatedOrExemptFuncIncomeAmt"": ""11219"", ""UnrelatedBusinessRevenueAmt"": ""0"", ""ExclusionAmt"": ""0""}","{""TotalAmt"": ""717"", ""ProgramServicesAmt"": ""717"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}","{""TotalAmt"": ""323"", ""ProgramServicesAmt"": ""323"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}",,,"{""TotalAmt"": ""2006"", ""ProgramServicesAmt"": ""2006"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}","[{""Desc"": ""Drug Offender Program - Financial Assistance"", ""TotalAmt"": ""165162"", ""ProgramServicesAmt"": ""165162"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}, {""Desc"": ""Gifts & Donations & Awards"", ""TotalAmt"": ""2134"", ""ProgramServicesAmt...","{""TotalAmt"": ""221"", ""ProgramServicesAmt"": ""221"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}","{""TotalAmt"": ""184789"", ""ProgramServicesAmt"": ""184789"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}",,"{""BOYAmt"": ""319270"", ""EOYAmt"": ""342112""}",,"{""BOYAmt"": ""319270"", ""EOYAmt"": ""342112""}",-5055,X,false,false,false,,0,0,114055,,12691,1951,128697,,,,,145196,145196,-16499,,165162,0,168515,"{""ExpenseAmt"": ""11913"", ""GrantAmt"": ""0"", ""RevenueAmt"": ""0"", ""Desc"": ""Provided the court with funds for trophies/awards and other support as incentives for individuals enrolled in Drug court as well as other miscellaneous financial support.""}","{""ExpenseAmt"": ""7714"", ""GrantAmt"": ""0"", ""RevenueAmt"": ""11219"", ""Desc"": ""Administrative expenses.""}",,,184789,X,,,X,0,0,0,0,0,,,,,,,,"{""TotalAmt"": ""1618"", ""ProgramServicesAmt"": ""1618""}",,,,,,,,"{""TotalAmt"": ""279"", ""ProgramServicesAmt"": ""279"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}",,"{""TotalAmt"": ""0"", ""ProgramServicesAmt"": ""0"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}","{""BOYAmt"": ""17374"", ""EOYAmt"": ""15131""}",,,,1239,1239,"{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}",,,,,,,,,,X,27897,,,,,,,,,,,"{""TotalAmt"": ""2350"", ""ProgramServicesAmt"": ""2350"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}",,,,,,,,"{""BOYAmt"": ""301896"", ""EOYAmt"": ""326981""}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2023-04-26 12:10:37Z,2024-05-23T17:20:23-07:00,2023-12-31,2023-01-01,"{'PersonNm': 'Pamela Q Harris', 'PersonTitleTxt': 'President', 'PhoneNum': '2407779110', 'SignatureDt': '2024-02-17', 'DiscussWithPaidPreparerInd': 'true'}",2023,2023,651261203,,,,"{'AddressLine1Txt': '50 Maryland Avenue', 'CityNm': 'Rockville', 'StateAbbreviationCd': 'MD', 'ZIPCd': '208502303'}",,,"{'BusinessNameLine1Txt': ""Montgomery's Miracles Inc""}",MONT,2407779110,,
3469006,67f0a6febea7582f201aed21,,https://s3.amazonaws.com/irs-form-990/202441449349301654_public.xml,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,JANETH TRUJILLO,112938,false,X,X,,,TO EDUCATE CHILDREN AT RISK AND TO EMPOWER THE COMMUNITY AND FAMILY RELATIONS.,3,3,0,0,0,112938,0,0,112938,0,0,0,0,0,103200,103200,9738,,9738,0,,9738,X,TO EDUCATE CHILDREN AT RISK AND TO EMPOWER THE COMMUNITY AND FAMILY RELATIONS.,false,false,WE WERE ABLE TO OPEN THE SCHOOL AND HELP THE STUDENTS IN THE AREA.,"""false""","""false""","""false""","""false""","""false""","""false""","""false""","""false""","""false""",false,,"""false""","""false""",0,0,false,3,3,false,false,false,false,false,false,false,true,true,false,false,false,false,false,false,false,false,false,,X,false,false,false,,,,,,"{""TotalRevenueColumnAmt"": ""112938"", ""RelatedOrExemptFuncIncomeAmt"": ""112938""}",,,,,,,,"{""TotalAmt"": ""103200"", ""ProgramServicesAmt"": ""103200"", ""ManagementAndGeneralAmt"": ""0"", ""FundraisingAmt"": ""0""}","{""EOYAmt"": ""9738""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""9738""}",,,9738,,false,false,,,,,,,,,,,,,,,,,,,,,,,"{""ExpenseAmt"": ""103200"", ""Desc"": ""TO EDUCATE CHILDREN AT RISK AND TO EMPOWER THE COMMUNITY AND FAMILY RELATIONS.""}",,103200,,,,,,,,,,,112938,,,,,,,,,,,,"{""TotalAmt"": ""71200"", ""ProgramServicesAmt"": ""71200""}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"{""TotalAmt"": ""32000"", ""ProgramServicesAmt"": ""32000""}",,,,,,,,,103200,,X,,,,,,,,,,,,,,,,,,,,X,,,,,,,,,,,,,,,,,,,,,,,,,,,2023-04-26 12:10:37Z,2024-05-23T21:26:20-04:00,2023-12-31,2023-01-01,"{'PersonNm': 'JANETH TRUJILLO', 'PersonTitleTxt': 'PRESIDENT', 'PhoneNum': '9542401249', 'SignatureDt': '2024-05-09'}",2023,2023,922617060,,,,"{'AddressLine1Txt': '14900 SW 51ST ST', 'CityNm': 'DAVIE', 'StateAbbreviationCd': 'FL', 'ZIPCd': '33331'}",,,"{'BusinessNameLine1Txt': 'FUTURE LEARNING CHRISTIAN SCHOOL', 'BusinessNameLine2Txt': 'INC'}",FUTU,9542401249,,


In [75]:
df[x_columns].dtypes

OrganizationName                string[python]
AddressChange                   string[python]
NameOfPrincipalOfficerPerson    string[python]
Organization501c3               string[python]
WebSite                         string[python]
                                     ...      
ForeignAddress                  string[python]
InCareOfName                    string[python]
BusinessName                    string[python]
BusinessNameControlTxt          string[python]
InCareOfNm                      string[python]
Length: 99, dtype: object

#### Now re-save file

In [73]:
%%time
import datetime
print("🕓 Save started:", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
# Step 3: Save
#df_clean.to_parquet("D:/filings_full.parquet", engine="pyarrow", compression="snappy", index=False)
df.to_parquet("D:/all_filings_april_2025_all_controls_v2.parquet", engine="pyarrow", compression="snappy", index=False)

print("✅ Save completed:", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

🕓 Save started: 2025-04-16 16:27:10
✅ Save completed: 2025-04-16 16:34:21
CPU times: total: 4min 32s
Wall time: 7min 11s


# Traditional Import

In [81]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df = pd.read_pickle('all NEW filings February 2024 - all control variables.pkl.gz', compression='gzip')
print('# of columns:', len(df.columns))
print('# of observations:', len(df))
df[:2]

Current date and time :  2025-04-16 21:16:01 

# of columns: 496
# of observations: 3469008
CPU times: total: 0 ns
Wall time: 1 ms


Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,AddressChange,NameOfPrincipalOfficerPerson,GrossReceipts,GroupReturnForAffiliates,Organization501c3,WebSite,TypeOfOrganizationCorporation,YearFormation,StateLegalDomicile,ActivityOrMissionDescription,NbrVotingMembersGoverningBody,NbrIndependentVotingMembers,TotalNbrEmployees,TotalNbrVolunteers,TotalGrossUBI,NetUnrelatedBusinessTxblIncome,ContributionsGrantsPriorYear,ContributionsGrantsCurrentYear,ProgramServiceRevenuePriorYear,ProgramServiceRevenueCY,InvestmentIncomePriorYear,InvestmentIncomeCurrentYear,OtherRevenuePriorYear,OtherRevenueCurrentYear,TotalRevenuePriorYear,TotalRevenueCurrentYear,GrantsAndSimilarAmntsPriorYear,GrantsAndSimilarAmntsCY,BenefitsPaidToMembersPriorYear,BenefitsPaidToMembersCY,SalariesEtcPriorYear,SalariesEtcCurrentYear,TotalProfFundrsngExpPriorYear,TotalProfFundrsngExpCY,TotalFundrsngExpCurrentYear,OtherExpensePriorYear,OtherExpensesCurrentYear,TotalExpensesPriorYear,TotalExpensesCurrentYear,RevenuesLessExpensesPriorYear,RevenuesLessExpensesCY,TotalAssetsBOY,TotalAssetsEOY,TotalLiabilitiesBOY,TotalLiabilitiesEOY,NetAssetsOrFundBalancesBOY,NetAssetsOrFundBalancesEOY,InfoInScheduleOPartIII,MissionDescription,SignificantNewProgramServices,SignificantChange,Expense,Grants,Description,TotalProgramServiceExpense,PoliticalActivities,LobbyingActivities,ProfessionalFundraising,FundraisingActivities,Gaming,ExcessBenefitTransaction,PriorExcessBenefitTransaction,DisregardedEntity,RelatedEntity,RelatedOrgControlledEntity,TransactionRelatedEntity,TransfersToExemptNonChrtblOrg,ActivitiesConductedPartnership,NumberFormsTransmittedWith1096,NumberOfEmployees,UnrelatedBusinessIncome,InfoInScheduleOPartVI,NbrVotingGoverningBodyMembers,NumberIndependentVotingMembers,FamilyOrBusinessRelationship,DelegationOfManagementDuties,ChangesToOrganizingDocs,MaterialDiversionOrMisuse,MembersOrStockholders,ElectionOfBoardMembers,DecisionsSubjectToApproval,MinutesOfGoverningBody,MinutesOfCommittees,OfficerMailingAddress,LocalChapters,Form990ProvidedToGoverningBody,ConflictOfInterestPolicy,AnnualDisclosureCoveredPersons,RegularMonitoringEnforcement,WhistleblowerPolicy,DocumentRetentionPolicy,CompensationProcessCEO,CompensationProcessOther,InvestmentInJointVenture,StatesWhereCopyOfReturnIsFiled,UponRequest,NoListedPersonsCompensated,TotalReportableCompFromOrg,TotalReportableCompFrmRltdOrgs,TotalOtherCompensation,NumberIndividualsGT100K,FormersListed,TotalCompGT150K,CompensationFromOtherSources,NumberOfContractorsGT100K,AllOtherContributions,TotalContributions,TotalOtherRevenue,TotalRevenue,GrantsToDomesticOrgs,GrantsToDomesticIndividuals,FeesForServicesLegal,FeesForServicesAccounting,OfficeExpenses,PaymentsToAffiliates,DepreciationDepletion,OtherExpenses,AllOtherExpenses,TotalFunctionalExpenses,SavingsAndTempCashInvestments,AccountsReceivable,LandBuildingsEquipmentBasis,LandBldgEquipmentAccumDeprec,LandBuildingsEquipmentBasisNet,InvestmentsOtherSecurities,TotalAssets,AccountsPayableAccruedExpenses,GrantsPayable,OtherLiabilities,FollowSFAS117,UnrestrictedNetAssets,InfoInScheduleOPartXI,ReconcilationRevenueExpenses,InfoInScheduleOPartXII,MethodOfAccountingAccrual,AccountantCompileOrReview,FSAudited,AuditCommittee,FederalGrantAuditRequired,AllAffiliatesIncluded,GroupExemptionNumber,Revenue,PoliciesReferenceChapters,WrittenPolicyOrProcedure,TotalProgramServiceRevenue,ForeignGrants,BenefitsToMembers,CompCurrentOfficersDirectors,CompDisqualPersons,OtherSalariesAndWages,PensionPlanContributions,OtherEmployeeBenefits,PayrollTaxes,FeesForServicesManagement,FeesForServicesLobbying,FeesForServicesProfFundraising,FeesForServicesInvstMgmntFees,FeesForServicesOther,Advertising,InformationTechnology,Royalties,Occupancy,Travel,TravelEntrtnmntPublicOfficials,ConferencesMeetings,Interest,Insurance,CashNonInterestBearing,PledgesAndGrantsReceivable,ReceivablesFromDisqualPersons,OtherNotesLoansReceivableNet,InventoriesForSaleOrUse,PrepaidExpensesDeferredCharges,InvestmentsPubTradedSecurities,InvestmentsProgramRelated,IntangibleAssets,OtherAssetsTotal,DeferredRevenue,MortNotesPyblSecuredInvestProp,FederalGrantAuditPerformed,LoansFromOfficersDirectors,MethodOfAccountingCash,Activity2,Activity3,InfoInScheduleOPartVII,TaxExemptBondLiabilities,TemporarilyRestrictedNetAssets,OtherWebsite,PermanentlyRestrictedNetAssets,FundraisingEvents,CntrbtnsRprtdFundraisingEvents,RelatedOrganizations,GrossIncomeFundraisingEvents,FundraisingDirectExpenses,FederatedCampaigns,GovernmentGrants,MethodOfAccountingOther,GrossSalesOfInventory,CostOfGoodsSold,DoNotFollowSFAS117,RetainedEarningsEndowmentEtc,InitialReturn,MembershipDues,GrossIncomeGaming,GamingDirectExpenses,NoncashContributions,InfoInScheduleOPartV,OwnWebsite,UnsecuredNotesLoansPayable,ActivityOther,TotalOfOtherProgramServiceExp,TotalOfOtherProgramServiceRev,EscrowAccountLiability,TotalOfOtherProgramServiceGrnt,TypeOfOrganizationOther,Organization501c,TypeOfOrganizationTrust,TypeOfOrganizationAssociation,CountryLegalDomicile,AmendedReturn,TypeOfOrgOtherDescription,TotalJointCosts,TerminatedReturn,TerminationOrContraction,ActivityCode,SpecialConditionDescription,Organization4947a1,InfoInScheduleOPartIX,ReconciliationUnrealizedInvest,ReconcilationPriorAdjustment,ReconcilationDonatedServices,ReconcilationInvestExpenses,InfoInScheduleOPartVIII,InfoInScheduleOPartX,PrincipalOfficerNm,GrossReceiptsAmt,GroupReturnForAffiliatesInd,Organization501c3Ind,TypeOfOrganizationCorpInd,FormationYr,LegalDomicileStateCd,ActivityOrMissionDesc,VotingMembersGoverningBodyCnt,VotingMembersIndependentCnt,TotalEmployeeCnt,TotalGrossUBIAmt,CYContributionsGrantsAmt,CYProgramServiceRevenueAmt,CYInvestmentIncomeAmt,CYOtherRevenueAmt,CYTotalRevenueAmt,CYGrantsAndSimilarPaidAmt,CYBenefitsPaidToMembersAmt,CYSalariesCompEmpBnftPaidAmt,CYTotalProfFndrsngExpnsAmt,CYTotalFundraisingExpenseAmt,CYOtherExpensesAmt,CYTotalExpensesAmt,CYRevenuesLessExpensesAmt,TotalAssetsBOYAmt,TotalAssetsEOYAmt,TotalLiabilitiesEOYAmt,NetAssetsOrFundBalancesBOYAmt,NetAssetsOrFundBalancesEOYAmt,InfoInScheduleOPartIIIInd,MissionDesc,SignificantNewProgramSrvcInd,SignificantChangeInd,Desc,PoliticalCampaignActyInd,LobbyingActivitiesInd,ProfessionalFundraisingInd,FundraisingActivitiesInd,GamingActivitiesInd,EngagedInExcessBenefitTransInd,PYExcessBenefitTransInd,DisregardedEntityInd,RelatedEntityInd,RelatedOrganizationCtrlEntInd,TransactionWithControlEntInd,TrnsfrExmptNonChrtblRltdOrgInd,ActivitiesConductedPrtshpInd,IRPDocumentCnt,EmployeeCnt,UnrelatedBusIncmOverLimitInd,GoverningBodyVotingMembersCnt,IndependentVotingMemberCnt,FamilyOrBusinessRlnInd,DelegationOfMgmtDutiesInd,ChangeToOrgDocumentsInd,MaterialDiversionOrMisuseInd,MembersOrStockholdersInd,ElectionOfBoardMembersInd,DecisionsSubjectToApprovaInd,MinutesOfGoverningBodyInd,MinutesOfCommitteesInd,OfficerMailingAddressInd,LocalChaptersInd,Form990ProvidedToGvrnBodyInd,ConflictOfInterestPolicyInd,WhistleblowerPolicyInd,DocumentRetentionPolicyInd,CompensationProcessCEOInd,CompensationProcessOtherInd,InvestmentInJointVentureInd,StatesWhereCopyOfReturnIsFldCd,NoListedPersonsCompensatedInd,FormerOfcrEmployeesListedInd,TotalCompGreaterThan150KInd,CompensationFromOtherSrcsInd,MembershipDuesAmt,FundraisingAmt,AllOtherContributionsAmt,TotalContributionsAmt,OtherRevenueTotalAmt,TotalRevenueGrp,FeesForServicesAccountingGrp,OfficeExpensesGrp,InformationTechnologyGrp,ConferencesMeetingsGrp,InsuranceGrp,OtherExpensesGrp,AllOtherExpensesGrp,TotalFunctionalExpensesGrp,CashNonInterestBearingGrp,TotalAssetsGrp,OrgDoesNotFollowSFAS117Ind,RtnEarnEndowmentIncmOthFndsGrp,ReconcilationRevenueExpnssAmt,MethodOfAccountingCashInd,AccountantCompileOrReviewInd,FSAuditedInd,FederalGrantAuditRequiredInd,WebsiteAddressTxt,TotalVolunteersCnt,NetUnrelatedBusTxblIncmAmt,PYContributionsGrantsAmt,PYProgramServiceRevenueAmt,PYInvestmentIncomeAmt,PYOtherRevenueAmt,PYTotalRevenueAmt,PYGrantsAndSimilarPaidAmt,PYBenefitsPaidToMembersAmt,PYSalariesCompEmpBnftPaidAmt,PYTotalProfFndrsngExpnsAmt,PYOtherExpensesAmt,PYTotalExpensesAmt,PYRevenuesLessExpensesAmt,TotalLiabilitiesBOYAmt,ExpenseAmt,GrantAmt,RevenueAmt,ProgSrvcAccomActy2Grp,ProgSrvcAccomActy3Grp,ProgSrvcAccomActyOtherGrp,TotalOtherProgSrvcGrantAmt,TotalProgramServiceExpensesAmt,InfoInScheduleOPartVIInd,AnnualDisclosureCoveredPrsnInd,RegularMonitoringEnfrcInd,UponRequestInd,TotalReportableCompFromOrgAmt,TotReportableCompRltdOrgAmt,TotalOtherCompensationAmt,IndivRcvdGreaterThan100KCnt,CntrctRcvdGreaterThan100KCnt,GovernmentGrantsAmt,TotalProgramServiceRevenueAmt,FundraisingGrossIncomeAmt,ContriRptFundraisingEventAmt,FundraisingDirectExpensesAmt,GrossSalesOfInventoryAmt,CostOfGoodsSoldAmt,GrantsToDomesticIndividualsGrp,CompCurrentOfcrDirectorsGrp,OtherSalariesAndWagesGrp,PensionPlanContributionsGrp,OtherEmployeeBenefitsGrp,PayrollTaxesGrp,FeesForServicesOtherGrp,AdvertisingGrp,TravelGrp,InterestGrp,DepreciationDepletionGrp,SavingsAndTempCashInvstGrp,AccountsReceivableGrp,InventoriesForSaleOrUseGrp,PrepaidExpensesDefrdChargesGrp,LandBldgEquipCostOrOtherBssAmt,LandBldgEquipAccumDeprecAmt,LandBldgEquipBasisNetGrp,InvestmentsOtherSecuritiesGrp,IntangibleAssetsGrp,AccountsPayableAccrExpnssGrp,DeferredRevenueGrp,MortgNotesPyblScrdInvstPropGrp,OtherLiabilitiesGrp,OrganizationFollowsSFAS117Ind,UnrestrictedNetAssetsGrp,TemporarilyRstrNetAssetsGrp,InfoInScheduleOPartXIInd,NetUnrlzdGainsLossesInvstAmt,InfoInScheduleOPartXIIInd,AuditCommitteeInd,AllAffiliatesIncludedInd,GrantsToDomesticOrgsGrp,ForeignGrantsGrp,BenefitsToMembersGrp,CompDisqualPersonsGrp,FeesForServicesManagementGrp,FeesForServicesLegalGrp,FeesForServicesLobbyingGrp,FeesForSrvcInvstMgmntFeesGrp,RoyaltiesGrp,OccupancyGrp,PymtTravelEntrtnmntPubOfclGrp,PaymentsToAffiliatesGrp,PledgesAndGrantsReceivableGrp,RcvblFromDisqualifiedPrsnGrp,OthNotesLoansReceivableNetGrp,InvestmentsPubTradedSecGrp,InvestmentsProgramRelatedGrp,OtherAssetsTotalGrp,TotalOtherProgSrvcExpenseAmt,InfoInScheduleOPartVInd,MethodOfAccountingAccrualInd,NoncashContributionsAmt,GrantsPayableGrp,PermanentlyRstrNetAssetsGrp,TaxExemptBondLiabilitiesGrp,EscrowAccountLiabilityGrp,LoansFromOfficersDirectorsGrp,UnsecuredNotesLoansPayableGrp,PriorPeriodAdjustmentsAmt,FederalGrantAuditPerformedInd,PoliciesReferenceChaptersInd,OtherWebsiteInd,AddressChangeInd,WrittenPolicyOrProcedureInd,RelatedOrganizationsAmt,TotalOtherProgSrvcRevenueAmt,OwnWebsiteInd,TotalJointCostsGrp,DonatedServicesAndUseFcltsAmt,LegalDomicileCountryCd,InfoInScheduleOPartIXInd,TypeOfOrganizationTrustInd,FinalReturnInd,ContractTerminationInd,InfoInScheduleOPartXInd,GroupExemptionNum,InfoInScheduleOPartVIIInd,FederatedCampaignsAmt,TypeOfOrganizationOtherInd,OtherOrganizationDsc,InfoInScheduleOPartVIIIInd,TypeOfOrganizationAssocInd,InitialReturnInd,GamingGrossIncomeAmt,GamingDirectExpensesAmt,MethodOfAccountingOtherInd,InvestmentExpenseAmt,Organization501cInd,Organization4947a1NotPFInd,AmendedReturnInd,SpecialConditionDesc,ActivityCd,Timestamp,TaxPeriodEndDate,TaxPeriodBeginDate,Officer,TaxYear,BuildTS,ReturnTs,TaxPeriodEndDt,TaxPeriodBeginDt,BusinessOfficerGrp,TaxYr,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum
0,5d019e6778ffca27b42818d7,RONALD MCDONALD HOUSE CHARITIES- PHILADELPHIA REGION INC,https://s3.amazonaws.com/irs-form-990/201113139349301301_public.xml,93493313013011,201012,X,MICHAEL ANTON,1473903,0,X,,X,1992,PA,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,10,10,0,0.0,0,0.0,1044925.0,1439340,0,0,30447,33563,0.0,1000,1075372,1473903,638637.0,925000,0.0,0,0,0,0.0,0,195892,243131,459751,881768,1384751,193604,89152,1925215,2440859,171810,450430,1753405,1990429,X,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",0,0,1043744,925000.0,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,"""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""",0,0,0,X,10,10,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,"[""PA"", ""NJ"", ""DE""]",X,X,0.0,0,0,0,0,0,0,0,1439340.0,1439340,1000,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}","{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}","{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}","{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""118744"", ""ProgramServices"": ""118744""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",256845,86228,"{""BOY"": ""0"", ""EOY"": ""170617""}","{""BOY"": ""1489143"", ""EOY"": ""1851561""}","{""BOY"": ""1925215"", ""EOY"": ""2440859""}","{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}","{""BOY"": ""51640"", ""EOY"": ""240077""}",X,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",X,89152,X,X,0,1,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T06:41:09-06:00,2010-12-31,2010-01-01,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",2010,2016-02-24 21:20:13Z,,,,,,,232705170,"{'BusinessNameLine1': 'RONALD MCDONALD HOUSE CHARITIES-', 'BusinessNameLine2': 'PHILADELPHIA REGION INC'}",RONA,8565826843,"{'AddressLine1': '1525 VALLEY CENTER PARKWAY NO 300', 'City': 'BETHLEHEM', 'State': 'PA', 'ZIPCode': '18017'}",,,,,,,
1,5d019e6778ffca27b42818d8,TORRINGTON VOA ELDERLY HOUSING INC BELL PARK TOWER,https://s3.amazonaws.com/irs-form-990/201113139349301311_public.xml,93493313013111,201106,,,266420,false,X,,X,1993,WY,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,19,13,0,,0,,,0,222839,265592,1425,828,,0,224264,266420,,0,,0,71405,82955,,0,0,189785,222550,261190,305505,-36926,-39085,1455332,1433342,17482,34577,1437850,1398765,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,false,false,276405,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,276405,"""false""","""false""","""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}",0,0,false,X,19,13,true,true,false,false,false,true,true,true,true,false,false,true,true,true,true,false,false,true,true,false,,X,,,1180355,411648,0,true,true,false,0,,0,0,"{""TotalRevenueColumn"": ""266420"", ""RelatedOrExemptFunctionIncome"": ""266420""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""7500"", ""ManagementAndGeneral"": ""7500""}","{""Total"": ""14222"", ""ProgramServices"": ""14222""}","{""Total"": ""0""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","[{""Description"": ""OPER. & MAINT."", ""Total"": ""46164"", ""ProgramServices"": ""46164""}, {""Description"": ""MISC TAXES"", ""Total"": ""298"", ""ProgramServices"": ""298""}, {""Description"": ""ADMINISTRATIVE"", ""Total"": ""12176"", ""ProgramServices"": ""12176""}]","{""Total"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""231"", ""EOY"": ""474""}",2187206,904332,"{""BOY"": ""1306860"", ""EOY"": ""1282874""}","{""BOY"": ""125980"", ""EOY"": ""102794""}","{""BOY"": ""1455332"", ""EOY"": ""1433342""}","{""BOY"": ""2040"", ""EOY"": ""16145""}",,"{""BOY"": ""9203"", ""EOY"": ""11349""}",X,"{""BOY"": ""1437850"", ""EOY"": ""1398765""}",,-39085,,X,false,true,true,true,"""false""",1736.0,266420.0,False,False,265592.0,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""0""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""21600"", ""ManagementAndGeneral"": ""21600""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""8433"", ""ProgramServices"": ""8433""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""44077"", ""ProgramServices"": ""44077""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""806"", ""ProgramServices"": ""806""}","{""Total"": ""0""}","{""Total"": ""1108"", ""ProgramServices"": ""1108""}","{""BOY"": ""250"", ""EOY"": ""22261""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""7628"", ""EOY"": ""7554""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""14383"", ""EOY"": ""17385""}","{""BOY"": ""20"", ""EOY"": ""48""}","{""BOY"": ""6219"", ""EOY"": ""7035""}",True,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T07:32:06-08:00,2011-06-30,2010-07-01,"{'Name': 'THOMAS D TURNBULL', 'Title': 'ASST. SEC/TREAS', 'DateSigned': '2011-11-09'}",2010,2016-02-24 21:20:13Z,,,,,,,581805618,"{'BusinessNameLine1': 'TORRINGTON VOA ELDERLY HOUSING INC', 'BusinessNameLine2': 'BELL PARK TOWER'}",TORR,7033415000,"{'AddressLine1': '1660 DUKE STREET', 'City': 'ALEXANDRIA', 'State': 'VA', 'ZIPCode': '22314'}",,,,,,,


<br>Look at last observation in the dataset.

In [82]:
df[-1:]

Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,AddressChange,NameOfPrincipalOfficerPerson,GrossReceipts,GroupReturnForAffiliates,Organization501c3,WebSite,TypeOfOrganizationCorporation,YearFormation,StateLegalDomicile,ActivityOrMissionDescription,NbrVotingMembersGoverningBody,NbrIndependentVotingMembers,TotalNbrEmployees,TotalNbrVolunteers,TotalGrossUBI,NetUnrelatedBusinessTxblIncome,ContributionsGrantsPriorYear,ContributionsGrantsCurrentYear,ProgramServiceRevenuePriorYear,ProgramServiceRevenueCY,InvestmentIncomePriorYear,InvestmentIncomeCurrentYear,OtherRevenuePriorYear,OtherRevenueCurrentYear,TotalRevenuePriorYear,TotalRevenueCurrentYear,GrantsAndSimilarAmntsPriorYear,GrantsAndSimilarAmntsCY,BenefitsPaidToMembersPriorYear,BenefitsPaidToMembersCY,SalariesEtcPriorYear,SalariesEtcCurrentYear,TotalProfFundrsngExpPriorYear,TotalProfFundrsngExpCY,TotalFundrsngExpCurrentYear,OtherExpensePriorYear,OtherExpensesCurrentYear,TotalExpensesPriorYear,TotalExpensesCurrentYear,RevenuesLessExpensesPriorYear,RevenuesLessExpensesCY,TotalAssetsBOY,TotalAssetsEOY,TotalLiabilitiesBOY,TotalLiabilitiesEOY,NetAssetsOrFundBalancesBOY,NetAssetsOrFundBalancesEOY,InfoInScheduleOPartIII,MissionDescription,SignificantNewProgramServices,SignificantChange,Expense,Grants,Description,TotalProgramServiceExpense,PoliticalActivities,LobbyingActivities,ProfessionalFundraising,FundraisingActivities,Gaming,ExcessBenefitTransaction,PriorExcessBenefitTransaction,DisregardedEntity,RelatedEntity,RelatedOrgControlledEntity,TransactionRelatedEntity,TransfersToExemptNonChrtblOrg,ActivitiesConductedPartnership,NumberFormsTransmittedWith1096,NumberOfEmployees,UnrelatedBusinessIncome,InfoInScheduleOPartVI,NbrVotingGoverningBodyMembers,NumberIndependentVotingMembers,FamilyOrBusinessRelationship,DelegationOfManagementDuties,ChangesToOrganizingDocs,MaterialDiversionOrMisuse,MembersOrStockholders,ElectionOfBoardMembers,DecisionsSubjectToApproval,MinutesOfGoverningBody,MinutesOfCommittees,OfficerMailingAddress,LocalChapters,Form990ProvidedToGoverningBody,ConflictOfInterestPolicy,AnnualDisclosureCoveredPersons,RegularMonitoringEnforcement,WhistleblowerPolicy,DocumentRetentionPolicy,CompensationProcessCEO,CompensationProcessOther,InvestmentInJointVenture,StatesWhereCopyOfReturnIsFiled,UponRequest,NoListedPersonsCompensated,TotalReportableCompFromOrg,TotalReportableCompFrmRltdOrgs,TotalOtherCompensation,NumberIndividualsGT100K,FormersListed,TotalCompGT150K,CompensationFromOtherSources,NumberOfContractorsGT100K,AllOtherContributions,TotalContributions,TotalOtherRevenue,TotalRevenue,GrantsToDomesticOrgs,GrantsToDomesticIndividuals,FeesForServicesLegal,FeesForServicesAccounting,OfficeExpenses,PaymentsToAffiliates,DepreciationDepletion,OtherExpenses,AllOtherExpenses,TotalFunctionalExpenses,SavingsAndTempCashInvestments,AccountsReceivable,LandBuildingsEquipmentBasis,LandBldgEquipmentAccumDeprec,LandBuildingsEquipmentBasisNet,InvestmentsOtherSecurities,TotalAssets,AccountsPayableAccruedExpenses,GrantsPayable,OtherLiabilities,FollowSFAS117,UnrestrictedNetAssets,InfoInScheduleOPartXI,ReconcilationRevenueExpenses,InfoInScheduleOPartXII,MethodOfAccountingAccrual,AccountantCompileOrReview,FSAudited,AuditCommittee,FederalGrantAuditRequired,AllAffiliatesIncluded,GroupExemptionNumber,Revenue,PoliciesReferenceChapters,WrittenPolicyOrProcedure,TotalProgramServiceRevenue,ForeignGrants,BenefitsToMembers,CompCurrentOfficersDirectors,CompDisqualPersons,OtherSalariesAndWages,PensionPlanContributions,OtherEmployeeBenefits,PayrollTaxes,FeesForServicesManagement,FeesForServicesLobbying,FeesForServicesProfFundraising,FeesForServicesInvstMgmntFees,FeesForServicesOther,Advertising,InformationTechnology,Royalties,Occupancy,Travel,TravelEntrtnmntPublicOfficials,ConferencesMeetings,Interest,Insurance,CashNonInterestBearing,PledgesAndGrantsReceivable,ReceivablesFromDisqualPersons,OtherNotesLoansReceivableNet,InventoriesForSaleOrUse,PrepaidExpensesDeferredCharges,InvestmentsPubTradedSecurities,InvestmentsProgramRelated,IntangibleAssets,OtherAssetsTotal,DeferredRevenue,MortNotesPyblSecuredInvestProp,FederalGrantAuditPerformed,LoansFromOfficersDirectors,MethodOfAccountingCash,Activity2,Activity3,InfoInScheduleOPartVII,TaxExemptBondLiabilities,TemporarilyRestrictedNetAssets,OtherWebsite,PermanentlyRestrictedNetAssets,FundraisingEvents,CntrbtnsRprtdFundraisingEvents,RelatedOrganizations,GrossIncomeFundraisingEvents,FundraisingDirectExpenses,FederatedCampaigns,GovernmentGrants,MethodOfAccountingOther,GrossSalesOfInventory,CostOfGoodsSold,DoNotFollowSFAS117,RetainedEarningsEndowmentEtc,InitialReturn,MembershipDues,GrossIncomeGaming,GamingDirectExpenses,NoncashContributions,InfoInScheduleOPartV,OwnWebsite,UnsecuredNotesLoansPayable,ActivityOther,TotalOfOtherProgramServiceExp,TotalOfOtherProgramServiceRev,EscrowAccountLiability,TotalOfOtherProgramServiceGrnt,TypeOfOrganizationOther,Organization501c,TypeOfOrganizationTrust,TypeOfOrganizationAssociation,CountryLegalDomicile,AmendedReturn,TypeOfOrgOtherDescription,TotalJointCosts,TerminatedReturn,TerminationOrContraction,ActivityCode,SpecialConditionDescription,Organization4947a1,InfoInScheduleOPartIX,ReconciliationUnrealizedInvest,ReconcilationPriorAdjustment,ReconcilationDonatedServices,ReconcilationInvestExpenses,InfoInScheduleOPartVIII,InfoInScheduleOPartX,PrincipalOfficerNm,GrossReceiptsAmt,GroupReturnForAffiliatesInd,Organization501c3Ind,TypeOfOrganizationCorpInd,FormationYr,LegalDomicileStateCd,ActivityOrMissionDesc,VotingMembersGoverningBodyCnt,VotingMembersIndependentCnt,TotalEmployeeCnt,TotalGrossUBIAmt,CYContributionsGrantsAmt,CYProgramServiceRevenueAmt,CYInvestmentIncomeAmt,CYOtherRevenueAmt,CYTotalRevenueAmt,CYGrantsAndSimilarPaidAmt,CYBenefitsPaidToMembersAmt,CYSalariesCompEmpBnftPaidAmt,CYTotalProfFndrsngExpnsAmt,CYTotalFundraisingExpenseAmt,CYOtherExpensesAmt,CYTotalExpensesAmt,CYRevenuesLessExpensesAmt,TotalAssetsBOYAmt,TotalAssetsEOYAmt,TotalLiabilitiesEOYAmt,NetAssetsOrFundBalancesBOYAmt,NetAssetsOrFundBalancesEOYAmt,InfoInScheduleOPartIIIInd,MissionDesc,SignificantNewProgramSrvcInd,SignificantChangeInd,Desc,PoliticalCampaignActyInd,LobbyingActivitiesInd,ProfessionalFundraisingInd,FundraisingActivitiesInd,GamingActivitiesInd,EngagedInExcessBenefitTransInd,PYExcessBenefitTransInd,DisregardedEntityInd,RelatedEntityInd,RelatedOrganizationCtrlEntInd,TransactionWithControlEntInd,TrnsfrExmptNonChrtblRltdOrgInd,ActivitiesConductedPrtshpInd,IRPDocumentCnt,EmployeeCnt,UnrelatedBusIncmOverLimitInd,GoverningBodyVotingMembersCnt,IndependentVotingMemberCnt,FamilyOrBusinessRlnInd,DelegationOfMgmtDutiesInd,ChangeToOrgDocumentsInd,MaterialDiversionOrMisuseInd,MembersOrStockholdersInd,ElectionOfBoardMembersInd,DecisionsSubjectToApprovaInd,MinutesOfGoverningBodyInd,MinutesOfCommitteesInd,OfficerMailingAddressInd,LocalChaptersInd,Form990ProvidedToGvrnBodyInd,ConflictOfInterestPolicyInd,WhistleblowerPolicyInd,DocumentRetentionPolicyInd,CompensationProcessCEOInd,CompensationProcessOtherInd,InvestmentInJointVentureInd,StatesWhereCopyOfReturnIsFldCd,NoListedPersonsCompensatedInd,FormerOfcrEmployeesListedInd,TotalCompGreaterThan150KInd,CompensationFromOtherSrcsInd,MembershipDuesAmt,FundraisingAmt,AllOtherContributionsAmt,TotalContributionsAmt,OtherRevenueTotalAmt,TotalRevenueGrp,FeesForServicesAccountingGrp,OfficeExpensesGrp,InformationTechnologyGrp,ConferencesMeetingsGrp,InsuranceGrp,OtherExpensesGrp,AllOtherExpensesGrp,TotalFunctionalExpensesGrp,CashNonInterestBearingGrp,TotalAssetsGrp,OrgDoesNotFollowSFAS117Ind,RtnEarnEndowmentIncmOthFndsGrp,ReconcilationRevenueExpnssAmt,MethodOfAccountingCashInd,AccountantCompileOrReviewInd,FSAuditedInd,FederalGrantAuditRequiredInd,WebsiteAddressTxt,TotalVolunteersCnt,NetUnrelatedBusTxblIncmAmt,PYContributionsGrantsAmt,PYProgramServiceRevenueAmt,PYInvestmentIncomeAmt,PYOtherRevenueAmt,PYTotalRevenueAmt,PYGrantsAndSimilarPaidAmt,PYBenefitsPaidToMembersAmt,PYSalariesCompEmpBnftPaidAmt,PYTotalProfFndrsngExpnsAmt,PYOtherExpensesAmt,PYTotalExpensesAmt,PYRevenuesLessExpensesAmt,TotalLiabilitiesBOYAmt,ExpenseAmt,GrantAmt,RevenueAmt,ProgSrvcAccomActy2Grp,ProgSrvcAccomActy3Grp,ProgSrvcAccomActyOtherGrp,TotalOtherProgSrvcGrantAmt,TotalProgramServiceExpensesAmt,InfoInScheduleOPartVIInd,AnnualDisclosureCoveredPrsnInd,RegularMonitoringEnfrcInd,UponRequestInd,TotalReportableCompFromOrgAmt,TotReportableCompRltdOrgAmt,TotalOtherCompensationAmt,IndivRcvdGreaterThan100KCnt,CntrctRcvdGreaterThan100KCnt,GovernmentGrantsAmt,TotalProgramServiceRevenueAmt,FundraisingGrossIncomeAmt,ContriRptFundraisingEventAmt,FundraisingDirectExpensesAmt,GrossSalesOfInventoryAmt,CostOfGoodsSoldAmt,GrantsToDomesticIndividualsGrp,CompCurrentOfcrDirectorsGrp,OtherSalariesAndWagesGrp,PensionPlanContributionsGrp,OtherEmployeeBenefitsGrp,PayrollTaxesGrp,FeesForServicesOtherGrp,AdvertisingGrp,TravelGrp,InterestGrp,DepreciationDepletionGrp,SavingsAndTempCashInvstGrp,AccountsReceivableGrp,InventoriesForSaleOrUseGrp,PrepaidExpensesDefrdChargesGrp,LandBldgEquipCostOrOtherBssAmt,LandBldgEquipAccumDeprecAmt,LandBldgEquipBasisNetGrp,InvestmentsOtherSecuritiesGrp,IntangibleAssetsGrp,AccountsPayableAccrExpnssGrp,DeferredRevenueGrp,MortgNotesPyblScrdInvstPropGrp,OtherLiabilitiesGrp,OrganizationFollowsSFAS117Ind,UnrestrictedNetAssetsGrp,TemporarilyRstrNetAssetsGrp,InfoInScheduleOPartXIInd,NetUnrlzdGainsLossesInvstAmt,InfoInScheduleOPartXIIInd,AuditCommitteeInd,AllAffiliatesIncludedInd,GrantsToDomesticOrgsGrp,ForeignGrantsGrp,BenefitsToMembersGrp,CompDisqualPersonsGrp,FeesForServicesManagementGrp,FeesForServicesLegalGrp,FeesForServicesLobbyingGrp,FeesForSrvcInvstMgmntFeesGrp,RoyaltiesGrp,OccupancyGrp,PymtTravelEntrtnmntPubOfclGrp,PaymentsToAffiliatesGrp,PledgesAndGrantsReceivableGrp,RcvblFromDisqualifiedPrsnGrp,OthNotesLoansReceivableNetGrp,InvestmentsPubTradedSecGrp,InvestmentsProgramRelatedGrp,OtherAssetsTotalGrp,TotalOtherProgSrvcExpenseAmt,InfoInScheduleOPartVInd,MethodOfAccountingAccrualInd,NoncashContributionsAmt,GrantsPayableGrp,PermanentlyRstrNetAssetsGrp,TaxExemptBondLiabilitiesGrp,EscrowAccountLiabilityGrp,LoansFromOfficersDirectorsGrp,UnsecuredNotesLoansPayableGrp,PriorPeriodAdjustmentsAmt,FederalGrantAuditPerformedInd,PoliciesReferenceChaptersInd,OtherWebsiteInd,AddressChangeInd,WrittenPolicyOrProcedureInd,RelatedOrganizationsAmt,TotalOtherProgSrvcRevenueAmt,OwnWebsiteInd,TotalJointCostsGrp,DonatedServicesAndUseFcltsAmt,LegalDomicileCountryCd,InfoInScheduleOPartIXInd,TypeOfOrganizationTrustInd,FinalReturnInd,ContractTerminationInd,InfoInScheduleOPartXInd,GroupExemptionNum,InfoInScheduleOPartVIIInd,FederatedCampaignsAmt,TypeOfOrganizationOtherInd,OtherOrganizationDsc,InfoInScheduleOPartVIIIInd,TypeOfOrganizationAssocInd,InitialReturnInd,GamingGrossIncomeAmt,GamingDirectExpensesAmt,MethodOfAccountingOtherInd,InvestmentExpenseAmt,Organization501cInd,Organization4947a1NotPFInd,AmendedReturnInd,SpecialConditionDesc,ActivityCd,Timestamp,TaxPeriodEndDate,TaxPeriodBeginDate,Officer,TaxYear,BuildTS,ReturnTs,TaxPeriodEndDt,TaxPeriodBeginDt,BusinessOfficerGrp,TaxYr,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum
3469007,67f0a6febea7582f201aed22,,https://s3.amazonaws.com/irs-form-990/202441449349301704_public.xml,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"{""TotalAmt"": ""0""}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,ALEXIS RATHBORNE,4180132,False,X,X,2015,GA,TO USE INNOVATION TO CREATE SYSTEM CHANGE BY CATALYZING ECONOMIC GROWTH FOR BLACK AND LATINX COMMUNITIES THROUGH WOMEN ENTREPRENEURS.,7,7,19,0,4178351,1662,-43597,0,4136416,0,0,942906,0,375588,2301471,3244377,892039,1752817,2792833,189399,1711395,2603434,X,TO USE INNOVATION TO CREATE SYSTEM CHANGE BY CATALYZING ECONOMIC GROWTH FOR BLACK AND LATINX COMMUNITIES THROUGH WOMEN ENTREPRENEURS.,False,False,"BIG is a program designed for Black women and Latina founders who have already begun to build their startups and are still pre-revenue. Our BIG Pre-Accelerator includes access to resources to learn and apply the Lean Startup methods, discover how...","""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": ""IRS990ScheduleL"", ""#text"": ""false""}","{""@referenceDocumentId"": ""IRS990ScheduleL"", ""#text"": ""false""}","""false""","""false""",False,,"""false""","""false""",84,19,False,7,7,True,True,False,False,False,False,False,True,True,False,False,True,True,True,True,True,True,False,"[""GA"", ""NY""]",,False,False,False,,,4178351,4178351,0,"{""TotalRevenueColumnAmt"": ""4136416"", ""RelatedOrExemptFuncIncomeAmt"": ""1662"", ""ExclusionAmt"": ""-43597""}","{""TotalAmt"": ""130887"", ""ManagementAndGeneralAmt"": ""130887""}","{""TotalAmt"": ""39036"", ""ProgramServicesAmt"": ""14961"", ""ManagementAndGeneralAmt"": ""19573"", ""FundraisingAmt"": ""4502""}","{""TotalAmt"": ""223547"", ""ProgramServicesAmt"": ""177948"", ""ManagementAndGeneralAmt"": ""44230"", ""FundraisingAmt"": ""1369""}","{""TotalAmt"": ""73777"", ""ProgramServicesAmt"": ""56284"", ""ManagementAndGeneralAmt"": ""524"", ""FundraisingAmt"": ""16969""}","{""TotalAmt"": ""12204"", ""ManagementAndGeneralAmt"": ""12204""}","[{""Desc"": ""SPECIAL PROJECTS"", ""TotalAmt"": ""275000"", ""ProgramServicesAmt"": ""275000""}, {""Desc"": ""DUES & SUBSCRIPTIONS"", ""TotalAmt"": ""16687"", ""ProgramServicesAmt"": ""2594"", ""ManagementAndGeneralAmt"": ""11033"", ""FundraisingAmt"": ""3060""}, {""Desc"": ""BAD ...","{""TotalAmt"": ""4165"", ""ManagementAndGeneralAmt"": ""4165""}","{""TotalAmt"": ""3244377"", ""ProgramServicesAmt"": ""2476408"", ""ManagementAndGeneralAmt"": ""392381"", ""FundraisingAmt"": ""375588""}","{""BOYAmt"": ""206982"", ""EOYAmt"": ""783983""}","{""BOYAmt"": ""1752817"", ""EOYAmt"": ""2792833""}",,,892039,,False,True,False,,7,,2055847,21768,62,0,2077677,0,0,591675,0,1390480,1982155,95522,41422,336105,,1662,"{""ExpenseAmt"": ""472863"", ""Desc"": ""COMMUNITY (BREAKTHROUGH): This growth-stage, five-session intensive program in partnership with JPMorgan Chase's Advancing Black Pathways, helps entrepreneurs scale their business past the five-figure mark and cr...","{""ExpenseAmt"": ""326157"", ""Desc"": ""RESEARCH: The organization produces original qualitative and quantitative research on the state of Latina and Black women founders in the U.S. and other territories. This is the first organization to identify dis...","[{""ExpenseAmt"": ""254385"", ""Desc"": ""DO YOU FELLOWSHIP""}, {""ExpenseAmt"": ""209971"", ""Desc"": ""START""}, {""ExpenseAmt"": ""64448"", ""Desc"": ""THOUGHT LEADERSHIP""}, {""ExpenseAmt"": ""812479"", ""Desc"": ""OTHER PROGRAMS (THE NEW C-SUITE, GENERAL PROGRAMS)""}]",,2476408,X,True,True,,121703,0,9077,1,4,,1662,0,,0,0,0,"{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""815286"", ""ProgramServicesAmt"": ""636740"", ""ManagementAndGeneralAmt"": ""27216"", ""FundraisingAmt"": ""151330""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""57429"", ""ProgramServicesAmt"": ""43668"", ""ManagementAndGeneralAmt"": ""3383"", ""FundraisingAmt"": ""10378""}","{""TotalAmt"": ""70191"", ""ProgramServicesAmt"": ""52910"", ""ManagementAndGeneralAmt"": ""4931"", ""FundraisingAmt"": ""12350""}","{""TotalAmt"": ""945157"", ""ProgramServicesAmt"": ""728903"", ""ManagementAndGeneralAmt"": ""77699"", ""FundraisingAmt"": ""138555""}","{""TotalAmt"": ""141993"", ""ProgramServicesAmt"": ""140415"", ""ManagementAndGeneralAmt"": ""1578""}","{""TotalAmt"": ""227640"", ""ProgramServicesAmt"": ""193360"", ""FundraisingAmt"": ""34280""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""15755"", ""ManagementAndGeneralAmt"": ""15755""}","{""BOYAmt"": ""28402"", ""EOYAmt"": ""1050221""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""2937"", ""EOYAmt"": ""2937""}",0,0,"{""BOYAmt"": ""58325"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""7500"", ""EOYAmt"": ""7500""}","{""BOYAmt"": ""32814"", ""EOYAmt"": ""126944""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""8608"", ""EOYAmt"": ""62455""}",,,,,,,True,,"{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""80589"", ""ProgramServicesAmt"": ""61550"", ""ManagementAndGeneralAmt"": ""16244"", ""FundraisingAmt"": ""2795""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""98473"", ""ProgramServicesAmt"": ""92075"", ""ManagementAndGeneralAmt"": ""6398""}","{""TotalAmt"": ""0""}","{""TotalAmt"": ""0""}","{""BOYAmt"": ""1438671"", ""EOYAmt"": ""882483""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""10000"", ""EOYAmt"": ""65709""}",1341283,,X,,"{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}",,"{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}","{""BOYAmt"": ""0"", ""EOYAmt"": ""0""}",,,,X,X,,,,,,,,X,,,,X,,X,,,,,,,0,0,,,,,,,,,,,,,2023-04-26 12:10:37Z,2024-05-23T21:05:34-05:00,2022-12-31,2022-01-01,"{'PersonNm': 'ALEXIS RATHBORNE', 'PersonTitleTxt': 'DIRECTOR', 'PhoneNum': '2012924224', 'SignatureDt': '2024-06-15', 'DiscussWithPaidPreparerInd': 'true'}",2022,2022,472669712,,,,"{'AddressLine1Txt': '261 MADISON AVENUE FL 9 Ste 1040', 'CityNm': 'NEW YORK', 'StateAbbreviationCd': 'NY', 'ZIPCd': '10016'}",,,{'BusinessNameLine1Txt': 'DIDTECHNOLOGY INC'},DIDT,2012924224,% LEAH WILLIAMS,


### Create and save list of EINs for BMF File
- We will use this in a future notebook: *IRS Form 990 e-File Data (7a) -- Create combined BMF dataset (NTEE, MSA, etc) for ALL EINs.ipynb*  and  *IRS Form 990 e-File Data (7b) -- Merge BMF Data into 990 Data and Limit to 501(c)(3) orgs.ipynb*

<br>What we are doing in the following code block is (in the first line) creating a Python list containing all *EIN* values in the dataframe. The second line shows the number of items in the list, which is the same as the number of observations in *df*. The third line uses the ``set`` function to show the number of *unique* items in the list. The fourth line applies the set function to our list, thereby producing an updated list with only all 358,707 unique EINs in our dataframe. The last line shows that the updated list contains the correct number of elements. 

In [83]:
ein_list = df['EIN'].tolist()
print(len(ein_list))
print(len(set(ein_list)))
ein_list = list(set(ein_list))
print(len(ein_list))

3469008
456945
456945


<br>Save list to a JSON file called `ein_list_2025.json`

In [84]:
import json
with open('ein_list_2025.json', 'w') as fp:
    json.dump(ein_list, fp)

### Collapse *concordance* file
A strange artifact of the e-file data is that there are generally two different names for each 990 variable -- with the first few years of filings having different names than later years. What we are doing in the following code block is collapsing the *concordance* file by the 'variable_name_new' column so that there is one row per variable; ``variable_name_new`` is the standardized, more descriptive variable name that several 990 researchers (including me, Jesse Lecy, Nathan Grasse, Dan Neely, etc.) have agreed on. 

In [85]:
def agg_funcs(x):
    names = {
        #'name': x['variable_name_new'].head(1).values[0],
        'original_names':  list(set(x['MongoDB_Name'].tolist())),
        'data_type_xsd': x['data_type_xsd'].head(1).values[0],
        'binarize': x['BINARIZE'].head(1).values[0]
        }
    #THE FOLLOWING SHORTCUT WORKS BUT CHANGES THE ORDER OF THE COLUMNS
    #return pd.Series(names, index = list(names.keys()))
    return pd.Series(names, index=['original_names', 'data_type_xsd', 'binarize'])
new_variables_df = concordance[:].groupby(['variable_name_new']).apply(agg_funcs)
new_variables_df = new_variables_df.reset_index()
print('# of variables:', len(new_variables_df))
new_variables_df[:5]



# of variables: 288


Unnamed: 0,variable_name_new,original_names,data_type_xsd,binarize
0,F9_00_HD_ADDR_CHANGE,"[AddressChange, AddressChangeInd]",CheckboxType,binarize
1,F9_00_HD_AMENDED_RETURN,"[AmendedReturnInd, AmendedReturn]",CheckboxType,binarize
2,F9_00_HD_BUILD_TIME_STAMP,[BuildTS],TimestampType,
3,F9_00_HD_CTRY_OF_DOMICILE,"[CountryLegalDomicile, LegalDomicileCountryCd]",CountryType,
4,F9_00_HD_EXEMPT_STATUS_4847A1,"[Organization4947a1NotPFInd, Organization4947a1]",CheckboxType,binarize


#### New Way

In [91]:
%%time
new_variables_df = concordance.groupby('variable_name_new').agg(
    original_names=('MongoDB_Name', lambda x: list(set(x.tolist()))),
    data_type_xsd=('data_type_xsd', 'first'),
    binarize=('BINARIZE', 'first')
    ).reset_index()
print('# of variables:', len(new_variables_df))
new_variables_df[:5]

# of variables: 288
CPU times: total: 0 ns
Wall time: 17.9 ms


Unnamed: 0,variable_name_new,original_names,data_type_xsd,binarize
0,F9_00_HD_ADDR_CHANGE,"[AddressChange, AddressChangeInd]",CheckboxType,binarize
1,F9_00_HD_AMENDED_RETURN,"[AmendedReturnInd, AmendedReturn]",CheckboxType,binarize
2,F9_00_HD_BUILD_TIME_STAMP,[BuildTS],TimestampType,
3,F9_00_HD_CTRY_OF_DOMICILE,"[CountryLegalDomicile, LegalDomicileCountryCd]",CountryType,
4,F9_00_HD_EXEMPT_STATUS_4847A1,"[Organization4947a1NotPFInd, Organization4947a1]",CheckboxType,binarize


<br>This next block creates a variable that counts the number of 'locations' (xpaths in the original XML filings) for each variable. The ``value_counts( )`` output then tells us that there are 9 variables with only one location -- we will treat those separately. The other 279 variables have two locations (two different xpaths).

In [92]:
new_variables_df['len'] = new_variables_df['original_names'].apply(lambda x: len(x))
print(new_variables_df['len'].value_counts(), '\n')
new_variables_df[:4]

len
2    279
1      9
Name: count, dtype: int64 



Unnamed: 0,variable_name_new,original_names,data_type_xsd,binarize,len
0,F9_00_HD_ADDR_CHANGE,"[AddressChange, AddressChangeInd]",CheckboxType,binarize,2
1,F9_00_HD_AMENDED_RETURN,"[AmendedReturnInd, AmendedReturn]",CheckboxType,binarize,2
2,F9_00_HD_BUILD_TIME_STAMP,[BuildTS],TimestampType,,1
3,F9_00_HD_CTRY_OF_DOMICILE,"[CountryLegalDomicile, LegalDomicileCountryCd]",CountryType,,2


### Handle variables with only 1 original name
NOTE:
- Per *IRS 990 e-File Data -- Control Variables (4) -- Fees-for-Services Variables  - Extract from MongoDB and Process -- Part I (Python 3.6).ipynb*, there is only one path for this variables (there is no *FeesForServicesProfFundraisingGrp*)
    - Instead, as seen in the concordance file, *FeesForServicesProfFundraising* has both a 'Total' and a 'TotalAmt' key, which suggests this is the only key that did not change names over time.
- We won't be renaming *TaxPeriod* and we will parse *F9_00_HD_FILER_STATE_US* in a subsequent notebook

In [93]:
new_variables_df[new_variables_df['len']!=2]

Unnamed: 0,variable_name_new,original_names,data_type_xsd,binarize,len
2,F9_00_HD_BUILD_TIME_STAMP,[BuildTS],TimestampType,,1
7,F9_00_HD_FILER_ADDR_US_L1,[Filer],StreetAddressType,,1
8,F9_00_HD_FILER_ADDR_US_L2,[Filer],StreetAddressType,,1
9,F9_00_HD_FILER_CITY_US,[Filer],CityType,,1
10,F9_00_HD_FILER_COUNTRY_FRGN,[Filer],CountryType,,1
11,F9_00_HD_FILER_STATE_US,[Filer],StateType,,1
12,F9_00_HD_FILER_ZIP_US,[Filer],ZIPCodeType,,1
205,F9_09_PC_FEES_FOR_SVCE_FR_TOT,[FeesForServicesProfFundraising],USAmountType,,1
286,TaxPeriod,[TaxPeriod],YearMonthType,,1


#### Rename *FeesForServicesProfFundraising*
Note that *describe* and *value_counts* won't work yet because some values are dictionaries

In [94]:
%%time
df.rename(columns = {'FeesForServicesProfFundraising':'F9_09_PC_FEES_FOR_SVCE_FR_TOT'}, inplace = True)
#df['F9_09_PC_FEES_FOR_SVCE_FR_TOT'].describe()
#df['F9_09_PC_FEES_FOR_SVCE_FR_TOT'].value_counts()[:5]

CPU times: total: 0 ns
Wall time: 14.5 ms


#### Rename *BuildTS*

In [95]:
%%time
df.rename(columns = {'BuildTS':'F9_00_HD_BUILD_TIME_STAMP'}, inplace = True)

CPU times: total: 0 ns
Wall time: 996 µs


<br>Show above two variables (plus *Filer* and *TaxPeriod*) for sample of 5 rows

##### Update for XML
Replace *TaxPeriod* with *TaxPeriodEndDt* and replace *Filer* with the parsed *Filer* columns

In [96]:
%%time
#df[['F9_09_PC_FEES_FOR_SVCE_FR_TOT', 'F9_00_HD_BUILD_TIME_STAMP', 'Filer', 'TaxPeriod']].sample(5)
df[['F9_09_PC_FEES_FOR_SVCE_FR_TOT', 'F9_00_HD_BUILD_TIME_STAMP', 'TaxPeriodEndDt',
       'EIN', 'BusinessName', 'BusinessNameControlTxt', 'USAddress', 'PhoneNum', 'InCareOfNm',
       'ForeignAddress', 'ForeignPhoneNum']].sample(5)

CPU times: total: 766 ms
Wall time: 1.09 s


Unnamed: 0,F9_09_PC_FEES_FOR_SVCE_FR_TOT,F9_00_HD_BUILD_TIME_STAMP,TaxPeriodEndDt,EIN,BusinessName,BusinessNameControlTxt,USAddress,PhoneNum,InCareOfNm,ForeignAddress,ForeignPhoneNum
1624115,,2019-02-21 02:37:17Z,2018-06-30,113335036,{'BusinessNameLine1Txt': 'THE ENRICHMENT CENTER INC'},ENRI,"{'AddressLine1Txt': '750 HICKSVILLE ROAD', 'CityNm': 'SEAFORD', 'StateAbbreviationCd': 'NY', 'ZIPCd': '11783'}",5165206000.0,,,
1455427,,2018-06-14 16:35:46Z,2017-09-30,237417523,{'BusinessNameLine1Txt': 'AMERICAN SOCIETY OF INTERIOR DESIGN INC S CNTRL'},AMER,"{'AddressLine1Txt': '108 FELICIE DRIVE', 'CityNm': 'Lafayette', 'StateAbbreviationCd': 'LA', 'ZIPCd': '70506'}",,,,
2947254,,2023-04-26 12:10:37Z,2022-12-31,843705917,{'BusinessNameLine1Txt': 'MONTANA YOUTH DIABETES ALLIANCE INC'},MONT,"{'AddressLine1Txt': 'PO BOX 104', 'CityNm': 'COLUMBIA FALLS', 'StateAbbreviationCd': 'MT', 'ZIPCd': '59912'}",4064612185.0,,,
3035912,,2023-04-26 12:10:37Z,2022-12-31,251912781,{'BusinessNameLine1Txt': 'MOUNT CARMEL HEALTH INSURANCE COMPANY'},MOUN,"{'AddressLine1Txt': '3100 EASTON SQUARE PL 300', 'CityNm': 'COLUMBUS', 'StateAbbreviationCd': 'OH', 'ZIPCd': '432196290'}",6145464000.0,,,
728734,,2016-02-25 16:41:14Z,2014-06-30,205299027,{'BusinessNameLine1': 'MARYLAND CRIME VICTIMS RESOURCE FDN'},MARY,"{'AddressLine1': '1001 Prince Georges Blvd', 'City': 'Upper Marlboro', 'State': 'MD', 'ZIPCode': '20774'}",,,,


<br>Sidebar: generate separate dataset called *years* to get a rough sense of the spread of filings across years.

In [97]:
years = pd.DataFrame(df['fiscal_year'].value_counts())
years.index.name = 'year'
years = years.reset_index()
years = years.sort_values('year')
years

Unnamed: 0,year,count
15,2000,1
13,2001,2
14,2012,1
10,2013,103432
9,2014,210538
8,2015,228000
7,2016,240304
6,2017,251414
5,2018,261873
4,2019,276308


# Combine all columns where *len*==2

### Define Function to combine columns
In Python we can create a series of functions that can be used as shortcuts. First we'll create a function called ``combine`` that will combine two columns that are both related to the same 990 variable. It takes as *inputs* four things: our dataset/dataframe (*df*), the name we'd like for our new variable (*newvar*), the name of the first variable to combine (*var1*), and the name of the second variable to combine (*var2*).

In [28]:
#def combine(df, newvar, var1, var2):
#    df[newvar] = np.where(df[var1].notnull(), df[var1], df[var2])
#    #print(df[newvar].value_counts().head(), '\n')
#    #print('# of missing observations:', len(df[df[newvar].isnull()]))
#    #print('# of valid observations:', len(df[df[newvar].notnull()]), '\n')  
#    #return df.sample(5)[[newvar, var1, var2, 'DLN']] 
#    #print(df[[newvar, var1, var2, 'ObjectId']][:5], '\n\n\n')

##### New version of function for new XML filings

In [98]:
def combine(df, newvar, var1, var2):
    if var1 in df.columns and var2 in df.columns:
        df[newvar] = np.where(df[var1].notnull(), df[var1], df[var2])
    elif var1 in df.columns:
        #print(df[var1].value_counts().head())
        df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
    elif var2 in df.columns:
        #print(df[var2].value_counts().head())
        df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)        
    #print(df[newvar].value_counts().head(), '\n')
    #print('# of missing observations:', len(df[df[newvar].isnull()]))
    #print('# of valid observations:', len(df[df[newvar].notnull()]), '\n')  
    #return df.sample(5)[[newvar, var1, var2, 'DLN']] 
    #print(df[[newvar, var1, var2, 'ObjectId']][:5], '\n\n\n')

#### Do initial check to ensure that no row has values in both columns
In the following code block we are initiating a ``loop``, a highly useful coding tactic. Note that in the last row we are asking the code to print out the number of observations in the dataset that are non-empty for both xpaths for a given variable, such as *FSAudited* and *FSAuditedInd* for our variable *F9_12_PC_FINCL_STMTS_AUDITED*. While we are fairly certain that our assumption is correct that *FSAudited* and *FSAuditedInd* are mutually exclusive (because the former is used in earlier years and the latter in later years), we need to do these types of data verifications in order to have faith in our data processing.

##### The previous code block below doesn't work because there is only one of the two variable names -- updated version is below (but is not really needed)

In [42]:
#for index, row in new_variables_df[new_variables_df['len']==2][:].iterrows():
#    #print(row['variable_name_new'])
#    print(row['variable_name_new'], row['original_names'][0], row['original_names'][1])
#    print('\t\t', len(df[df[row['original_names'][0]].notnull()]))
#    print('\t\t', len(df[df[row['original_names'][1]].notnull()]))
#    #print(len(df[(df[row['original_names'][0]].isnull()) & (df[row['original_names'][1]].isnull())]), '\n\n')      
#    print('OK IF ZERO:', len(df[(df[row['original_names'][0]].notnull()) & (df[row['original_names'][1]].notnull())]), '\n\n')

In [99]:
new_variables_df[:10]

Unnamed: 0,variable_name_new,original_names,data_type_xsd,binarize,len
0,F9_00_HD_ADDR_CHANGE,"[AddressChange, AddressChangeInd]",CheckboxType,binarize,2
1,F9_00_HD_AMENDED_RETURN,"[AmendedReturnInd, AmendedReturn]",CheckboxType,binarize,2
2,F9_00_HD_BUILD_TIME_STAMP,[BuildTS],TimestampType,,1
3,F9_00_HD_CTRY_OF_DOMICILE,"[CountryLegalDomicile, LegalDomicileCountryCd]",CountryType,,2
4,F9_00_HD_EXEMPT_STATUS_4847A1,"[Organization4947a1NotPFInd, Organization4947a1]",CheckboxType,binarize,2
5,F9_00_HD_EXEMPT_STATUS_501C,"[Organization501c, Organization501cInd]",CheckboxType,binarize_with_dict,2
6,F9_00_HD_EXEMPT_STATUS_501C3,"[Organization501c3, Organization501c3Ind]",CheckboxType,binarize,2
7,F9_00_HD_FILER_ADDR_US_L1,[Filer],StreetAddressType,,1
8,F9_00_HD_FILER_ADDR_US_L2,[Filer],StreetAddressType,,1
9,F9_00_HD_FILER_CITY_US,[Filer],CityType,,1


# Better way
This approach:

First collects all the column pairs you need to check
Then processes them using vectorized pandas operations rather than row-by-row iteration
Summarizes the results in a single DataFrame

The key performance improvement comes from using vectorized methods like .notnull().sum() instead of filtering with len(df[df[...]]) inside a loop, which forces pandas to create new temporary DataFrames for each iteration.
If you need to print the exact same output format as your original code, you can add this after creating the results_df:

### Replace 'NaN' (string) with `np.nan`

In [108]:
%%time
# Before replacement, check how many string 'NaN' values you have
string_nan_count = (df == 'NaN').sum().sum()
print(f"Total string 'NaN' values in dataset: {string_nan_count}")

Total string 'NaN' values in dataset: 172804630


In [109]:
%%time
# Replace string 'NaN' with np.nan throughout the entire dataframe
df = df.replace('NaN', np.nan)

CPU times: total: 1min 43s
Wall time: 1min 54s


In [110]:
%%time
# Verify the replacement
post_replacement_count = (df == 'NaN').sum().sum()
print(f"Remaining string 'NaN' values: {post_replacement_count}")

Remaining string 'NaN' values: 0
CPU times: total: 1min
Wall time: 1min 5s


In [None]:
%%time
# Also check if there are other string variants of NaN
#variants = ['nan', 'Nan', 'NAN', '<NA>']
#for variant in variants:
#    count = (df == variant).sum().sum()
#    if count > 0:
#        print(f"Found {count} instances of '{variant}'")
#        df = df.replace(variant, np.nan)

One set of columns has 'nan': 
```python
print(len(df[df['Officer'].notnull()&df['BusinessOfficerGrp'].notnull()]))
```

In [116]:
%%time
df = df.replace('nan', np.nan)

CPU times: total: 1min 38s
Wall time: 1min 49s


#### Updated code

In [117]:
%%time
# Create a list of all column pairs to check
column_pairs = []
for index, row in new_variables_df[new_variables_df['len']==2].iterrows():
    col1, col2 = row['original_names'][0], row['original_names'][1]
    var_name = row['variable_name_new']
    
    # Check if both columns exist in the DataFrame
    if col1 in df.columns and col2 in df.columns:
        column_pairs.append((var_name, col1, col2))

# Now process all pairs at once using vectorized operations
results = []
for var_name, col1, col2 in column_pairs:
    # Count non-null values in each column
    count1 = df[col1].notnull().sum()
    count2 = df[col2].notnull().sum()
    
    # Count rows where BOTH columns have values (should be zero)
    overlap_count = (df[col1].notnull() & df[col2].notnull()).sum()
    
    results.append({
        'variable_name_new': var_name,
        'column1': col1,
        'column2': col2,
        'count1': count1,
        'count2': count2,
        'overlap': overlap_count
    })

# Convert to DataFrame for easy viewing
results_df = pd.DataFrame(results)
results_df[:5]

CPU times: total: 1min 39s
Wall time: 1min 46s


Unnamed: 0,variable_name_new,column1,column2,count1,count2,overlap
0,F9_00_HD_ADDR_CHANGE,AddressChange,AddressChangeInd,19701,118486,0
1,F9_00_HD_AMENDED_RETURN,AmendedReturnInd,AmendedReturn,36676,4675,0
2,F9_00_HD_CTRY_OF_DOMICILE,CountryLegalDomicile,LegalDomicileCountryCd,272,2207,0
3,F9_00_HD_EXEMPT_STATUS_4847A1,Organization4947a1NotPFInd,Organization4947a1,1724,737,0
4,F9_00_HD_EXEMPT_STATUS_501C,Organization501c,Organization501cInd,145796,711309,0


In [118]:
results_df[results_df['overlap']>0]

Unnamed: 0,variable_name_new,column1,column2,count1,count2,overlap


In [120]:
#print(len(df[df['Officer'].notnull()&df['BusinessOfficerGrp'].notnull()]))

In [119]:
df[df['Officer'].notnull()&df['BusinessOfficerGrp'].notnull()][:5][['Officer', 'BusinessOfficerGrp']]

Unnamed: 0,Officer,BusinessOfficerGrp


### Old Way

In [26]:
%%time
for index, row in new_variables_df[new_variables_df['len']==2][:].iterrows():
    print(row['variable_name_new'], row['original_names'][0], row['original_names'][1])
    if (row['original_names'][0] in df.columns and row['original_names'][1] in df.columns):
        print('yes')
        print('\t\t', len(df[df[row['original_names'][0]].notnull()]))
        print('\t\t', len(df[df[row['original_names'][1]].notnull()]))
        print('OK IF ZERO:', len(df[(df[row['original_names'][0]].notnull()) & (df[row['original_names'][1]].notnull())]), '\n\n')
    elif row['original_names'][0] in df.columns:
        print('only', row['original_names'][0] )
        print('\t\t', len(df[df[row['original_names'][0]].notnull()]))
    elif row['original_names'][1] in df.columns:
        print('only', row['original_names'][1] )   
        print('\t\t', len(df[df[row['original_names'][1]].notnull()]))

F9_00_HD_ADDR_CHANGE AddressChange AddressChangeInd
only AddressChangeInd
		 36528
F9_00_HD_AMENDED_RETURN AmendedReturn AmendedReturnInd
only AmendedReturnInd
		 14148
F9_00_HD_CTRY_OF_DOMICILE LegalDomicileCountryCd CountryLegalDomicile
only LegalDomicileCountryCd
		 878
F9_00_HD_EXEMPT_STATUS_4847A1 Organization4947a1 Organization4947a1NotPFInd
only Organization4947a1NotPFInd
		 573
F9_00_HD_EXEMPT_STATUS_501C Organization501cInd Organization501c
only Organization501cInd
		 226380
F9_00_HD_EXEMPT_STATUS_501C3 Organization501c3 Organization501c3Ind
only Organization501c3Ind
		 665027
F9_00_HD_FINAL_RETURN FinalReturnInd TerminatedReturn
only FinalReturnInd
		 5153
F9_00_HD_GROSS_EXEMPT_NUM GroupExemptionNumber GroupExemptionNum
only GroupExemptionNum
		 29605
F9_00_HD_GROSS_RCPT GrossReceipts GrossReceiptsAmt
only GrossReceiptsAmt
		 891980
F9_00_HD_GROUP_RETURN GroupReturnForAffiliatesInd GroupReturnForAffiliates
only GroupReturnForAffiliatesInd
		 891980
F9_00_HD_INCLUDES_SUBORD_OR

		 187714
F9_03_PC_PROG_SVC_ACC_3_DESC Activity3 ProgSrvcAccomActy3Grp
only ProgSrvcAccomActy3Grp
		 187714
F9_03_PC_PROG_SVC_ACC_3_EXP Activity3 ProgSrvcAccomActy3Grp
only ProgSrvcAccomActy3Grp
		 187714
F9_03_PC_PROG_SVC_ACC_3_GRNT Activity3 ProgSrvcAccomActy3Grp
only ProgSrvcAccomActy3Grp
		 187714
F9_03_PC_PROG_SVC_ACC_3_REV Activity3 ProgSrvcAccomActy3Grp
only ProgSrvcAccomActy3Grp
		 187714
F9_03_PC_TOT_OTH_PROG_SVC_EXP TotalOfOtherProgramServiceExp TotalOtherProgSrvcExpenseAmt
only TotalOtherProgSrvcExpenseAmt
		 137384
F9_03_PC_TOT_OTH_PROG_SVC_GRNT TotalOfOtherProgramServiceGrnt TotalOtherProgSrvcGrantAmt
only TotalOtherProgSrvcGrantAmt
		 72474
F9_03_PC_TOT_OTH_PROG_SVC_REV TotalOtherProgSrvcRevenueAmt TotalOfOtherProgramServiceRev
only TotalOtherProgSrvcRevenueAmt
		 84315
F9_03_PC_TOT_PROG_SVC_EXPENSE TotalProgramServiceExpensesAmt TotalProgramServiceExpense
only TotalProgramServiceExpensesAmt
		 759368
F9_03_PZ_MISSION_DESCRIPTION MissionDescription MissionDesc
only Missio

		 611416
F9_08_PC_CONTS_REPRTD_FNDRAISNG CntrbtnsRprtdFundraisingEvents ContriRptFundraisingEventAmt
only ContriRptFundraisingEventAmt
		 150470
F9_08_PC_COST_OF_GOODS_SOLD CostOfGoodsSold CostOfGoodsSoldAmt
only CostOfGoodsSoldAmt
		 135694
F9_08_PC_FEDERATED_CAMPAIGNS FederatedCampaigns FederatedCampaignsAmt
only FederatedCampaignsAmt
		 77700
F9_08_PC_FUNDRAISING_DIRECT_EXP FundraisingDirectExpensesAmt FundraisingDirectExpenses
only FundraisingDirectExpensesAmt
		 213137
F9_08_PC_FUNDRAISING_EVENTS FundraisingAmt FundraisingEvents
only FundraisingAmt
		 155291
F9_08_PC_FUNDRAISING_GROSS_INC FundraisingGrossIncomeAmt GrossIncomeFundraisingEvents
only FundraisingGrossIncomeAmt
		 221484
F9_08_PC_GAMING_DIRECT_EXPENSES GamingDirectExpenses GamingDirectExpensesAmt
only GamingDirectExpensesAmt
		 75294
F9_08_PC_GAMING_GROSS_INCOME GrossIncomeGaming GamingGrossIncomeAmt
only GamingGrossIncomeAmt
		 77463
F9_08_PC_GOVERNMENT_GRANTS GovernmentGrantsAmt GovernmentGrants
only GovernmentGrant

		 891980
F9_10_ASSETS_ACC_NET_EOY AccountsReceivable AccountsReceivableGrp
only AccountsReceivableGrp
		 477612
F9_10_ASSETS_EXP_PREPAID_EOY PrepaidExpensesDeferredCharges PrepaidExpensesDefrdChargesGrp
only PrepaidExpensesDefrdChargesGrp
		 465123
F9_10_ASSETS_INTANGIB_EOY IntangibleAssets IntangibleAssetsGrp
only IntangibleAssetsGrp
		 201674
F9_10_ASSETS_INVENT_SALE_EOY InventoriesForSaleOrUse InventoriesForSaleOrUseGrp
only InventoriesForSaleOrUseGrp
		 278199
F9_10_ASSETS_LESS_DEPREC_EOY LandBldgEquipBasisNetGrp LandBuildingsEquipmentBasisNet
only LandBldgEquipBasisNetGrp
		 646771
F9_10_ASSETS_LOANS_DISQUAL_EOY ReceivablesFromDisqualPersons RcvblFromDisqualifiedPrsnGrp
only RcvblFromDisqualifiedPrsnGrp
		 177139
F9_10_ASSETS_NOTES_LOANS_NET_EOY OtherNotesLoansReceivableNet OthNotesLoansReceivableNetGrp
only OthNotesLoansReceivableNetGrp
		 220347
F9_10_ASSETS_OTH_EOY OtherAssetsTotal OtherAssetsTotalGrp
only OtherAssetsTotalGrp
		 430652
F9_10_ASSETS_PLEDGES_NET_EOY PledgesAndGr

### Combine
Now we can loop over every row in ``new_variables_df`` and combine the relevant variables. We are getting some 'performance warnings' here but we can ignore those. We do need to check the *combo_fails* output at the very end, however. We are hoping for an *empty list* - that is, ``[]``. If a variable is listed there it means the combining failed.

In [27]:
%%time
combo_fails = []
for index, row in new_variables_df[new_variables_df['len']==2][:].iterrows():
    print(row['variable_name_new'], row['original_names'][0], row['original_names'][1])
    try:
        combine(df, row['variable_name_new'], row['original_names'][0], row['original_names'][1])
    except:
        print('\n\n\n\n\n***********issue with variable: ', row['variable_name_new'])
        combo_fails.append(row['variable_name_new'])

print(combo_fails)

F9_00_HD_ADDR_CHANGE AddressChange AddressChangeInd
F9_00_HD_AMENDED_RETURN AmendedReturn AmendedReturnInd
F9_00_HD_CTRY_OF_DOMICILE LegalDomicileCountryCd CountryLegalDomicile
F9_00_HD_EXEMPT_STATUS_4847A1 Organization4947a1 Organization4947a1NotPFInd
F9_00_HD_EXEMPT_STATUS_501C Organization501cInd Organization501c
F9_00_HD_EXEMPT_STATUS_501C3 Organization501c3 Organization501c3Ind
F9_00_HD_FINAL_RETURN FinalReturnInd TerminatedReturn
F9_00_HD_GROSS_EXEMPT_NUM GroupExemptionNumber GroupExemptionNum
F9_00_HD_GROSS_RCPT GrossReceipts GrossReceiptsAmt
F9_00_HD_GROUP_RETURN GroupReturnForAffiliatesInd GroupReturnForAffiliates
F9_00_HD_INCLUDES_SUBORD_ORGS AllAffiliatesIncludedInd AllAffiliatesIncluded
F9_00_HD_INITIAL_RETURN InitialReturn InitialReturnInd
F9_00_HD_PRIN_OFF_NAME NameOfPrincipalOfficerPerson PrincipalOfficerNm
F9_00_HD_SIGNING_OFFICER_SIGNTR BusinessOfficerGrp Officer
F9_00_HD_SPECIAL_CONDITION_DESC SpecialConditionDescription SpecialConditionDesc
F9_00_HD_STATE_OF_DOMICILE

  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_04_PC_TRANS_TO_CNTRLD_ENT TrnsfrExmptNonChrtblRltdOrgInd TransfersToExemptNonChrtblOrg
F9_04_PC_TRANS_WITH_CNTRLD_ENT TransactionWithControlEntInd TransactionRelatedEntity
F9_05_EXP_SCHED_O_X InfoInScheduleOPartVInd InfoInScheduleOPartV
F9_05_PC_NUMBER_EMPLOYEES_W3 EmployeeCnt NumberOfEmployees


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_05_PC_NUMBER_FORMS_1096 NumberFormsTransmittedWith1096 IRPDocumentCnt
F9_05_PC_UNRELATED_BUS_INCOME UnrelatedBusIncmOverLimitInd UnrelatedBusinessIncome
F9_06_EXP_SCHED_O_X InfoInScheduleOPartVI InfoInScheduleOPartVIInd
F9_06_PC_990_PROVIDED_GOV_BODY Form990ProvidedToGoverningBody Form990ProvidedToGvrnBodyInd
F9_06_PC_ANNUAL_DISC_COVRD_PERS AnnualDisclosureCoveredPersons AnnualDisclosureCoveredPrsnInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_06_PC_CEO_COMPENSTN_PROCESS CompensationProcessCEOInd CompensationProcessCEO
F9_06_PC_CHANGES_ORGANIZING_DOCS ChangeToOrgDocumentsInd ChangesToOrganizingDocs
F9_06_PC_CONFLICT_OF_INTEREST ConflictOfInterestPolicyInd ConflictOfInterestPolicy
F9_06_PC_DECISIONS_SUBJ_APPROVAL DecisionsSubjectToApproval DecisionsSubjectToApprovaInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_06_PC_DELEGATION_MGT_DUTIES DelegationOfMgmtDutiesInd DelegationOfManagementDuties
F9_06_PC_DELEGATION_OF_MGT DelegationOfMgmtDutiesInd DelegationOfManagementDuties
F9_06_PC_DOCUMENT_RET_POLICY DocumentRetentionPolicyInd DocumentRetentionPolicy
F9_06_PC_ELECTION_BOARD_MEMBERS ElectionOfBoardMembersInd ElectionOfBoardMembers
F9_06_PC_FAMILY_OR_BUSINESS_REL FamilyOrBusinessRelationship FamilyOrBusinessRlnInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_06_PC_FORM_AVAIL_OWN_WEBSITE OwnWebsiteInd OwnWebsite
F9_06_PC_FORM_UPON_REQUEST UponRequest UponRequestInd
F9_06_PC_JOINT_VENTURE_INVESTMNT InvestmentInJointVenture InvestmentInJointVentureInd
F9_06_PC_JOINT_VENTURE_POLICY WrittenPolicyOrProcedure WrittenPolicyOrProcedureInd
F9_06_PC_LOCAL_CHAPTERS LocalChapters LocalChaptersInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_06_PC_MATERIAL_DIVERSION MaterialDiversionOrMisuseInd MaterialDiversionOrMisuse
F9_06_PC_MEMBERS_OR_STOCKHOLDERS MembersOrStockholders MembersOrStockholdersInd
F9_06_PC_MINUTES_COMMITTEES MinutesOfCommitteesInd MinutesOfCommittees
F9_06_PC_MINUTES_GOVERNING_BODY MinutesOfGoverningBody MinutesOfGoverningBodyInd
F9_06_PC_MONITORING_OF_COI_POLICY RegularMonitoringEnfrcInd RegularMonitoringEnforcement


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_06_PC_NUM_IND_VOTING_MEMBERS NumberIndependentVotingMembers IndependentVotingMemberCnt
F9_06_PC_NUM_VOTING_GOV_MEMBERS GoverningBodyVotingMembersCnt NbrVotingGoverningBodyMembers
F9_06_PC_OFFICER_MAILING_ADDRESS OfficerMailingAddressInd OfficerMailingAddress
F9_06_PC_OTHER_COMPENSTN_PROCESS CompensationProcessOtherInd CompensationProcessOther
F9_06_PC_OTHER_WEBSITE OtherWebsite OtherWebsiteInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_06_PC_OWN_WEBSITE OwnWebsiteInd OwnWebsite
F9_06_PC_POLICIES_GOVERN_CHAPTER PoliciesReferenceChapters PoliciesReferenceChaptersInd
F9_06_PC_STATES_WHERE_RET_FILED StatesWhereCopyOfReturnIsFiled StatesWhereCopyOfReturnIsFldCd
F9_06_PC_WHISTLEBLOWER_POLICY WhistleblowerPolicyInd WhistleblowerPolicy
F9_07_EXP_SCHED_O_X InfoInScheduleOPartVII InfoInScheduleOPartVIIInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_07_PC_COMPENSATION_OTHER_SRCE CompensationFromOtherSrcsInd CompensationFromOtherSources
F9_07_PC_FORMER_OFFICER_LISTED FormersListed FormerOfcrEmployeesListedInd
F9_07_PC_NO_LISTED_PERS_COMPENSD NoListedPersonsCompensatedInd NoListedPersonsCompensated
F9_07_PC_NUM_CONTRCTRS_GRTR_100K NumberOfContractorsGT100K CntrctRcvdGreaterThan100KCnt
F9_07_PC_NUM_INDS_GREATER_100K IndivRcvdGreaterThan100KCnt NumberIndividualsGT100K


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_07_PC_TOTAL_COMP_GRTR_150K TotalCompGT150K TotalCompGreaterThan150KInd
F9_07_PC_TOT_OTHER_COMPENSATION TotalOtherCompensationAmt TotalOtherCompensation
F9_07_PC_TOT_REPRT_COMP_FROM_ORG TotalReportableCompFromOrg TotalReportableCompFromOrgAmt
F9_07_PC_TOT_REPRT_COMP_RLTD_ORG TotReportableCompRltdOrgAmt TotalReportableCompFrmRltdOrgs
F9_08_EXP_SCHED_O_X InfoInScheduleOPartVIIIInd InfoInScheduleOPartVIII


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_08_PC_ALL_OTHER_CONTRIBUTIONS AllOtherContributionsAmt AllOtherContributions
F9_08_PC_CONTS_REPRTD_FNDRAISNG CntrbtnsRprtdFundraisingEvents ContriRptFundraisingEventAmt
F9_08_PC_COST_OF_GOODS_SOLD CostOfGoodsSold CostOfGoodsSoldAmt
F9_08_PC_FEDERATED_CAMPAIGNS FederatedCampaigns FederatedCampaignsAmt
F9_08_PC_FUNDRAISING_DIRECT_EXP FundraisingDirectExpensesAmt FundraisingDirectExpenses


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_08_PC_FUNDRAISING_EVENTS FundraisingAmt FundraisingEvents
F9_08_PC_FUNDRAISING_GROSS_INC FundraisingGrossIncomeAmt GrossIncomeFundraisingEvents
F9_08_PC_GAMING_DIRECT_EXPENSES GamingDirectExpenses GamingDirectExpensesAmt
F9_08_PC_GAMING_GROSS_INCOME GrossIncomeGaming GamingGrossIncomeAmt
F9_08_PC_GOVERNMENT_GRANTS GovernmentGrantsAmt GovernmentGrants


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_08_PC_GROSS_SALES_INVENTORY GrossSalesOfInventory GrossSalesOfInventoryAmt
F9_08_PC_MEMBERSHIP_DUES MembershipDues MembershipDuesAmt
F9_08_PC_NONCASH_CONTRIBUTIONS NoncashContributions NoncashContributionsAmt
F9_08_PC_PROGRAM_SVCE_REV_TOTAL TotalProgramServiceRevenueAmt TotalProgramServiceRevenue
F9_08_PC_RELATED_ORGANIZATIONS RelatedOrganizations RelatedOrganizationsAmt


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_08_PC_TOTAL_CONTRIBUTIONS TotalContributions TotalContributionsAmt
F9_08_PC_TOTAL_OTHER_REVENUE OtherRevenueTotalAmt TotalOtherRevenue
F9_08_PC_TOTAL_PROG_SVCE_REVENUE TotalProgramServiceRevenueAmt TotalProgramServiceRevenue
F9_08_PC_TOTAL_REVENUE TotalRevenueGrp TotalRevenue


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_EXP_AD_PROMO_TOT AdvertisingGrp Advertising
F9_09_EXP_BENF_PAID_MEMB_TOT BenefitsToMembersGrp BenefitsToMembers
F9_09_EXP_CONF_MEETING_TOT ConferencesMeetings ConferencesMeetingsGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_EXP_DEPREC_FUNDR DepreciationDepletionGrp DepreciationDepletion
F9_09_EXP_DEPREC_MAG DepreciationDepletionGrp DepreciationDepletion
F9_09_EXP_DEPREC_PROG DepreciationDepletionGrp DepreciationDepletion


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_EXP_DEPREC_TOT DepreciationDepletionGrp DepreciationDepletion
F9_09_EXP_GRANT_FRGN_TOT ForeignGrants ForeignGrantsGrp
F9_09_EXP_GRANT_INDIV_DMSTC_TOT GrantsToDomesticIndividuals GrantsToDomesticIndividualsGrp
F9_09_EXP_GRANT_ORG_DMSTC_TOT GrantsToDomesticOrgs GrantsToDomesticOrgsGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_EXP_INFO_TECH_TOT InformationTechnology InformationTechnologyGrp
F9_09_EXP_INSURANCE_TOT InsuranceGrp Insurance
F9_09_EXP_INTEREST_TOT InterestGrp Interest


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_EXP_JOINT_COSTS_TOT TotalJointCostsGrp TotalJointCosts
F9_09_EXP_OCCUPANCY_TOT OccupancyGrp Occupancy
F9_09_EXP_OFFICE_TOT OfficeExpenses OfficeExpensesGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_EXP_OTH_OTH_TOT AllOtherExpenses AllOtherExpensesGrp
F9_09_EXP_OTH_TOT OtherExpenses OtherExpensesGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_EXP_ROY_TOT Royalties RoyaltiesGrp
F9_09_EXP_SCHED_O_X InfoInScheduleOPartIXInd InfoInScheduleOPartIX
F9_09_EXP_TRAVEL_ENTRTNMNT_TOT TravelEntrtnmntPublicOfficials PymtTravelEntrtnmntPubOfclGrp
F9_09_EXP_TRAVEL_TOT TravelGrp Travel


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_PC_COMP_DISQUAL_FUNDRAISE CompDisqualPersonsGrp CompDisqualPersons
F9_09_PC_COMP_DISQUAL_MGMT CompDisqualPersonsGrp CompDisqualPersons
F9_09_PC_COMP_DISQUAL_PROG_SVCE CompDisqualPersonsGrp CompDisqualPersons
F9_09_PC_COMP_DISQUAL_TOTAL CompDisqualPersonsGrp CompDisqualPersons


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_PC_COMP_OFFICERS_FUNDRAISE CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors
F9_09_PC_COMP_OFFICERS_MGMT CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors
F9_09_PC_COMP_OFFICERS_PROG_SVCE CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_PC_COMP_OFFICERS_TOTAL CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors
F9_09_PC_FEES_FOR_SVCE_ACCT_TOT FeesForServicesAccountingGrp FeesForServicesAccounting
F9_09_PC_FEES_FOR_SVCE_INVST_TOT FeesForSrvcInvstMgmntFeesGrp FeesForServicesInvstMgmntFees


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_PC_FEES_FOR_SVCE_LEGL_TOT FeesForServicesLegal FeesForServicesLegalGrp
F9_09_PC_FEES_FOR_SVCE_LOBB_TOT FeesForServicesLobbyingGrp FeesForServicesLobbying
F9_09_PC_FEES_FOR_SVCE_MGMT_TOT FeesForServicesManagement FeesForServicesManagementGrp
F9_09_PC_FEES_FOR_SVCE_OTH_TOT FeesForServicesOtherGrp FeesForServicesOther


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_PC_OTHER_EMP_BEN_FUNDRAISE OtherEmployeeBenefits OtherEmployeeBenefitsGrp
F9_09_PC_OTHER_EMP_BEN_MGMT OtherEmployeeBenefits OtherEmployeeBenefitsGrp
F9_09_PC_OTHER_EMP_BEN_PROG_SVCE OtherEmployeeBenefits OtherEmployeeBenefitsGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_PC_OTHER_EMP_BEN_TOTAL OtherEmployeeBenefits OtherEmployeeBenefitsGrp
F9_09_PC_OTHER_SALARY_FUNDRAISE OtherSalariesAndWagesGrp OtherSalariesAndWages
F9_09_PC_OTHER_SALARY_MGMT OtherSalariesAndWagesGrp OtherSalariesAndWages


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_PC_OTHER_SALARY_PROG_SVCE OtherSalariesAndWagesGrp OtherSalariesAndWages
F9_09_PC_OTHER_SALARY_TOTAL OtherSalariesAndWagesGrp OtherSalariesAndWages
F9_09_PC_PAYMENT_TO_AFFILIATES PaymentsToAffiliatesGrp PaymentsToAffiliates


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_PC_PAYROLL_TAX_FUNDRAISE PayrollTaxesGrp PayrollTaxes
F9_09_PC_PAYROLL_TAX_MGMT PayrollTaxesGrp PayrollTaxes
F9_09_PC_PAYROLL_TAX_PROG_SVCE PayrollTaxesGrp PayrollTaxes


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_09_PC_PAYROLL_TAX_TOTAL PayrollTaxesGrp PayrollTaxes
F9_09_PC_PENSION_CONT_FUNDRAISE PensionPlanContributions PensionPlanContributionsGrp
F9_09_PC_PENSION_CONT_MGMT PensionPlanContributions PensionPlanContributionsGrp
F9_09_PC_PENSION_CONT_PROG_SVCE PensionPlanContributions PensionPlanContributionsGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_PC_PENSION_CONT_TOTAL PensionPlanContributions PensionPlanContributionsGrp
F9_09_PC_TOTAL_FUNC_EXPENSES TotalFunctionalExpenses TotalFunctionalExpensesGrp
F9_09_PC_TOTAL_FUNDRAISE_EXPENSE TotalFunctionalExpenses TotalFunctionalExpensesGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_09_PC_TOTAL_MGMT_EXPENSE TotalFunctionalExpenses TotalFunctionalExpensesGrp
F9_09_PC_TOTAL_PROG_SVCE_EXPENSE TotalFunctionalExpenses TotalFunctionalExpensesGrp
F9_10_ASSETS_ACC_NET_EOY AccountsReceivable AccountsReceivableGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_10_ASSETS_EXP_PREPAID_EOY PrepaidExpensesDeferredCharges PrepaidExpensesDefrdChargesGrp
F9_10_ASSETS_INTANGIB_EOY IntangibleAssets IntangibleAssetsGrp
F9_10_ASSETS_INVENT_SALE_EOY InventoriesForSaleOrUse InventoriesForSaleOrUseGrp
F9_10_ASSETS_LESS_DEPREC_EOY LandBldgEquipBasisNetGrp LandBuildingsEquipmentBasisNet


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_10_ASSETS_LOANS_DISQUAL_EOY ReceivablesFromDisqualPersons RcvblFromDisqualifiedPrsnGrp
F9_10_ASSETS_NOTES_LOANS_NET_EOY OtherNotesLoansReceivableNet OthNotesLoansReceivableNetGrp
F9_10_ASSETS_OTH_EOY OtherAssetsTotal OtherAssetsTotalGrp
F9_10_ASSETS_PLEDGES_NET_EOY PledgesAndGrantsReceivableGrp PledgesAndGrantsReceivable


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_10_LIAB_ACC_PAYABLE_EOY AccountsPayableAccrExpnssGrp AccountsPayableAccruedExpenses
F9_10_LIAB_GRANTS_PAYABLE_EOY GrantsPayableGrp GrantsPayable
F9_10_LIAB_LOANS_OFF_EOY LoansFromOfficersDirectorsGrp LoansFromOfficersDirectors
F9_10_LIAB_REV_DEFERRED_EOY DeferredRevenueGrp DeferredRevenue


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_10_NAFB_RESTRICT_PERM_EOY PermanentlyRestrictedNetAssets PermanentlyRstrNetAssetsGrp
F9_10_NAFB_RESTRICT_TEMP_EOY TemporarilyRestrictedNetAssets TemporarilyRstrNetAssetsGrp
F9_10_NAFB_UNRESTRICT_EOY UnrestrictedNetAssets UnrestrictedNetAssetsGrp
F9_10_PC_BOND_LIABILITY_EOY TaxExemptBondLiabilitiesGrp TaxExemptBondLiabilities
F9_10_PC_CASH_NON_INTEREST_BOY CashNonInterestBearingGrp CashNonInterestBearing


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_10_PC_CASH_NON_INTEREST_EOY CashNonInterestBearingGrp CashNonInterestBearing
F9_10_PC_ESCROW_LIABILITY_EOY EscrowAccountLiabilityGrp EscrowAccountLiability
F9_10_PC_INVEST_OTHER_SEC_EOY InvestmentsOtherSecuritiesGrp InvestmentsOtherSecurities


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_10_PC_INVEST_PROG_RELTD_EOY InvestmentsProgramRelated InvestmentsProgramRelatedGrp
F9_10_PC_INVEST_PUB_TRADED_EOY InvestmentsPubTradedSecGrp InvestmentsPubTradedSecurities
F9_10_PC_LAND_BLDG_EQPMT LandBldgEquipCostOrOtherBssAmt LandBuildingsEquipmentBasis


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN LandBldgEquipAccumDeprecAmt LandBldgEquipmentAccumDeprec
F9_10_PC_LOANS_FROM_OFFICERS_EOY LoansFromOfficersDirectorsGrp LoansFromOfficersDirectors
F9_10_PC_ORG_FOLLOWS_SFAS117 OrganizationFollowsSFAS117Ind FollowSFAS117
F9_10_PC_ORG_NOT_FOLLOW_SFAS117 DoNotFollowSFAS117 OrgDoesNotFollowSFAS117Ind
F9_10_PC_OTHER_LIABILITIES_EOY OtherLiabilities OtherLiabilitiesGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_10_PC_RET_EARNINGS_ENDWMT_EOY RtnEarnEndowmentIncmOthFndsGrp RetainedEarningsEndowmentEtc
F9_10_PC_SAVINGS_TEMP_INVEST_BOY SavingsAndTempCashInvestments SavingsAndTempCashInvstGrp
F9_10_PC_SAVINGS_TEMP_INVEST_EOY SavingsAndTempCashInvestments SavingsAndTempCashInvstGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_10_PC_SECURED_MORTGAGES_EOY MortgNotesPyblScrdInvstPropGrp MortNotesPyblSecuredInvestProp
F9_10_PC_SECURE_MORT_NOTES_EOY MortgNotesPyblScrdInvstPropGrp MortNotesPyblSecuredInvestProp
F9_10_PC_UNSECURED_LOANS_EOY UnsecuredNotesLoansPayable UnsecuredNotesLoansPayableGrp
F9_10_PC_UNSECURED_NOTES_BOY UnsecuredNotesLoansPayable UnsecuredNotesLoansPayableGrp


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_10_PC_UNSECURED_NOTES_EOY UnsecuredNotesLoansPayable UnsecuredNotesLoansPayableGrp
F9_10_PZ_TOTAL_ASSETS_EOY TotalAssets TotalAssetsGrp
F9_10_SCHED_O_X InfoInScheduleOPartXInd InfoInScheduleOPartX


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_11_PC_RECNCLTN_DONATED_SVCES ReconcilationDonatedServices DonatedServicesAndUseFcltsAmt
F9_11_PC_RECNCLTN_INVSTMNT_EXP ReconcilationInvestExpenses InvestmentExpenseAmt
F9_11_PC_RECNCLTN_PRIOR_PER_ADJ ReconcilationPriorAdjustment PriorPeriodAdjustmentsAmt
F9_11_PC_RECNCLTN_REV_LESS_EXP ReconcilationRevenueExpnssAmt ReconcilationRevenueExpenses
F9_11_PC_RECNCLTN_UNRLZD_GAIN NetUnrlzdGainsLossesInvstAmt ReconciliationUnrealizedInvest


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)


F9_11_SCHED_O_X InfoInScheduleOPartXIInd InfoInScheduleOPartXI
F9_12_PC_ACCNT_COMPILE_OR_REVIEW AccountantCompileOrReview AccountantCompileOrReviewInd
F9_12_PC_ACCTG_METHOD_ACCRUAL MethodOfAccountingAccrualInd MethodOfAccountingAccrual
F9_12_PC_ACCTG_METHOD_CASH MethodOfAccountingCash MethodOfAccountingCashInd
F9_12_PC_ACCTG_METHOD_OTHER MethodOfAccountingOther MethodOfAccountingOtherInd


  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)
  df[newvar] = np.where(df[var2].notnull(), df[var2], np.NaN)
  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


F9_12_PC_AUDIT_COMMITTEE AuditCommitteeInd AuditCommittee
F9_12_PC_FED_GRNT_AUDIT_PERFORMD FederalGrantAuditPerformed FederalGrantAuditPerformedInd
F9_12_PC_FED_GRNT_AUDIT_REQUIRED FederalGrantAuditRequiredInd FederalGrantAuditRequired
F9_12_PC_FINCL_STMTS_AUDITED FSAudited FSAuditedInd
F9_12_SCHED_O_X InfoInScheduleOPartXIIInd InfoInScheduleOPartXII
number_of_other_prog_svces ProgSrvcAccomActyOtherGrp ActivityOther
[]
CPU times: total: 18.1 s
Wall time: 21.3 s


  df[newvar] = np.where(df[var1].notnull(), df[var1], np.NaN)


# New Way to Combine Columns
The improvements include:

Using combine_first() which is pandas' built-in method for this exact use case
Handling cases where one or both columns might not exist in the dataframe
More informative error reporting that captures both the variable name and the specific error
Returning both the updated dataframe and a list of failures
Adding better progress reporting

This approach should be faster than your original version since it avoids the repeated function calls in a loop. It also handles edge cases more gracefully and provides better diagnostics if something goes wrong.

In [123]:
def prepare_for_save(df):
    """Safely prepare a large DataFrame for disk write without restarting the kernel."""
    print("🧼 Copying DataFrame to clear cached views...")
    df_clean = df.copy()

    print("🧹 Running garbage collection...")
    gc.collect()

    print("✅ Frame is cleaned and ready to save.")
    return df_clean

In [211]:
import gc
gc.collect()

1273

In [127]:
%%time
# Step 2: Clean any lazy eval or .head() artifacts
#df_clean = prepare_for_save(df)

🧼 Copying DataFrame to clear cached views...
🧹 Running garbage collection...
✅ Frame is cleaned and ready to save.
CPU times: total: 18.2 s
Wall time: 21.2 s


In [132]:
#del df_clean

In [133]:
# Vectorized version that processes all variables at once
def combine_all_variables(df, mapping_df):
    """
    Process all variable combinations at once without looping
    
    Parameters:
    -----------
    df: pandas DataFrame
        The main dataframe with data
    mapping_df: pandas DataFrame 
        DataFrame with variable mappings (new_variables_df)
        
    Returns:
    --------
    tuple: (updated_df, failed_variables)
    """
    # Work on a copy to avoid unexpected side effects
    result_df = df.copy()
    failed_vars = []
    
    # Filter for len=2 rows
    to_combine = mapping_df[mapping_df['len']==2]
    
    for _, row in to_combine.iterrows():
        new_var = row['variable_name_new']
        var1, var2 = row['original_names'][0], row['original_names'][1]
        
        try:
            # Check if both columns exist in the dataframe
            if var1 in df.columns and var2 in df.columns:
                # Combine the columns
                result_df[new_var] = df[var1].combine_first(df[var2])
                
                # Alternative using numpy for potential performance improvement
                # result_df[new_var] = np.where(df[var1].notnull(), df[var1], df[var2])
            
            # If only the first column exists
            elif var1 in df.columns:
                result_df[new_var] = df[var1]
                
            # If only the second column exists
            elif var2 in df.columns:
                result_df[new_var] = df[var2]
                
            else:
                # Neither column exists
                failed_vars.append((new_var, "Columns not found"))
                continue
                
            # Add debugging stats if needed
            # print(f"Created {new_var}: {result_df[new_var].notna().sum()} non-null values")
            
        except Exception as e:
            print(f"\n***********Issue with variable: {new_var}, Error: {str(e)}")
            failed_vars.append((new_var, str(e)))
    
    return result_df, failed_vars

#### Try this version next time instead to avert `PerformanceWarning`

In [212]:
def combine_all_variables(df, mapping_df):
    """
    Optimized version that avoids DataFrame fragmentation by creating 
    all columns at once
    """
    # Work on a copy to avoid unexpected side effects
    result_df = df.copy()
    
    # Filter for len=2 rows
    to_combine = mapping_df[mapping_df['len']==2]
    
    # Create a dictionary to store all new columns
    new_columns = {}
    failed_vars = []
    
    for _, row in to_combine.iterrows():
        new_var = row['variable_name_new']
        var1, var2 = row['original_names'][0], row['original_names'][1]
        
        try:
            # Check if both columns exist in the dataframe
            if var1 in df.columns and var2 in df.columns:
                # Store the combined column in the dictionary
                new_columns[new_var] = df[var1].combine_first(df[var2])
            
            # If only the first column exists
            elif var1 in df.columns:
                new_columns[new_var] = df[var1]
                
            # If only the second column exists
            elif var2 in df.columns:
                new_columns[new_var] = df[var2]
                
            else:
                # Neither column exists
                failed_vars.append((new_var, "Columns not found"))
                
        except Exception as e:
            print(f"\n***********Issue with variable: {new_var}, Error: {str(e)}")
            failed_vars.append((new_var, str(e)))
    
    # Create a DataFrame with all the new columns
    new_columns_df = pd.DataFrame(new_columns, index=df.index)
    
    # Combine with the original DataFrame
    result_df = pd.concat([result_df, new_columns_df], axis=1)
    
    return result_df, failed_vars

In [213]:
%%time
# Use the improved function
updated_df, failed_variables = combine_all_variables(df, new_variables_df)

# Report on results
print(f"Successfully combined {len(new_variables_df[new_variables_df['len']==2]) - len(failed_variables)} variables")
if failed_variables:
    print("\nFailed variables:")
    for var, reason in failed_variables:
        print(f"- {var}: {reason}")

Successfully combined 279 variables
CPU times: total: 3min 52s
Wall time: 4min 1s


In [214]:
failed_variables

[]

In [215]:
updated_df[:1]

Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,AddressChange,NameOfPrincipalOfficerPerson,GrossReceipts,GroupReturnForAffiliates,Organization501c3,WebSite,TypeOfOrganizationCorporation,YearFormation,StateLegalDomicile,ActivityOrMissionDescription,NbrVotingMembersGoverningBody,NbrIndependentVotingMembers,TotalNbrEmployees,TotalNbrVolunteers,TotalGrossUBI,NetUnrelatedBusinessTxblIncome,ContributionsGrantsPriorYear,ContributionsGrantsCurrentYear,ProgramServiceRevenuePriorYear,ProgramServiceRevenueCY,InvestmentIncomePriorYear,InvestmentIncomeCurrentYear,OtherRevenuePriorYear,OtherRevenueCurrentYear,TotalRevenuePriorYear,TotalRevenueCurrentYear,GrantsAndSimilarAmntsPriorYear,GrantsAndSimilarAmntsCY,BenefitsPaidToMembersPriorYear,BenefitsPaidToMembersCY,SalariesEtcPriorYear,SalariesEtcCurrentYear,TotalProfFundrsngExpPriorYear,TotalProfFundrsngExpCY,TotalFundrsngExpCurrentYear,OtherExpensePriorYear,OtherExpensesCurrentYear,TotalExpensesPriorYear,TotalExpensesCurrentYear,RevenuesLessExpensesPriorYear,RevenuesLessExpensesCY,TotalAssetsBOY,TotalAssetsEOY,TotalLiabilitiesBOY,TotalLiabilitiesEOY,NetAssetsOrFundBalancesBOY,NetAssetsOrFundBalancesEOY,InfoInScheduleOPartIII,MissionDescription,SignificantNewProgramServices,SignificantChange,Expense,Grants,Description,TotalProgramServiceExpense,PoliticalActivities,LobbyingActivities,ProfessionalFundraising,FundraisingActivities,Gaming,ExcessBenefitTransaction,PriorExcessBenefitTransaction,DisregardedEntity,RelatedEntity,RelatedOrgControlledEntity,TransactionRelatedEntity,TransfersToExemptNonChrtblOrg,ActivitiesConductedPartnership,NumberFormsTransmittedWith1096,NumberOfEmployees,UnrelatedBusinessIncome,InfoInScheduleOPartVI,NbrVotingGoverningBodyMembers,NumberIndependentVotingMembers,FamilyOrBusinessRelationship,DelegationOfManagementDuties,ChangesToOrganizingDocs,MaterialDiversionOrMisuse,MembersOrStockholders,ElectionOfBoardMembers,DecisionsSubjectToApproval,MinutesOfGoverningBody,MinutesOfCommittees,OfficerMailingAddress,LocalChapters,Form990ProvidedToGoverningBody,ConflictOfInterestPolicy,AnnualDisclosureCoveredPersons,RegularMonitoringEnforcement,WhistleblowerPolicy,DocumentRetentionPolicy,CompensationProcessCEO,CompensationProcessOther,InvestmentInJointVenture,StatesWhereCopyOfReturnIsFiled,UponRequest,NoListedPersonsCompensated,TotalReportableCompFromOrg,TotalReportableCompFrmRltdOrgs,TotalOtherCompensation,NumberIndividualsGT100K,FormersListed,TotalCompGT150K,CompensationFromOtherSources,NumberOfContractorsGT100K,AllOtherContributions,TotalContributions,TotalOtherRevenue,TotalRevenue,GrantsToDomesticOrgs,GrantsToDomesticIndividuals,FeesForServicesLegal,FeesForServicesAccounting,OfficeExpenses,PaymentsToAffiliates,DepreciationDepletion,OtherExpenses,AllOtherExpenses,TotalFunctionalExpenses,SavingsAndTempCashInvestments,AccountsReceivable,LandBuildingsEquipmentBasis,LandBldgEquipmentAccumDeprec,LandBuildingsEquipmentBasisNet,InvestmentsOtherSecurities,TotalAssets,AccountsPayableAccruedExpenses,GrantsPayable,OtherLiabilities,FollowSFAS117,UnrestrictedNetAssets,InfoInScheduleOPartXI,ReconcilationRevenueExpenses,InfoInScheduleOPartXII,MethodOfAccountingAccrual,AccountantCompileOrReview,FSAudited,AuditCommittee,FederalGrantAuditRequired,AllAffiliatesIncluded,GroupExemptionNumber,Revenue,PoliciesReferenceChapters,WrittenPolicyOrProcedure,TotalProgramServiceRevenue,ForeignGrants,BenefitsToMembers,CompCurrentOfficersDirectors,CompDisqualPersons,OtherSalariesAndWages,PensionPlanContributions,OtherEmployeeBenefits,PayrollTaxes,FeesForServicesManagement,FeesForServicesLobbying,F9_09_PC_FEES_FOR_SVCE_FR_TOT,FeesForServicesInvstMgmntFees,FeesForServicesOther,Advertising,InformationTechnology,Royalties,Occupancy,Travel,TravelEntrtnmntPublicOfficials,ConferencesMeetings,Interest,Insurance,CashNonInterestBearing,PledgesAndGrantsReceivable,ReceivablesFromDisqualPersons,OtherNotesLoansReceivableNet,InventoriesForSaleOrUse,PrepaidExpensesDeferredCharges,InvestmentsPubTradedSecurities,InvestmentsProgramRelated,IntangibleAssets,OtherAssetsTotal,DeferredRevenue,MortNotesPyblSecuredInvestProp,FederalGrantAuditPerformed,LoansFromOfficersDirectors,MethodOfAccountingCash,Activity2,Activity3,InfoInScheduleOPartVII,TaxExemptBondLiabilities,TemporarilyRestrictedNetAssets,OtherWebsite,PermanentlyRestrictedNetAssets,FundraisingEvents,CntrbtnsRprtdFundraisingEvents,RelatedOrganizations,GrossIncomeFundraisingEvents,FundraisingDirectExpenses,FederatedCampaigns,GovernmentGrants,MethodOfAccountingOther,GrossSalesOfInventory,CostOfGoodsSold,DoNotFollowSFAS117,RetainedEarningsEndowmentEtc,InitialReturn,MembershipDues,GrossIncomeGaming,GamingDirectExpenses,NoncashContributions,InfoInScheduleOPartV,OwnWebsite,UnsecuredNotesLoansPayable,ActivityOther,TotalOfOtherProgramServiceExp,TotalOfOtherProgramServiceRev,EscrowAccountLiability,TotalOfOtherProgramServiceGrnt,TypeOfOrganizationOther,Organization501c,TypeOfOrganizationTrust,TypeOfOrganizationAssociation,CountryLegalDomicile,AmendedReturn,TypeOfOrgOtherDescription,TotalJointCosts,TerminatedReturn,TerminationOrContraction,ActivityCode,SpecialConditionDescription,Organization4947a1,InfoInScheduleOPartIX,ReconciliationUnrealizedInvest,ReconcilationPriorAdjustment,ReconcilationDonatedServices,ReconcilationInvestExpenses,InfoInScheduleOPartVIII,InfoInScheduleOPartX,PrincipalOfficerNm,GrossReceiptsAmt,GroupReturnForAffiliatesInd,Organization501c3Ind,TypeOfOrganizationCorpInd,FormationYr,LegalDomicileStateCd,ActivityOrMissionDesc,VotingMembersGoverningBodyCnt,VotingMembersIndependentCnt,TotalEmployeeCnt,TotalGrossUBIAmt,CYContributionsGrantsAmt,CYProgramServiceRevenueAmt,CYInvestmentIncomeAmt,CYOtherRevenueAmt,CYTotalRevenueAmt,CYGrantsAndSimilarPaidAmt,CYBenefitsPaidToMembersAmt,CYSalariesCompEmpBnftPaidAmt,CYTotalProfFndrsngExpnsAmt,CYTotalFundraisingExpenseAmt,CYOtherExpensesAmt,CYTotalExpensesAmt,CYRevenuesLessExpensesAmt,TotalAssetsBOYAmt,TotalAssetsEOYAmt,TotalLiabilitiesEOYAmt,NetAssetsOrFundBalancesBOYAmt,NetAssetsOrFundBalancesEOYAmt,InfoInScheduleOPartIIIInd,MissionDesc,SignificantNewProgramSrvcInd,SignificantChangeInd,Desc,PoliticalCampaignActyInd,LobbyingActivitiesInd,ProfessionalFundraisingInd,FundraisingActivitiesInd,GamingActivitiesInd,EngagedInExcessBenefitTransInd,PYExcessBenefitTransInd,DisregardedEntityInd,RelatedEntityInd,RelatedOrganizationCtrlEntInd,TransactionWithControlEntInd,TrnsfrExmptNonChrtblRltdOrgInd,ActivitiesConductedPrtshpInd,IRPDocumentCnt,EmployeeCnt,UnrelatedBusIncmOverLimitInd,GoverningBodyVotingMembersCnt,IndependentVotingMemberCnt,FamilyOrBusinessRlnInd,DelegationOfMgmtDutiesInd,ChangeToOrgDocumentsInd,MaterialDiversionOrMisuseInd,MembersOrStockholdersInd,ElectionOfBoardMembersInd,DecisionsSubjectToApprovaInd,MinutesOfGoverningBodyInd,MinutesOfCommitteesInd,OfficerMailingAddressInd,LocalChaptersInd,Form990ProvidedToGvrnBodyInd,ConflictOfInterestPolicyInd,WhistleblowerPolicyInd,DocumentRetentionPolicyInd,CompensationProcessCEOInd,CompensationProcessOtherInd,InvestmentInJointVentureInd,StatesWhereCopyOfReturnIsFldCd,NoListedPersonsCompensatedInd,FormerOfcrEmployeesListedInd,TotalCompGreaterThan150KInd,CompensationFromOtherSrcsInd,MembershipDuesAmt,FundraisingAmt,AllOtherContributionsAmt,TotalContributionsAmt,OtherRevenueTotalAmt,TotalRevenueGrp,FeesForServicesAccountingGrp,OfficeExpensesGrp,InformationTechnologyGrp,ConferencesMeetingsGrp,InsuranceGrp,OtherExpensesGrp,AllOtherExpensesGrp,TotalFunctionalExpensesGrp,CashNonInterestBearingGrp,TotalAssetsGrp,OrgDoesNotFollowSFAS117Ind,RtnEarnEndowmentIncmOthFndsGrp,ReconcilationRevenueExpnssAmt,MethodOfAccountingCashInd,AccountantCompileOrReviewInd,FSAuditedInd,FederalGrantAuditRequiredInd,WebsiteAddressTxt,TotalVolunteersCnt,NetUnrelatedBusTxblIncmAmt,PYContributionsGrantsAmt,PYProgramServiceRevenueAmt,PYInvestmentIncomeAmt,PYOtherRevenueAmt,PYTotalRevenueAmt,PYGrantsAndSimilarPaidAmt,PYBenefitsPaidToMembersAmt,PYSalariesCompEmpBnftPaidAmt,PYTotalProfFndrsngExpnsAmt,PYOtherExpensesAmt,PYTotalExpensesAmt,PYRevenuesLessExpensesAmt,TotalLiabilitiesBOYAmt,ExpenseAmt,GrantAmt,RevenueAmt,ProgSrvcAccomActy2Grp,ProgSrvcAccomActy3Grp,ProgSrvcAccomActyOtherGrp,TotalOtherProgSrvcGrantAmt,TotalProgramServiceExpensesAmt,InfoInScheduleOPartVIInd,AnnualDisclosureCoveredPrsnInd,RegularMonitoringEnfrcInd,UponRequestInd,TotalReportableCompFromOrgAmt,TotReportableCompRltdOrgAmt,TotalOtherCompensationAmt,IndivRcvdGreaterThan100KCnt,CntrctRcvdGreaterThan100KCnt,GovernmentGrantsAmt,TotalProgramServiceRevenueAmt,FundraisingGrossIncomeAmt,ContriRptFundraisingEventAmt,FundraisingDirectExpensesAmt,GrossSalesOfInventoryAmt,CostOfGoodsSoldAmt,GrantsToDomesticIndividualsGrp,CompCurrentOfcrDirectorsGrp,OtherSalariesAndWagesGrp,PensionPlanContributionsGrp,OtherEmployeeBenefitsGrp,PayrollTaxesGrp,FeesForServicesOtherGrp,AdvertisingGrp,TravelGrp,InterestGrp,DepreciationDepletionGrp,SavingsAndTempCashInvstGrp,AccountsReceivableGrp,InventoriesForSaleOrUseGrp,PrepaidExpensesDefrdChargesGrp,LandBldgEquipCostOrOtherBssAmt,LandBldgEquipAccumDeprecAmt,LandBldgEquipBasisNetGrp,InvestmentsOtherSecuritiesGrp,IntangibleAssetsGrp,AccountsPayableAccrExpnssGrp,DeferredRevenueGrp,MortgNotesPyblScrdInvstPropGrp,OtherLiabilitiesGrp,OrganizationFollowsSFAS117Ind,UnrestrictedNetAssetsGrp,TemporarilyRstrNetAssetsGrp,InfoInScheduleOPartXIInd,NetUnrlzdGainsLossesInvstAmt,InfoInScheduleOPartXIIInd,AuditCommitteeInd,AllAffiliatesIncludedInd,GrantsToDomesticOrgsGrp,ForeignGrantsGrp,BenefitsToMembersGrp,CompDisqualPersonsGrp,FeesForServicesManagementGrp,FeesForServicesLegalGrp,FeesForServicesLobbyingGrp,FeesForSrvcInvstMgmntFeesGrp,RoyaltiesGrp,OccupancyGrp,PymtTravelEntrtnmntPubOfclGrp,PaymentsToAffiliatesGrp,PledgesAndGrantsReceivableGrp,RcvblFromDisqualifiedPrsnGrp,OthNotesLoansReceivableNetGrp,InvestmentsPubTradedSecGrp,InvestmentsProgramRelatedGrp,OtherAssetsTotalGrp,TotalOtherProgSrvcExpenseAmt,InfoInScheduleOPartVInd,MethodOfAccountingAccrualInd,NoncashContributionsAmt,GrantsPayableGrp,PermanentlyRstrNetAssetsGrp,TaxExemptBondLiabilitiesGrp,EscrowAccountLiabilityGrp,LoansFromOfficersDirectorsGrp,UnsecuredNotesLoansPayableGrp,PriorPeriodAdjustmentsAmt,FederalGrantAuditPerformedInd,PoliciesReferenceChaptersInd,OtherWebsiteInd,AddressChangeInd,WrittenPolicyOrProcedureInd,RelatedOrganizationsAmt,TotalOtherProgSrvcRevenueAmt,OwnWebsiteInd,TotalJointCostsGrp,DonatedServicesAndUseFcltsAmt,LegalDomicileCountryCd,InfoInScheduleOPartIXInd,TypeOfOrganizationTrustInd,FinalReturnInd,ContractTerminationInd,InfoInScheduleOPartXInd,GroupExemptionNum,InfoInScheduleOPartVIIInd,FederatedCampaignsAmt,TypeOfOrganizationOtherInd,OtherOrganizationDsc,InfoInScheduleOPartVIIIInd,TypeOfOrganizationAssocInd,InitialReturnInd,GamingGrossIncomeAmt,GamingDirectExpensesAmt,MethodOfAccountingOtherInd,InvestmentExpenseAmt,Organization501cInd,Organization4947a1NotPFInd,AmendedReturnInd,SpecialConditionDesc,ActivityCd,Timestamp,TaxPeriodEndDate,TaxPeriodBeginDate,Officer,TaxYear,F9_00_HD_BUILD_TIME_STAMP,ReturnTs,TaxPeriodEndDt,TaxPeriodBeginDt,BusinessOfficerGrp,TaxYr,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_CTRY_OF_DOMICILE,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROSS_EXEMPT_NUM,F9_00_HD_GROSS_RCPT,F9_00_HD_GROUP_RETURN,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_00_HD_INITIAL_RETURN,F9_00_HD_PRIN_OFF_NAME,F9_00_HD_SIGNING_OFFICER_SIGNTR,F9_00_HD_SPECIAL_CONDITION_DESC,F9_00_HD_STATE_OF_DOMICILE,F9_00_HD_TAX_PER_BEGIN,F9_00_HD_TAX_PER_END,F9_00_HD_TAX_YEAR,F9_00_HD_TIME_STAMP,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_OTHER_DESC,F9_00_HD_TYPE_ORG_TRUST,F9_00_HD_WEBSITE,F9_00_HD_YEAR_FORMED,F9_01_PC_BEN_PAID_MEMB_PRIOR,F9_01_PC_CONTR_GRANTS_CURR,F9_01_PC_CONTR_GRANTS_PRIOR,F9_01_PC_GRANTS_PRIOR,F9_01_PC_INDEP_VOTING_MEMB,F9_01_PC_INVEST_INCOME_PRIOR,F9_01_PC_NET_ASSETS_BOY,F9_01_PC_OTHER_EXPENSE_PRIOR,F9_01_PC_OTHER_REV_PRIOR,F9_01_PC_PROF_FUNDRISING_EXP_CURR,F9_01_PC_PROF_FUNDRISING_EXP_PRIOR,F9_01_PC_PROG_SERVICE_REV_PRIOR,F9_01_PC_REV_LESS_EXP_CURR,F9_01_PC_REV_LESS_EXP_PRIOR,F9_01_PC_TERMINATION_CONTRACTION,F9_01_PC_TOT_ASSETS_EOY,F9_01_PC_TOT_EXP_PRIOR,F9_01_PC_TOT_FNDR_EXP_CURR,F9_01_PC_TOT_INDIV_EMPLOYED,F9_01_PC_TOT_INDIV_VOLUNTEERS,F9_01_PC_TOT_LIABILITIES_EOY,F9_01_PC_TOT_REVENUE_PRIOR,F9_01_PC_TOT_UBI_GROSS,F9_01_PC_TOT_UBI_NET,F9_01_PC_VOTING_MEMB_GOV_BODY,F9_01_PZ_BEN_PAID_TO_MEMB_CURR,F9_01_PZ_GRANTS_PAID_CURR,F9_01_PZ_INVEST_INCOME_CURR,F9_01_PZ_NAFB_EOY,F9_01_PZ_ORGANIZATIONAL_MISSION,F9_01_PZ_OTHER_EXPENSE_CURR,F9_01_PZ_OTHER_REV_CURR,F9_01_PZ_PROG_SERVICE_REV_CURR,F9_01_PZ_SALARIES_CURR,F9_01_PZ_SALARIES_PRIOR,F9_01_PZ_TOT_ASSETS_BOY,F9_01_PZ_TOT_EXP_CURR,F9_01_PZ_TOT_LIAB_BOY,F9_01_PZ_TOT_REV_CURR,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PC_PROG_SVC_ACC_1_CODE,F9_03_PC_PROG_SVC_ACC_1_DESC,F9_03_PC_PROG_SVC_ACC_1_EXP,F9_03_PC_PROG_SVC_ACC_1_GRNT,F9_03_PC_PROG_SVC_ACC_1_REV,F9_03_PC_PROG_SVC_ACC_2_CODE,F9_03_PC_PROG_SVC_ACC_2_DESC,F9_03_PC_PROG_SVC_ACC_2_EXP,F9_03_PC_PROG_SVC_ACC_2_GRNT,F9_03_PC_PROG_SVC_ACC_2_REV,F9_03_PC_PROG_SVC_ACC_3_CODE,F9_03_PC_PROG_SVC_ACC_3_DESC,F9_03_PC_PROG_SVC_ACC_3_EXP,F9_03_PC_PROG_SVC_ACC_3_GRNT,F9_03_PC_PROG_SVC_ACC_3_REV,F9_03_PC_TOT_OTH_PROG_SVC_EXP,F9_03_PC_TOT_OTH_PROG_SVC_GRNT,F9_03_PC_TOT_OTH_PROG_SVC_REV,F9_03_PC_TOT_PROG_SVC_EXPENSE,F9_03_PZ_MISSION_DESCRIPTION,F9_03_PZ_SCHEDULE_O_PART3,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT,F9_05_EXP_SCHED_O_X,F9_05_PC_NUMBER_EMPLOYEES_W3,F9_05_PC_NUMBER_FORMS_1096,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_NUM_IND_VOTING_MEMBERS,F9_06_PC_NUM_VOTING_GOV_MEMBERS,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_STATES_WHERE_RET_FILED,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_NUM_CONTRCTRS_GRTR_100K,F9_07_PC_NUM_INDS_GREATER_100K,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_07_PC_TOT_OTHER_COMPENSATION,F9_07_PC_TOT_REPRT_COMP_FROM_ORG,F9_07_PC_TOT_REPRT_COMP_RLTD_ORG,F9_08_EXP_SCHED_O_X,F9_08_PC_ALL_OTHER_CONTRIBUTIONS,F9_08_PC_CONTS_REPRTD_FNDRAISNG,F9_08_PC_COST_OF_GOODS_SOLD,F9_08_PC_FEDERATED_CAMPAIGNS,F9_08_PC_FUNDRAISING_DIRECT_EXP,F9_08_PC_FUNDRAISING_EVENTS,F9_08_PC_FUNDRAISING_GROSS_INC,F9_08_PC_GAMING_DIRECT_EXPENSES,F9_08_PC_GAMING_GROSS_INCOME,F9_08_PC_GOVERNMENT_GRANTS,F9_08_PC_GROSS_SALES_INVENTORY,F9_08_PC_MEMBERSHIP_DUES,F9_08_PC_NONCASH_CONTRIBUTIONS,F9_08_PC_PROGRAM_SVCE_REV_TOTAL,F9_08_PC_RELATED_ORGANIZATIONS,F9_08_PC_TOTAL_CONTRIBUTIONS,F9_08_PC_TOTAL_OTHER_REVENUE,F9_08_PC_TOTAL_PROG_SVCE_REVENUE,F9_08_PC_TOTAL_REVENUE,F9_09_EXP_AD_PROMO_TOT,F9_09_EXP_BENF_PAID_MEMB_TOT,F9_09_EXP_CONF_MEETING_TOT,F9_09_EXP_DEPREC_FUNDR,F9_09_EXP_DEPREC_MAG,F9_09_EXP_DEPREC_PROG,F9_09_EXP_DEPREC_TOT,F9_09_EXP_GRANT_FRGN_TOT,F9_09_EXP_GRANT_INDIV_DMSTC_TOT,F9_09_EXP_GRANT_ORG_DMSTC_TOT,F9_09_EXP_INFO_TECH_TOT,F9_09_EXP_INSURANCE_TOT,F9_09_EXP_INTEREST_TOT,F9_09_EXP_JOINT_COSTS_TOT,F9_09_EXP_OCCUPANCY_TOT,F9_09_EXP_OFFICE_TOT,F9_09_EXP_OTH_OTH_TOT,F9_09_EXP_OTH_TOT,F9_09_EXP_ROY_TOT,F9_09_EXP_SCHED_O_X,F9_09_EXP_TRAVEL_ENTRTNMNT_TOT,F9_09_EXP_TRAVEL_TOT,F9_09_PC_COMP_DISQUAL_FUNDRAISE,F9_09_PC_COMP_DISQUAL_MGMT,F9_09_PC_COMP_DISQUAL_PROG_SVCE,F9_09_PC_COMP_DISQUAL_TOTAL,F9_09_PC_COMP_OFFICERS_FUNDRAISE,F9_09_PC_COMP_OFFICERS_MGMT,F9_09_PC_COMP_OFFICERS_PROG_SVCE,F9_09_PC_COMP_OFFICERS_TOTAL,F9_09_PC_FEES_FOR_SVCE_ACCT_TOT,F9_09_PC_FEES_FOR_SVCE_INVST_TOT,F9_09_PC_FEES_FOR_SVCE_LEGL_TOT,F9_09_PC_FEES_FOR_SVCE_LOBB_TOT,F9_09_PC_FEES_FOR_SVCE_MGMT_TOT,F9_09_PC_FEES_FOR_SVCE_OTH_TOT,F9_09_PC_OTHER_EMP_BEN_FUNDRAISE,F9_09_PC_OTHER_EMP_BEN_MGMT,F9_09_PC_OTHER_EMP_BEN_PROG_SVCE,F9_09_PC_OTHER_EMP_BEN_TOTAL,F9_09_PC_OTHER_SALARY_FUNDRAISE,F9_09_PC_OTHER_SALARY_MGMT,F9_09_PC_OTHER_SALARY_PROG_SVCE,F9_09_PC_OTHER_SALARY_TOTAL,F9_09_PC_PAYMENT_TO_AFFILIATES,F9_09_PC_PAYROLL_TAX_FUNDRAISE,F9_09_PC_PAYROLL_TAX_MGMT,F9_09_PC_PAYROLL_TAX_PROG_SVCE,F9_09_PC_PAYROLL_TAX_TOTAL,F9_09_PC_PENSION_CONT_FUNDRAISE,F9_09_PC_PENSION_CONT_MGMT,F9_09_PC_PENSION_CONT_PROG_SVCE,F9_09_PC_PENSION_CONT_TOTAL,F9_09_PC_TOTAL_FUNC_EXPENSES,F9_09_PC_TOTAL_FUNDRAISE_EXPENSE,F9_09_PC_TOTAL_MGMT_EXPENSE,F9_09_PC_TOTAL_PROG_SVCE_EXPENSE,F9_10_ASSETS_ACC_NET_EOY,F9_10_ASSETS_EXP_PREPAID_EOY,F9_10_ASSETS_INTANGIB_EOY,F9_10_ASSETS_INVENT_SALE_EOY,F9_10_ASSETS_LESS_DEPREC_EOY,F9_10_ASSETS_LOANS_DISQUAL_EOY,F9_10_ASSETS_NOTES_LOANS_NET_EOY,F9_10_ASSETS_OTH_EOY,F9_10_ASSETS_PLEDGES_NET_EOY,F9_10_LIAB_ACC_PAYABLE_EOY,F9_10_LIAB_GRANTS_PAYABLE_EOY,F9_10_LIAB_LOANS_OFF_EOY,F9_10_LIAB_REV_DEFERRED_EOY,F9_10_NAFB_RESTRICT_PERM_EOY,F9_10_NAFB_RESTRICT_TEMP_EOY,F9_10_NAFB_UNRESTRICT_EOY,F9_10_PC_BOND_LIABILITY_EOY,F9_10_PC_CASH_NON_INTEREST_BOY,F9_10_PC_CASH_NON_INTEREST_EOY,F9_10_PC_ESCROW_LIABILITY_EOY,F9_10_PC_INVEST_OTHER_SEC_EOY,F9_10_PC_INVEST_PROG_RELTD_EOY,F9_10_PC_INVEST_PUB_TRADED_EOY,F9_10_PC_LAND_BLDG_EQPMT,F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN,F9_10_PC_LOANS_FROM_OFFICERS_EOY,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_PC_OTHER_LIABILITIES_EOY,F9_10_PC_RET_EARNINGS_ENDWMT_EOY,F9_10_PC_SAVINGS_TEMP_INVEST_BOY,F9_10_PC_SAVINGS_TEMP_INVEST_EOY,F9_10_PC_SECURED_MORTGAGES_EOY,F9_10_PC_SECURE_MORT_NOTES_EOY,F9_10_PC_UNSECURED_LOANS_EOY,F9_10_PC_UNSECURED_NOTES_BOY,F9_10_PC_UNSECURED_NOTES_EOY,F9_10_PZ_TOTAL_ASSETS_EOY,F9_10_SCHED_O_X,F9_11_PC_RECNCLTN_DONATED_SVCES,F9_11_PC_RECNCLTN_INVSTMNT_EXP,F9_11_PC_RECNCLTN_PRIOR_PER_ADJ,F9_11_PC_RECNCLTN_REV_LESS_EXP,F9_11_PC_RECNCLTN_UNRLZD_GAIN,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_ACCTG_METHOD_OTHER,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X,number_of_other_prog_svces
0,5d019e6778ffca27b42818d7,RONALD MCDONALD HOUSE CHARITIES- PHILADELPHIA REGION INC,https://s3.amazonaws.com/irs-form-990/201113139349301301_public.xml,93493313013011,201012,X,MICHAEL ANTON,1473903,0,X,,X,1992,PA,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,10,10,0,0,0,0,1044925,1439340,0,0,30447,33563,0,1000,1075372,1473903,638637,925000,0,0,0,0,0,0,195892,243131,459751,881768,1384751,193604,89152,1925215,2440859,171810,450430,1753405,1990429,X,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",0,0,1043744,925000,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,"""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""",0,0,0,X,10,10,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,"[""PA"", ""NJ"", ""DE""]",X,X,0,0,0,0,0,0,0,0,1439340,1439340,1000,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}","{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}","{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}","{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""118744"", ""ProgramServices"": ""118744""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",256845,86228,"{""BOY"": ""0"", ""EOY"": ""170617""}","{""BOY"": ""1489143"", ""EOY"": ""1851561""}","{""BOY"": ""1925215"", ""EOY"": ""2440859""}","{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}","{""BOY"": ""51640"", ""EOY"": ""240077""}",X,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",X,89152,X,X,0,1,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T06:41:09-06:00,2010-12-31,2010-01-01,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",2010,2016-02-24 21:20:13Z,,,,,,,232705170,"{'BusinessNameLine1': 'RONALD MCDONALD HOUSE CHARITIES-', 'BusinessNameLine2': 'PHILADELPHIA REGION INC'}",RONA,8565826843,"{'AddressLine1': '1525 VALLEY CENTER PARKWAY NO 300', 'City': 'BETHLEHEM', 'State': 'PA', 'ZIPCode': '18017'}",,,,,,,,X,,,,,X,,,1473903,0,,,MICHAEL ANTON,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",,PA,2010-01-01,2010-12-31,2010,2011-11-09T06:41:09-06:00,,X,,,,,1992,0,1439340,1044925,638637,10,30447,1753405,243131,0,0,0,0,89152,193604,,2440859,881768,195892,0,0,450430,1075372,0,0,10,0,925000,33563,1990429,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,459751,1000,0,0,0,1925215,1384751,171810,1473903,0,0,,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,925000,,,,,,,,,,,,,,,1043744,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",X,"""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""",,0,0,0,X,1,1,0,0,1,0,0,0,0,0,0,,X,0,,0,0,0,1,1,1,10,10,0,0,,,,"[""PA"", ""NJ"", ""DE""]",0,,0,0,X,0,0,0,0,0,0,,1439340,,,,,,,,,,,,,,,1439340,1000,,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}",,,,"{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}",,"{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}",,,,,,"{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...",,,,,,,,,,,,,"{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}",,"{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}",,,,,,,,,,,,"{""Total"": ""118744"", ""ProgramServices"": ""118744""}",,,,,,,,,"{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",,,,"{""BOY"": ""0"", ""EOY"": ""170617""}",,,,,"{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}",,,,,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",,,,,"{""BOY"": ""1489143"", ""EOY"": ""1851561""}",,,256845,86228,,X,,"{""BOY"": ""51640"", ""EOY"": ""240077""}",,"{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""332660"", ""EOY"": ""270700""}",,,,,,"{""BOY"": ""1925215"", ""EOY"": ""2440859""}",,,,,89152,,X,0,X,,,1,,0,1,X,


In [216]:
gc.collect()

0

In [217]:
df = updated_df.copy(deep=True)

In [218]:
gc.collect()

0

In [219]:
del updated_df

In [220]:
gc.collect()

0

#### Save DF
Whenever I'm dealing with time-intensive computations I like to save the dataset several times throughout the notebook. Here I'm using a slightly different file name (adding 'renamed') to differentiate it from our original file. 

In [221]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_feather('D:/all_filings_april_2025_all_controls.feather')

Current date and time :  2025-04-16 23:18:10 

CPU times: total: 5min 30s
Wall time: 4min 36s


In [208]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df = pd.read_feather("D:/all_filings_april_2025_all_controls.feather")
print('# of columns:', len(df.columns))
print('# of observations:', len(df))
df[:2]

Current date and time :  2025-04-16 23:02:39 

# of columns: 496
# of observations: 3469008
CPU times: total: 5min 23s
Wall time: 4min 48s


Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,AddressChange,NameOfPrincipalOfficerPerson,GrossReceipts,GroupReturnForAffiliates,Organization501c3,WebSite,TypeOfOrganizationCorporation,YearFormation,StateLegalDomicile,ActivityOrMissionDescription,NbrVotingMembersGoverningBody,NbrIndependentVotingMembers,TotalNbrEmployees,TotalNbrVolunteers,TotalGrossUBI,NetUnrelatedBusinessTxblIncome,ContributionsGrantsPriorYear,ContributionsGrantsCurrentYear,ProgramServiceRevenuePriorYear,ProgramServiceRevenueCY,InvestmentIncomePriorYear,InvestmentIncomeCurrentYear,OtherRevenuePriorYear,OtherRevenueCurrentYear,TotalRevenuePriorYear,TotalRevenueCurrentYear,GrantsAndSimilarAmntsPriorYear,GrantsAndSimilarAmntsCY,BenefitsPaidToMembersPriorYear,BenefitsPaidToMembersCY,SalariesEtcPriorYear,SalariesEtcCurrentYear,TotalProfFundrsngExpPriorYear,TotalProfFundrsngExpCY,TotalFundrsngExpCurrentYear,OtherExpensePriorYear,OtherExpensesCurrentYear,TotalExpensesPriorYear,TotalExpensesCurrentYear,RevenuesLessExpensesPriorYear,RevenuesLessExpensesCY,TotalAssetsBOY,TotalAssetsEOY,TotalLiabilitiesBOY,TotalLiabilitiesEOY,NetAssetsOrFundBalancesBOY,NetAssetsOrFundBalancesEOY,InfoInScheduleOPartIII,MissionDescription,SignificantNewProgramServices,SignificantChange,Expense,Grants,Description,TotalProgramServiceExpense,PoliticalActivities,LobbyingActivities,ProfessionalFundraising,FundraisingActivities,Gaming,ExcessBenefitTransaction,PriorExcessBenefitTransaction,DisregardedEntity,RelatedEntity,RelatedOrgControlledEntity,TransactionRelatedEntity,TransfersToExemptNonChrtblOrg,ActivitiesConductedPartnership,NumberFormsTransmittedWith1096,NumberOfEmployees,UnrelatedBusinessIncome,InfoInScheduleOPartVI,NbrVotingGoverningBodyMembers,NumberIndependentVotingMembers,FamilyOrBusinessRelationship,DelegationOfManagementDuties,ChangesToOrganizingDocs,MaterialDiversionOrMisuse,MembersOrStockholders,ElectionOfBoardMembers,DecisionsSubjectToApproval,MinutesOfGoverningBody,MinutesOfCommittees,OfficerMailingAddress,LocalChapters,Form990ProvidedToGoverningBody,ConflictOfInterestPolicy,AnnualDisclosureCoveredPersons,RegularMonitoringEnforcement,WhistleblowerPolicy,DocumentRetentionPolicy,CompensationProcessCEO,CompensationProcessOther,InvestmentInJointVenture,StatesWhereCopyOfReturnIsFiled,UponRequest,NoListedPersonsCompensated,TotalReportableCompFromOrg,TotalReportableCompFrmRltdOrgs,TotalOtherCompensation,NumberIndividualsGT100K,FormersListed,TotalCompGT150K,CompensationFromOtherSources,NumberOfContractorsGT100K,AllOtherContributions,TotalContributions,TotalOtherRevenue,TotalRevenue,GrantsToDomesticOrgs,GrantsToDomesticIndividuals,FeesForServicesLegal,FeesForServicesAccounting,OfficeExpenses,PaymentsToAffiliates,DepreciationDepletion,OtherExpenses,AllOtherExpenses,TotalFunctionalExpenses,SavingsAndTempCashInvestments,AccountsReceivable,LandBuildingsEquipmentBasis,LandBldgEquipmentAccumDeprec,LandBuildingsEquipmentBasisNet,InvestmentsOtherSecurities,TotalAssets,AccountsPayableAccruedExpenses,GrantsPayable,OtherLiabilities,FollowSFAS117,UnrestrictedNetAssets,InfoInScheduleOPartXI,ReconcilationRevenueExpenses,InfoInScheduleOPartXII,MethodOfAccountingAccrual,AccountantCompileOrReview,FSAudited,AuditCommittee,FederalGrantAuditRequired,AllAffiliatesIncluded,GroupExemptionNumber,Revenue,PoliciesReferenceChapters,WrittenPolicyOrProcedure,TotalProgramServiceRevenue,ForeignGrants,BenefitsToMembers,CompCurrentOfficersDirectors,CompDisqualPersons,OtherSalariesAndWages,PensionPlanContributions,OtherEmployeeBenefits,PayrollTaxes,FeesForServicesManagement,FeesForServicesLobbying,F9_09_PC_FEES_FOR_SVCE_FR_TOT,FeesForServicesInvstMgmntFees,FeesForServicesOther,Advertising,InformationTechnology,Royalties,Occupancy,Travel,TravelEntrtnmntPublicOfficials,ConferencesMeetings,Interest,Insurance,CashNonInterestBearing,PledgesAndGrantsReceivable,ReceivablesFromDisqualPersons,OtherNotesLoansReceivableNet,InventoriesForSaleOrUse,PrepaidExpensesDeferredCharges,InvestmentsPubTradedSecurities,InvestmentsProgramRelated,IntangibleAssets,OtherAssetsTotal,DeferredRevenue,MortNotesPyblSecuredInvestProp,FederalGrantAuditPerformed,LoansFromOfficersDirectors,MethodOfAccountingCash,Activity2,Activity3,InfoInScheduleOPartVII,TaxExemptBondLiabilities,TemporarilyRestrictedNetAssets,OtherWebsite,PermanentlyRestrictedNetAssets,FundraisingEvents,CntrbtnsRprtdFundraisingEvents,RelatedOrganizations,GrossIncomeFundraisingEvents,FundraisingDirectExpenses,FederatedCampaigns,GovernmentGrants,MethodOfAccountingOther,GrossSalesOfInventory,CostOfGoodsSold,DoNotFollowSFAS117,RetainedEarningsEndowmentEtc,InitialReturn,MembershipDues,GrossIncomeGaming,GamingDirectExpenses,NoncashContributions,InfoInScheduleOPartV,OwnWebsite,UnsecuredNotesLoansPayable,ActivityOther,TotalOfOtherProgramServiceExp,TotalOfOtherProgramServiceRev,EscrowAccountLiability,TotalOfOtherProgramServiceGrnt,TypeOfOrganizationOther,Organization501c,TypeOfOrganizationTrust,TypeOfOrganizationAssociation,CountryLegalDomicile,AmendedReturn,TypeOfOrgOtherDescription,TotalJointCosts,TerminatedReturn,TerminationOrContraction,ActivityCode,SpecialConditionDescription,Organization4947a1,InfoInScheduleOPartIX,ReconciliationUnrealizedInvest,ReconcilationPriorAdjustment,ReconcilationDonatedServices,ReconcilationInvestExpenses,InfoInScheduleOPartVIII,InfoInScheduleOPartX,PrincipalOfficerNm,GrossReceiptsAmt,GroupReturnForAffiliatesInd,Organization501c3Ind,TypeOfOrganizationCorpInd,FormationYr,LegalDomicileStateCd,ActivityOrMissionDesc,VotingMembersGoverningBodyCnt,VotingMembersIndependentCnt,TotalEmployeeCnt,TotalGrossUBIAmt,CYContributionsGrantsAmt,CYProgramServiceRevenueAmt,CYInvestmentIncomeAmt,CYOtherRevenueAmt,CYTotalRevenueAmt,CYGrantsAndSimilarPaidAmt,CYBenefitsPaidToMembersAmt,CYSalariesCompEmpBnftPaidAmt,CYTotalProfFndrsngExpnsAmt,CYTotalFundraisingExpenseAmt,CYOtherExpensesAmt,CYTotalExpensesAmt,CYRevenuesLessExpensesAmt,TotalAssetsBOYAmt,TotalAssetsEOYAmt,TotalLiabilitiesEOYAmt,NetAssetsOrFundBalancesBOYAmt,NetAssetsOrFundBalancesEOYAmt,InfoInScheduleOPartIIIInd,MissionDesc,SignificantNewProgramSrvcInd,SignificantChangeInd,Desc,PoliticalCampaignActyInd,LobbyingActivitiesInd,ProfessionalFundraisingInd,FundraisingActivitiesInd,GamingActivitiesInd,EngagedInExcessBenefitTransInd,PYExcessBenefitTransInd,DisregardedEntityInd,RelatedEntityInd,RelatedOrganizationCtrlEntInd,TransactionWithControlEntInd,TrnsfrExmptNonChrtblRltdOrgInd,ActivitiesConductedPrtshpInd,IRPDocumentCnt,EmployeeCnt,UnrelatedBusIncmOverLimitInd,GoverningBodyVotingMembersCnt,IndependentVotingMemberCnt,FamilyOrBusinessRlnInd,DelegationOfMgmtDutiesInd,ChangeToOrgDocumentsInd,MaterialDiversionOrMisuseInd,MembersOrStockholdersInd,ElectionOfBoardMembersInd,DecisionsSubjectToApprovaInd,MinutesOfGoverningBodyInd,MinutesOfCommitteesInd,OfficerMailingAddressInd,LocalChaptersInd,Form990ProvidedToGvrnBodyInd,ConflictOfInterestPolicyInd,WhistleblowerPolicyInd,DocumentRetentionPolicyInd,CompensationProcessCEOInd,CompensationProcessOtherInd,InvestmentInJointVentureInd,StatesWhereCopyOfReturnIsFldCd,NoListedPersonsCompensatedInd,FormerOfcrEmployeesListedInd,TotalCompGreaterThan150KInd,CompensationFromOtherSrcsInd,MembershipDuesAmt,FundraisingAmt,AllOtherContributionsAmt,TotalContributionsAmt,OtherRevenueTotalAmt,TotalRevenueGrp,FeesForServicesAccountingGrp,OfficeExpensesGrp,InformationTechnologyGrp,ConferencesMeetingsGrp,InsuranceGrp,OtherExpensesGrp,AllOtherExpensesGrp,TotalFunctionalExpensesGrp,CashNonInterestBearingGrp,TotalAssetsGrp,OrgDoesNotFollowSFAS117Ind,RtnEarnEndowmentIncmOthFndsGrp,ReconcilationRevenueExpnssAmt,MethodOfAccountingCashInd,AccountantCompileOrReviewInd,FSAuditedInd,FederalGrantAuditRequiredInd,WebsiteAddressTxt,TotalVolunteersCnt,NetUnrelatedBusTxblIncmAmt,PYContributionsGrantsAmt,PYProgramServiceRevenueAmt,PYInvestmentIncomeAmt,PYOtherRevenueAmt,PYTotalRevenueAmt,PYGrantsAndSimilarPaidAmt,PYBenefitsPaidToMembersAmt,PYSalariesCompEmpBnftPaidAmt,PYTotalProfFndrsngExpnsAmt,PYOtherExpensesAmt,PYTotalExpensesAmt,PYRevenuesLessExpensesAmt,TotalLiabilitiesBOYAmt,ExpenseAmt,GrantAmt,RevenueAmt,ProgSrvcAccomActy2Grp,ProgSrvcAccomActy3Grp,ProgSrvcAccomActyOtherGrp,TotalOtherProgSrvcGrantAmt,TotalProgramServiceExpensesAmt,InfoInScheduleOPartVIInd,AnnualDisclosureCoveredPrsnInd,RegularMonitoringEnfrcInd,UponRequestInd,TotalReportableCompFromOrgAmt,TotReportableCompRltdOrgAmt,TotalOtherCompensationAmt,IndivRcvdGreaterThan100KCnt,CntrctRcvdGreaterThan100KCnt,GovernmentGrantsAmt,TotalProgramServiceRevenueAmt,FundraisingGrossIncomeAmt,ContriRptFundraisingEventAmt,FundraisingDirectExpensesAmt,GrossSalesOfInventoryAmt,CostOfGoodsSoldAmt,GrantsToDomesticIndividualsGrp,CompCurrentOfcrDirectorsGrp,OtherSalariesAndWagesGrp,PensionPlanContributionsGrp,OtherEmployeeBenefitsGrp,PayrollTaxesGrp,FeesForServicesOtherGrp,AdvertisingGrp,TravelGrp,InterestGrp,DepreciationDepletionGrp,SavingsAndTempCashInvstGrp,AccountsReceivableGrp,InventoriesForSaleOrUseGrp,PrepaidExpensesDefrdChargesGrp,LandBldgEquipCostOrOtherBssAmt,LandBldgEquipAccumDeprecAmt,LandBldgEquipBasisNetGrp,InvestmentsOtherSecuritiesGrp,IntangibleAssetsGrp,AccountsPayableAccrExpnssGrp,DeferredRevenueGrp,MortgNotesPyblScrdInvstPropGrp,OtherLiabilitiesGrp,OrganizationFollowsSFAS117Ind,UnrestrictedNetAssetsGrp,TemporarilyRstrNetAssetsGrp,InfoInScheduleOPartXIInd,NetUnrlzdGainsLossesInvstAmt,InfoInScheduleOPartXIIInd,AuditCommitteeInd,AllAffiliatesIncludedInd,GrantsToDomesticOrgsGrp,ForeignGrantsGrp,BenefitsToMembersGrp,CompDisqualPersonsGrp,FeesForServicesManagementGrp,FeesForServicesLegalGrp,FeesForServicesLobbyingGrp,FeesForSrvcInvstMgmntFeesGrp,RoyaltiesGrp,OccupancyGrp,PymtTravelEntrtnmntPubOfclGrp,PaymentsToAffiliatesGrp,PledgesAndGrantsReceivableGrp,RcvblFromDisqualifiedPrsnGrp,OthNotesLoansReceivableNetGrp,InvestmentsPubTradedSecGrp,InvestmentsProgramRelatedGrp,OtherAssetsTotalGrp,TotalOtherProgSrvcExpenseAmt,InfoInScheduleOPartVInd,MethodOfAccountingAccrualInd,NoncashContributionsAmt,GrantsPayableGrp,PermanentlyRstrNetAssetsGrp,TaxExemptBondLiabilitiesGrp,EscrowAccountLiabilityGrp,LoansFromOfficersDirectorsGrp,UnsecuredNotesLoansPayableGrp,PriorPeriodAdjustmentsAmt,FederalGrantAuditPerformedInd,PoliciesReferenceChaptersInd,OtherWebsiteInd,AddressChangeInd,WrittenPolicyOrProcedureInd,RelatedOrganizationsAmt,TotalOtherProgSrvcRevenueAmt,OwnWebsiteInd,TotalJointCostsGrp,DonatedServicesAndUseFcltsAmt,LegalDomicileCountryCd,InfoInScheduleOPartIXInd,TypeOfOrganizationTrustInd,FinalReturnInd,ContractTerminationInd,InfoInScheduleOPartXInd,GroupExemptionNum,InfoInScheduleOPartVIIInd,FederatedCampaignsAmt,TypeOfOrganizationOtherInd,OtherOrganizationDsc,InfoInScheduleOPartVIIIInd,TypeOfOrganizationAssocInd,InitialReturnInd,GamingGrossIncomeAmt,GamingDirectExpensesAmt,MethodOfAccountingOtherInd,InvestmentExpenseAmt,Organization501cInd,Organization4947a1NotPFInd,AmendedReturnInd,SpecialConditionDesc,ActivityCd,Timestamp,TaxPeriodEndDate,TaxPeriodBeginDate,Officer,TaxYear,F9_00_HD_BUILD_TIME_STAMP,ReturnTs,TaxPeriodEndDt,TaxPeriodBeginDt,BusinessOfficerGrp,TaxYr,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum
0,5d019e6778ffca27b42818d7,RONALD MCDONALD HOUSE CHARITIES- PHILADELPHIA REGION INC,https://s3.amazonaws.com/irs-form-990/201113139349301301_public.xml,93493313013011,201012,X,MICHAEL ANTON,1473903,0,X,,X,1992,PA,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,10,10,0,0.0,0,0.0,1044925.0,1439340,0,0,30447,33563,0.0,1000,1075372,1473903,638637.0,925000,0.0,0,0,0,0.0,0,195892,243131,459751,881768,1384751,193604,89152,1925215,2440859,171810,450430,1753405,1990429,X,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",0,0,1043744,925000.0,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,"""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""","""0""",0,0,0,X,10,10,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,"[""PA"", ""NJ"", ""DE""]",X,X,0.0,0,0,0,0,0,0,0,1439340.0,1439340,1000,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}","{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}","{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}","{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""118744"", ""ProgramServices"": ""118744""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",256845,86228,"{""BOY"": ""0"", ""EOY"": ""170617""}","{""BOY"": ""1489143"", ""EOY"": ""1851561""}","{""BOY"": ""1925215"", ""EOY"": ""2440859""}","{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}","{""BOY"": ""51640"", ""EOY"": ""240077""}",X,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",X,89152,X,X,0,1,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T06:41:09-06:00,2010-12-31,2010-01-01,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",2010,2016-02-24 21:20:13Z,,,,,,,232705170,"{'BusinessNameLine1': 'RONALD MCDONALD HOUSE CHARITIES-', 'BusinessNameLine2': 'PHILADELPHIA REGION INC'}",RONA,8565826843,"{'AddressLine1': '1525 VALLEY CENTER PARKWAY NO 300', 'City': 'BETHLEHEM', 'State': 'PA', 'ZIPCode': '18017'}",,,,,,,
1,5d019e6778ffca27b42818d8,TORRINGTON VOA ELDERLY HOUSING INC BELL PARK TOWER,https://s3.amazonaws.com/irs-form-990/201113139349301311_public.xml,93493313013111,201106,,,266420,false,X,,X,1993,WY,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,19,13,0,,0,,,0,222839,265592,1425,828,,0,224264,266420,,0,,0,71405,82955,,0,0,189785,222550,261190,305505,-36926,-39085,1455332,1433342,17482,34577,1437850,1398765,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,false,false,276405,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,276405,"""false""","""false""","""false""","""false""","""false""","""false""","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""true""}","""false""","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}","{""@referenceDocumentId"": "" IRS990ScheduleR"", ""#text"": ""false""}",0,0,false,X,19,13,true,true,false,false,false,true,true,true,true,false,false,true,true,true,true,false,false,true,true,false,,X,,,1180355,411648,0,true,true,false,0,,0,0,"{""TotalRevenueColumn"": ""266420"", ""RelatedOrExemptFunctionIncome"": ""266420""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""7500"", ""ManagementAndGeneral"": ""7500""}","{""Total"": ""14222"", ""ProgramServices"": ""14222""}","{""Total"": ""0""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","[{""Description"": ""OPER. & MAINT."", ""Total"": ""46164"", ""ProgramServices"": ""46164""}, {""Description"": ""MISC TAXES"", ""Total"": ""298"", ""ProgramServices"": ""298""}, {""Description"": ""ADMINISTRATIVE"", ""Total"": ""12176"", ""ProgramServices"": ""12176""}]","{""Total"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""231"", ""EOY"": ""474""}",2187206,904332,"{""BOY"": ""1306860"", ""EOY"": ""1282874""}","{""BOY"": ""125980"", ""EOY"": ""102794""}","{""BOY"": ""1455332"", ""EOY"": ""1433342""}","{""BOY"": ""2040"", ""EOY"": ""16145""}",,"{""BOY"": ""9203"", ""EOY"": ""11349""}",X,"{""BOY"": ""1437850"", ""EOY"": ""1398765""}",,-39085,,X,false,true,true,true,"""false""",1736.0,266420.0,False,False,265592.0,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""0""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""21600"", ""ManagementAndGeneral"": ""21600""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""8433"", ""ProgramServices"": ""8433""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""44077"", ""ProgramServices"": ""44077""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""806"", ""ProgramServices"": ""806""}","{""Total"": ""0""}","{""Total"": ""1108"", ""ProgramServices"": ""1108""}","{""BOY"": ""250"", ""EOY"": ""22261""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""7628"", ""EOY"": ""7554""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""14383"", ""EOY"": ""17385""}","{""BOY"": ""20"", ""EOY"": ""48""}","{""BOY"": ""6219"", ""EOY"": ""7035""}",True,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2011-11-09T07:32:06-08:00,2011-06-30,2010-07-01,"{'Name': 'THOMAS D TURNBULL', 'Title': 'ASST. SEC/TREAS', 'DateSigned': '2011-11-09'}",2010,2016-02-24 21:20:13Z,,,,,,,581805618,"{'BusinessNameLine1': 'TORRINGTON VOA ELDERLY HOUSING INC', 'BusinessNameLine2': 'BELL PARK TOWER'}",TORR,7033415000,"{'AddressLine1': '1660 DUKE STREET', 'City': 'ALEXANDRIA', 'State': 'VA', 'ZIPCode': '22314'}",,,,,,,


#### Traditional method

In [28]:
#%%time
#import datetime
#print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df.to_pickle('all NEW filings February 2024 - all control variables (renamed).pkl.gz', compression='gzip')

Current date and time :  2024-03-29 11:54:17 

CPU times: total: 25min 15s
Wall time: 27min 6s


#### Re-read in DF
In case you took a break or the notebook failed somewhere below, you could come back here and re-read the updated file and start from here.

In [29]:
#%%time
#import datetime
#print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df = pd.read_pickle('all NEW filings February 2024 - all control variables (renamed).pkl.gz', compression='gzip')
#print('# of columns:', len(df.columns))
#print('# of observations:', len(df))
#df[:2]

# Binarize
We will now 'binarize' certain variables. Variables that should be made binary are indicated by the *binarize* column in our concordance file -- those columns were inspected in previous iterations of this notebook and then added to the concordance file. 

The reason this is necessary is that some years of e-file data have 'True' or 'X' or '1' to indicate 'yes' and 'False' or '0' or blank to indicate no. We will change all the former to '1' and all the latter to '0'.

Note that in the concordance file I use two different values - 'binarize' indicates columns that simply have one or more of the above values, while 'binarize_with_dict' indicates variables where the above values are (for some unknown reason) 'nested' under a JSON dictionary. This will become clearer below once we look at these variables. The frequencies below show that there are 64 variables with the label 'binarize' and 16 with the label 'binarize_with_dict'.

In [223]:
new_variables_df['binarize'].value_counts()

binarize
binarize              64
binarize_with_dict    16
Name: count, dtype: int64

<br>We'll create a list of all the updated (standardized) variable names we want to process that have a value of 'binarize' in our concordance file. 

In [224]:
binarize_cols = [c for c in new_variables_df[new_variables_df['binarize']=='binarize']['variable_name_new'].tolist()] 
print(len(binarize_cols))
print(binarize_cols)

64
['F9_00_HD_ADDR_CHANGE', 'F9_00_HD_AMENDED_RETURN', 'F9_00_HD_EXEMPT_STATUS_4847A1', 'F9_00_HD_EXEMPT_STATUS_501C3', 'F9_00_HD_FINAL_RETURN', 'F9_00_HD_GROUP_RETURN', 'F9_00_HD_INITIAL_RETURN', 'F9_00_HD_TYPE_ORG_ASSOCIATION', 'F9_00_HD_TYPE_ORG_CORP', 'F9_00_HD_TYPE_ORG_OTHER', 'F9_00_HD_TYPE_ORG_TRUST', 'F9_01_PC_TERMINATION_CONTRACTION', 'F9_03_PC_PGMSVC_SIGNIF_CHG', 'F9_03_PC_PGMSVC_SIGNIF_NEW', 'F9_03_PZ_SCHEDULE_O_PART3', 'F9_05_EXP_SCHED_O_X', 'F9_05_PC_UNRELATED_BUS_INCOME', 'F9_06_EXP_SCHED_O_X', 'F9_06_PC_990_PROVIDED_GOV_BODY', 'F9_06_PC_ANNUAL_DISC_COVRD_PERS', 'F9_06_PC_CEO_COMPENSTN_PROCESS', 'F9_06_PC_CHANGES_ORGANIZING_DOCS', 'F9_06_PC_CONFLICT_OF_INTEREST', 'F9_06_PC_DECISIONS_SUBJ_APPROVAL', 'F9_06_PC_DELEGATION_MGT_DUTIES', 'F9_06_PC_DELEGATION_OF_MGT', 'F9_06_PC_DOCUMENT_RET_POLICY', 'F9_06_PC_ELECTION_BOARD_MEMBERS', 'F9_06_PC_FAMILY_OR_BUSINESS_REL', 'F9_06_PC_FORM_AVAIL_OWN_WEBSITE', 'F9_06_PC_FORM_UPON_REQUEST', 'F9_06_PC_JOINT_VENTURE_INVESTMNT', 'F9_06_PC_J

<br>Then we'll do the same for the variables that have a value of 'binarize_with_dict'. 

In [225]:
binarize_with_dict_cols = [c for c in new_variables_df[new_variables_df['binarize']=='binarize_with_dict']['variable_name_new'].tolist()] 
print(len(binarize_with_dict_cols))
binarize_with_dict_cols

16


['F9_00_HD_EXEMPT_STATUS_501C',
 'F9_00_HD_INCLUDES_SUBORD_ORGS',
 'F9_04_PC_ACTVITIES_VIA_PARTNER',
 'F9_04_PC_CONTROLLED_ENTITY',
 'F9_04_PC_DISREGARDED_ENTITY',
 'F9_04_PC_EXCESS_BENEFIT_TRANS',
 'F9_04_PC_FR_EVENT_INC_GT_15K',
 'F9_04_PC_GAMING_INC_GT_15K',
 'F9_04_PC_LOBBYING_ACTIVITIES',
 'F9_04_PC_POLITICAL_ACTIVITIES',
 'F9_04_PC_PRIOR_EXCESS_BEN_TRAN',
 'F9_04_PC_PROF_FR_EXP_GT_15K',
 'F9_04_PC_RELATED_ENTITY',
 'F9_04_PC_TRANS_TO_CNTRLD_ENT',
 'F9_04_PC_TRANS_WITH_CNTRLD_ENT',
 'F9_12_PC_ACCTG_METHOD_OTHER']

<br>Loop over all 64 variables with a 'binarize' value. What we're looking for are any variables that do not fit in the above pattern of values (that is, any variables that are not checkbox or True/False variables. I have previously verified all 64 of these variables, but it is always sound practice to double-check. Values in more recent filings could always change.

In [None]:
#df = updated_df.copy(deep=True)

In [155]:
#del updated_df

In [226]:
gc.collect()

0

In [227]:
%%time
for c in binarize_cols[:2]:
    print(df[df[c].notnull()][c].value_counts().head(), '\n')

F9_00_HD_ADDR_CHANGE
X    138187
Name: count, dtype: Int64 

F9_00_HD_AMENDED_RETURN
X    41351
Name: count, dtype: Int64 

CPU times: total: 9.91 s
Wall time: 10.4 s


<br>Now look at frequences for the 16 variables with 'binarize_with_dict' values in the *binarize* column. I identified all 16 of these in earlier data verifications. These 16 variables all have values that are 'dictionaries' (as indicated by the curly braces) and are more difficult to parse. Therefore, we will process them separately.

In [34]:
%%time 
for c in binarize_with_dict_cols[:2]:
    print(df[df[c].notnull()][c].value_counts()[:10], '\n')

{'@organization501cTypeTxt': '6', '#text': 'X'}     60746
{'@organization501cTypeTxt': '4', '#text': 'X'}     36718
{'@organization501cTypeTxt': '5', '#text': 'X'}     29271
{'@organization501cTypeTxt': '7', '#text': 'X'}     28402
{'@organization501cTypeTxt': '9', '#text': 'X'}     13876
{'@organization501cTypeTxt': '8', '#text': 'X'}     12859
{'@organization501cTypeTxt': '19', '#text': 'X'}    12295
{'@organization501cTypeTxt': '12', '#text': 'X'}     8731
{'@organization501cTypeTxt': '13', '#text': 'X'}     6193
{'@organization501cTypeTxt': '2', '#text': 'X'}      6058
Name: F9_00_HD_EXEMPT_STATUS_501C, dtype: int64 

1                                                                                    2301
false                                                                                1155
true                                                                                  788
0                                                                                     356
{'@referen

false                                                                                                            495838
0                                                                                                                302067
{'@referenceDocumentId': 'RetDoc1039700001', '#text': '0'}                                                        31554
{'@referenceDocumentName': 'IRS990ScheduleC', '#text': 'false'}                                                   12125
{'@referenceDocumentId': 'IRS990ScheduleC', '#text': 'false'}                                                     10643
{'@referenceDocumentId': 'RetDoc1039600001', '#text': '0'}                                                         8557
{'@referenceDocumentId': 'RetDoc3', '#text': 'false'}                                                              5827
{'@referenceDocumentId': 'IRS990ScheduleC-01', '@referenceDocumentName': 'IRS990ScheduleC', '#text': 'false'}      4952
{'@referenceDocumentId': 'RetDoc2', '#te

##### Check *F9_12_PC_ACCTG_METHOD_OTHER*, *F9_00_HD_EXEMPT_STATUS_501C*, and *F9_00_HD_INCLUDES_SUBORD_ORGS*
Based on the following frequencies,for *F9_12_PC_ACCTG_METHOD_OTHER* do an *np.where* and make it 'other'. Leave *F9_00_HD_EXEMPT_STATUS_501C* and *F9_00_HD_INCLUDES_SUBORD_ORGS* alone.

In [228]:
%%time
print(df[df['F9_00_HD_EXEMPT_STATUS_501C'].notnull()]['F9_00_HD_EXEMPT_STATUS_501C'].value_counts().head())

F9_00_HD_EXEMPT_STATUS_501C
{"@organization501cTypeTxt": "6", "#text": "X"}    198899
{"@organization501cTypeTxt": "4", "#text": "X"}    110643
{"@organization501cTypeTxt": "5", "#text": "X"}     91522
{"@organization501cTypeTxt": "7", "#text": "X"}     87154
{"@organization501cTypeTxt": "9", "#text": "X"}     44800
Name: count, dtype: Int64
CPU times: total: 15.6 s
Wall time: 16 s


#### Fix *F9_00_HD_EXEMPT_STATUS_501C*
We'll write a custom function for processing this variable

In [229]:
def func(x, key1, key2):
    if pd.isnull(x):
        return np.nan
    #else: 
    #    mydict = ast.literal_eval(x)
    elif key1 in x.keys():
        return x[key1]
    elif key2 in x.keys():
        return x[key2]
    else:
        return np.nan

<br>Apply the function and then show updated frequencies. You can see that the 'dictionaries' have all disappeared and what remains is the 501(c) value.

In [230]:
df['F9_00_HD_EXEMPT_STATUS_501C'].dtype

string[python]

In [231]:
import ast
import json

def convert_to_dict(x):
    """Convert string to dictionary if needed"""
    if pd.isnull(x):
        return np.nan
    
    if isinstance(x, dict):
        return x
        
    try:
        # Try using json.loads first (safer and handles more formats)
        return json.loads(x)
    except:
        try:
            # Fallback to ast.literal_eval
            return ast.literal_eval(x)
        except:
            return np.nan

In [232]:
%%time
# Apply the conversion to each column directly
for col in binarize_with_dict_cols:
    if col in df.columns:
        print(f"Converting {col}...")
        # Apply the exact same approach that worked for the individual column
        df[col] = df[col].apply(convert_to_dict)
        
        # Show a quick verification
        #sample = df[df[col].notnull()][col].head(1).values
        #print(f"Sample after conversion: {sample}")
        #print(f"Type after conversion: {type(sample[0]) if len(sample) > 0 else 'N/A'}")

Converting F9_00_HD_EXEMPT_STATUS_501C...
Converting F9_00_HD_INCLUDES_SUBORD_ORGS...
Converting F9_04_PC_ACTVITIES_VIA_PARTNER...
Converting F9_04_PC_CONTROLLED_ENTITY...
Converting F9_04_PC_DISREGARDED_ENTITY...
Converting F9_04_PC_EXCESS_BENEFIT_TRANS...
Converting F9_04_PC_FR_EVENT_INC_GT_15K...
Converting F9_04_PC_GAMING_INC_GT_15K...
Converting F9_04_PC_LOBBYING_ACTIVITIES...
Converting F9_04_PC_POLITICAL_ACTIVITIES...
Converting F9_04_PC_PRIOR_EXCESS_BEN_TRAN...
Converting F9_04_PC_PROF_FR_EXP_GT_15K...
Converting F9_04_PC_RELATED_ENTITY...
Converting F9_04_PC_TRANS_TO_CNTRLD_ENT...
Converting F9_04_PC_TRANS_WITH_CNTRLD_ENT...
Converting F9_12_PC_ACCTG_METHOD_OTHER...


In [167]:
#%%time
## First convert strings to dictionaries
#df['F9_00_HD_EXEMPT_STATUS_501C'] = df['F9_00_HD_EXEMPT_STATUS_501C'].apply(convert_to_dict)

CPU times: total: 5.5 s
Wall time: 5.69 s


In [233]:
df['F9_00_HD_EXEMPT_STATUS_501C'].dtype

dtype('O')

In [235]:
if sample_col in df.columns:
    # Print a few non-null values to see what we're working with
    sample_values = df[df[sample_col].notnull()][sample_col].head(3).tolist()
    print(f"Sample values from {sample_col}:")
    for val in sample_values:
        print(f"Value: {val}")
        print(f"Type: {type(val)}")

    ## Try the conversion on just these values to see what happens
    #print("\nTrying conversion:")
    #for val in sample_values:
    #    converted = convert_to_dict(val)
    #    print(f"Original: {val}")
    #    print(f"Converted: {converted}")
    #    print(f"Converted type: {type(converted)}")
    #    print("---")        

Sample values from F9_00_HD_EXEMPT_STATUS_501C:
Value: {'@typeOf501cOrganization': '7', '#text': 'X'}
Type: <class 'dict'>
Value: {'@typeOf501cOrganization': '4', '#text': 'X'}
Type: <class 'dict'>
Value: {'@typeOf501cOrganization': '6', '#text': 'X'}
Type: <class 'dict'>


In [171]:
#df = df.drop('F9_00_HD_EXEMPT_STATUS_501C_dict', axis=1)

In [236]:
%%time
df['F9_00_HD_EXEMPT_STATUS_501C'] = df['F9_00_HD_EXEMPT_STATUS_501C'][:].apply(func, 
                            key1='@organization501cTypeTxt', key2 ='@typeOf501cOrganization')

CPU times: total: 6.73 s
Wall time: 7.06 s


In [237]:
df['F9_00_HD_EXEMPT_STATUS_501C'].value_counts()[:10]

F9_00_HD_EXEMPT_STATUS_501C
6     231858
4     127070
5     106056
7     100604
9      54868
8      42206
19     40678
12     33678
3      28339
14     28043
Name: count, dtype: int64

#### Fix other 'dictionary' variables

In [174]:
binarize_with_dict_cols

['F9_00_HD_EXEMPT_STATUS_501C',
 'F9_00_HD_INCLUDES_SUBORD_ORGS',
 'F9_04_PC_ACTVITIES_VIA_PARTNER',
 'F9_04_PC_CONTROLLED_ENTITY',
 'F9_04_PC_DISREGARDED_ENTITY',
 'F9_04_PC_EXCESS_BENEFIT_TRANS',
 'F9_04_PC_FR_EVENT_INC_GT_15K',
 'F9_04_PC_GAMING_INC_GT_15K',
 'F9_04_PC_LOBBYING_ACTIVITIES',
 'F9_04_PC_POLITICAL_ACTIVITIES',
 'F9_04_PC_PRIOR_EXCESS_BEN_TRAN',
 'F9_04_PC_PROF_FR_EXP_GT_15K',
 'F9_04_PC_RELATED_ENTITY',
 'F9_04_PC_TRANS_TO_CNTRLD_ENT',
 'F9_04_PC_TRANS_WITH_CNTRLD_ENT',
 'F9_12_PC_ACCTG_METHOD_OTHER']

<br>The remaining *binarize_with_dict* variables have the core values nested in the dictionary's 'text' key. Let's write a function to allow us to grab the value for the 'text' key in each dictionary.

In [238]:
def func_text(x, key1):
    if pd.isnull(x):
        return np.nan
    #else: 
    #    mydict = ast.literal_eval(x)
    
    elif type(x)==dict: 
        if key1 in x.keys():
            return x[key1]
    else:
        return x

<br>Apply the function to the first variable.

In [240]:
df['F9_00_HD_INCLUDES_SUBORD_ORGS'] = df['F9_00_HD_INCLUDES_SUBORD_ORGS'][:].apply(func_text, 
                            key1='#text')

<br>Now when we run the frequencies for this variable we see that there are four values -- just like the 'binarize' variables. 

In [281]:
df['F9_00_HD_INCLUDES_SUBORD_ORGS'].value_counts()

F9_00_HD_INCLUDES_SUBORD_ORGS
False    292935
True      17788
false      1157
true        888
0           647
1           114
Name: count, dtype: int64

In [282]:
df[binarize_with_dict_cols].dtypes

F9_00_HD_INCLUDES_SUBORD_ORGS     object
F9_04_PC_ACTVITIES_VIA_PARTNER    object
F9_04_PC_CONTROLLED_ENTITY        object
F9_04_PC_DISREGARDED_ENTITY       object
F9_04_PC_EXCESS_BENEFIT_TRANS     object
F9_04_PC_FR_EVENT_INC_GT_15K      object
F9_04_PC_GAMING_INC_GT_15K        object
F9_04_PC_LOBBYING_ACTIVITIES      object
F9_04_PC_POLITICAL_ACTIVITIES     object
F9_04_PC_PRIOR_EXCESS_BEN_TRAN    object
F9_04_PC_PROF_FR_EXP_GT_15K       object
F9_04_PC_RELATED_ENTITY           object
F9_04_PC_TRANS_TO_CNTRLD_ENT      object
F9_04_PC_TRANS_WITH_CNTRLD_ENT    object
dtype: object

<br>Now let's apply the function and show frequencies for each of the remaining *binarize_with_dict* variables in turn.

In [243]:
df['F9_04_PC_EXCESS_BENEFIT_TRANS'] = df['F9_04_PC_EXCESS_BENEFIT_TRANS'][:].apply(func_text, 
                            key1='#text')

In [244]:
df['F9_04_PC_EXCESS_BENEFIT_TRANS'].value_counts()

F9_04_PC_EXCESS_BENEFIT_TRANS
false    1720098
0        1128172
true         418
1            340
False          4
Name: count, dtype: int64

In [245]:
%%time
df['F9_04_PC_FR_EVENT_INC_GT_15K'] = df['F9_04_PC_FR_EVENT_INC_GT_15K'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.41 s
Wall time: 6.6 s


In [246]:
df['F9_04_PC_FR_EVENT_INC_GT_15K'].value_counts()

F9_04_PC_FR_EVENT_INC_GT_15K
false    1643695
0        1079137
true      428605
1         317528
False         40
1              3
Name: count, dtype: int64

In [247]:
%%time
df['F9_04_PC_GAMING_INC_GT_15K'] = df['F9_04_PC_GAMING_INC_GT_15K'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.03 s
Wall time: 6.23 s


In [248]:
df['F9_04_PC_GAMING_INC_GT_15K'].value_counts()

F9_04_PC_GAMING_INC_GT_15K
false    1993496
0        1369654
true       78804
1          27011
False         43
Name: count, dtype: int64

In [249]:
%%time
df['F9_04_PC_PRIOR_EXCESS_BEN_TRAN'] = df['F9_04_PC_PRIOR_EXCESS_BEN_TRAN'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 4.98 s
Wall time: 5.14 s


In [250]:
df['F9_04_PC_PRIOR_EXCESS_BEN_TRAN'].value_counts()

F9_04_PC_PRIOR_EXCESS_BEN_TRAN
false    1721545
0        1128272
true         393
1            209
False          4
Name: count, dtype: int64

In [251]:
%%time
df['F9_04_PC_PROF_FR_EXP_GT_15K'] = df['F9_04_PC_PROF_FR_EXP_GT_15K'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.03 s
Wall time: 6.2 s


In [252]:
df['F9_04_PC_FR_EVENT_INC_GT_15K'].value_counts()

F9_04_PC_FR_EVENT_INC_GT_15K
false    1643695
0        1079137
true      428605
1         317528
False         40
1              3
Name: count, dtype: int64

In [253]:
%%time
df['F9_04_PC_ACTVITIES_VIA_PARTNER'] = df['F9_04_PC_ACTVITIES_VIA_PARTNER'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.5 s
Wall time: 6.64 s


In [254]:
df['F9_04_PC_ACTVITIES_VIA_PARTNER'].value_counts()

F9_04_PC_ACTVITIES_VIA_PARTNER
false    2071549
0        1395447
1           1245
true         767
Name: count, dtype: int64

In [255]:
%%time
df['F9_04_PC_CONTROLLED_ENTITY'] = df['F9_04_PC_CONTROLLED_ENTITY'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 3.97 s
Wall time: 4.18 s


In [261]:
df['F9_04_PC_CONTROLLED_ENTITY'].value_counts()

F9_04_PC_CONTROLLED_ENTITY
False    3061951
1         177057
false     119283
0          94343
1           9963
true        6411
Name: count, dtype: int64

In [257]:
%%time
df['F9_04_PC_DISREGARDED_ENTITY'] = df['F9_04_PC_DISREGARDED_ENTITY'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.8 s
Wall time: 7.09 s


In [262]:
df['F9_04_PC_DISREGARDED_ENTITY'].value_counts()

F9_04_PC_DISREGARDED_ENTITY
false    2035297
0        1339639
1          57053
true       37019
Name: count, dtype: int64

In [264]:
%%time
df['F9_04_PC_LOBBYING_ACTIVITIES'] = df['F9_04_PC_LOBBYING_ACTIVITIES'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 4.44 s
Wall time: 4.65 s


In [265]:
df['F9_04_PC_LOBBYING_ACTIVITIES'].value_counts()

F9_04_PC_LOBBYING_ACTIVITIES
false    1589524
0         999903
1          86110
true       56722
0             22
Name: count, dtype: int64

In [266]:
%%time
df['F9_04_PC_POLITICAL_ACTIVITIES'] = df['F9_04_PC_POLITICAL_ACTIVITIES'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 4.84 s
Wall time: 5.17 s


In [267]:
df['F9_04_PC_POLITICAL_ACTIVITIES'].value_counts()

F9_04_PC_POLITICAL_ACTIVITIES
false    2054355
0        1380007
true       17956
1          16682
False          8
Name: count, dtype: int64

In [268]:
%%time
df['F9_04_PC_RELATED_ENTITY'] = df['F9_04_PC_RELATED_ENTITY'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.64 s
Wall time: 6.88 s


In [269]:
df['F9_04_PC_RELATED_ENTITY'].value_counts()

F9_04_PC_RELATED_ENTITY
false    1745005
0         879575
1         517117
true      327311
Name: count, dtype: int64

In [270]:
%%time
df['F9_04_PC_TRANS_TO_CNTRLD_ENT'] = df['F9_04_PC_TRANS_TO_CNTRLD_ENT'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 6.12 s
Wall time: 6.34 s


In [271]:
df['F9_04_PC_TRANS_TO_CNTRLD_ENT'].value_counts()

F9_04_PC_TRANS_TO_CNTRLD_ENT
false    1504504
0        1067944
1          18148
true       13565
0              1
Name: count, dtype: int64

In [272]:
%%time
df['F9_04_PC_TRANS_WITH_CNTRLD_ENT'] = df['F9_04_PC_TRANS_WITH_CNTRLD_ENT'][:].apply(func_text, 
                            key1='#text')

CPU times: total: 4.67 s
Wall time: 4.91 s


In [273]:
df['F9_04_PC_TRANS_WITH_CNTRLD_ENT'].value_counts()

F9_04_PC_TRANS_WITH_CNTRLD_ENT
false    624181
0        177129
1         59556
true      51225
Name: count, dtype: int64

#### Fix *F9_12_PC_ACCTG_METHOD_OTHER*
Write function to process this variable

In [274]:
def func_text2(x, key1, key2):
    if pd.isnull(x):
        return np.nan
    #else: 
    #    mydict = ast.literal_eval(x)
    
    elif type(x)==dict: 
        if key1 in x.keys():
            return x[key1]
        elif key2 in x.keys():
            return x[key2]
    else:
        return x

In [275]:
%%time
df['F9_12_PC_ACCTG_METHOD_OTHER'] = df['F9_12_PC_ACCTG_METHOD_OTHER'][:].apply(func_text2, 
                            key1='@note', key2='@methodOfAccountingOtherDesc')

CPU times: total: 4.33 s
Wall time: 4.45 s


<br>We'll now look at the frequencies for all the variables in our *binarize_with_dict* list. From this we'll see which ones now need to be binarized.

In [277]:
for c in binarize_with_dict_cols[:1]:
    print(df[df[c].notnull()][c].value_counts()[:10], '\n')

F9_00_HD_EXEMPT_STATUS_501C
6     231858
4     127070
5     106056
7     100604
9      54868
8      42206
19     40678
12     33678
3      28339
14     28043
Name: count, dtype: int64 



##### Remove two variables from *binarize_with_dict_cols*
These have already been processed above (and are not True/False variables). The remaining variables will be processed below with the other 'binarize' variables.

In [278]:
gc.collect()

1495

In [279]:
print(len(binarize_with_dict_cols))
binarize_with_dict_cols.remove('F9_12_PC_ACCTG_METHOD_OTHER') 
binarize_with_dict_cols.remove('F9_00_HD_EXEMPT_STATUS_501C')
print(len(binarize_with_dict_cols))

16
14


#### Save DF

In [280]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_feather('D:/all_filings_april_2025_all_controls.feather')

Current date and time :  2025-04-16 23:47:12 



ArrowInvalid: ("Could not convert 'false' with type str: tried to convert to boolean", 'Conversion failed for column F9_00_HD_INCLUDES_SUBORD_ORGS with type object')

In [None]:
#Use HDF5 format (handles mixed types better than feather):
#df.to_hdf('D:/all_filings_april_2025_all_controls.h5', key='df')

In [None]:
#Use parquet with string type enforcement:
#df.to_parquet('D:/all_filings_april_2025_all_controls.parquet', 
#             engine='pyarrow', 
#             allow_truncated_timestamps=True)

#### Create *501c3* variable
There are two variables to look at here: ``F9_00_HD_EXEMPT_STATUS_501C`` and ``F9_00_HD_EXEMPT_STATUS_501C3``

You can see that there are 5,137 observations with a value of '3' on the first of these variables

In [283]:
df['F9_00_HD_EXEMPT_STATUS_501C'].value_counts()[:15]

F9_00_HD_EXEMPT_STATUS_501C
6     231858
4     127070
5     106056
7     100604
9      54868
8      42206
19     40678
12     33678
3      28339
14     28043
2      22782
13     21118
10     12610
25      4660
15      1382
Name: count, dtype: int64

<br>And there are 665,027 501c3 observations on the second variable

In [284]:
df['F9_00_HD_EXEMPT_STATUS_501C3'].value_counts()

F9_00_HD_EXEMPT_STATUS_501C3
X    2609442
Name: count, dtype: Int64

In [290]:
df['F9_00_HD_EXEMPT_STATUS_501C3'] = df['F9_00_HD_EXEMPT_STATUS_501C3'].astype('string')

In [291]:
df['F9_00_HD_EXEMPT_STATUS_501C3'].value_counts()

F9_00_HD_EXEMPT_STATUS_501C3
X    2609442
Name: count, dtype: Int64

<br>Run a cross-tab of the two variables. There's no overlap, so we should expect 5,137 + 1,605,635 = 1,610,772 expected filings with a value of '1' on the ``501c3`` variable we will create.

In [285]:
pd.crosstab(df['F9_00_HD_EXEMPT_STATUS_501C3'], df['F9_00_HD_EXEMPT_STATUS_501C'])

F9_00_HD_EXEMPT_STATUS_501C
F9_00_HD_EXEMPT_STATUS_501C3


In [292]:
# Fill NaN values with 0 before applying np.where
df['501c3'] = np.where(df['F9_00_HD_EXEMPT_STATUS_501C3'].fillna('').eq('X'), 1, 0)
print(df['501c3'].value_counts(),'\n')

501c3
1    2609442
0     859566
Name: count, dtype: int64 



In [293]:
df['501c3'] = np.where(df['F9_00_HD_EXEMPT_STATUS_501C'].fillna('').eq('3'), 1, df['501c3'])
print(df['501c3'].value_counts())

501c3
1    2637781
0     831227
Name: count, dtype: int64


<br>We see above that the frequencies for our new variable ``501c3`` are as expected.

#### Save DF

In [79]:
#%%time
#import datetime
#print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df.to_pickle('all NEW filings February 2024 - all control variables (renamed).pkl.gz', compression='gzip')

Current date and time :  2024-03-29 14:31:43 

CPU times: total: 25min 39s
Wall time: 26min 15s


# Binarize Columns

Loop over the all 'binarize' variables. In this loop we're just inspecting the frequencies for each of the 64 variables.

In [294]:
%%time
for col in binarize_cols[:2]:
    print(df[col].value_counts(), '\n\n')

F9_00_HD_ADDR_CHANGE
X    138187
Name: count, dtype: Int64 


F9_00_HD_AMENDED_RETURN
X    41351
Name: count, dtype: Int64 


CPU times: total: 46.9 ms
Wall time: 67.4 ms


<br>Write a function to make the values binary (0,1)

In [325]:
def binarize(df, variable):
    print(f"Before – {variable}:\n", df[variable].value_counts(dropna=False).head(), '\n')
    
    # Handle actual booleans
    df[variable] = np.where(df[variable] == True, 1, df[variable])
    df[variable] = np.where(df[variable] == False, 0, df[variable])

    # Handle strings
    df[variable] = np.where(df[variable] == 'true', 1, df[variable])
    df[variable] = np.where(df[variable] == 'false', 0, df[variable])
    df[variable] = np.where(df[variable] == '1', 1, df[variable])
    df[variable] = np.where(df[variable] == '0', 0, df[variable])
    df[variable] = np.where(df[variable] == 'X', 1, df[variable])    

    # Convert to consistent binary int type
    df[variable] = df[variable].astype('Int64')  # Nullable integer type

    print(f"After – {variable}:\n", df[variable].value_counts(dropna=False).head(), '\n')
    return df.sample(10)[['EIN', variable]]

<br>Run the above function for the six remaining variables in the *binarize_with_dict_cols* list. The ouptut below will show the frequencies for each variable before and after processing. All are successfully converted into 0/1 variables.

In [326]:
%%time 
for col in binarize_with_dict_cols:
    binarize(df, col)

Before – F9_00_HD_INCLUDES_SUBORD_ORGS:
 F9_00_HD_INCLUDES_SUBORD_ORGS
NaN    3155479
0       294739
1        18790
Name: count, dtype: int64 

After – F9_00_HD_INCLUDES_SUBORD_ORGS:
 F9_00_HD_INCLUDES_SUBORD_ORGS
<NA>    3155479
0        294739
1         18790
Name: count, dtype: Int64 

Before – F9_04_PC_ACTVITIES_VIA_PARTNER:
 F9_04_PC_ACTVITIES_VIA_PARTNER
0    3466996
1       2012
Name: count, dtype: int64 

After – F9_04_PC_ACTVITIES_VIA_PARTNER:
 F9_04_PC_ACTVITIES_VIA_PARTNER
0    3466996
1       2012
Name: count, dtype: Int64 

Before – F9_04_PC_CONTROLLED_ENTITY:
 F9_04_PC_CONTROLLED_ENTITY
0    3275577
1     193431
Name: count, dtype: int64 

After – F9_04_PC_CONTROLLED_ENTITY:
 F9_04_PC_CONTROLLED_ENTITY
0    3275577
1     193431
Name: count, dtype: Int64 

Before – F9_04_PC_DISREGARDED_ENTITY:
 F9_04_PC_DISREGARDED_ENTITY
0    3374936
1      94072
Name: count, dtype: int64 

After – F9_04_PC_DISREGARDED_ENTITY:
 F9_04_PC_DISREGARDED_ENTITY
0    3374936
1      94072
Name: c

In [300]:
#binarize(df, 'F9_00_HD_INCLUDES_SUBORD_ORGS')

F9_00_HD_INCLUDES_SUBORD_ORGS
False    294739
True      18790
Name: count, dtype: int64 

F9_00_HD_INCLUDES_SUBORD_ORGS
0    294739
1     18790
Name: count, dtype: int64 




Unnamed: 0,EIN,F9_00_HD_INCLUDES_SUBORD_ORGS
1454999,550735567,
2446405,330578620,
2257556,821515568,
3141806,882818905,
2969228,943297380,
450620,453621984,
3433614,232603853,
241872,133560387,
2955765,841893677,
3325722,272718480,


In [311]:
binarize_with_dict_cols

['F9_00_HD_INCLUDES_SUBORD_ORGS',
 'F9_04_PC_ACTVITIES_VIA_PARTNER',
 'F9_04_PC_CONTROLLED_ENTITY',
 'F9_04_PC_DISREGARDED_ENTITY',
 'F9_04_PC_EXCESS_BENEFIT_TRANS',
 'F9_04_PC_FR_EVENT_INC_GT_15K',
 'F9_04_PC_GAMING_INC_GT_15K',
 'F9_04_PC_LOBBYING_ACTIVITIES',
 'F9_04_PC_POLITICAL_ACTIVITIES',
 'F9_04_PC_PRIOR_EXCESS_BEN_TRAN',
 'F9_04_PC_PROF_FR_EXP_GT_15K',
 'F9_04_PC_RELATED_ENTITY',
 'F9_04_PC_TRANS_TO_CNTRLD_ENT',
 'F9_04_PC_TRANS_WITH_CNTRLD_ENT']

In [327]:
df['F9_04_PC_CONTROLLED_ENTITY'].value_counts()

F9_04_PC_CONTROLLED_ENTITY
0    3275577
1     193431
Name: count, dtype: Int64

In [None]:
#binarize(df, 'F9_04_PC_CONTROLLED_ENTITY')

<br>Now run it for the 64 variables in the *binarize_cols* list

In [301]:
len(binarize_cols)

64

# 4/16/2025 - new version of binarize 
- To avert `TypeError: boolean value of NA is ambiguous` 

Ah yes — this is a classic `pandas.NA` issue. You're getting this error:

`TypeError: boolean value of NA is ambiguou`s

because you're comparing a column that uses the nullable integer type (Int64, capital I) with a string ('true'), and it's encountering `pd.NA` values during the comparison. Unlike np.nan, pd.NA cannot be evaluated in boolean operations — it's designed to raise an error instead of guessing.

✅ Best Fix: Coerce everything to string first
Update your binarize function so it converts the column to strings at the beginning, before doing any comparisons. That way, the pd.NA gets turned into the string "nan" and won’t cause issues.

Why this works:
- `astype(str)` converts all values to strings, including `pd.NA → 'nan'`
- `.map({...})` applies clean mapping  
- Unrecognized values (like 'nan') become `NaN`   
- Casting to 'Int64' ensures valid binary or null values   

In [306]:
def binarize(df, variable):
    print("Before:\n", df[variable].value_counts(dropna=False).head(), '\n')

    # Coerce all values to string and normalize
    df[variable] = df[variable].astype(str).str.strip().str.lower()

    # Map known values to binary
    df[variable] = df[variable].map({
        'true': 1,
        '1': 1,
        'x': 1,
        'X': 1,        
        'false': 0,
        '0': 0
    })

    print("After:\n", df[variable].value_counts(dropna=False).head(), '\n')
    
    # Cast to nullable Int64 type (preserves NaNs)
    df[variable] = df[variable].astype('Int64')
    
    return df.sample(5)[['EIN', variable]]

In [307]:
%%time
for col in binarize_cols:
    binarize(df, col)

Before:
 F9_00_HD_ADDR_CHANGE
<NA>    3330821
X        138187
Name: count, dtype: Int64 

After:
 F9_00_HD_ADDR_CHANGE
NaN    3330821
1.0     138187
Name: count, dtype: int64 

Before:
 F9_00_HD_AMENDED_RETURN
<NA>    3427657
X         41351
Name: count, dtype: Int64 

After:
 F9_00_HD_AMENDED_RETURN
NaN    3427657
1.0      41351
Name: count, dtype: int64 

Before:
 F9_00_HD_EXEMPT_STATUS_4847A1
<NA>    3466547
X          2461
Name: count, dtype: Int64 

After:
 F9_00_HD_EXEMPT_STATUS_4847A1
NaN    3466547
1.0       2461
Name: count, dtype: int64 

Before:
 F9_00_HD_EXEMPT_STATUS_501C3
X       2609442
<NA>     859566
Name: count, dtype: Int64 

After:
 F9_00_HD_EXEMPT_STATUS_501C3
1.0    2609442
NaN     859566
Name: count, dtype: int64 

Before:
 F9_00_HD_FINAL_RETURN
<NA>    3449629
X         19379
Name: count, dtype: Int64 

After:
 F9_00_HD_FINAL_RETURN
NaN    3449629
1.0      19379
Name: count, dtype: int64 

Before:
 F9_00_HD_GROUP_RETURN
false    2068591
0        1392941
1       

After:
 F9_06_PC_MONITORING_OF_COI_POLICY
1.0    1848873
NaN    1156585
0.0     463550
Name: count, dtype: int64 

Before:
 F9_06_PC_OFFICER_MAILING_ADDRESS
false    2033316
0        1382310
true       39000
1          14382
Name: count, dtype: int64 

After:
 F9_06_PC_OFFICER_MAILING_ADDRESS
0    3415626
1      53382
Name: count, dtype: int64 

Before:
 F9_06_PC_OTHER_COMPENSTN_PROCESS
false    1524387
0         899664
true      530210
1         496512
None       18235
Name: count, dtype: int64 

After:
 F9_06_PC_OTHER_COMPENSTN_PROCESS
0.0    2424051
1.0    1026722
NaN      18235
Name: count, dtype: int64 

Before:
 F9_06_PC_OTHER_WEBSITE
<NA>    3033593
X        435415
Name: count, dtype: Int64 

After:
 F9_06_PC_OTHER_WEBSITE
NaN    3033593
1.0     435415
Name: count, dtype: int64 

Before:
 F9_06_PC_OWN_WEBSITE
<NA>    3244763
X        224245
Name: count, dtype: Int64 

After:
 F9_06_PC_OWN_WEBSITE
NaN    3244763
1.0     224245
Name: count, dtype: int64 

Before:
 F9_06_PC_POLICIE

<br>Show a sample of 5 observations for all variables in the *binarize_with_dict* list then the *binarize_cols* list.

In [323]:
df[binarize_with_dict_cols].dtypes

F9_00_HD_INCLUDES_SUBORD_ORGS     object
F9_04_PC_ACTVITIES_VIA_PARTNER    object
F9_04_PC_CONTROLLED_ENTITY        object
F9_04_PC_DISREGARDED_ENTITY       object
F9_04_PC_EXCESS_BENEFIT_TRANS     object
F9_04_PC_FR_EVENT_INC_GT_15K      object
F9_04_PC_GAMING_INC_GT_15K        object
F9_04_PC_LOBBYING_ACTIVITIES      object
F9_04_PC_POLITICAL_ACTIVITIES     object
F9_04_PC_PRIOR_EXCESS_BEN_TRAN    object
F9_04_PC_PROF_FR_EXP_GT_15K       object
F9_04_PC_RELATED_ENTITY           object
F9_04_PC_TRANS_TO_CNTRLD_ENT      object
F9_04_PC_TRANS_WITH_CNTRLD_ENT    object
dtype: object

In [332]:
df[binarize_with_dict_cols].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
F9_00_HD_INCLUDES_SUBORD_ORGS,313529.0,0.059931,0.237359,0.0,0.0,0.0,0.0,1.0
F9_04_PC_ACTVITIES_VIA_PARTNER,3469008.0,0.00058,0.024076,0.0,0.0,0.0,0.0,1.0
F9_04_PC_CONTROLLED_ENTITY,3469008.0,0.05576,0.229457,0.0,0.0,0.0,0.0,1.0
F9_04_PC_DISREGARDED_ENTITY,3469008.0,0.027118,0.162427,0.0,0.0,0.0,0.0,1.0
F9_04_PC_EXCESS_BENEFIT_TRANS,2849032.0,0.000266,0.016309,0.0,0.0,0.0,0.0,1.0
F9_04_PC_FR_EVENT_INC_GT_15K,3469008.0,0.215086,0.410882,0.0,0.0,0.0,0.0,1.0
F9_04_PC_GAMING_INC_GT_15K,3469008.0,0.030503,0.171967,0.0,0.0,0.0,0.0,1.0
F9_04_PC_LOBBYING_ACTIVITIES,2732281.0,0.052276,0.222583,0.0,0.0,0.0,0.0,1.0
F9_04_PC_POLITICAL_ACTIVITIES,3469008.0,0.009985,0.099425,0.0,0.0,0.0,0.0,1.0
F9_04_PC_PRIOR_EXCESS_BEN_TRAN,2850423.0,0.000211,0.014531,0.0,0.0,0.0,0.0,1.0


In [328]:
df[binarize_with_dict_cols].sample(5)

Unnamed: 0,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT
2531478,,0,0,0,0.0,0,0,0,0,0.0,0,0,0.0,
3388090,,0,0,0,0.0,0,0,0,0,0.0,0,1,0.0,
2985105,,0,0,0,0.0,0,0,0,0,0.0,0,0,0.0,
2387705,,0,0,1,0.0,1,0,0,0,0.0,0,0,0.0,
3128191,,0,0,0,,0,0,0,0,,0,0,,


In [331]:
df[binarize_cols].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
F9_00_HD_ADDR_CHANGE,138187.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
F9_00_HD_AMENDED_RETURN,41351.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
F9_00_HD_EXEMPT_STATUS_4847A1,2461.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
F9_00_HD_EXEMPT_STATUS_501C3,2609442.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
F9_00_HD_FINAL_RETURN,19379.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...
F9_12_PC_AUDIT_COMMITTEE,1873352.0,0.810797,0.39167,0.0,1.0,1.0,1.0,1.0
F9_12_PC_FED_GRNT_AUDIT_PERFORMD,362372.0,0.822332,0.382234,0.0,1.0,1.0,1.0,1.0
F9_12_PC_FED_GRNT_AUDIT_REQUIRED,3118240.0,0.096413,0.295157,0.0,0.0,0.0,0.0,1.0
F9_12_PC_FINCL_STMTS_AUDITED,3469008.0,0.434902,0.495744,0.0,0.0,0.0,1.0,1.0


In [330]:
df[binarize_cols].sample(5)

Unnamed: 0,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROUP_RETURN,F9_00_HD_INITIAL_RETURN,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_TRUST,F9_01_PC_TERMINATION_CONTRACTION,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PZ_SCHEDULE_O_PART3,F9_05_EXP_SCHED_O_X,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_08_EXP_SCHED_O_X,F9_09_EXP_SCHED_O_X,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_SCHED_O_X,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X
2331038,,,,1.0,,0,,,1,,,,0,0,,,0,1,0,1.0,0,0,1,0,0,0,1,0,0,,1.0,0,,0,0,0,1,1,1.0,0,0,,,,1,,0,0,,0,,,,,,1.0,0,1.0,,1.0,,0.0,1,
1706860,,,,,,0,,,1,,,,0,0,,,0,1,0,,0,0,0,0,0,0,0,0,0,,1.0,0,,1,0,0,1,1,,0,0,,,1.0,0,,0,0,,0,,1.0,,1.0,,,0,,1.0,,,0.0,0,
3127596,,,,1.0,,0,,,1,,,,0,0,,,0,1,0,,0,0,0,0,0,0,0,0,0,,1.0,0,,0,0,0,1,1,,0,0,,,,0,,0,0,,0,,,,,,1.0,0,1.0,,,,,0,
1570356,,,,1.0,,0,,,1,,,,0,0,,,0,1,0,1.0,0,0,1,0,0,0,1,0,0,,1.0,0,,0,0,0,1,1,1.0,0,0,,,,1,,0,0,,0,,,1.0,,,,0,1.0,,1.0,,0.0,1,
206036,,,,1.0,,0,,,1,,,,0,0,1.0,,0,1,0,,0,0,0,0,0,0,0,0,0,,,0,,0,0,0,1,1,,0,0,,,,0,,0,0,1.0,0,,,1.0,,,,0,,1.0,,,,0,


### Check that total number of values in new variable equal sum of prior 2 variables
Now we are doing some additional verifications to ensure that we have the expected number of observations for our 'combined' variables. We're printing out two numbers for each variable; they should match. 

##### Skip for the new XML filings

In [71]:
"""
for index, row in new_variables_df[new_variables_df['len']==2][:].iterrows():
    #print(row['variable_name_new'])
    print(row['variable_name_new'], row['original_names'][0], row['original_names'][1])
    print(len(df[df[row['original_names'][0]].notnull()]) + len(df[df[row['original_names'][1]].notnull()]))    
    print(len(df[df[row['variable_name_new']].notnull()]), '\n')
    #print(len(df[(df[row['original_names'][0]].notnull()) & (df[row['original_names'][1]].notnull())]), '\n')     
"""

F9_00_HD_ADDR_CHANGE AddressChange AddressChangeInd
82976
82976 

F9_00_HD_AMENDED_RETURN AmendedReturnInd AmendedReturn
19445
19445 

F9_00_HD_CTRY_OF_DOMICILE LegalDomicileCountryCd CountryLegalDomicile
1155
1155 

F9_00_HD_EXEMPT_STATUS_4847A1 Organization4947a1 Organization4947a1NotPFInd
1420
1420 

F9_00_HD_EXEMPT_STATUS_501C Organization501cInd Organization501c
497380
497380 

F9_00_HD_EXEMPT_STATUS_501C3 Organization501c3Ind Organization501c3
1605635
1605635 

F9_00_HD_FINAL_RETURN FinalReturnInd TerminatedReturn
11573
11573 

F9_00_HD_GROSS_EXEMPT_NUM GroupExemptionNum GroupExemptionNumber
70836
70836 

F9_00_HD_GROSS_RCPT GrossReceiptsAmt GrossReceipts
2104435
2104435 

F9_00_HD_GROUP_RETURN GroupReturnForAffiliatesInd GroupReturnForAffiliates
2104435
2104435 

F9_00_HD_INCLUDES_SUBORD_ORGS AllAffiliatesIncluded AllAffiliatesIncludedInd
292934
292934 

F9_00_HD_INITIAL_RETURN InitialReturn InitialReturnInd
20875
20875 

F9_00_HD_PRIN_OFF_NAME PrincipalOfficerNm NameOfPrincipal

1692396
1692396 

F9_04_PC_POLITICAL_ACTIVITIES PoliticalCampaignActyInd PoliticalActivities
2104435
2104435 

F9_04_PC_PRIOR_EXCESS_BEN_TRAN PYExcessBenefitTransInd PriorExcessBenefitTransaction
1756505
1756505 

F9_04_PC_PROF_FR_EXP_GT_15K ProfessionalFundraising ProfessionalFundraisingInd
2104435
2104435 

F9_04_PC_RELATED_ENTITY RelatedEntityInd RelatedEntity
2104435
2104435 

F9_04_PC_TRANS_TO_CNTRLD_ENT TrnsfrExmptNonChrtblRltdOrgInd TransfersToExemptNonChrtblOrg
1572460
1572460 

F9_04_PC_TRANS_WITH_CNTRLD_ENT TransactionRelatedEntity TransactionWithControlEntInd
691900
691900 

F9_05_EXP_SCHED_O_X InfoInScheduleOPartVInd InfoInScheduleOPartV
39329
39329 

F9_05_PC_NUMBER_EMPLOYEES_W3 EmployeeCnt NumberOfEmployees
2104435
2104435 

F9_05_PC_NUMBER_FORMS_1096 IRPDocumentCnt NumberFormsTransmittedWith1096
2104435
2104435 

F9_05_PC_UNRELATED_BUS_INCOME UnrelatedBusIncmOverLimitInd UnrelatedBusinessIncome
2104435
2104435 

F9_06_EXP_SCHED_O_X InfoInScheduleOPartVIInd InfoInSchedule

513944
513944 

F9_09_EXP_SCHED_O_X InfoInScheduleOPartIXInd InfoInScheduleOPartIX
328361
328361 

F9_09_EXP_TRAVEL_ENTRTNMNT_TOT PymtTravelEntrtnmntPubOfclGrp TravelEntrtnmntPublicOfficials
506307
506307 

F9_09_EXP_TRAVEL_TOT TravelGrp Travel
1328365
1328365 

F9_09_PC_COMP_DISQUAL_FUNDRAISE CompDisqualPersonsGrp CompDisqualPersons
542588
542588 

F9_09_PC_COMP_DISQUAL_MGMT CompDisqualPersonsGrp CompDisqualPersons
542588
542588 

F9_09_PC_COMP_DISQUAL_PROG_SVCE CompDisqualPersonsGrp CompDisqualPersons
542588
542588 

F9_09_PC_COMP_DISQUAL_TOTAL CompDisqualPersonsGrp CompDisqualPersons
542588
542588 

F9_09_PC_COMP_OFFICERS_FUNDRAISE CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors
1232247
1232247 

F9_09_PC_COMP_OFFICERS_MGMT CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors
1232247
1232247 

F9_09_PC_COMP_OFFICERS_PROG_SVCE CompCurrentOfcrDirectorsGrp CompCurrentOfficersDirectors
1232247
1232247 

F9_09_PC_COMP_OFFICERS_TOTAL CompCurrentOfcrDirectorsGrp CompCurrentOff

668184
668184 

F9_12_PC_ACCTG_METHOD_OTHER MethodOfAccountingOther MethodOfAccountingOtherInd
43810
43810 

F9_12_PC_AUDIT_COMMITTEE AuditCommittee AuditCommitteeInd
1216828
1216828 

F9_12_PC_FED_GRNT_AUDIT_PERFORMD FederalGrantAuditPerformedInd FederalGrantAuditPerformed
237299
237299 

F9_12_PC_FED_GRNT_AUDIT_REQUIRED FederalGrantAuditRequiredInd FederalGrantAuditRequired
1904509
1904509 

F9_12_PC_FINCL_STMTS_AUDITED FSAudited FSAuditedInd
2104435
2104435 

F9_12_SCHED_O_X InfoInScheduleOPartXIIInd InfoInScheduleOPartXII
379406
379406 

number_of_other_prog_svces ActivityOther ProgSrvcAccomActyOtherGrp
293660
293660 



In [333]:
print(len(concordance))
print(len(new_variables_df))
print(len(set(new_variables_df['variable_name_new'].tolist())))
new_variables_df[:2]

574
288
288


Unnamed: 0,variable_name_new,original_names,data_type_xsd,binarize,len
0,F9_00_HD_ADDR_CHANGE,"[AddressChange, AddressChangeInd]",CheckboxType,binarize,2
1,F9_00_HD_AMENDED_RETURN,"[AmendedReturnInd, AmendedReturn]",CheckboxType,binarize,2


<br><br>
From the above we are fine with deleting the ~574 variables related to the 288 above variables in *variable_name_new* (numbers from earlier version of notebook).

### Inspect the Combined and Original Variables
Here I'm showing one variable. What we're seeing here is the new 'combined' column -- the one with the standardized name ``F9_12_PC_FED_GRNT_AUDIT_REQUIRED`` and the two columns from which this variable was combined. Recall that the reason for this is that in early years the IRS used one name (*FederalGrantAuditRequired*) for storing this data in the XML files, while for later years the IRS used a different name (*FederalGrantAuditRequired*). Our checks above verified that no filing contains data on both variables; it's one or the other, so we are safe combining. 

In [106]:
#df[df['F9_12_PC_FED_GRNT_AUDIT_REQUIRED'].notnull()].sample(5)[['F9_12_PC_FED_GRNT_AUDIT_REQUIRED', 'FederalGrantAuditRequiredInd', 'FederalGrantAuditRequired']]

In [334]:
df[df['F9_12_PC_FED_GRNT_AUDIT_REQUIRED'].notnull()].sample(5)[['F9_12_PC_FED_GRNT_AUDIT_REQUIRED', 'FederalGrantAuditRequiredInd']]

Unnamed: 0,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,FederalGrantAuditRequiredInd
2553147,0,false
385507,0,
139093,0,
2196320,0,false
2023345,1,1


# Fill Null Values for binary variables

In [335]:
print(binarize_with_dict_cols)

['F9_00_HD_INCLUDES_SUBORD_ORGS', 'F9_04_PC_ACTVITIES_VIA_PARTNER', 'F9_04_PC_CONTROLLED_ENTITY', 'F9_04_PC_DISREGARDED_ENTITY', 'F9_04_PC_EXCESS_BENEFIT_TRANS', 'F9_04_PC_FR_EVENT_INC_GT_15K', 'F9_04_PC_GAMING_INC_GT_15K', 'F9_04_PC_LOBBYING_ACTIVITIES', 'F9_04_PC_POLITICAL_ACTIVITIES', 'F9_04_PC_PRIOR_EXCESS_BEN_TRAN', 'F9_04_PC_PROF_FR_EXP_GT_15K', 'F9_04_PC_RELATED_ENTITY', 'F9_04_PC_TRANS_TO_CNTRLD_ENT', 'F9_04_PC_TRANS_WITH_CNTRLD_ENT']


In [336]:
df[binarize_with_dict_cols] = df[binarize_with_dict_cols].fillna(0).astype('int64')

In [337]:
print(binarize_cols)

['F9_00_HD_ADDR_CHANGE', 'F9_00_HD_AMENDED_RETURN', 'F9_00_HD_EXEMPT_STATUS_4847A1', 'F9_00_HD_EXEMPT_STATUS_501C3', 'F9_00_HD_FINAL_RETURN', 'F9_00_HD_GROUP_RETURN', 'F9_00_HD_INITIAL_RETURN', 'F9_00_HD_TYPE_ORG_ASSOCIATION', 'F9_00_HD_TYPE_ORG_CORP', 'F9_00_HD_TYPE_ORG_OTHER', 'F9_00_HD_TYPE_ORG_TRUST', 'F9_01_PC_TERMINATION_CONTRACTION', 'F9_03_PC_PGMSVC_SIGNIF_CHG', 'F9_03_PC_PGMSVC_SIGNIF_NEW', 'F9_03_PZ_SCHEDULE_O_PART3', 'F9_05_EXP_SCHED_O_X', 'F9_05_PC_UNRELATED_BUS_INCOME', 'F9_06_EXP_SCHED_O_X', 'F9_06_PC_990_PROVIDED_GOV_BODY', 'F9_06_PC_ANNUAL_DISC_COVRD_PERS', 'F9_06_PC_CEO_COMPENSTN_PROCESS', 'F9_06_PC_CHANGES_ORGANIZING_DOCS', 'F9_06_PC_CONFLICT_OF_INTEREST', 'F9_06_PC_DECISIONS_SUBJ_APPROVAL', 'F9_06_PC_DELEGATION_MGT_DUTIES', 'F9_06_PC_DELEGATION_OF_MGT', 'F9_06_PC_DOCUMENT_RET_POLICY', 'F9_06_PC_ELECTION_BOARD_MEMBERS', 'F9_06_PC_FAMILY_OR_BUSINESS_REL', 'F9_06_PC_FORM_AVAIL_OWN_WEBSITE', 'F9_06_PC_FORM_UPON_REQUEST', 'F9_06_PC_JOINT_VENTURE_INVESTMNT', 'F9_06_PC_JOIN

In [340]:
df[binarize_cols] = df[binarize_cols].fillna(0).astype('int64')

In [344]:
df[binarize_with_dict_cols].dtypes

F9_00_HD_INCLUDES_SUBORD_ORGS     int64
F9_04_PC_ACTVITIES_VIA_PARTNER    int64
F9_04_PC_CONTROLLED_ENTITY        int64
F9_04_PC_DISREGARDED_ENTITY       int64
F9_04_PC_EXCESS_BENEFIT_TRANS     int64
F9_04_PC_FR_EVENT_INC_GT_15K      int64
F9_04_PC_GAMING_INC_GT_15K        int64
F9_04_PC_LOBBYING_ACTIVITIES      int64
F9_04_PC_POLITICAL_ACTIVITIES     int64
F9_04_PC_PRIOR_EXCESS_BEN_TRAN    int64
F9_04_PC_PROF_FR_EXP_GT_15K       int64
F9_04_PC_RELATED_ENTITY           int64
F9_04_PC_TRANS_TO_CNTRLD_ENT      int64
F9_04_PC_TRANS_WITH_CNTRLD_ENT    int64
dtype: object

In [345]:
df[binarize_with_dict_cols].isna().sum()

F9_00_HD_INCLUDES_SUBORD_ORGS     0
F9_04_PC_ACTVITIES_VIA_PARTNER    0
F9_04_PC_CONTROLLED_ENTITY        0
F9_04_PC_DISREGARDED_ENTITY       0
F9_04_PC_EXCESS_BENEFIT_TRANS     0
F9_04_PC_FR_EVENT_INC_GT_15K      0
F9_04_PC_GAMING_INC_GT_15K        0
F9_04_PC_LOBBYING_ACTIVITIES      0
F9_04_PC_POLITICAL_ACTIVITIES     0
F9_04_PC_PRIOR_EXCESS_BEN_TRAN    0
F9_04_PC_PROF_FR_EXP_GT_15K       0
F9_04_PC_RELATED_ENTITY           0
F9_04_PC_TRANS_TO_CNTRLD_ENT      0
F9_04_PC_TRANS_WITH_CNTRLD_ENT    0
dtype: int64

In [346]:
df[binarize_with_dict_cols].apply(pd.Series.value_counts, dropna=False)

Unnamed: 0,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT
0,3450218,3466996,3275577,3374936,3468250,2722872,3363193,3326176,3434370,3468406,3375717,2624580,3437295,3358227
1,18790,2012,193431,94072,758,746136,105815,142832,34638,602,93291,844428,31713,110781


In [342]:
df[binarize_with_dict_cols].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
F9_00_HD_INCLUDES_SUBORD_ORGS,3469008.0,0.005417,0.073398,0.0,0.0,0.0,0.0,1.0
F9_04_PC_ACTVITIES_VIA_PARTNER,3469008.0,0.00058,0.024076,0.0,0.0,0.0,0.0,1.0
F9_04_PC_CONTROLLED_ENTITY,3469008.0,0.05576,0.229457,0.0,0.0,0.0,0.0,1.0
F9_04_PC_DISREGARDED_ENTITY,3469008.0,0.027118,0.162427,0.0,0.0,0.0,0.0,1.0
F9_04_PC_EXCESS_BENEFIT_TRANS,3469008.0,0.000219,0.01478,0.0,0.0,0.0,0.0,1.0
F9_04_PC_FR_EVENT_INC_GT_15K,3469008.0,0.215086,0.410882,0.0,0.0,0.0,0.0,1.0
F9_04_PC_GAMING_INC_GT_15K,3469008.0,0.030503,0.171967,0.0,0.0,0.0,0.0,1.0
F9_04_PC_LOBBYING_ACTIVITIES,3469008.0,0.041174,0.198692,0.0,0.0,0.0,0.0,1.0
F9_04_PC_POLITICAL_ACTIVITIES,3469008.0,0.009985,0.099425,0.0,0.0,0.0,0.0,1.0
F9_04_PC_PRIOR_EXCESS_BEN_TRAN,3469008.0,0.000174,0.013172,0.0,0.0,0.0,0.0,1.0


In [341]:
df[binarize_cols].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
F9_00_HD_ADDR_CHANGE,3469008.0,0.039835,0.195571,0.0,0.0,0.0,0.0,1.0
F9_00_HD_AMENDED_RETURN,3469008.0,0.011920,0.108527,0.0,0.0,0.0,0.0,1.0
F9_00_HD_EXEMPT_STATUS_4847A1,3469008.0,0.000709,0.026626,0.0,0.0,0.0,0.0,1.0
F9_00_HD_EXEMPT_STATUS_501C3,3469008.0,0.752216,0.431726,0.0,1.0,1.0,1.0,1.0
F9_00_HD_FINAL_RETURN,3469008.0,0.005586,0.074533,0.0,0.0,0.0,0.0,1.0
...,...,...,...,...,...,...,...,...
F9_12_PC_AUDIT_COMMITTEE,3469008.0,0.437851,0.496123,0.0,0.0,0.0,1.0,1.0
F9_12_PC_FED_GRNT_AUDIT_PERFORMD,3469008.0,0.085901,0.280217,0.0,0.0,0.0,0.0,1.0
F9_12_PC_FED_GRNT_AUDIT_REQUIRED,3469008.0,0.086664,0.281342,0.0,0.0,0.0,0.0,1.0
F9_12_PC_FINCL_STMTS_AUDITED,3469008.0,0.434902,0.495744,0.0,0.0,0.0,1.0,1.0


In [348]:
df[binarize_cols].dtypes

F9_00_HD_ADDR_CHANGE                int64
F9_00_HD_AMENDED_RETURN             int64
F9_00_HD_EXEMPT_STATUS_4847A1       int64
F9_00_HD_EXEMPT_STATUS_501C3        int64
F9_00_HD_FINAL_RETURN               int64
                                    ...  
F9_12_PC_AUDIT_COMMITTEE            int64
F9_12_PC_FED_GRNT_AUDIT_PERFORMD    int64
F9_12_PC_FED_GRNT_AUDIT_REQUIRED    int64
F9_12_PC_FINCL_STMTS_AUDITED        int64
F9_12_SCHED_O_X                     int64
Length: 64, dtype: object

In [347]:
df[binarize_cols].isna().sum()

F9_00_HD_ADDR_CHANGE                0
F9_00_HD_AMENDED_RETURN             0
F9_00_HD_EXEMPT_STATUS_4847A1       0
F9_00_HD_EXEMPT_STATUS_501C3        0
F9_00_HD_FINAL_RETURN               0
                                   ..
F9_12_PC_AUDIT_COMMITTEE            0
F9_12_PC_FED_GRNT_AUDIT_PERFORMD    0
F9_12_PC_FED_GRNT_AUDIT_REQUIRED    0
F9_12_PC_FINCL_STMTS_AUDITED        0
F9_12_SCHED_O_X                     0
Length: 64, dtype: int64

### Drop variables
Above we have combined columns that relate to the same 990 variable. Generally, there are two different names -- and thus two different columns -- for each 990 variable. Given that above we have combined these two columns into a new variable -- with a new name -- we can delete all the old columns. 

In [349]:
new_variables_df['original_names'].tolist()

[['AddressChange', 'AddressChangeInd'],
 ['AmendedReturnInd', 'AmendedReturn'],
 ['BuildTS'],
 ['CountryLegalDomicile', 'LegalDomicileCountryCd'],
 ['Organization4947a1NotPFInd', 'Organization4947a1'],
 ['Organization501c', 'Organization501cInd'],
 ['Organization501c3', 'Organization501c3Ind'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['FinalReturnInd', 'TerminatedReturn'],
 ['GroupExemptionNumber', 'GroupExemptionNum'],
 ['GrossReceipts', 'GrossReceiptsAmt'],
 ['GroupReturnForAffiliates', 'GroupReturnForAffiliatesInd'],
 ['AllAffiliatesIncluded', 'AllAffiliatesIncludedInd'],
 ['InitialReturnInd', 'InitialReturn'],
 ['NameOfPrincipalOfficerPerson', 'PrincipalOfficerNm'],
 ['Officer', 'BusinessOfficerGrp'],
 ['SpecialConditionDesc', 'SpecialConditionDescription'],
 ['StateLegalDomicile', 'LegalDomicileStateCd'],
 ['TaxPeriodBeginDate', 'TaxPeriodBeginDt'],
 ['TaxPeriodEndDt', 'TaxPeriodEndDate'],
 ['TaxYr', 'TaxYear'],
 ['ReturnTs', 'Timestamp'],
 ['TypeO

In [350]:
new_variables_df[new_variables_df['len']!=2]['original_names'].tolist()

[['BuildTS'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['Filer'],
 ['FeesForServicesProfFundraising'],
 ['TaxPeriod']]

In [351]:
new_variables_df[new_variables_df['len']==2]['original_names'].tolist()

[['AddressChange', 'AddressChangeInd'],
 ['AmendedReturnInd', 'AmendedReturn'],
 ['CountryLegalDomicile', 'LegalDomicileCountryCd'],
 ['Organization4947a1NotPFInd', 'Organization4947a1'],
 ['Organization501c', 'Organization501cInd'],
 ['Organization501c3', 'Organization501c3Ind'],
 ['FinalReturnInd', 'TerminatedReturn'],
 ['GroupExemptionNumber', 'GroupExemptionNum'],
 ['GrossReceipts', 'GrossReceiptsAmt'],
 ['GroupReturnForAffiliates', 'GroupReturnForAffiliatesInd'],
 ['AllAffiliatesIncluded', 'AllAffiliatesIncludedInd'],
 ['InitialReturnInd', 'InitialReturn'],
 ['NameOfPrincipalOfficerPerson', 'PrincipalOfficerNm'],
 ['Officer', 'BusinessOfficerGrp'],
 ['SpecialConditionDesc', 'SpecialConditionDescription'],
 ['StateLegalDomicile', 'LegalDomicileStateCd'],
 ['TaxPeriodBeginDate', 'TaxPeriodBeginDt'],
 ['TaxPeriodEndDt', 'TaxPeriodEndDate'],
 ['TaxYr', 'TaxYear'],
 ['ReturnTs', 'Timestamp'],
 ['TypeOfOrganizationAssocInd', 'TypeOfOrganizationAssociation'],
 ['TypeOfOrganizationCorpInd

In [352]:
flat_list = [item for sublist in new_variables_df[new_variables_df['len']==2]['original_names'].tolist() for item in sublist]
print(len(flat_list))
print(flat_list[:])

558
['AddressChange', 'AddressChangeInd', 'AmendedReturnInd', 'AmendedReturn', 'CountryLegalDomicile', 'LegalDomicileCountryCd', 'Organization4947a1NotPFInd', 'Organization4947a1', 'Organization501c', 'Organization501cInd', 'Organization501c3', 'Organization501c3Ind', 'FinalReturnInd', 'TerminatedReturn', 'GroupExemptionNumber', 'GroupExemptionNum', 'GrossReceipts', 'GrossReceiptsAmt', 'GroupReturnForAffiliates', 'GroupReturnForAffiliatesInd', 'AllAffiliatesIncluded', 'AllAffiliatesIncludedInd', 'InitialReturnInd', 'InitialReturn', 'NameOfPrincipalOfficerPerson', 'PrincipalOfficerNm', 'Officer', 'BusinessOfficerGrp', 'SpecialConditionDesc', 'SpecialConditionDescription', 'StateLegalDomicile', 'LegalDomicileStateCd', 'TaxPeriodBeginDate', 'TaxPeriodBeginDt', 'TaxPeriodEndDt', 'TaxPeriodEndDate', 'TaxYr', 'TaxYear', 'ReturnTs', 'Timestamp', 'TypeOfOrganizationAssocInd', 'TypeOfOrganizationAssociation', 'TypeOfOrganizationCorpInd', 'TypeOfOrganizationCorporation', 'TypeOfOrganizationOther

<br> Flatten a list of lists: https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists

In [353]:
print(len([c for c in df.columns.tolist() if c not in flat_list]))
print([c for c in df.columns.tolist() if c not in flat_list])

300
['_id', 'OrganizationName', 'URL', 'DLN', 'TaxPeriod', 'F9_09_PC_FEES_FOR_SVCE_FR_TOT', 'F9_00_HD_BUILD_TIME_STAMP', 'fiscal_year', 'EIN', 'Name', 'NameControl', 'Phone', 'USAddress', 'ForeignAddress', 'InCareOfName', 'BusinessName', 'BusinessNameControlTxt', 'PhoneNum', 'InCareOfNm', 'ForeignPhoneNum', 'F9_00_HD_ADDR_CHANGE', 'F9_00_HD_AMENDED_RETURN', 'F9_00_HD_CTRY_OF_DOMICILE', 'F9_00_HD_EXEMPT_STATUS_4847A1', 'F9_00_HD_EXEMPT_STATUS_501C', 'F9_00_HD_EXEMPT_STATUS_501C3', 'F9_00_HD_FINAL_RETURN', 'F9_00_HD_GROSS_EXEMPT_NUM', 'F9_00_HD_GROSS_RCPT', 'F9_00_HD_GROUP_RETURN', 'F9_00_HD_INCLUDES_SUBORD_ORGS', 'F9_00_HD_INITIAL_RETURN', 'F9_00_HD_PRIN_OFF_NAME', 'F9_00_HD_SIGNING_OFFICER_SIGNTR', 'F9_00_HD_SPECIAL_CONDITION_DESC', 'F9_00_HD_STATE_OF_DOMICILE', 'F9_00_HD_TAX_PER_BEGIN', 'F9_00_HD_TAX_PER_END', 'F9_00_HD_TAX_YEAR', 'F9_00_HD_TIME_STAMP', 'F9_00_HD_TYPE_ORG_ASSOCIATION', 'F9_00_HD_TYPE_ORG_CORP', 'F9_00_HD_TYPE_ORG_OTHER', 'F9_00_HD_TYPE_ORG_OTHER_DESC', 'F9_00_HD_TYPE_

In [354]:
print(len(new_variables_df['variable_name_new'].tolist()))

288


In [355]:
set([c for c in df.columns.tolist() if c not in flat_list]) - set(new_variables_df['variable_name_new'].tolist())

{'501c3',
 'BusinessName',
 'BusinessNameControlTxt',
 'DLN',
 'EIN',
 'ForeignAddress',
 'ForeignPhoneNum',
 'InCareOfName',
 'InCareOfNm',
 'Name',
 'NameControl',
 'OrganizationName',
 'Phone',
 'PhoneNum',
 'URL',
 'USAddress',
 '_id',
 'fiscal_year'}

<br>The following block drops a lot of columns. We are then left with the 292 combined, renamed, and (as appropriate) binarized columns.

In [356]:
gc.collect()

18965

In [357]:
%%time
print(len(df.columns))
df = df[[c for c in df.columns.tolist() if c not in flat_list]]
print(len(df.columns))
df[:2]

776
300
CPU times: total: 12.4 s
Wall time: 13 s


Unnamed: 0,_id,OrganizationName,URL,DLN,TaxPeriod,F9_09_PC_FEES_FOR_SVCE_FR_TOT,F9_00_HD_BUILD_TIME_STAMP,fiscal_year,EIN,Name,NameControl,Phone,USAddress,ForeignAddress,InCareOfName,BusinessName,BusinessNameControlTxt,PhoneNum,InCareOfNm,ForeignPhoneNum,F9_00_HD_ADDR_CHANGE,F9_00_HD_AMENDED_RETURN,F9_00_HD_CTRY_OF_DOMICILE,F9_00_HD_EXEMPT_STATUS_4847A1,F9_00_HD_EXEMPT_STATUS_501C,F9_00_HD_EXEMPT_STATUS_501C3,F9_00_HD_FINAL_RETURN,F9_00_HD_GROSS_EXEMPT_NUM,F9_00_HD_GROSS_RCPT,F9_00_HD_GROUP_RETURN,F9_00_HD_INCLUDES_SUBORD_ORGS,F9_00_HD_INITIAL_RETURN,F9_00_HD_PRIN_OFF_NAME,F9_00_HD_SIGNING_OFFICER_SIGNTR,F9_00_HD_SPECIAL_CONDITION_DESC,F9_00_HD_STATE_OF_DOMICILE,F9_00_HD_TAX_PER_BEGIN,F9_00_HD_TAX_PER_END,F9_00_HD_TAX_YEAR,F9_00_HD_TIME_STAMP,F9_00_HD_TYPE_ORG_ASSOCIATION,F9_00_HD_TYPE_ORG_CORP,F9_00_HD_TYPE_ORG_OTHER,F9_00_HD_TYPE_ORG_OTHER_DESC,F9_00_HD_TYPE_ORG_TRUST,F9_00_HD_WEBSITE,F9_00_HD_YEAR_FORMED,F9_01_PC_BEN_PAID_MEMB_PRIOR,F9_01_PC_CONTR_GRANTS_CURR,F9_01_PC_CONTR_GRANTS_PRIOR,F9_01_PC_GRANTS_PRIOR,F9_01_PC_INDEP_VOTING_MEMB,F9_01_PC_INVEST_INCOME_PRIOR,F9_01_PC_NET_ASSETS_BOY,F9_01_PC_OTHER_EXPENSE_PRIOR,F9_01_PC_OTHER_REV_PRIOR,F9_01_PC_PROF_FUNDRISING_EXP_CURR,F9_01_PC_PROF_FUNDRISING_EXP_PRIOR,F9_01_PC_PROG_SERVICE_REV_PRIOR,F9_01_PC_REV_LESS_EXP_CURR,F9_01_PC_REV_LESS_EXP_PRIOR,F9_01_PC_TERMINATION_CONTRACTION,F9_01_PC_TOT_ASSETS_EOY,F9_01_PC_TOT_EXP_PRIOR,F9_01_PC_TOT_FNDR_EXP_CURR,F9_01_PC_TOT_INDIV_EMPLOYED,F9_01_PC_TOT_INDIV_VOLUNTEERS,F9_01_PC_TOT_LIABILITIES_EOY,F9_01_PC_TOT_REVENUE_PRIOR,F9_01_PC_TOT_UBI_GROSS,F9_01_PC_TOT_UBI_NET,F9_01_PC_VOTING_MEMB_GOV_BODY,F9_01_PZ_BEN_PAID_TO_MEMB_CURR,F9_01_PZ_GRANTS_PAID_CURR,F9_01_PZ_INVEST_INCOME_CURR,F9_01_PZ_NAFB_EOY,F9_01_PZ_ORGANIZATIONAL_MISSION,F9_01_PZ_OTHER_EXPENSE_CURR,F9_01_PZ_OTHER_REV_CURR,F9_01_PZ_PROG_SERVICE_REV_CURR,F9_01_PZ_SALARIES_CURR,F9_01_PZ_SALARIES_PRIOR,F9_01_PZ_TOT_ASSETS_BOY,F9_01_PZ_TOT_EXP_CURR,F9_01_PZ_TOT_LIAB_BOY,F9_01_PZ_TOT_REV_CURR,F9_03_PC_PGMSVC_SIGNIF_CHG,F9_03_PC_PGMSVC_SIGNIF_NEW,F9_03_PC_PROG_SVC_ACC_1_CODE,F9_03_PC_PROG_SVC_ACC_1_DESC,F9_03_PC_PROG_SVC_ACC_1_EXP,F9_03_PC_PROG_SVC_ACC_1_GRNT,F9_03_PC_PROG_SVC_ACC_1_REV,F9_03_PC_PROG_SVC_ACC_2_CODE,F9_03_PC_PROG_SVC_ACC_2_DESC,F9_03_PC_PROG_SVC_ACC_2_EXP,F9_03_PC_PROG_SVC_ACC_2_GRNT,F9_03_PC_PROG_SVC_ACC_2_REV,F9_03_PC_PROG_SVC_ACC_3_CODE,F9_03_PC_PROG_SVC_ACC_3_DESC,F9_03_PC_PROG_SVC_ACC_3_EXP,F9_03_PC_PROG_SVC_ACC_3_GRNT,F9_03_PC_PROG_SVC_ACC_3_REV,F9_03_PC_TOT_OTH_PROG_SVC_EXP,F9_03_PC_TOT_OTH_PROG_SVC_GRNT,F9_03_PC_TOT_OTH_PROG_SVC_REV,F9_03_PC_TOT_PROG_SVC_EXPENSE,F9_03_PZ_MISSION_DESCRIPTION,F9_03_PZ_SCHEDULE_O_PART3,F9_04_PC_ACTVITIES_VIA_PARTNER,F9_04_PC_CONTROLLED_ENTITY,F9_04_PC_DISREGARDED_ENTITY,F9_04_PC_EXCESS_BENEFIT_TRANS,F9_04_PC_FR_EVENT_INC_GT_15K,F9_04_PC_GAMING_INC_GT_15K,F9_04_PC_LOBBYING_ACTIVITIES,F9_04_PC_POLITICAL_ACTIVITIES,F9_04_PC_PRIOR_EXCESS_BEN_TRAN,F9_04_PC_PROF_FR_EXP_GT_15K,F9_04_PC_RELATED_ENTITY,F9_04_PC_TRANS_TO_CNTRLD_ENT,F9_04_PC_TRANS_WITH_CNTRLD_ENT,F9_05_EXP_SCHED_O_X,F9_05_PC_NUMBER_EMPLOYEES_W3,F9_05_PC_NUMBER_FORMS_1096,F9_05_PC_UNRELATED_BUS_INCOME,F9_06_EXP_SCHED_O_X,F9_06_PC_990_PROVIDED_GOV_BODY,F9_06_PC_ANNUAL_DISC_COVRD_PERS,F9_06_PC_CEO_COMPENSTN_PROCESS,F9_06_PC_CHANGES_ORGANIZING_DOCS,F9_06_PC_CONFLICT_OF_INTEREST,F9_06_PC_DECISIONS_SUBJ_APPROVAL,F9_06_PC_DELEGATION_MGT_DUTIES,F9_06_PC_DELEGATION_OF_MGT,F9_06_PC_DOCUMENT_RET_POLICY,F9_06_PC_ELECTION_BOARD_MEMBERS,F9_06_PC_FAMILY_OR_BUSINESS_REL,F9_06_PC_FORM_AVAIL_OWN_WEBSITE,F9_06_PC_FORM_UPON_REQUEST,F9_06_PC_JOINT_VENTURE_INVESTMNT,F9_06_PC_JOINT_VENTURE_POLICY,F9_06_PC_LOCAL_CHAPTERS,F9_06_PC_MATERIAL_DIVERSION,F9_06_PC_MEMBERS_OR_STOCKHOLDERS,F9_06_PC_MINUTES_COMMITTEES,F9_06_PC_MINUTES_GOVERNING_BODY,F9_06_PC_MONITORING_OF_COI_POLICY,F9_06_PC_NUM_IND_VOTING_MEMBERS,F9_06_PC_NUM_VOTING_GOV_MEMBERS,F9_06_PC_OFFICER_MAILING_ADDRESS,F9_06_PC_OTHER_COMPENSTN_PROCESS,F9_06_PC_OTHER_WEBSITE,F9_06_PC_OWN_WEBSITE,F9_06_PC_POLICIES_GOVERN_CHAPTER,F9_06_PC_STATES_WHERE_RET_FILED,F9_06_PC_WHISTLEBLOWER_POLICY,F9_07_EXP_SCHED_O_X,F9_07_PC_COMPENSATION_OTHER_SRCE,F9_07_PC_FORMER_OFFICER_LISTED,F9_07_PC_NO_LISTED_PERS_COMPENSD,F9_07_PC_NUM_CONTRCTRS_GRTR_100K,F9_07_PC_NUM_INDS_GREATER_100K,F9_07_PC_TOTAL_COMP_GRTR_150K,F9_07_PC_TOT_OTHER_COMPENSATION,F9_07_PC_TOT_REPRT_COMP_FROM_ORG,F9_07_PC_TOT_REPRT_COMP_RLTD_ORG,F9_08_EXP_SCHED_O_X,F9_08_PC_ALL_OTHER_CONTRIBUTIONS,F9_08_PC_CONTS_REPRTD_FNDRAISNG,F9_08_PC_COST_OF_GOODS_SOLD,F9_08_PC_FEDERATED_CAMPAIGNS,F9_08_PC_FUNDRAISING_DIRECT_EXP,F9_08_PC_FUNDRAISING_EVENTS,F9_08_PC_FUNDRAISING_GROSS_INC,F9_08_PC_GAMING_DIRECT_EXPENSES,F9_08_PC_GAMING_GROSS_INCOME,F9_08_PC_GOVERNMENT_GRANTS,F9_08_PC_GROSS_SALES_INVENTORY,F9_08_PC_MEMBERSHIP_DUES,F9_08_PC_NONCASH_CONTRIBUTIONS,F9_08_PC_PROGRAM_SVCE_REV_TOTAL,F9_08_PC_RELATED_ORGANIZATIONS,F9_08_PC_TOTAL_CONTRIBUTIONS,F9_08_PC_TOTAL_OTHER_REVENUE,F9_08_PC_TOTAL_PROG_SVCE_REVENUE,F9_08_PC_TOTAL_REVENUE,F9_09_EXP_AD_PROMO_TOT,F9_09_EXP_BENF_PAID_MEMB_TOT,F9_09_EXP_CONF_MEETING_TOT,F9_09_EXP_DEPREC_FUNDR,F9_09_EXP_DEPREC_MAG,F9_09_EXP_DEPREC_PROG,F9_09_EXP_DEPREC_TOT,F9_09_EXP_GRANT_FRGN_TOT,F9_09_EXP_GRANT_INDIV_DMSTC_TOT,F9_09_EXP_GRANT_ORG_DMSTC_TOT,F9_09_EXP_INFO_TECH_TOT,F9_09_EXP_INSURANCE_TOT,F9_09_EXP_INTEREST_TOT,F9_09_EXP_JOINT_COSTS_TOT,F9_09_EXP_OCCUPANCY_TOT,F9_09_EXP_OFFICE_TOT,F9_09_EXP_OTH_OTH_TOT,F9_09_EXP_OTH_TOT,F9_09_EXP_ROY_TOT,F9_09_EXP_SCHED_O_X,F9_09_EXP_TRAVEL_ENTRTNMNT_TOT,F9_09_EXP_TRAVEL_TOT,F9_09_PC_COMP_DISQUAL_FUNDRAISE,F9_09_PC_COMP_DISQUAL_MGMT,F9_09_PC_COMP_DISQUAL_PROG_SVCE,F9_09_PC_COMP_DISQUAL_TOTAL,F9_09_PC_COMP_OFFICERS_FUNDRAISE,F9_09_PC_COMP_OFFICERS_MGMT,F9_09_PC_COMP_OFFICERS_PROG_SVCE,F9_09_PC_COMP_OFFICERS_TOTAL,F9_09_PC_FEES_FOR_SVCE_ACCT_TOT,F9_09_PC_FEES_FOR_SVCE_INVST_TOT,F9_09_PC_FEES_FOR_SVCE_LEGL_TOT,F9_09_PC_FEES_FOR_SVCE_LOBB_TOT,F9_09_PC_FEES_FOR_SVCE_MGMT_TOT,F9_09_PC_FEES_FOR_SVCE_OTH_TOT,F9_09_PC_OTHER_EMP_BEN_FUNDRAISE,F9_09_PC_OTHER_EMP_BEN_MGMT,F9_09_PC_OTHER_EMP_BEN_PROG_SVCE,F9_09_PC_OTHER_EMP_BEN_TOTAL,F9_09_PC_OTHER_SALARY_FUNDRAISE,F9_09_PC_OTHER_SALARY_MGMT,F9_09_PC_OTHER_SALARY_PROG_SVCE,F9_09_PC_OTHER_SALARY_TOTAL,F9_09_PC_PAYMENT_TO_AFFILIATES,F9_09_PC_PAYROLL_TAX_FUNDRAISE,F9_09_PC_PAYROLL_TAX_MGMT,F9_09_PC_PAYROLL_TAX_PROG_SVCE,F9_09_PC_PAYROLL_TAX_TOTAL,F9_09_PC_PENSION_CONT_FUNDRAISE,F9_09_PC_PENSION_CONT_MGMT,F9_09_PC_PENSION_CONT_PROG_SVCE,F9_09_PC_PENSION_CONT_TOTAL,F9_09_PC_TOTAL_FUNC_EXPENSES,F9_09_PC_TOTAL_FUNDRAISE_EXPENSE,F9_09_PC_TOTAL_MGMT_EXPENSE,F9_09_PC_TOTAL_PROG_SVCE_EXPENSE,F9_10_ASSETS_ACC_NET_EOY,F9_10_ASSETS_EXP_PREPAID_EOY,F9_10_ASSETS_INTANGIB_EOY,F9_10_ASSETS_INVENT_SALE_EOY,F9_10_ASSETS_LESS_DEPREC_EOY,F9_10_ASSETS_LOANS_DISQUAL_EOY,F9_10_ASSETS_NOTES_LOANS_NET_EOY,F9_10_ASSETS_OTH_EOY,F9_10_ASSETS_PLEDGES_NET_EOY,F9_10_LIAB_ACC_PAYABLE_EOY,F9_10_LIAB_GRANTS_PAYABLE_EOY,F9_10_LIAB_LOANS_OFF_EOY,F9_10_LIAB_REV_DEFERRED_EOY,F9_10_NAFB_RESTRICT_PERM_EOY,F9_10_NAFB_RESTRICT_TEMP_EOY,F9_10_NAFB_UNRESTRICT_EOY,F9_10_PC_BOND_LIABILITY_EOY,F9_10_PC_CASH_NON_INTEREST_BOY,F9_10_PC_CASH_NON_INTEREST_EOY,F9_10_PC_ESCROW_LIABILITY_EOY,F9_10_PC_INVEST_OTHER_SEC_EOY,F9_10_PC_INVEST_PROG_RELTD_EOY,F9_10_PC_INVEST_PUB_TRADED_EOY,F9_10_PC_LAND_BLDG_EQPMT,F9_10_PC_LAND_BLDG_EQPMT_DEPRCTN,F9_10_PC_LOANS_FROM_OFFICERS_EOY,F9_10_PC_ORG_FOLLOWS_SFAS117,F9_10_PC_ORG_NOT_FOLLOW_SFAS117,F9_10_PC_OTHER_LIABILITIES_EOY,F9_10_PC_RET_EARNINGS_ENDWMT_EOY,F9_10_PC_SAVINGS_TEMP_INVEST_BOY,F9_10_PC_SAVINGS_TEMP_INVEST_EOY,F9_10_PC_SECURED_MORTGAGES_EOY,F9_10_PC_SECURE_MORT_NOTES_EOY,F9_10_PC_UNSECURED_LOANS_EOY,F9_10_PC_UNSECURED_NOTES_BOY,F9_10_PC_UNSECURED_NOTES_EOY,F9_10_PZ_TOTAL_ASSETS_EOY,F9_10_SCHED_O_X,F9_11_PC_RECNCLTN_DONATED_SVCES,F9_11_PC_RECNCLTN_INVSTMNT_EXP,F9_11_PC_RECNCLTN_PRIOR_PER_ADJ,F9_11_PC_RECNCLTN_REV_LESS_EXP,F9_11_PC_RECNCLTN_UNRLZD_GAIN,F9_11_SCHED_O_X,F9_12_PC_ACCNT_COMPILE_OR_REVIEW,F9_12_PC_ACCTG_METHOD_ACCRUAL,F9_12_PC_ACCTG_METHOD_CASH,F9_12_PC_ACCTG_METHOD_OTHER,F9_12_PC_AUDIT_COMMITTEE,F9_12_PC_FED_GRNT_AUDIT_PERFORMD,F9_12_PC_FED_GRNT_AUDIT_REQUIRED,F9_12_PC_FINCL_STMTS_AUDITED,F9_12_SCHED_O_X,number_of_other_prog_svces,501c3
0,5d019e6778ffca27b42818d7,RONALD MCDONALD HOUSE CHARITIES- PHILADELPHIA REGION INC,https://s3.amazonaws.com/irs-form-990/201113139349301301_public.xml,93493313013011,201012,,2016-02-24 21:20:13Z,,232705170,"{'BusinessNameLine1': 'RONALD MCDONALD HOUSE CHARITIES-', 'BusinessNameLine2': 'PHILADELPHIA REGION INC'}",RONA,8565826843,"{'AddressLine1': '1525 VALLEY CENTER PARKWAY NO 300', 'City': 'BETHLEHEM', 'State': 'PA', 'ZIPCode': '18017'}",,,,,,,,1,0,,0,,1,0,,1473903,0,0,0,MICHAEL ANTON,"{'Name': 'ROBERT TRAA', 'Title': 'TREASURER', 'Phone': '8565826843', 'DateSigned': '2011-11-04', 'AuthorizeThirdParty': '1'}",,PA,2010-01-01,2010-12-31,2010,2011-11-09T06:41:09-06:00,0,1,0,,0,,1992,0.0,1439340,1044925.0,638637.0,10,30447,1753405,243131,0.0,0,0.0,0,89152,193604,0,2440859,881768,195892,0,0.0,450430,1075372,0,0.0,10,0,925000,33563,1990429,MAKES GRANTS TO NON-PROFITS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN.,459751,1000,0,0,0,1925215,1384751,171810,1473903,0,0,,"RMHC OF THE PHILADELPHIA REGION, INC. GRANTS HUNDREDS OF THOUSANDS OF DOLLARS PER YEAR TO SUPPORT NON-PROFIT PROGRAMS THAT DIRECTLY IMPROVE THE HEALTH AND WELL-BEING OF CHILDREN. LOCALLY, RMHC SUPPORTS THE PHILADELPHIA, SOUTHERN NEW JERSEY AND DE...",1043744,925000.0,,,,,,,,,,,,,,,1043744,"THE CORPORATION IS ORGANIZED AND WILL BE OPERATED EXCLUSIVELY FOR CHARITABLE, EDUCATIONAL AND SCIENTIFIC PURPOSES WITHIN THE MEANING OF SECTION 501(C)(3) OF THE INTERNAL REVENUE CODE. SUCH PURPOSES SHALL BE LIMITED TO PROVIDING SUPPORT AND FUNDIN...",1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,1,10,10,0,0,0,0,0,"[""PA"", ""NJ"", ""DE""]",0,0,0,0,1,0,0,0,0,0.0,0,0,1439340.0,,,,,,,,,,,,,,,1439340,1000,,"{""TotalRevenueColumn"": ""1473903"", ""RelatedOrExemptFunctionIncome"": ""1000"", ""UnrelatedBusinessRevenue"": ""0"", ""ExclusionAmount"": ""33563""}",,,,"{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}","{""Total"": ""86228"", ""ManagementAndGeneral"": ""86228""}",,"{""Total"": ""33000"", ""ProgramServices"": ""33000""}","{""Total"": ""892000"", ""ProgramServices"": ""892000""}",,,,,,"{""Total"": ""123"", ""ManagementAndGeneral"": ""123""}","{""Total"": ""763"", ""ManagementAndGeneral"": ""763""}","[{""Description"": ""FUNDRAISING COSTS"", ""Total"": ""108311"", ""Fundraising"": ""108311""}, {""Description"": ""CANISTER COLLECTION FEE"", ""Total"": ""81925"", ""Fundraising"": ""81925""}, {""Description"": ""PR/ADMINISTRATIVE SERVI"", ""Total"": ""34517"", ""ManagementAndGe...",,0,,,,,,,,,,,"{""Total"": ""21675"", ""ManagementAndGeneral"": ""21675""}",,"{""Total"": ""215"", ""ManagementAndGeneral"": ""215""}",,,,,,,,,,,,"{""Total"": ""118744"", ""ProgramServices"": ""118744""}",,,,,,,,,"{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""Total"": ""1384751"", ""ProgramServices"": ""1043744"", ""ManagementAndGeneral"": ""145115"", ""Fundraising"": ""195892""}","{""BOY"": ""103412"", ""EOY"": ""147981""}",,,,"{""BOY"": ""0"", ""EOY"": ""170617""}",,,,,"{""BOY"": ""39670"", ""EOY"": ""44353""}","{""BOY"": ""80500"", ""EOY"": ""166000""}",,,,,"{""BOY"": ""1753405"", ""EOY"": ""1990429""}",,,,,"{""BOY"": ""1489143"", ""EOY"": ""1851561""}",,,256845,86228,,1,0,"{""BOY"": ""51640"", ""EOY"": ""240077""}",,"{""BOY"": ""332660"", ""EOY"": ""270700""}","{""BOY"": ""332660"", ""EOY"": ""270700""}",,,,,,"{""BOY"": ""1925215"", ""EOY"": ""2440859""}",0,,,,89152,,1,0,1,0,,1,0,0,1,1,,1
1,5d019e6778ffca27b42818d8,TORRINGTON VOA ELDERLY HOUSING INC BELL PARK TOWER,https://s3.amazonaws.com/irs-form-990/201113139349301311_public.xml,93493313013111,201106,"{""Total"": ""0""}",2016-02-24 21:20:13Z,,581805618,"{'BusinessNameLine1': 'TORRINGTON VOA ELDERLY HOUSING INC', 'BusinessNameLine2': 'BELL PARK TOWER'}",TORR,7033415000,"{'AddressLine1': '1660 DUKE STREET', 'City': 'ALEXANDRIA', 'State': 'VA', 'ZIPCode': '22314'}",,,,,,,,0,0,,0,,1,0,1736.0,266420,0,0,0,,"{'Name': 'THOMAS D TURNBULL', 'Title': 'ASST. SEC/TREAS', 'DateSigned': '2011-11-09'}",,WY,2010-07-01,2011-06-30,2010,2011-11-09T07:32:06-08:00,0,1,0,,0,,1993,,0,,,13,1425,1437850,189785,,0,,222839,-39085,-36926,0,1433342,261190,0,0,,34577,224264,0,,19,0,0,828,1398765,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,222550,0,265592,82955,71405,1455332,305505,17482,266420,0,0,,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,276405,,266420.0,,,,,,,,,,,,,,276405,PROVIDE HOUSING FOR THE ELDERLY AND THE DISABLED UNDER SECTION 202 OF THE NATIONAL HOUSING ACT UNDER AN AGREEMENT WITH THE DEPARTMENT OF HUD.,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,1,0,1,1,1,1,0,1,1,0,1,0,0,0,0,0,1,1,1,13,19,0,1,0,0,0,,0,0,0,1,0,0,0,1,411648,,1180355,0,,,,,,,,,,,,,,265592.0,,0,0,265592.0,"{""TotalRevenueColumn"": ""266420"", ""RelatedOrExemptFunctionIncome"": ""266420""}","{""Total"": ""8433"", ""ProgramServices"": ""8433""}","{""Total"": ""0""}","{""Total"": ""806"", ""ProgramServices"": ""806""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","{""Total"": ""66166"", ""ProgramServices"": ""66166""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""1108"", ""ProgramServices"": ""1108""}","{""Total"": ""0""}",,"{""Total"": ""44077"", ""ProgramServices"": ""44077""}","{""Total"": ""14222"", ""ProgramServices"": ""14222""}","{""Total"": ""0""}","[{""Description"": ""OPER. & MAINT."", ""Total"": ""46164"", ""ProgramServices"": ""46164""}, {""Description"": ""MISC TAXES"", ""Total"": ""298"", ""ProgramServices"": ""298""}, {""Description"": ""ADMINISTRATIVE"", ""Total"": ""12176"", ""ProgramServices"": ""12176""}]","{""Total"": ""0""}",0,"{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""7500"", ""ManagementAndGeneral"": ""7500""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""21600"", ""ManagementAndGeneral"": ""21600""}","{""Total"": ""0""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""17714"", ""ProgramServices"": ""17714""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""59440"", ""ProgramServices"": ""59440""}","{""Total"": ""0""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""5801"", ""ProgramServices"": ""5801""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""Total"": ""305505"", ""ProgramServices"": ""276405"", ""ManagementAndGeneral"": ""29100"", ""Fundraising"": ""0""}","{""BOY"": ""231"", ""EOY"": ""474""}","{""BOY"": ""7628"", ""EOY"": ""7554""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""1306860"", ""EOY"": ""1282874""}","{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""14383"", ""EOY"": ""17385""}","{""EOY"": ""0""}","{""BOY"": ""2040"", ""EOY"": ""16145""}",,,"{""BOY"": ""20"", ""EOY"": ""48""}",,,"{""BOY"": ""1437850"", ""EOY"": ""1398765""}",,"{""BOY"": ""250"", ""EOY"": ""22261""}","{""BOY"": ""250"", ""EOY"": ""22261""}",,"{""BOY"": ""125980"", ""EOY"": ""102794""}","{""EOY"": ""0""}","{""EOY"": ""0""}",2187206,904332,,1,0,"{""BOY"": ""9203"", ""EOY"": ""11349""}",,"{""EOY"": ""0""}","{""EOY"": ""0""}","{""BOY"": ""6219"", ""EOY"": ""7035""}","{""BOY"": ""6219"", ""EOY"": ""7035""}",,,,"{""BOY"": ""1455332"", ""EOY"": ""1433342""}",0,,,,-39085,,0,0,1,0,,1,1,1,1,0,,1


##### Verify

In [358]:
print(len(df.columns.tolist()))

300


<br>Below we are seeing whether there are any columns in *df* that are not in *new_variables_df*. All 7 columns are the expected 'identfier' columns so there are no issues here. 

As an aside, the *df.columns.tolist( )* command will produce a list of all the column names. The *set* command does two things: 1) it creates a type of list of all *unique* values in a list and 2) it allowsd for 'set' operations, such as *union*, *intersection*, and (as below) *difference*.

In [359]:
set(df.columns.tolist()) - set(new_variables_df['variable_name_new'].tolist())

{'501c3',
 'BusinessName',
 'BusinessNameControlTxt',
 'DLN',
 'EIN',
 'ForeignAddress',
 'ForeignPhoneNum',
 'InCareOfName',
 'InCareOfNm',
 'Name',
 'NameControl',
 'OrganizationName',
 'Phone',
 'PhoneNum',
 'URL',
 'USAddress',
 '_id',
 'fiscal_year'}

<br>Now we will check whether there are any columns in *new_variables_df* that are not in *df*. There are a few address-related columns -- but we will deal with these in a later notebook.

In [360]:
set(new_variables_df['variable_name_new'].tolist()) - set(df.columns.tolist())

{'F9_00_HD_FILER_ADDR_US_L1',
 'F9_00_HD_FILER_ADDR_US_L2',
 'F9_00_HD_FILER_CITY_US',
 'F9_00_HD_FILER_COUNTRY_FRGN',
 'F9_00_HD_FILER_STATE_US',
 'F9_00_HD_FILER_ZIP_US'}

##### Save DF

In [363]:
import psutil

def check_memory_status(threshold=85):
    mem = psutil.virtual_memory()
    print(f"🧠 RAM Usage: {mem.percent}% ({mem.used / 1e9:.2f} GB / {mem.total / 1e9:.2f} GB)")
    if mem.percent >= threshold:
        print("⚠️  Warning: RAM usage is high. Consider restarting the kernel before saving.")
    else:
        print("✅ Good to go!")

In [364]:
# Step 1: Check RAM before saving
check_memory_status()

🧠 RAM Usage: 48.8% (99.84 GB / 204.69 GB)
✅ Good to go!


In [365]:
def prepare_for_save(df):
    import gc

    # Drop any cached views
    df = df.copy()  # Break reference to any partial evaluation from .head(), etc.

    # Optionally sort or reset if needed
    # df = df.sort_values("some_column")  # Only if relevant
    # df = df.reset_index(drop=True)

    # Trigger garbage collection
    gc.collect()

    print("🧼 DataFrame copied + garbage collected. Ready to save.")
    return df

In [None]:
# Step 2: Clean up df (especially if you’ve been doing .head(), .sort(), etc.)
df = prepare_for_save(df)

In [361]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_feather('D:/all_filings_april_2025_all_controls_combined.feather')

Current date and time :  2025-04-17 00:39:12 

CPU times: total: 2min 43s
Wall time: 2min


In [366]:
%%time
df.to_parquet("D:/all_filings_april_2025_all_controls_combined.parquet", engine="pyarrow", compression="snappy", index=False)

CPU times: total: 3min 6s
Wall time: 3min 13s


In [368]:
#Use HDF5 format (handles mixed types better than feather):
#ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.
#df.to_hdf('D:/all_filings_april_2025_all_controls.h5', key='df')

In [370]:
%%time
import datetime
print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
df.to_pickle('D:/all_filings_april_2025_all_controls_combined.pkl.gz', compression='gzip')

Current date and time :  2025-04-17 00:47:45 

CPU times: total: 42min 18s
Wall time: 42min 40s


<br>Now we're done with this step. In the next notebook we'll parse variables that have complex 'nested' structures.