**Instructions**

1. Defining business objective
2. Extracting the data
3. Data cleaning, wrangling & EDA
    - On the **categorical** columns in the dataset:
        - Checking for null values in all the columns.
        - Excluding the following variables by looking at the definitions. Create a new empty list called drop_list. We will append this list and then drop all the columns in this list later: `OSOURCE`, `ZIP`.
        - Identifying columns that have over 85% missing values and removing them.
        - Reducing the number of categories in the column `GENDER`. The column should only have either "M" for males, "F" for females, and "other" for all the rest. 
    - On the **numerical** columns:
        - Checking for null values in the numerical columns.
        - Cleaning the columns GEOCODE2, WEALTH1, ADI, DMA,and MSA.
4. Checking accuracy
5. Building a model
6. Improving the model
7. Results

**Additional Information on Dataset**

- Large number of features: The data set has over 450 features. Hence selecting the right features for the model is very critical and at the same time it is not easy as the same traditional ways of removing features is not effective given the large number of features. Apart from feature selection, feature extraction (creating your own features using the existing features) is also not easy in this case.

- Sparsity: There are a lot of features with a large number of null values.

- Data imbalance: For developing a classification, there is a huge imbalance in the training dataset with only approximately 5000 values for one category as compared to over 95,000 instances for the other category.

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import seaborn as sns 

## 1. Defining business objective

To make their future direct marketing efforts more cost-effective, the organization, "Healthcare for All," wants to create a model that will help them maximize the overall revenue from future mailings. They want to develop a model that predicts the amount of money each donor is likely to give, rather than just classifying whether they will respond or not.
The organization is interested in a specific group called "lapsed" donors. These are people who made a donation between 13 to 24 months ago but haven't donated since then.
The organization observed a reverse relationship between the likelihood of a donor responding and the amount of money they give. This means that if they use a simple model to predict who will respond, they will mostly attract donors who give very small amounts.

The mailing was sent out in June 2018. All information included in the file (excluding the giving history date fields) is reflective of behavior before June 1997.

## 2. Extracting the data

In [2]:
df = pd.read_csv('learningSet.csv')

In [3]:
df.shape

(95412, 481)

It's a very large dataset, we have 481 features.

In [4]:
df.head()

Unnamed: 0,ODATEDW,OSOURCE,TCODE,STATE,ZIP,MAILCODE,PVASTATE,DOB,NOEXCH,RECINHSE,RECP3,RECPGVG,RECSWEEP,MDMAUD,DOMAIN,CLUSTER,AGE,AGEFLAG,HOMEOWNR,CHILD03,CHILD07,CHILD12,CHILD18,NUMCHLD,INCOME,GENDER,WEALTH1,HIT,MBCRAFT,MBGARDEN,MBBOOKS,MBCOLECT,MAGFAML,MAGFEM,MAGMALE,PUBGARDN,PUBCULIN,PUBHLTH,PUBDOITY,PUBNEWFN,PUBPHOTO,PUBOPP,DATASRCE,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,SOLP3,SOLIH,MAJOR,WEALTH2,GEOCODE,COLLECT1,VETERANS,BIBLE,CATLG,HOMEE,PETS,CDPLAY,STEREO,PCOWNERS,PHOTO,CRAFTS,FISHER,GARDENIN,BOATS,WALKER,KIDSTUFF,CARDS,PLATES,LIFESRC,PEPSTRFL,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,ADATE_2,ADATE_3,ADATE_4,ADATE_5,ADATE_6,ADATE_7,ADATE_8,ADATE_9,ADATE_10,ADATE_11,ADATE_12,ADATE_13,ADATE_14,ADATE_15,ADATE_16,ADATE_17,ADATE_18,ADATE_19,ADATE_20,ADATE_21,ADATE_22,ADATE_23,ADATE_24,RFA_2,RFA_3,RFA_4,RFA_5,RFA_6,RFA_7,RFA_8,RFA_9,RFA_10,RFA_11,RFA_12,RFA_13,RFA_14,RFA_15,RFA_16,RFA_17,RFA_18,RFA_19,RFA_20,RFA_21,RFA_22,RFA_23,RFA_24,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RDATE_3,RDATE_4,RDATE_5,RDATE_6,RDATE_7,RDATE_8,RDATE_9,RDATE_10,RDATE_11,RDATE_12,RDATE_13,RDATE_14,RDATE_15,RDATE_16,RDATE_17,RDATE_18,RDATE_19,RDATE_20,RDATE_21,RDATE_22,RDATE_23,RDATE_24,RAMNT_3,RAMNT_4,RAMNT_5,RAMNT_6,RAMNT_7,RAMNT_8,RAMNT_9,RAMNT_10,RAMNT_11,RAMNT_12,RAMNT_13,RAMNT_14,RAMNT_15,RAMNT_16,RAMNT_17,RAMNT_18,RAMNT_19,RAMNT_20,RAMNT_21,RAMNT_22,RAMNT_23,RAMNT_24,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,CONTROLN,TARGET_B,TARGET_D,HPHONE_D,RFA_2R,RFA_2F,RFA_2A,MDMAUD_R,MDMAUD_F,MDMAUD_A,CLUSTER2,GEOCODE2
0,8901,GRI,0,IL,61081,,,3712,0,,,,,XXXX,T2,36,60.0,,,,,,,,,F,,0,,,,,,,,,,,,,,,,0,39,34,18,10,2,1,,,,5.0,,,,,,,,,,,,,,,,,,,,,X,992,264,332,0,35,65,47,53,92,1,0,0,11,0,0,0,0,0,0,0,11,0,0,0,39,48,51,40,50,54,25,31,42,27,11,14,18,17,13,11,15,12,11,34,25,18,26,10,23,18,33,49,28,12,4,61,7,12,19,198,276,97,95,2,2,0,0,7,7,0,479,635,3,2,86,14,96,4,7,38,80,70,32,84,16,6,2,5,9,15,3,17,50,25,0,0,0,2,7,13,27,47,0,1,61,58,61,15,4,2,0,0,14,1,0,0,2,5,17,73,0.0,177.0,682.0,307,318,349,378,12883,13,23,23,23,15,1,0,0,1,4,25,24,26,17,2,0,0,2,28,4,51,1,46,54,3,88,8,0,0,0,0,0,0,4,1,13,14,16,2,45,56,64,50,64,44,62,53,99,0,0,9,3,8,13,9,0,3,9,3,15,19,5,4,3,0,3,41,1,0,7,13,6,5,0,4,9,4,1,3,10,2,1,7,78,2,0,120,16,10,39,21,8,4,3,5,20,3,19,4,0,0,0,18,39,0,34,23,18,16,1,4,0,23,0,0,5,1,0,0,0,0,0,2,0,3,74,88,8,0,4,96,77,19,13,31,5,14,14,31,54,46,0,0,90,0,10,0,0,0,33,65,40,99,99,6,2,10,7,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,9510.0,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,9502.0,9501.0,9411.0,9411.0,9410.0,9409.0,9407.0,9406.0,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,27,9702,74,6,14,,,,,,,,9512.0,,,,9507.0,9505.0,9505.0,9503.0,,,,,,9408.0,9406.0,,,,,,,,10.0,,,,10.0,11.0,11.0,11.0,,,,,,11.0,9.0,240.0,31,14,5.0,9208,12.0,9402,10.0,9512,8911,9003.0,4.0,7.741935,95515,0,0.0,0,L,4,E,X,X,X,39.0,C
1,9401,BOA,1,CA,91326,,,5202,0,,,,,XXXX,S1,14,46.0,E,H,,,,M,1.0,6.0,M,9.0,16,0.0,0.0,3.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,3.0,0,15,55,11,6,2,1,,,,9.0,2.0,,,,,,,,,,,,,,,,,,,,,3611,940,998,99,0,0,50,50,67,0,0,31,6,4,2,6,4,14,0,0,2,0,1,4,34,41,43,32,42,45,32,33,46,21,13,14,33,23,10,4,2,11,16,36,22,15,12,1,5,4,21,75,55,23,9,69,4,3,24,317,360,99,99,0,0,0,0,0,0,0,5468,5218,12,10,96,4,97,3,9,59,94,88,55,95,5,4,1,3,5,4,2,18,44,5,0,0,0,97,98,98,98,99,94,0,83,76,73,21,5,0,0,0,4,0,0,0,91,91,91,94,4480.0,13.0,803.0,1088,1096,1026,1037,36175,2,6,2,5,15,14,13,10,33,2,5,2,5,15,14,14,10,32,6,2,66,3,56,44,9,80,14,0,0,0,0,0,0,6,0,2,24,32,12,71,70,83,58,81,57,64,57,99,99,0,22,24,4,21,13,2,1,6,0,4,1,0,3,1,0,6,13,1,2,8,18,11,4,3,4,10,7,11,1,6,2,1,16,69,5,2,160,5,5,12,21,7,30,20,14,24,4,24,10,0,0,0,8,15,0,55,10,11,0,0,2,0,3,1,1,2,3,1,1,0,3,0,0,0,42,39,50,7,27,16,99,92,53,5,10,2,26,56,97,99,0,0,0,96,0,4,0,0,0,99,0,99,99,99,20,4,6,5,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,9510.0,9510.0,9509.0,,,,9503.0,,,9411.0,9411.0,9410.0,9409.0,,9406.0,L2G,A2G,A2G,A2G,A2G,A1E,A1E,A1E,A1E,A1E,A1E,,,,L1E,,,N1E,N1E,N1E,N1E,,F1E,12,9702,32,6,13,,,,,,,9512.0,,,,,,,9504.0,,,,,,,,,,,,,,,25.0,,,,,,,12.0,,,,,,,,,47.0,3,1,10.0,9310,25.0,9512,25.0,9512,9310,9504.0,18.0,15.666667,148535,0,0.0,0,L,2,G,X,X,X,1.0,A
2,9001,AMH,1,NC,27017,,,0,0,,,,,XXXX,R2,43,,,U,,,,,,3.0,M,1.0,2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,3.0,0,20,29,33,6,8,1,,,,1.0,,,,,,,,,,,,,,,,,,,,,X,7001,2040,2669,0,2,98,49,51,96,2,0,0,2,0,0,0,0,0,0,0,2,0,0,0,35,43,46,37,45,49,23,35,40,25,13,20,19,16,13,10,8,15,14,30,22,19,25,10,23,21,35,44,22,6,2,63,9,9,19,183,254,69,69,1,6,5,3,3,3,0,497,546,2,1,78,22,93,7,18,36,76,65,30,86,14,7,2,5,11,17,3,17,60,18,0,1,0,0,1,6,18,50,0,4,36,49,51,14,5,4,2,24,11,2,3,6,0,2,9,44,0.0,281.0,518.0,251,292,292,340,11576,32,18,20,15,12,2,0,0,1,20,19,24,18,16,2,0,0,1,28,8,31,11,38,62,8,74,22,0,0,0,0,0,2,2,1,21,19,24,6,61,65,73,59,70,56,78,62,82,99,4,10,5,2,6,12,0,1,9,5,18,20,5,7,6,0,11,33,4,3,2,12,3,3,2,0,7,8,3,3,6,7,1,8,74,3,1,120,22,20,28,16,6,5,3,1,23,1,16,6,0,0,0,10,21,0,28,23,32,8,1,14,1,5,0,0,7,0,0,0,0,0,1,0,0,2,84,96,3,0,0,92,65,29,9,22,3,12,23,50,69,31,0,0,0,6,35,44,0,15,22,77,17,97,92,9,2,6,5,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,,9501.0,9411.0,,,9409.0,9407.0,9406.0,L4E,S4E,S4E,S4E,S4E,S4F,S4F,S4F,,S4F,S4F,S4F,S4F,S4F,S4F,,S4D,S4D,,,S4D,S4D,S3D,26,9702,63,6,14,,,,,,,,,,9509.0,,9506.0,,9504.0,,9501.0,,,,9409.0,9407.0,9406.0,,,,,,,,,,11.0,,9.0,,9.0,,8.0,,,,8.0,7.0,6.0,202.0,27,14,2.0,9111,16.0,9207,5.0,9512,9001,9101.0,12.0,7.481481,15078,0,0.0,1,L,4,E,X,X,X,60.0,C
3,8701,BRY,0,CA,95953,,,2801,0,,,,,XXXX,R2,44,70.0,E,U,,,,,,1.0,F,4.0,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0,23,14,31,3,0,3,,,,0.0,,,,,,,,,,,,,,,,,,,,,X,640,160,219,0,8,92,54,46,61,0,0,11,32,6,2,0,0,0,0,0,31,0,0,1,32,40,44,34,43,47,25,45,35,20,15,25,17,17,12,7,7,20,17,30,14,19,25,11,23,23,27,50,30,15,8,63,9,6,23,199,283,85,83,3,4,1,0,2,0,2,1000,1263,2,1,48,52,93,7,6,36,73,61,30,84,16,6,3,3,21,12,4,13,36,13,0,0,0,10,25,50,69,92,10,15,42,55,50,15,5,4,0,9,42,4,0,5,1,8,17,34,9340.0,67.0,862.0,386,388,396,423,15130,27,12,4,26,22,5,0,0,4,35,5,6,12,30,6,0,0,5,22,14,26,20,46,54,3,58,36,0,0,0,0,0,6,0,0,17,13,15,0,43,69,81,53,68,45,33,31,0,99,23,17,3,0,6,6,0,0,13,42,12,0,0,0,42,0,6,3,0,0,0,23,3,3,6,0,3,3,3,3,3,0,3,6,87,0,0,120,28,12,14,27,10,3,5,0,19,1,17,0,0,0,0,13,23,0,14,40,31,16,0,1,0,13,0,0,4,0,0,0,3,0,0,0,0,29,67,56,41,3,0,94,43,27,4,38,0,10,19,39,45,55,0,0,45,22,17,0,0,16,23,77,22,93,89,16,2,6,6,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,9502.0,9501.0,9411.0,9411.0,9410.0,9409.0,,,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,,S4E,S4E,S4E,S4E,S4E,S4E,S2D,S2D,A1D,A1D,A1D,A1D,,,27,9702,66,6,14,,,,,,,,,9512.0,9509.0,,9508.0,,9505.0,9503.0,,,9411.0,9411.0,,,,,,,,,,,,10.0,10.0,,10.0,,7.0,11.0,,,6.0,11.0,,,,109.0,16,7,2.0,8711,11.0,9411,10.0,9512,8702,8711.0,9.0,6.8125,172556,0,0.0,1,L,4,E,X,X,X,41.0,C
4,8601,,0,FL,33176,,,2001,0,X,X,,,XXXX,S2,16,78.0,E,H,,,,,1.0,3.0,F,2.0,60,1.0,0.0,9.0,0.0,4.0,1.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,1.0,3.0,1,28,9,53,26,3,2,,12.0,,,,,,Y,Y,,,,Y,,,Y,,Y,,Y,,Y,,3.0,,2520,627,761,99,0,0,46,54,2,98,0,0,1,0,0,0,0,0,0,0,0,0,0,0,33,45,50,36,46,50,27,34,43,23,14,21,13,15,20,12,5,13,15,34,19,19,31,7,27,16,26,57,36,24,14,42,17,9,33,235,323,99,98,0,0,0,0,0,0,0,576,594,4,3,90,10,97,3,0,42,82,49,22,92,8,20,3,17,9,23,1,1,1,0,21,58,19,0,1,2,16,67,0,2,45,52,53,16,6,0,0,0,9,0,0,0,25,58,74,83,5000.0,127.0,528.0,240,250,293,321,9836,24,29,23,13,4,4,0,0,2,21,30,22,16,4,5,0,0,3,35,8,11,14,20,80,4,73,22,1,1,0,0,0,3,1,2,1,24,27,3,76,61,73,51,65,49,80,31,81,99,10,17,8,2,6,15,3,7,22,2,9,0,7,2,2,0,6,1,5,2,2,12,2,7,6,4,15,29,4,3,26,3,2,7,49,12,1,120,16,20,30,13,3,12,5,2,26,1,20,7,1,1,1,15,28,4,9,16,53,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,65,99,0,0,0,90,45,18,25,34,0,1,3,6,33,67,0,0,9,14,72,3,0,0,99,1,21,99,96,6,2,7,11,9706,9606.0,9604.0,9604.0,9603.0,9512.0,9601.0,9511.0,9510.0,9509.0,9508.0,9502.0,9506.0,,9503.0,9502.0,9412.0,9411.0,9411.0,9410.0,9506.0,9407.0,9406.0,L2F,A2F,A2F,A2F,A1D,I2D,A1E,A1E,L1D,A1E,A1E,L1D,L3D,,L3D,A2D,A2D,A3D,A3D,A3D,I4E,A3D,A3D,43,9702,113,10,25,,,,,,9601.0,,,,,,9506.0,,,,,,,,,,,,,,,,15.0,,,,,,10.0,,,,,,,,,,,254.0,37,8,3.0,9310,15.0,9601,15.0,9601,7903,8005.0,14.0,6.864865,7112,0,0.0,1,L,2,F,X,X,X,26.0,A


`TARGET_B` and `TARGET_D` could both be our target. 
- `TARGET_B`: donors that responded to a mailing ('1') and those that didn't ('0')
- `TARGET_D`: total amount that has been given per donor

In [5]:
df['TARGET_B'].value_counts()

0    90569
1     4843
Name: TARGET_B, dtype: int64

There is a huge imbalance towards those donors that did not respond in the past.

In [6]:
df['TARGET_D'].describe()

count    95412.000000
mean         0.793073
std          4.429725
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max        200.000000
Name: TARGET_D, dtype: float64

In [7]:
df['TARGET_D'].value_counts()

0.00     90569
10.00      941
15.00      591
20.00      577
5.00       503
         ...  
18.25        1
10.70        1
2.50         1
16.87        1
44.21        1
Name: TARGET_D, Length: 71, dtype: int64

The total amount donated varies heavily since many adressees haven't donated anything at all.

In [8]:
df.corr()['TARGET_D'].sort_values(ascending=False)

TARGET_D    1.000000
TARGET_B    0.774232
RAMNT_4     0.268811
RAMNT_21    0.099339
RAMNT_9     0.090168
              ...   
RDATE_3    -0.125194
RDATE_5    -0.220455
RAMNT_5    -0.272147
ADATE_5          NaN
ADATE_15         NaN
Name: TARGET_D, Length: 407, dtype: float64

In [9]:
df.corr()['TARGET_B'].sort_values(ascending=False)

TARGET_B    1.000000
TARGET_D    0.774232
RFA_2F      0.072311
CARDGIFT    0.054027
NGIFTALL    0.050896
              ...   
RAMNT_3    -0.095351
RDATE_3    -0.126060
RAMNT_5    -0.380296
ADATE_5          NaN
ADATE_15         NaN
Name: TARGET_B, Length: 407, dtype: float64

There are no high correlations of features with the targets. This will make it difficult to predict `TARGET_D` in a linear regression. `TARGET_B`, on the other hand, we would predict with a classification.

## Data cleaning, wrangling & EDA

In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95412 entries, 0 to 95411
Columns: 481 entries, ODATEDW to GEOCODE2
dtypes: float64(97), int64(310), object(74)
memory usage: 350.1+ MB


Checking for null values in all the columns.

In [11]:
df.isna().sum()

ODATEDW       0
OSOURCE       0
TCODE         0
STATE         0
ZIP           0
           ... 
MDMAUD_R      0
MDMAUD_F      0
MDMAUD_A      0
CLUSTER2    132
GEOCODE2    132
Length: 481, dtype: int64

In [12]:
empty_columns = df.columns[df.eq(' ').any()]
empty_columns

Index(['OSOURCE', 'MAILCODE', 'PVASTATE', 'NOEXCH', 'RECINHSE', 'RECP3',
       'RECPGVG', 'RECSWEEP', 'DOMAIN', 'CLUSTER', 'AGEFLAG', 'HOMEOWNR',
       'CHILD03', 'CHILD07', 'CHILD12', 'CHILD18', 'GENDER', 'DATASRCE',
       'SOLP3', 'SOLIH', 'MAJOR', 'GEOCODE', 'COLLECT1', 'VETERANS', 'BIBLE',
       'CATLG', 'HOMEE', 'PETS', 'CDPLAY', 'STEREO', 'PCOWNERS', 'PHOTO',
       'CRAFTS', 'FISHER', 'GARDENIN', 'BOATS', 'WALKER', 'KIDSTUFF', 'CARDS',
       'PLATES', 'LIFESRC', 'PEPSTRFL', 'RFA_3', 'RFA_4', 'RFA_5', 'RFA_6',
       'RFA_7', 'RFA_8', 'RFA_9', 'RFA_10', 'RFA_11', 'RFA_12', 'RFA_13',
       'RFA_14', 'RFA_15', 'RFA_16', 'RFA_17', 'RFA_18', 'RFA_19', 'RFA_20',
       'RFA_21', 'RFA_22', 'RFA_23', 'RFA_24', 'GEOCODE2'],
      dtype='object')

But there are a lot of columns with empty values. Let's replace ' ' with NaN.

In [13]:
# Removing 'MAILCODE' column because here ' ' stands for 'address ok'.
mailcode_column = df['MAILCODE']
df.drop('MAILCODE', axis=1, inplace=True)
df = df.replace(' ', np.nan)
df = pd.concat([df, mailcode_column], axis=1)
df.head()

Unnamed: 0,ODATEDW,OSOURCE,TCODE,STATE,ZIP,PVASTATE,DOB,NOEXCH,RECINHSE,RECP3,RECPGVG,RECSWEEP,MDMAUD,DOMAIN,CLUSTER,AGE,AGEFLAG,HOMEOWNR,CHILD03,CHILD07,CHILD12,CHILD18,NUMCHLD,INCOME,GENDER,WEALTH1,HIT,MBCRAFT,MBGARDEN,MBBOOKS,MBCOLECT,MAGFAML,MAGFEM,MAGMALE,PUBGARDN,PUBCULIN,PUBHLTH,PUBDOITY,PUBNEWFN,PUBPHOTO,PUBOPP,DATASRCE,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,SOLP3,SOLIH,MAJOR,WEALTH2,GEOCODE,COLLECT1,VETERANS,BIBLE,CATLG,HOMEE,PETS,CDPLAY,STEREO,PCOWNERS,PHOTO,CRAFTS,FISHER,GARDENIN,BOATS,WALKER,KIDSTUFF,CARDS,PLATES,LIFESRC,PEPSTRFL,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,ADATE_2,ADATE_3,ADATE_4,ADATE_5,ADATE_6,ADATE_7,ADATE_8,ADATE_9,ADATE_10,ADATE_11,ADATE_12,ADATE_13,ADATE_14,ADATE_15,ADATE_16,ADATE_17,ADATE_18,ADATE_19,ADATE_20,ADATE_21,ADATE_22,ADATE_23,ADATE_24,RFA_2,RFA_3,RFA_4,RFA_5,RFA_6,RFA_7,RFA_8,RFA_9,RFA_10,RFA_11,RFA_12,RFA_13,RFA_14,RFA_15,RFA_16,RFA_17,RFA_18,RFA_19,RFA_20,RFA_21,RFA_22,RFA_23,RFA_24,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RDATE_3,RDATE_4,RDATE_5,RDATE_6,RDATE_7,RDATE_8,RDATE_9,RDATE_10,RDATE_11,RDATE_12,RDATE_13,RDATE_14,RDATE_15,RDATE_16,RDATE_17,RDATE_18,RDATE_19,RDATE_20,RDATE_21,RDATE_22,RDATE_23,RDATE_24,RAMNT_3,RAMNT_4,RAMNT_5,RAMNT_6,RAMNT_7,RAMNT_8,RAMNT_9,RAMNT_10,RAMNT_11,RAMNT_12,RAMNT_13,RAMNT_14,RAMNT_15,RAMNT_16,RAMNT_17,RAMNT_18,RAMNT_19,RAMNT_20,RAMNT_21,RAMNT_22,RAMNT_23,RAMNT_24,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,CONTROLN,TARGET_B,TARGET_D,HPHONE_D,RFA_2R,RFA_2F,RFA_2A,MDMAUD_R,MDMAUD_F,MDMAUD_A,CLUSTER2,GEOCODE2,MAILCODE
0,8901,GRI,0,IL,61081,,3712,0,,,,,XXXX,T2,36,60.0,,,,,,,,,F,,0,,,,,,,,,,,,,,,,0,39,34,18,10,2,1,,,,5.0,,,,,,,,,,,,,,,,,,,,,X,992,264,332,0,35,65,47,53,92,1,0,0,11,0,0,0,0,0,0,0,11,0,0,0,39,48,51,40,50,54,25,31,42,27,11,14,18,17,13,11,15,12,11,34,25,18,26,10,23,18,33,49,28,12,4,61,7,12,19,198,276,97,95,2,2,0,0,7,7,0,479,635,3,2,86,14,96,4,7,38,80,70,32,84,16,6,2,5,9,15,3,17,50,25,0,0,0,2,7,13,27,47,0,1,61,58,61,15,4,2,0,0,14,1,0,0,2,5,17,73,0.0,177.0,682.0,307,318,349,378,12883,13,23,23,23,15,1,0,0,1,4,25,24,26,17,2,0,0,2,28,4,51,1,46,54,3,88,8,0,0,0,0,0,0,4,1,13,14,16,2,45,56,64,50,64,44,62,53,99,0,0,9,3,8,13,9,0,3,9,3,15,19,5,4,3,0,3,41,1,0,7,13,6,5,0,4,9,4,1,3,10,2,1,7,78,2,0,120,16,10,39,21,8,4,3,5,20,3,19,4,0,0,0,18,39,0,34,23,18,16,1,4,0,23,0,0,5,1,0,0,0,0,0,2,0,3,74,88,8,0,4,96,77,19,13,31,5,14,14,31,54,46,0,0,90,0,10,0,0,0,33,65,40,99,99,6,2,10,7,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,9510.0,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,9502.0,9501.0,9411.0,9411.0,9410.0,9409.0,9407.0,9406.0,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,27,9702,74,6,14,,,,,,,,9512.0,,,,9507.0,9505.0,9505.0,9503.0,,,,,,9408.0,9406.0,,,,,,,,10.0,,,,10.0,11.0,11.0,11.0,,,,,,11.0,9.0,240.0,31,14,5.0,9208,12.0,9402,10.0,9512,8911,9003.0,4.0,7.741935,95515,0,0.0,0,L,4,E,X,X,X,39.0,C,
1,9401,BOA,1,CA,91326,,5202,0,,,,,XXXX,S1,14,46.0,E,H,,,,M,1.0,6.0,M,9.0,16,0.0,0.0,3.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,3.0,0,15,55,11,6,2,1,,,,9.0,2.0,,,,,,,,,,,,,,,,,,,,,3611,940,998,99,0,0,50,50,67,0,0,31,6,4,2,6,4,14,0,0,2,0,1,4,34,41,43,32,42,45,32,33,46,21,13,14,33,23,10,4,2,11,16,36,22,15,12,1,5,4,21,75,55,23,9,69,4,3,24,317,360,99,99,0,0,0,0,0,0,0,5468,5218,12,10,96,4,97,3,9,59,94,88,55,95,5,4,1,3,5,4,2,18,44,5,0,0,0,97,98,98,98,99,94,0,83,76,73,21,5,0,0,0,4,0,0,0,91,91,91,94,4480.0,13.0,803.0,1088,1096,1026,1037,36175,2,6,2,5,15,14,13,10,33,2,5,2,5,15,14,14,10,32,6,2,66,3,56,44,9,80,14,0,0,0,0,0,0,6,0,2,24,32,12,71,70,83,58,81,57,64,57,99,99,0,22,24,4,21,13,2,1,6,0,4,1,0,3,1,0,6,13,1,2,8,18,11,4,3,4,10,7,11,1,6,2,1,16,69,5,2,160,5,5,12,21,7,30,20,14,24,4,24,10,0,0,0,8,15,0,55,10,11,0,0,2,0,3,1,1,2,3,1,1,0,3,0,0,0,42,39,50,7,27,16,99,92,53,5,10,2,26,56,97,99,0,0,0,96,0,4,0,0,0,99,0,99,99,99,20,4,6,5,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,9510.0,9510.0,9509.0,,,,9503.0,,,9411.0,9411.0,9410.0,9409.0,,9406.0,L2G,A2G,A2G,A2G,A2G,A1E,A1E,A1E,A1E,A1E,A1E,,,,L1E,,,N1E,N1E,N1E,N1E,,F1E,12,9702,32,6,13,,,,,,,9512.0,,,,,,,9504.0,,,,,,,,,,,,,,,25.0,,,,,,,12.0,,,,,,,,,47.0,3,1,10.0,9310,25.0,9512,25.0,9512,9310,9504.0,18.0,15.666667,148535,0,0.0,0,L,2,G,X,X,X,1.0,A,
2,9001,AMH,1,NC,27017,,0,0,,,,,XXXX,R2,43,,,U,,,,,,3.0,M,1.0,2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,3.0,0,20,29,33,6,8,1,,,,1.0,,,,,,,,,,,,,,,,,,,,,X,7001,2040,2669,0,2,98,49,51,96,2,0,0,2,0,0,0,0,0,0,0,2,0,0,0,35,43,46,37,45,49,23,35,40,25,13,20,19,16,13,10,8,15,14,30,22,19,25,10,23,21,35,44,22,6,2,63,9,9,19,183,254,69,69,1,6,5,3,3,3,0,497,546,2,1,78,22,93,7,18,36,76,65,30,86,14,7,2,5,11,17,3,17,60,18,0,1,0,0,1,6,18,50,0,4,36,49,51,14,5,4,2,24,11,2,3,6,0,2,9,44,0.0,281.0,518.0,251,292,292,340,11576,32,18,20,15,12,2,0,0,1,20,19,24,18,16,2,0,0,1,28,8,31,11,38,62,8,74,22,0,0,0,0,0,2,2,1,21,19,24,6,61,65,73,59,70,56,78,62,82,99,4,10,5,2,6,12,0,1,9,5,18,20,5,7,6,0,11,33,4,3,2,12,3,3,2,0,7,8,3,3,6,7,1,8,74,3,1,120,22,20,28,16,6,5,3,1,23,1,16,6,0,0,0,10,21,0,28,23,32,8,1,14,1,5,0,0,7,0,0,0,0,0,1,0,0,2,84,96,3,0,0,92,65,29,9,22,3,12,23,50,69,31,0,0,0,6,35,44,0,15,22,77,17,97,92,9,2,6,5,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,,9501.0,9411.0,,,9409.0,9407.0,9406.0,L4E,S4E,S4E,S4E,S4E,S4F,S4F,S4F,,S4F,S4F,S4F,S4F,S4F,S4F,,S4D,S4D,,,S4D,S4D,S3D,26,9702,63,6,14,,,,,,,,,,9509.0,,9506.0,,9504.0,,9501.0,,,,9409.0,9407.0,9406.0,,,,,,,,,,11.0,,9.0,,9.0,,8.0,,,,8.0,7.0,6.0,202.0,27,14,2.0,9111,16.0,9207,5.0,9512,9001,9101.0,12.0,7.481481,15078,0,0.0,1,L,4,E,X,X,X,60.0,C,
3,8701,BRY,0,CA,95953,,2801,0,,,,,XXXX,R2,44,70.0,E,U,,,,,,1.0,F,4.0,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,3.0,0,23,14,31,3,0,3,,,,0.0,,,,,,,,,,,,,,,,,,,,,X,640,160,219,0,8,92,54,46,61,0,0,11,32,6,2,0,0,0,0,0,31,0,0,1,32,40,44,34,43,47,25,45,35,20,15,25,17,17,12,7,7,20,17,30,14,19,25,11,23,23,27,50,30,15,8,63,9,6,23,199,283,85,83,3,4,1,0,2,0,2,1000,1263,2,1,48,52,93,7,6,36,73,61,30,84,16,6,3,3,21,12,4,13,36,13,0,0,0,10,25,50,69,92,10,15,42,55,50,15,5,4,0,9,42,4,0,5,1,8,17,34,9340.0,67.0,862.0,386,388,396,423,15130,27,12,4,26,22,5,0,0,4,35,5,6,12,30,6,0,0,5,22,14,26,20,46,54,3,58,36,0,0,0,0,0,6,0,0,17,13,15,0,43,69,81,53,68,45,33,31,0,99,23,17,3,0,6,6,0,0,13,42,12,0,0,0,42,0,6,3,0,0,0,23,3,3,6,0,3,3,3,3,3,0,3,6,87,0,0,120,28,12,14,27,10,3,5,0,19,1,17,0,0,0,0,13,23,0,14,40,31,16,0,1,0,13,0,0,4,0,0,0,3,0,0,0,0,29,67,56,41,3,0,94,43,27,4,38,0,10,19,39,45,55,0,0,45,22,17,0,0,16,23,77,22,93,89,16,2,6,6,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,9502.0,9501.0,9411.0,9411.0,9410.0,9409.0,,,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,,S4E,S4E,S4E,S4E,S4E,S4E,S2D,S2D,A1D,A1D,A1D,A1D,,,27,9702,66,6,14,,,,,,,,,9512.0,9509.0,,9508.0,,9505.0,9503.0,,,9411.0,9411.0,,,,,,,,,,,,10.0,10.0,,10.0,,7.0,11.0,,,6.0,11.0,,,,109.0,16,7,2.0,8711,11.0,9411,10.0,9512,8702,8711.0,9.0,6.8125,172556,0,0.0,1,L,4,E,X,X,X,41.0,C,
4,8601,,0,FL,33176,,2001,0,X,X,,,XXXX,S2,16,78.0,E,H,,,,,1.0,3.0,F,2.0,60,1.0,0.0,9.0,0.0,4.0,1.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,1.0,3.0,1,28,9,53,26,3,2,,12.0,,,,,,Y,Y,,,,Y,,,Y,,Y,,Y,,Y,,3.0,,2520,627,761,99,0,0,46,54,2,98,0,0,1,0,0,0,0,0,0,0,0,0,0,0,33,45,50,36,46,50,27,34,43,23,14,21,13,15,20,12,5,13,15,34,19,19,31,7,27,16,26,57,36,24,14,42,17,9,33,235,323,99,98,0,0,0,0,0,0,0,576,594,4,3,90,10,97,3,0,42,82,49,22,92,8,20,3,17,9,23,1,1,1,0,21,58,19,0,1,2,16,67,0,2,45,52,53,16,6,0,0,0,9,0,0,0,25,58,74,83,5000.0,127.0,528.0,240,250,293,321,9836,24,29,23,13,4,4,0,0,2,21,30,22,16,4,5,0,0,3,35,8,11,14,20,80,4,73,22,1,1,0,0,0,3,1,2,1,24,27,3,76,61,73,51,65,49,80,31,81,99,10,17,8,2,6,15,3,7,22,2,9,0,7,2,2,0,6,1,5,2,2,12,2,7,6,4,15,29,4,3,26,3,2,7,49,12,1,120,16,20,30,13,3,12,5,2,26,1,20,7,1,1,1,15,28,4,9,16,53,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,65,99,0,0,0,90,45,18,25,34,0,1,3,6,33,67,0,0,9,14,72,3,0,0,99,1,21,99,96,6,2,7,11,9706,9606.0,9604.0,9604.0,9603.0,9512.0,9601.0,9511.0,9510.0,9509.0,9508.0,9502.0,9506.0,,9503.0,9502.0,9412.0,9411.0,9411.0,9410.0,9506.0,9407.0,9406.0,L2F,A2F,A2F,A2F,A1D,I2D,A1E,A1E,L1D,A1E,A1E,L1D,L3D,,L3D,A2D,A2D,A3D,A3D,A3D,I4E,A3D,A3D,43,9702,113,10,25,,,,,,9601.0,,,,,,9506.0,,,,,,,,,,,,,,,,15.0,,,,,,10.0,,,,,,,,,,,254.0,37,8,3.0,9310,15.0,9601,15.0,9601,7903,8005.0,14.0,6.864865,7112,0,0.0,1,L,2,F,X,X,X,26.0,A,


Before cleaning, let's separate in categoricals, numericals and Y for the targets.

In [14]:
Y = df[['TARGET_B', 'TARGET_D']]
Y.sample(5)

Unnamed: 0,TARGET_B,TARGET_D
21951,0,0.0
49065,0,0.0
65045,0,0.0
48806,0,0.0
46337,0,0.0


In [15]:
numerical = df.select_dtypes(np.number)
numerical = numerical.drop(columns = ['TARGET_B', 'TARGET_D'])
numerical.head()

Unnamed: 0,ODATEDW,TCODE,DOB,AGE,NUMCHLD,INCOME,WEALTH1,HIT,MBCRAFT,MBGARDEN,MBBOOKS,MBCOLECT,MAGFAML,MAGFEM,MAGMALE,PUBGARDN,PUBCULIN,PUBHLTH,PUBDOITY,PUBNEWFN,PUBPHOTO,PUBOPP,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,WEALTH2,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,ADATE_2,ADATE_3,ADATE_4,ADATE_5,ADATE_6,ADATE_7,ADATE_8,ADATE_9,ADATE_10,ADATE_11,ADATE_12,ADATE_13,ADATE_14,ADATE_15,ADATE_16,ADATE_17,ADATE_18,ADATE_19,ADATE_20,ADATE_21,ADATE_22,ADATE_23,ADATE_24,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RDATE_3,RDATE_4,RDATE_5,RDATE_6,RDATE_7,RDATE_8,RDATE_9,RDATE_10,RDATE_11,RDATE_12,RDATE_13,RDATE_14,RDATE_15,RDATE_16,RDATE_17,RDATE_18,RDATE_19,RDATE_20,RDATE_21,RDATE_22,RDATE_23,RDATE_24,RAMNT_3,RAMNT_4,RAMNT_5,RAMNT_6,RAMNT_7,RAMNT_8,RAMNT_9,RAMNT_10,RAMNT_11,RAMNT_12,RAMNT_13,RAMNT_14,RAMNT_15,RAMNT_16,RAMNT_17,RAMNT_18,RAMNT_19,RAMNT_20,RAMNT_21,RAMNT_22,RAMNT_23,RAMNT_24,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,CONTROLN,HPHONE_D,RFA_2F,CLUSTER2
0,8901,0,3712,60.0,,,,0,,,,,,,,,,,,,,,0,39,34,18,10,2,1,5.0,992,264,332,0,35,65,47,53,92,1,0,0,11,0,0,0,0,0,0,0,11,0,0,0,39,48,51,40,50,54,25,31,42,27,11,14,18,17,13,11,15,12,11,34,25,18,26,10,23,18,33,49,28,12,4,61,7,12,19,198,276,97,95,2,2,0,0,7,7,0,479,635,3,2,86,14,96,4,7,38,80,70,32,84,16,6,2,5,9,15,3,17,50,25,0,0,0,2,7,13,27,47,0,1,61,58,61,15,4,2,0,0,14,1,0,0,2,5,17,73,0.0,177.0,682.0,307,318,349,378,12883,13,23,23,23,15,1,0,0,1,4,25,24,26,17,2,0,0,2,28,4,51,1,46,54,3,88,8,0,0,0,0,0,0,4,1,13,14,16,2,45,56,64,50,64,44,62,53,99,0,0,9,3,8,13,9,0,3,9,3,15,19,5,4,3,0,3,41,1,0,7,13,6,5,0,4,9,4,1,3,10,2,1,7,78,2,0,120,16,10,39,21,8,4,3,5,20,3,19,4,0,0,0,18,39,0,34,23,18,16,1,4,0,23,0,0,5,1,0,0,0,0,0,2,0,3,74,88,8,0,4,96,77,19,13,31,5,14,14,31,54,46,0,0,90,0,10,0,0,0,33,65,40,99,99,6,2,10,7,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,9510.0,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,9502.0,9501.0,9411.0,9411.0,9410.0,9409.0,9407.0,9406.0,27,9702,74,6,14,,,,,,,,9512.0,,,,9507.0,9505.0,9505.0,9503.0,,,,,,9408.0,9406.0,,,,,,,,10.0,,,,10.0,11.0,11.0,11.0,,,,,,11.0,9.0,240.0,31,14,5.0,9208,12.0,9402,10.0,9512,8911,9003.0,4.0,7.741935,95515,0,4,39.0
1,9401,1,5202,46.0,1.0,6.0,9.0,16,0.0,0.0,3.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0,15,55,11,6,2,1,9.0,3611,940,998,99,0,0,50,50,67,0,0,31,6,4,2,6,4,14,0,0,2,0,1,4,34,41,43,32,42,45,32,33,46,21,13,14,33,23,10,4,2,11,16,36,22,15,12,1,5,4,21,75,55,23,9,69,4,3,24,317,360,99,99,0,0,0,0,0,0,0,5468,5218,12,10,96,4,97,3,9,59,94,88,55,95,5,4,1,3,5,4,2,18,44,5,0,0,0,97,98,98,98,99,94,0,83,76,73,21,5,0,0,0,4,0,0,0,91,91,91,94,4480.0,13.0,803.0,1088,1096,1026,1037,36175,2,6,2,5,15,14,13,10,33,2,5,2,5,15,14,14,10,32,6,2,66,3,56,44,9,80,14,0,0,0,0,0,0,6,0,2,24,32,12,71,70,83,58,81,57,64,57,99,99,0,22,24,4,21,13,2,1,6,0,4,1,0,3,1,0,6,13,1,2,8,18,11,4,3,4,10,7,11,1,6,2,1,16,69,5,2,160,5,5,12,21,7,30,20,14,24,4,24,10,0,0,0,8,15,0,55,10,11,0,0,2,0,3,1,1,2,3,1,1,0,3,0,0,0,42,39,50,7,27,16,99,92,53,5,10,2,26,56,97,99,0,0,0,96,0,4,0,0,0,99,0,99,99,99,20,4,6,5,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,9510.0,9510.0,9509.0,,,,9503.0,,,9411.0,9411.0,9410.0,9409.0,,9406.0,12,9702,32,6,13,,,,,,,9512.0,,,,,,,9504.0,,,,,,,,,,,,,,,25.0,,,,,,,12.0,,,,,,,,,47.0,3,1,10.0,9310,25.0,9512,25.0,9512,9310,9504.0,18.0,15.666667,148535,0,2,1.0
2,9001,1,0,,,3.0,1.0,2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,20,29,33,6,8,1,1.0,7001,2040,2669,0,2,98,49,51,96,2,0,0,2,0,0,0,0,0,0,0,2,0,0,0,35,43,46,37,45,49,23,35,40,25,13,20,19,16,13,10,8,15,14,30,22,19,25,10,23,21,35,44,22,6,2,63,9,9,19,183,254,69,69,1,6,5,3,3,3,0,497,546,2,1,78,22,93,7,18,36,76,65,30,86,14,7,2,5,11,17,3,17,60,18,0,1,0,0,1,6,18,50,0,4,36,49,51,14,5,4,2,24,11,2,3,6,0,2,9,44,0.0,281.0,518.0,251,292,292,340,11576,32,18,20,15,12,2,0,0,1,20,19,24,18,16,2,0,0,1,28,8,31,11,38,62,8,74,22,0,0,0,0,0,2,2,1,21,19,24,6,61,65,73,59,70,56,78,62,82,99,4,10,5,2,6,12,0,1,9,5,18,20,5,7,6,0,11,33,4,3,2,12,3,3,2,0,7,8,3,3,6,7,1,8,74,3,1,120,22,20,28,16,6,5,3,1,23,1,16,6,0,0,0,10,21,0,28,23,32,8,1,14,1,5,0,0,7,0,0,0,0,0,1,0,0,2,84,96,3,0,0,92,65,29,9,22,3,12,23,50,69,31,0,0,0,6,35,44,0,15,22,77,17,97,92,9,2,6,5,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,,9501.0,9411.0,,,9409.0,9407.0,9406.0,26,9702,63,6,14,,,,,,,,,,9509.0,,9506.0,,9504.0,,9501.0,,,,9409.0,9407.0,9406.0,,,,,,,,,,11.0,,9.0,,9.0,,8.0,,,,8.0,7.0,6.0,202.0,27,14,2.0,9111,16.0,9207,5.0,9512,9001,9101.0,12.0,7.481481,15078,1,4,60.0
3,8701,0,2801,70.0,,1.0,4.0,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0,23,14,31,3,0,3,0.0,640,160,219,0,8,92,54,46,61,0,0,11,32,6,2,0,0,0,0,0,31,0,0,1,32,40,44,34,43,47,25,45,35,20,15,25,17,17,12,7,7,20,17,30,14,19,25,11,23,23,27,50,30,15,8,63,9,6,23,199,283,85,83,3,4,1,0,2,0,2,1000,1263,2,1,48,52,93,7,6,36,73,61,30,84,16,6,3,3,21,12,4,13,36,13,0,0,0,10,25,50,69,92,10,15,42,55,50,15,5,4,0,9,42,4,0,5,1,8,17,34,9340.0,67.0,862.0,386,388,396,423,15130,27,12,4,26,22,5,0,0,4,35,5,6,12,30,6,0,0,5,22,14,26,20,46,54,3,58,36,0,0,0,0,0,6,0,0,17,13,15,0,43,69,81,53,68,45,33,31,0,99,23,17,3,0,6,6,0,0,13,42,12,0,0,0,42,0,6,3,0,0,0,23,3,3,6,0,3,3,3,3,3,0,3,6,87,0,0,120,28,12,14,27,10,3,5,0,19,1,17,0,0,0,0,13,23,0,14,40,31,16,0,1,0,13,0,0,4,0,0,0,3,0,0,0,0,29,67,56,41,3,0,94,43,27,4,38,0,10,19,39,45,55,0,0,45,22,17,0,0,16,23,77,22,93,89,16,2,6,6,9706,9606.0,9604.0,9604.0,9603.0,9602.0,9601.0,9511.0,,9510.0,9508.0,9507.0,9506.0,9504.0,9503.0,9502.0,9501.0,9411.0,9411.0,9410.0,9409.0,,,27,9702,66,6,14,,,,,,,,,9512.0,9509.0,,9508.0,,9505.0,9503.0,,,9411.0,9411.0,,,,,,,,,,,,10.0,10.0,,10.0,,7.0,11.0,,,6.0,11.0,,,,109.0,16,7,2.0,8711,11.0,9411,10.0,9512,8702,8711.0,9.0,6.8125,172556,1,4,41.0
4,8601,0,2001,78.0,1.0,3.0,2.0,60,1.0,0.0,9.0,0.0,4.0,1.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,1.0,1,28,9,53,26,3,2,,2520,627,761,99,0,0,46,54,2,98,0,0,1,0,0,0,0,0,0,0,0,0,0,0,33,45,50,36,46,50,27,34,43,23,14,21,13,15,20,12,5,13,15,34,19,19,31,7,27,16,26,57,36,24,14,42,17,9,33,235,323,99,98,0,0,0,0,0,0,0,576,594,4,3,90,10,97,3,0,42,82,49,22,92,8,20,3,17,9,23,1,1,1,0,21,58,19,0,1,2,16,67,0,2,45,52,53,16,6,0,0,0,9,0,0,0,25,58,74,83,5000.0,127.0,528.0,240,250,293,321,9836,24,29,23,13,4,4,0,0,2,21,30,22,16,4,5,0,0,3,35,8,11,14,20,80,4,73,22,1,1,0,0,0,3,1,2,1,24,27,3,76,61,73,51,65,49,80,31,81,99,10,17,8,2,6,15,3,7,22,2,9,0,7,2,2,0,6,1,5,2,2,12,2,7,6,4,15,29,4,3,26,3,2,7,49,12,1,120,16,20,30,13,3,12,5,2,26,1,20,7,1,1,1,15,28,4,9,16,53,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,65,99,0,0,0,90,45,18,25,34,0,1,3,6,33,67,0,0,9,14,72,3,0,0,99,1,21,99,96,6,2,7,11,9706,9606.0,9604.0,9604.0,9603.0,9512.0,9601.0,9511.0,9510.0,9509.0,9508.0,9502.0,9506.0,,9503.0,9502.0,9412.0,9411.0,9411.0,9410.0,9506.0,9407.0,9406.0,43,9702,113,10,25,,,,,,9601.0,,,,,,9506.0,,,,,,,,,,,,,,,,15.0,,,,,,10.0,,,,,,,,,,,254.0,37,8,3.0,9310,15.0,9601,15.0,9601,7903,8005.0,14.0,6.864865,7112,1,2,26.0


The numerical features provide some personal data, the donor's neighbourhood, economic & job situation, and some information on the donation history. 

In [16]:
categorical = df.select_dtypes('object')
categorical.columns

Index(['OSOURCE', 'STATE', 'ZIP', 'PVASTATE', 'NOEXCH', 'RECINHSE', 'RECP3',
       'RECPGVG', 'RECSWEEP', 'MDMAUD', 'DOMAIN', 'CLUSTER', 'AGEFLAG',
       'HOMEOWNR', 'CHILD03', 'CHILD07', 'CHILD12', 'CHILD18', 'GENDER',
       'DATASRCE', 'SOLP3', 'SOLIH', 'MAJOR', 'GEOCODE', 'COLLECT1',
       'VETERANS', 'BIBLE', 'CATLG', 'HOMEE', 'PETS', 'CDPLAY', 'STEREO',
       'PCOWNERS', 'PHOTO', 'CRAFTS', 'FISHER', 'GARDENIN', 'BOATS', 'WALKER',
       'KIDSTUFF', 'CARDS', 'PLATES', 'LIFESRC', 'PEPSTRFL', 'RFA_2', 'RFA_3',
       'RFA_4', 'RFA_5', 'RFA_6', 'RFA_7', 'RFA_8', 'RFA_9', 'RFA_10',
       'RFA_11', 'RFA_12', 'RFA_13', 'RFA_14', 'RFA_15', 'RFA_16', 'RFA_17',
       'RFA_18', 'RFA_19', 'RFA_20', 'RFA_21', 'RFA_22', 'RFA_23', 'RFA_24',
       'RFA_2R', 'RFA_2A', 'MDMAUD_R', 'MDMAUD_F', 'MDMAUD_A', 'GEOCODE2',
       'MAILCODE'],
      dtype='object')

The categorical features provide information on type & origin of record, the donor's living location, family status, other personal data and interests as well as on frequency and amount given (RFA) and major donors (MDMAUD). 

### Lab Machine Learning Revisited

#### On the **categorical** columns in the dataset:

**Checking for null values in all the columns.**

In [17]:
categorical.isna().sum()

OSOURCE       928
STATE           0
ZIP             0
PVASTATE    93954
NOEXCH          7
            ...  
MDMAUD_R        0
MDMAUD_F        0
MDMAUD_A        0
GEOCODE2      319
MAILCODE        0
Length: 74, dtype: int64

#### Excluding variables `OSOURCE` and `ZIP`.

In [18]:
drop_list = ['OSOURCE', 'ZIP']
drop_list

['OSOURCE', 'ZIP']

#### Identifying columns that have over 85% missing values and removing them.

In [19]:
def filter_nulls(categorical, perc=0.85):
    nulls_percent_categorical = pd.DataFrame(categorical.isna().sum() / len(categorical)).reset_index()
    nulls_percent_categorical.columns = ['column_name', 'nulls_percentage']
    columns_above_threshold = nulls_percent_categorical[nulls_percent_categorical['nulls_percentage'] > perc]
    drop_columns_list = list(columns_above_threshold['column_name'])
    return drop_columns_list

drop_columns = filter_nulls(categorical, perc=0.85)
drop_columns

['PVASTATE',
 'RECINHSE',
 'RECP3',
 'RECPGVG',
 'RECSWEEP',
 'CHILD03',
 'CHILD07',
 'CHILD12',
 'CHILD18',
 'SOLP3',
 'SOLIH',
 'MAJOR',
 'COLLECT1',
 'VETERANS',
 'BIBLE',
 'CATLG',
 'HOMEE',
 'CDPLAY',
 'STEREO',
 'PCOWNERS',
 'PHOTO',
 'CRAFTS',
 'FISHER',
 'GARDENIN',
 'BOATS',
 'WALKER',
 'KIDSTUFF',
 'CARDS',
 'PLATES']

In [20]:
categorical.drop(columns=drop_columns, axis=1, inplace=True)
categorical.head()

Unnamed: 0,OSOURCE,STATE,ZIP,NOEXCH,MDMAUD,DOMAIN,CLUSTER,AGEFLAG,HOMEOWNR,GENDER,DATASRCE,GEOCODE,PETS,LIFESRC,PEPSTRFL,RFA_2,RFA_3,RFA_4,RFA_5,RFA_6,RFA_7,RFA_8,RFA_9,RFA_10,RFA_11,RFA_12,RFA_13,RFA_14,RFA_15,RFA_16,RFA_17,RFA_18,RFA_19,RFA_20,RFA_21,RFA_22,RFA_23,RFA_24,RFA_2R,RFA_2A,MDMAUD_R,MDMAUD_F,MDMAUD_A,GEOCODE2,MAILCODE
0,GRI,IL,61081,0,XXXX,T2,36,,,F,,,,,X,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,L,E,X,X,X,C,
1,BOA,CA,91326,0,XXXX,S1,14,E,H,M,3.0,2.0,,,,L2G,A2G,A2G,A2G,A2G,A1E,A1E,A1E,A1E,A1E,A1E,,,,L1E,,,N1E,N1E,N1E,N1E,,F1E,L,G,X,X,X,A,
2,AMH,NC,27017,0,XXXX,R2,43,,U,M,3.0,,,,X,L4E,S4E,S4E,S4E,S4E,S4F,S4F,S4F,,S4F,S4F,S4F,S4F,S4F,S4F,,S4D,S4D,,,S4D,S4D,S3D,L,E,X,X,X,C,
3,BRY,CA,95953,0,XXXX,R2,44,E,U,F,3.0,,,,X,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,,S4E,S4E,S4E,S4E,S4E,S4E,S2D,S2D,A1D,A1D,A1D,A1D,,,L,E,X,X,X,C,
4,,FL,33176,0,XXXX,S2,16,E,H,F,3.0,,,3.0,,L2F,A2F,A2F,A2F,A1D,I2D,A1E,A1E,L1D,A1E,A1E,L1D,L3D,,L3D,A2D,A2D,A3D,A3D,A3D,I4E,A3D,A3D,L,F,X,X,X,A,


#### Reducing the number of categories in the column `GENDER`.

In [21]:
categorical['GENDER'].value_counts()

F    51277
M    39094
U     1715
J      365
C        2
A        2
Name: GENDER, dtype: int64

In [22]:
replace_dict = {'U': 'Other', 'J': 'Other', 'C': 'Other', 'A': 'Other'}
categorical['GENDER'] = df['GENDER'].replace(replace_dict)
categorical['GENDER'].value_counts()

F        51277
M        39094
Other     2084
Name: GENDER, dtype: int64

#### On the **numerical** columns:

In [23]:
df = pd.concat([categorical, numerical, Y], axis=1)
df.shape

(95412, 452)

### Activities

**Use the method value_counts on the columns `MAILCODE`, `NOEXCH`, and `MDMAUD` and check the proportion of category representation in those columns.** Since there is a huge imbalance in the representation of categories, we will add those columns to the drop_list.

In [24]:
df['MAILCODE'].value_counts()

     94013
B     1399
Name: MAILCODE, dtype: int64

In [25]:
df['NOEXCH'].value_counts()

0    61203
0    33882
1      195
1       90
X       35
Name: NOEXCH, dtype: int64

In [26]:
df['NOEXCH'].unique()

array(['0', '1', 'X', 0, 1, nan], dtype=object)

Assuming '0' and 0 as well as '1' and 1 are the same category, there is a high imbalance towards 0.

In [27]:
df['MDMAUD'].value_counts()

XXXX    95118
C1CM       65
L1CM       44
I1CM       37
D2CM       28
C2CM       24
D1CM       20
L2CM       15
L1LM        8
C1LM        8
I2CM        7
D5CM        5
D5MM        5
D2MM        4
C5CM        3
C2MM        3
C2LM        3
D5TM        3
I1LM        3
I5CM        1
C1MM        1
I5MM        1
C5MM        1
I2MM        1
L1MM        1
L2LM        1
C5TM        1
L2TM        1
Name: MDMAUD, dtype: int64

In [28]:
drop_list = ['OSOURCE', 'ZIP']
new_items = ['MAILCODE', 'NOEXCH', 'MDMAUD']
drop_list.extend(new_items)
drop_list

['OSOURCE', 'ZIP', 'MAILCODE', 'NOEXCH', 'MDMAUD']

**Replace null values in the columns `DATASRCE` and `GEOCODE2`.**

In [29]:
df['DATASRCE'].unique()

array([nan, '3', '1', '2'], dtype=object)

`DATASRCE`: Source of Overlay Data: the third-party data source the donor matched against. 1=MetroMail, 2=Polk, 3=Both

In order to not introduce bias, I'll check for correlation of the feature with the targets. If correlation is low, best would be to drop the feature. Unfortunately, earlier I already saw that none of the independent features show a high correlation with either one of the targets. Therefore, I do assign a new category 'Unknown' to the null values. 

In [30]:
df['DATASRCE'] = df['DATASRCE'].replace(np.nan, 'Unknown')
df['DATASRCE'].value_counts()

3          43549
2          23455
Unknown    21280
1           7128
Name: DATASRCE, dtype: int64

**Remove the columns starting with `ADATE_`.** We are assuming that the date when the previous mail was done is not significant in the respondents decision to give donation. They may or may not even remember when they received the mail in the previous years. 

In [31]:
df.drop(columns=df.filter(regex=r'^ADATE_', axis=1).columns, inplace=True)

In [32]:
df.shape

(95412, 429)

### Lab Feature Engineering

#### Checking for null values in the numerical columns.

In [33]:
numerical = df.select_dtypes(np.number)
numerical = numerical.drop(columns = ['TARGET_B', 'TARGET_D'])
numerical.head()

Unnamed: 0,ODATEDW,TCODE,DOB,AGE,NUMCHLD,INCOME,WEALTH1,HIT,MBCRAFT,MBGARDEN,MBBOOKS,MBCOLECT,MAGFAML,MAGFEM,MAGMALE,PUBGARDN,PUBCULIN,PUBHLTH,PUBDOITY,PUBNEWFN,PUBPHOTO,PUBOPP,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,WEALTH2,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RDATE_3,RDATE_4,RDATE_5,RDATE_6,RDATE_7,RDATE_8,RDATE_9,RDATE_10,RDATE_11,RDATE_12,RDATE_13,RDATE_14,RDATE_15,RDATE_16,RDATE_17,RDATE_18,RDATE_19,RDATE_20,RDATE_21,RDATE_22,RDATE_23,RDATE_24,RAMNT_3,RAMNT_4,RAMNT_5,RAMNT_6,RAMNT_7,RAMNT_8,RAMNT_9,RAMNT_10,RAMNT_11,RAMNT_12,RAMNT_13,RAMNT_14,RAMNT_15,RAMNT_16,RAMNT_17,RAMNT_18,RAMNT_19,RAMNT_20,RAMNT_21,RAMNT_22,RAMNT_23,RAMNT_24,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,CONTROLN,HPHONE_D,RFA_2F,CLUSTER2
0,8901,0,3712,60.0,,,,0,,,,,,,,,,,,,,,0,39,34,18,10,2,1,5.0,992,264,332,0,35,65,47,53,92,1,0,0,11,0,0,0,0,0,0,0,11,0,0,0,39,48,51,40,50,54,25,31,42,27,11,14,18,17,13,11,15,12,11,34,25,18,26,10,23,18,33,49,28,12,4,61,7,12,19,198,276,97,95,2,2,0,0,7,7,0,479,635,3,2,86,14,96,4,7,38,80,70,32,84,16,6,2,5,9,15,3,17,50,25,0,0,0,2,7,13,27,47,0,1,61,58,61,15,4,2,0,0,14,1,0,0,2,5,17,73,0.0,177.0,682.0,307,318,349,378,12883,13,23,23,23,15,1,0,0,1,4,25,24,26,17,2,0,0,2,28,4,51,1,46,54,3,88,8,0,0,0,0,0,0,4,1,13,14,16,2,45,56,64,50,64,44,62,53,99,0,0,9,3,8,13,9,0,3,9,3,15,19,5,4,3,0,3,41,1,0,7,13,6,5,0,4,9,4,1,3,10,2,1,7,78,2,0,120,16,10,39,21,8,4,3,5,20,3,19,4,0,0,0,18,39,0,34,23,18,16,1,4,0,23,0,0,5,1,0,0,0,0,0,2,0,3,74,88,8,0,4,96,77,19,13,31,5,14,14,31,54,46,0,0,90,0,10,0,0,0,33,65,40,99,99,6,2,10,7,27,9702,74,6,14,,,,,,,,9512.0,,,,9507.0,9505.0,9505.0,9503.0,,,,,,9408.0,9406.0,,,,,,,,10.0,,,,10.0,11.0,11.0,11.0,,,,,,11.0,9.0,240.0,31,14,5.0,9208,12.0,9402,10.0,9512,8911,9003.0,4.0,7.741935,95515,0,4,39.0
1,9401,1,5202,46.0,1.0,6.0,9.0,16,0.0,0.0,3.0,1.0,1.0,1.0,0.0,0.0,0.0,2.0,0.0,3.0,0.0,0.0,0,15,55,11,6,2,1,9.0,3611,940,998,99,0,0,50,50,67,0,0,31,6,4,2,6,4,14,0,0,2,0,1,4,34,41,43,32,42,45,32,33,46,21,13,14,33,23,10,4,2,11,16,36,22,15,12,1,5,4,21,75,55,23,9,69,4,3,24,317,360,99,99,0,0,0,0,0,0,0,5468,5218,12,10,96,4,97,3,9,59,94,88,55,95,5,4,1,3,5,4,2,18,44,5,0,0,0,97,98,98,98,99,94,0,83,76,73,21,5,0,0,0,4,0,0,0,91,91,91,94,4480.0,13.0,803.0,1088,1096,1026,1037,36175,2,6,2,5,15,14,13,10,33,2,5,2,5,15,14,14,10,32,6,2,66,3,56,44,9,80,14,0,0,0,0,0,0,6,0,2,24,32,12,71,70,83,58,81,57,64,57,99,99,0,22,24,4,21,13,2,1,6,0,4,1,0,3,1,0,6,13,1,2,8,18,11,4,3,4,10,7,11,1,6,2,1,16,69,5,2,160,5,5,12,21,7,30,20,14,24,4,24,10,0,0,0,8,15,0,55,10,11,0,0,2,0,3,1,1,2,3,1,1,0,3,0,0,0,42,39,50,7,27,16,99,92,53,5,10,2,26,56,97,99,0,0,0,96,0,4,0,0,0,99,0,99,99,99,20,4,6,5,12,9702,32,6,13,,,,,,,9512.0,,,,,,,9504.0,,,,,,,,,,,,,,,25.0,,,,,,,12.0,,,,,,,,,47.0,3,1,10.0,9310,25.0,9512,25.0,9512,9310,9504.0,18.0,15.666667,148535,0,2,1.0
2,9001,1,0,,,3.0,1.0,2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,20,29,33,6,8,1,1.0,7001,2040,2669,0,2,98,49,51,96,2,0,0,2,0,0,0,0,0,0,0,2,0,0,0,35,43,46,37,45,49,23,35,40,25,13,20,19,16,13,10,8,15,14,30,22,19,25,10,23,21,35,44,22,6,2,63,9,9,19,183,254,69,69,1,6,5,3,3,3,0,497,546,2,1,78,22,93,7,18,36,76,65,30,86,14,7,2,5,11,17,3,17,60,18,0,1,0,0,1,6,18,50,0,4,36,49,51,14,5,4,2,24,11,2,3,6,0,2,9,44,0.0,281.0,518.0,251,292,292,340,11576,32,18,20,15,12,2,0,0,1,20,19,24,18,16,2,0,0,1,28,8,31,11,38,62,8,74,22,0,0,0,0,0,2,2,1,21,19,24,6,61,65,73,59,70,56,78,62,82,99,4,10,5,2,6,12,0,1,9,5,18,20,5,7,6,0,11,33,4,3,2,12,3,3,2,0,7,8,3,3,6,7,1,8,74,3,1,120,22,20,28,16,6,5,3,1,23,1,16,6,0,0,0,10,21,0,28,23,32,8,1,14,1,5,0,0,7,0,0,0,0,0,1,0,0,2,84,96,3,0,0,92,65,29,9,22,3,12,23,50,69,31,0,0,0,6,35,44,0,15,22,77,17,97,92,9,2,6,5,26,9702,63,6,14,,,,,,,,,,9509.0,,9506.0,,9504.0,,9501.0,,,,9409.0,9407.0,9406.0,,,,,,,,,,11.0,,9.0,,9.0,,8.0,,,,8.0,7.0,6.0,202.0,27,14,2.0,9111,16.0,9207,5.0,9512,9001,9101.0,12.0,7.481481,15078,1,4,60.0
3,8701,0,2801,70.0,,1.0,4.0,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0,23,14,31,3,0,3,0.0,640,160,219,0,8,92,54,46,61,0,0,11,32,6,2,0,0,0,0,0,31,0,0,1,32,40,44,34,43,47,25,45,35,20,15,25,17,17,12,7,7,20,17,30,14,19,25,11,23,23,27,50,30,15,8,63,9,6,23,199,283,85,83,3,4,1,0,2,0,2,1000,1263,2,1,48,52,93,7,6,36,73,61,30,84,16,6,3,3,21,12,4,13,36,13,0,0,0,10,25,50,69,92,10,15,42,55,50,15,5,4,0,9,42,4,0,5,1,8,17,34,9340.0,67.0,862.0,386,388,396,423,15130,27,12,4,26,22,5,0,0,4,35,5,6,12,30,6,0,0,5,22,14,26,20,46,54,3,58,36,0,0,0,0,0,6,0,0,17,13,15,0,43,69,81,53,68,45,33,31,0,99,23,17,3,0,6,6,0,0,13,42,12,0,0,0,42,0,6,3,0,0,0,23,3,3,6,0,3,3,3,3,3,0,3,6,87,0,0,120,28,12,14,27,10,3,5,0,19,1,17,0,0,0,0,13,23,0,14,40,31,16,0,1,0,13,0,0,4,0,0,0,3,0,0,0,0,29,67,56,41,3,0,94,43,27,4,38,0,10,19,39,45,55,0,0,45,22,17,0,0,16,23,77,22,93,89,16,2,6,6,27,9702,66,6,14,,,,,,,,,9512.0,9509.0,,9508.0,,9505.0,9503.0,,,9411.0,9411.0,,,,,,,,,,,,10.0,10.0,,10.0,,7.0,11.0,,,6.0,11.0,,,,109.0,16,7,2.0,8711,11.0,9411,10.0,9512,8702,8711.0,9.0,6.8125,172556,1,4,41.0
4,8601,0,2001,78.0,1.0,3.0,2.0,60,1.0,0.0,9.0,0.0,4.0,1.0,0.0,0.0,0.0,4.0,0.0,1.0,0.0,1.0,1,28,9,53,26,3,2,,2520,627,761,99,0,0,46,54,2,98,0,0,1,0,0,0,0,0,0,0,0,0,0,0,33,45,50,36,46,50,27,34,43,23,14,21,13,15,20,12,5,13,15,34,19,19,31,7,27,16,26,57,36,24,14,42,17,9,33,235,323,99,98,0,0,0,0,0,0,0,576,594,4,3,90,10,97,3,0,42,82,49,22,92,8,20,3,17,9,23,1,1,1,0,21,58,19,0,1,2,16,67,0,2,45,52,53,16,6,0,0,0,9,0,0,0,25,58,74,83,5000.0,127.0,528.0,240,250,293,321,9836,24,29,23,13,4,4,0,0,2,21,30,22,16,4,5,0,0,3,35,8,11,14,20,80,4,73,22,1,1,0,0,0,3,1,2,1,24,27,3,76,61,73,51,65,49,80,31,81,99,10,17,8,2,6,15,3,7,22,2,9,0,7,2,2,0,6,1,5,2,2,12,2,7,6,4,15,29,4,3,26,3,2,7,49,12,1,120,16,20,30,13,3,12,5,2,26,1,20,7,1,1,1,15,28,4,9,16,53,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,65,99,0,0,0,90,45,18,25,34,0,1,3,6,33,67,0,0,9,14,72,3,0,0,99,1,21,99,96,6,2,7,11,43,9702,113,10,25,,,,,,9601.0,,,,,,9506.0,,,,,,,,,,,,,,,,15.0,,,,,,10.0,,,,,,,,,,,254.0,37,8,3.0,9310,15.0,9601,15.0,9601,7903,8005.0,14.0,6.864865,7112,1,2,26.0


In [34]:
from IPython.display import display

nan_counts = numerical.isna().sum()
nan_counts = nan_counts[nan_counts > 0]
nan_df = pd.DataFrame({'Column': nan_counts.index, 'NaN Count': nan_counts.values})
with pd.option_context('display.max_rows', None):
    display(nan_df)

Unnamed: 0,Column,NaN Count
0,AGE,23665
1,NUMCHLD,83026
2,INCOME,21286
3,WEALTH1,44732
4,MBCRAFT,52854
5,MBGARDEN,52854
6,MBBOOKS,52854
7,MBCOLECT,52914
8,MAGFAML,52854
9,MAGFEM,52854


There is still many numerical columns with lots of NaNs, the removal in the categorical ones didn't seem to have a big effect here. I will run again the filter_nulls function with a larger treshold on the numerical data to rmeove more of them than before. 

In [35]:
len(df[df['TARGET_D'] >= 50])

114

In [36]:
drop_columns = filter_nulls(numerical, perc=0.25)
drop_columns

['NUMCHLD',
 'WEALTH1',
 'MBCRAFT',
 'MBGARDEN',
 'MBBOOKS',
 'MBCOLECT',
 'MAGFAML',
 'MAGFEM',
 'MAGMALE',
 'PUBGARDN',
 'PUBCULIN',
 'PUBHLTH',
 'PUBDOITY',
 'PUBNEWFN',
 'PUBPHOTO',
 'PUBOPP',
 'WEALTH2',
 'RDATE_3',
 'RDATE_4',
 'RDATE_5',
 'RDATE_6',
 'RDATE_7',
 'RDATE_8',
 'RDATE_9',
 'RDATE_10',
 'RDATE_11',
 'RDATE_12',
 'RDATE_13',
 'RDATE_14',
 'RDATE_15',
 'RDATE_16',
 'RDATE_17',
 'RDATE_18',
 'RDATE_19',
 'RDATE_20',
 'RDATE_21',
 'RDATE_22',
 'RDATE_23',
 'RDATE_24',
 'RAMNT_3',
 'RAMNT_4',
 'RAMNT_5',
 'RAMNT_6',
 'RAMNT_7',
 'RAMNT_8',
 'RAMNT_9',
 'RAMNT_10',
 'RAMNT_11',
 'RAMNT_12',
 'RAMNT_13',
 'RAMNT_14',
 'RAMNT_15',
 'RAMNT_16',
 'RAMNT_17',
 'RAMNT_18',
 'RAMNT_19',
 'RAMNT_20',
 'RAMNT_21',
 'RAMNT_22',
 'RAMNT_23',
 'RAMNT_24']

Wealth might play a role when donating, therefore I'll keep the features related.
`WEALTH`: wealth rating of donor ususally including real estate, financial, and other wealth indicators
`WEALTH2`: relative wealth within each state

In [37]:
cols = ['WEALTH1', 'WEALTH2']
for item in cols:
    drop_columns.remove(item) 

In [38]:
df_less_nulls = df.drop(columns=drop_columns)

In [39]:
# To make sure that no rows of high-amount-donors were dropped, I compare the amount of rows with the amount before dropping.
len(df_less_nulls[df_less_nulls['TARGET_D'] >= 50])

114

In [40]:
df = df_less_nulls
df.shape

(95412, 370)

**Cleaning the columns `GEOCODE2`, `WEALTH1`, `ADI`, `DMA`, and `MSA`.**

I go back to working on the entire dataset with both numerical and categorical data and targets.

In [41]:
df['GEOCODE2'].unique()

array(['C', 'A', 'D', 'B', nan], dtype=object)

In [42]:
df['GEOCODE2'].dropna()

0        C
1        A
2        C
3        C
4        A
        ..
95407    C
95408    A
95409    B
95410    A
95411    C
Name: GEOCODE2, Length: 95093, dtype: object

In [43]:
df['WEALTH1'].isna().sum()

44732

In [44]:
df['WEALTH1'].value_counts()

9.0    7585
8.0    6793
7.0    6198
6.0    5825
5.0    5280
4.0    4810
3.0    4237
2.0    4085
1.0    3454
0.0    2413
Name: WEALTH1, dtype: int64

We are looking at wealth rating across (a sample of) society, hence I am assuming that the proportions of the existing categories are representative of the distribution of the missing values. I assign the null values proportionally to each existing category.

In [45]:
# Calculate the proportion of each existing category.
proportions = df['WEALTH1'].value_counts(normalize=True)
proportions

9.0    0.149665
8.0    0.134037
7.0    0.122297
6.0    0.114937
5.0    0.104183
4.0    0.094909
3.0    0.083603
2.0    0.080604
1.0    0.068153
0.0    0.047612
Name: WEALTH1, dtype: float64

In [46]:
counts = df['WEALTH1'].value_counts(normalize=True)
df['WEALTH1'].fillna(pd.Series(np.random.choice(counts.index, size=len(df.index), p=counts)), inplace=True)
df['WEALTH1'].value_counts()

9.0    14277
8.0    12796
7.0    11710
6.0    10957
5.0     9864
4.0     9130
3.0     8027
2.0     7671
1.0     6478
0.0     4502
Name: WEALTH1, dtype: int64

`ADI`, `DMA`, `MSA` are codes for geographic identification in the US
- ADI code: Area Deprivation Index code is a measure of socioeconomic deprivation at the neighborhood level. It is used to identify areas with high levels of poverty, unemployment, and other indicators of disadvantage.
- DMA code: Designated Market Area code is a geographic area defined by Nielsen Media Research to identify television viewing markets. DMAs are used by advertisers and broadcasters to determine the reach of television advertising.
- MSA code: Metropolitan Statistical Area code is a designation to refer to a delineation consisting of a city and its suburbs. MSAs are used to group counties and cities into specific geographic areas for population censuses and compilations of related statistical data.

In [47]:
df['ADI'].isna().sum()

132

In [48]:
df['DMA'].isna().sum()

132

In [49]:
nan_rows = df[df['MSA'].isna()]
nan_rows

Unnamed: 0,OSOURCE,STATE,ZIP,NOEXCH,MDMAUD,DOMAIN,CLUSTER,AGEFLAG,HOMEOWNR,GENDER,DATASRCE,GEOCODE,PETS,LIFESRC,PEPSTRFL,RFA_2,RFA_3,RFA_4,RFA_5,RFA_6,RFA_7,RFA_8,RFA_9,RFA_10,RFA_11,RFA_12,RFA_13,RFA_14,RFA_15,RFA_16,RFA_17,RFA_18,RFA_19,RFA_20,RFA_21,RFA_22,RFA_23,RFA_24,RFA_2R,RFA_2A,MDMAUD_R,MDMAUD_F,MDMAUD_A,GEOCODE2,MAILCODE,ODATEDW,TCODE,DOB,AGE,INCOME,WEALTH1,HIT,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,WEALTH2,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,CONTROLN,HPHONE_D,RFA_2F,CLUSTER2,TARGET_B,TARGET_D
577,BHG,FL,33756,0,XXXX,U3,9,E,H,F,2,14,,3,X,L1E,A1E,A1E,,S2E,S2E,S2E,S3E,S3E,S3E,S3E,S4E,S4D,S4D,S3D,S4D,S4D,A4D,,,A4D,A3D,A3D,L,E,X,X,X,,,8601,2,708,90.0,2.0,1.0,0,3,36,21,34,9,3,0,7.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30,9702,70,5,11,108.0,16,12,3.0,8904,10.0,9505,10.0,9505,8703,8802.0,11.0,6.750000,45057,1,1,,0,0.0
1119,SPN,FL,34642,0,XXXX,C2,29,,,F,Unknown,14,,,X,L1F,A1F,A1F,,A1F,L1E,L1E,L2E,,L2E,L2E,,L2E,,L3E,A1E,A1E,S2E,S2E,S2E,S2E,S2E,S2E,L,F,X,X,X,,,8601,0,0,,,9.0,0,0,42,8,75,11,0,0,6.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,9702,71,5,12,119.0,14,11,5.0,9003,15.0,9105,15.0,9510,8611,8703.0,4.0,8.500000,45323,0,1,,0,0.0
2250,DNA,GA,31535,0,XXXX,T2,36,E,U,M,3,,Y,3,,L1G,A1G,A1G,A1G,,,,,,I1G,I1G,,,,I2G,L1G,L1G,L1G,,L1G,L1G,,L2G,L,G,X,X,X,,,8801,1,6201,36.0,2.0,3.0,4,0,27,28,31,8,8,1,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,25,9702,56,6,12,155.0,6,3,20.0,8906,30.0,9207,30.0,9512,8901,8906.0,5.0,25.833333,30917,1,1,,0,0.0
3326,SYN,GA,31217,0,XXXX,C3,32,,,M,Unknown,,,1,X,L3G,N2G,N2G,N2G,N2G,N2G,N2G,F1F,,F1F,F1F,,P1F,,,,,,,,,,,L,G,X,X,X,,,9501,0,0,,,8.0,0,0,24,33,27,7,5,6,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,9702,22,6,13,75.0,3,2,20.0,9506,30.0,9509,25.0,9603,9506,9509.0,3.0,25.000000,30322,0,3,,0,0.0
5558,SSS,NC,28625,0,XXXX,T2,36,E,H,M,3,,,3,X,L4E,S4E,S4E,S4E,S4E,A4D,A4D,A4D,A4D,A4D,A4D,A4D,A4D,A4D,A3D,,A1D,A1D,A1D,A1D,A1D,,A1D,L,E,X,X,X,,,9001,1,2101,77.0,2.0,8.0,42,0,29,30,30,5,4,1,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,22,9702,52,6,14,68.0,13,7,1.0,9104,10.0,9512,5.0,9601,9011,9104.0,5.0,5.230769,21587,1,4,,1,6.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
90627,L01,WA,98375,0,XXXX,C2,27,E,H,M,2,,,,,L1F,A1F,A1F,A1F,A1F,A1F,A1F,A2F,A2F,A2F,A2F,A1F,A1F,,N2F,N2F,N2F,N1E,,,F1E,,F1E,L,F,X,X,X,,,9401,1,4801,50.0,3.0,4.0,0,9,37,63,11,5,5,7,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,9702,32,6,11,50.0,3,3,10.0,9401,20.0,9409,20.0,9506,9401,9409.0,8.0,16.666667,181677,0,1,,0,0.0
90993,LIF,SC,59887-,0,XXXX,C2,26,E,U,F,3,,,,,L1F,A1F,A1F,,N2F,N1E,N1E,N1E,N1E,N1E,N1E,F1E,F1E,,,P1E,P1E,,,,,,,L,F,X,X,X,,,9501,28,2501,73.0,4.0,3.0,0,0,35,23,44,8,5,0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,9702,23,4,10,30.0,2,2,10.0,9501,20.0,9511,20.0,9511,9501,9511.0,10.0,15.000000,24966,0,1,,0,0.0
92870,BHG,FL,34624,0,XXXX,U3,8,E,H,M,3,,,2,X,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4D,S4D,S4D,S4D,S3D,S3D,S3C,,,S2C,,A2C,L,E,X,X,X,,,8601,0,606,92.0,2.0,1.0,1,0,31,16,47,4,0,0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30,9702,67,6,13,85.0,24,13,2.0,9411,10.0,9506,5.0,9512,8608,8806.0,22.0,3.541667,45166,1,4,,0,0.0
93624,HAN,NC,28370,0,XXXX,C1,23,E,U,M,2,14,,,,L1F,A1F,A1F,,A1F,A1F,A1F,A1F,A1F,A1F,A1F,A1F,A1F,A2F,A2F,A1E,A1E,A1E,A1E,A1E,A1E,A1E,,L,F,X,X,X,,,9101,2,1101,87.0,3.0,2.0,0,0,58,13,69,4,5,2,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,9702,53,5,12,72.0,7,2,5.0,9206,17.0,9510,17.0,9510,9109,9202.0,5.0,10.285714,20332,0,1,,0,0.0


All three features have NaNs in the exact same rows. And there are only 132 in total. Therefore I will drop all NaNs.

In [50]:
df.dropna(subset=['ADI'], inplace=True)

In [51]:
# To confirm.
nan_rows = df[df['DMA'].isna()]
nan_rows

Unnamed: 0,OSOURCE,STATE,ZIP,NOEXCH,MDMAUD,DOMAIN,CLUSTER,AGEFLAG,HOMEOWNR,GENDER,DATASRCE,GEOCODE,PETS,LIFESRC,PEPSTRFL,RFA_2,RFA_3,RFA_4,RFA_5,RFA_6,RFA_7,RFA_8,RFA_9,RFA_10,RFA_11,RFA_12,RFA_13,RFA_14,RFA_15,RFA_16,RFA_17,RFA_18,RFA_19,RFA_20,RFA_21,RFA_22,RFA_23,RFA_24,RFA_2R,RFA_2A,MDMAUD_R,MDMAUD_F,MDMAUD_A,GEOCODE2,MAILCODE,ODATEDW,TCODE,DOB,AGE,INCOME,WEALTH1,HIT,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,WEALTH2,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,CONTROLN,HPHONE_D,RFA_2F,CLUSTER2,TARGET_B,TARGET_D


Before moving to EDA I drop the features from the drop_list from earlier.

In [52]:
drop_list

['OSOURCE', 'ZIP', 'MAILCODE', 'NOEXCH', 'MDMAUD']

In [53]:
df.drop(columns=drop_list, inplace=True)

In [54]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 95280 entries, 0 to 95411
Columns: 365 entries, STATE to TARGET_D
dtypes: float64(16), int64(309), object(40)
memory usage: 266.1+ MB


In [55]:
df.set_index('CONTROLN')

Unnamed: 0_level_0,STATE,DOMAIN,CLUSTER,AGEFLAG,HOMEOWNR,GENDER,DATASRCE,GEOCODE,PETS,LIFESRC,PEPSTRFL,RFA_2,RFA_3,RFA_4,RFA_5,RFA_6,RFA_7,RFA_8,RFA_9,RFA_10,RFA_11,RFA_12,RFA_13,RFA_14,RFA_15,RFA_16,RFA_17,RFA_18,RFA_19,RFA_20,RFA_21,RFA_22,RFA_23,RFA_24,RFA_2R,RFA_2A,MDMAUD_R,MDMAUD_F,MDMAUD_A,GEOCODE2,ODATEDW,TCODE,DOB,AGE,INCOME,WEALTH1,HIT,MALEMILI,MALEVET,VIETVETS,WWIIVETS,LOCALGOV,STATEGOV,FEDGOV,WEALTH2,POP901,POP902,POP903,POP90C1,POP90C2,POP90C3,POP90C4,POP90C5,ETH1,ETH2,ETH3,ETH4,ETH5,ETH6,ETH7,ETH8,ETH9,ETH10,ETH11,ETH12,ETH13,ETH14,ETH15,ETH16,AGE901,AGE902,AGE903,AGE904,AGE905,AGE906,AGE907,CHIL1,CHIL2,CHIL3,AGEC1,AGEC2,AGEC3,AGEC4,AGEC5,AGEC6,AGEC7,CHILC1,CHILC2,CHILC3,CHILC4,CHILC5,HHAGE1,HHAGE2,HHAGE3,HHN1,HHN2,HHN3,HHN4,HHN5,HHN6,MARR1,MARR2,MARR3,MARR4,HHP1,HHP2,DW1,DW2,DW3,DW4,DW5,DW6,DW7,DW8,DW9,HV1,HV2,HV3,HV4,HU1,HU2,HU3,HU4,HU5,HHD1,HHD2,HHD3,HHD4,HHD5,HHD6,HHD7,HHD8,HHD9,HHD10,HHD11,HHD12,ETHC1,ETHC2,ETHC3,ETHC4,ETHC5,ETHC6,HVP1,HVP2,HVP3,HVP4,HVP5,HVP6,HUR1,HUR2,RHP1,RHP2,RHP3,RHP4,HUPA1,HUPA2,HUPA3,HUPA4,HUPA5,HUPA6,HUPA7,RP1,RP2,RP3,RP4,MSA,ADI,DMA,IC1,IC2,IC3,IC4,IC5,IC6,IC7,IC8,IC9,IC10,IC11,IC12,IC13,IC14,IC15,IC16,IC17,IC18,IC19,IC20,IC21,IC22,IC23,HHAS1,HHAS2,HHAS3,HHAS4,MC1,MC2,MC3,TPE1,TPE2,TPE3,TPE4,TPE5,TPE6,TPE7,TPE8,TPE9,PEC1,PEC2,TPE10,TPE11,TPE12,TPE13,LFC1,LFC2,LFC3,LFC4,LFC5,LFC6,LFC7,LFC8,LFC9,LFC10,OCC1,OCC2,OCC3,OCC4,OCC5,OCC6,OCC7,OCC8,OCC9,OCC10,OCC11,OCC12,OCC13,EIC1,EIC2,EIC3,EIC4,EIC5,EIC6,EIC7,EIC8,EIC9,EIC10,EIC11,EIC12,EIC13,EIC14,EIC15,EIC16,OEDC1,OEDC2,OEDC3,OEDC4,OEDC5,OEDC6,OEDC7,EC1,EC2,EC3,EC4,EC5,EC6,EC7,EC8,SEC1,SEC2,SEC3,SEC4,SEC5,AFC1,AFC2,AFC3,AFC4,AFC5,AFC6,VC1,VC2,VC3,VC4,ANC1,ANC2,ANC3,ANC4,ANC5,ANC6,ANC7,ANC8,ANC9,ANC10,ANC11,ANC12,ANC13,ANC14,ANC15,POBC1,POBC2,LSC1,LSC2,LSC3,LSC4,VOC1,VOC2,VOC3,HC1,HC2,HC3,HC4,HC5,HC6,HC7,HC8,HC9,HC10,HC11,HC12,HC13,HC14,HC15,HC16,HC17,HC18,HC19,HC20,HC21,MHUC1,MHUC2,AC1,AC2,CARDPROM,MAXADATE,NUMPROM,CARDPM12,NUMPRM12,RAMNTALL,NGIFTALL,CARDGIFT,MINRAMNT,MINRDATE,MAXRAMNT,MAXRDATE,LASTGIFT,LASTDATE,FISTDATE,NEXTDATE,TIMELAG,AVGGIFT,HPHONE_D,RFA_2F,CLUSTER2,TARGET_B,TARGET_D
CONTROLN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1
95515,IL,T2,36,,,F,Unknown,,,,X,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,L,E,X,X,X,C,8901,0,3712,60.0,,1.0,0,0,39,34,18,10,2,1,5.0,992,264,332,0,35,65,47,53,92,1,0,0,11,0,0,0,0,0,0,0,11,0,0,0,39,48,51,40,50,54,25,31,42,27,11,14,18,17,13,11,15,12,11,34,25,18,26,10,23,18,33,49,28,12,4,61,7,12,19,198,276,97,95,2,2,0,0,7,7,0,479,635,3,2,86,14,96,4,7,38,80,70,32,84,16,6,2,5,9,15,3,17,50,25,0,0,0,2,7,13,27,47,0,1,61,58,61,15,4,2,0,0,14,1,0,0,2,5,17,73,0.0,177.0,682.0,307,318,349,378,12883,13,23,23,23,15,1,0,0,1,4,25,24,26,17,2,0,0,2,28,4,51,1,46,54,3,88,8,0,0,0,0,0,0,4,1,13,14,16,2,45,56,64,50,64,44,62,53,99,0,0,9,3,8,13,9,0,3,9,3,15,19,5,4,3,0,3,41,1,0,7,13,6,5,0,4,9,4,1,3,10,2,1,7,78,2,0,120,16,10,39,21,8,4,3,5,20,3,19,4,0,0,0,18,39,0,34,23,18,16,1,4,0,23,0,0,5,1,0,0,0,0,0,2,0,3,74,88,8,0,4,96,77,19,13,31,5,14,14,31,54,46,0,0,90,0,10,0,0,0,33,65,40,99,99,6,2,10,7,27,9702,74,6,14,240.0,31,14,5.0,9208,12.0,9402,10.0,9512,8911,9003.0,4.0,7.741935,0,4,39.0,0,0.0
148535,CA,S1,14,E,H,M,3,2,,,,L2G,A2G,A2G,A2G,A2G,A1E,A1E,A1E,A1E,A1E,A1E,,,,L1E,,,N1E,N1E,N1E,N1E,,F1E,L,G,X,X,X,A,9401,1,5202,46.0,6.0,9.0,16,0,15,55,11,6,2,1,9.0,3611,940,998,99,0,0,50,50,67,0,0,31,6,4,2,6,4,14,0,0,2,0,1,4,34,41,43,32,42,45,32,33,46,21,13,14,33,23,10,4,2,11,16,36,22,15,12,1,5,4,21,75,55,23,9,69,4,3,24,317,360,99,99,0,0,0,0,0,0,0,5468,5218,12,10,96,4,97,3,9,59,94,88,55,95,5,4,1,3,5,4,2,18,44,5,0,0,0,97,98,98,98,99,94,0,83,76,73,21,5,0,0,0,4,0,0,0,91,91,91,94,4480.0,13.0,803.0,1088,1096,1026,1037,36175,2,6,2,5,15,14,13,10,33,2,5,2,5,15,14,14,10,32,6,2,66,3,56,44,9,80,14,0,0,0,0,0,0,6,0,2,24,32,12,71,70,83,58,81,57,64,57,99,99,0,22,24,4,21,13,2,1,6,0,4,1,0,3,1,0,6,13,1,2,8,18,11,4,3,4,10,7,11,1,6,2,1,16,69,5,2,160,5,5,12,21,7,30,20,14,24,4,24,10,0,0,0,8,15,0,55,10,11,0,0,2,0,3,1,1,2,3,1,1,0,3,0,0,0,42,39,50,7,27,16,99,92,53,5,10,2,26,56,97,99,0,0,0,96,0,4,0,0,0,99,0,99,99,99,20,4,6,5,12,9702,32,6,13,47.0,3,1,10.0,9310,25.0,9512,25.0,9512,9310,9504.0,18.0,15.666667,0,2,1.0,0,0.0
15078,NC,R2,43,,U,M,3,,,,X,L4E,S4E,S4E,S4E,S4E,S4F,S4F,S4F,,S4F,S4F,S4F,S4F,S4F,S4F,,S4D,S4D,,,S4D,S4D,S3D,L,E,X,X,X,C,9001,1,0,,3.0,1.0,2,0,20,29,33,6,8,1,1.0,7001,2040,2669,0,2,98,49,51,96,2,0,0,2,0,0,0,0,0,0,0,2,0,0,0,35,43,46,37,45,49,23,35,40,25,13,20,19,16,13,10,8,15,14,30,22,19,25,10,23,21,35,44,22,6,2,63,9,9,19,183,254,69,69,1,6,5,3,3,3,0,497,546,2,1,78,22,93,7,18,36,76,65,30,86,14,7,2,5,11,17,3,17,60,18,0,1,0,0,1,6,18,50,0,4,36,49,51,14,5,4,2,24,11,2,3,6,0,2,9,44,0.0,281.0,518.0,251,292,292,340,11576,32,18,20,15,12,2,0,0,1,20,19,24,18,16,2,0,0,1,28,8,31,11,38,62,8,74,22,0,0,0,0,0,2,2,1,21,19,24,6,61,65,73,59,70,56,78,62,82,99,4,10,5,2,6,12,0,1,9,5,18,20,5,7,6,0,11,33,4,3,2,12,3,3,2,0,7,8,3,3,6,7,1,8,74,3,1,120,22,20,28,16,6,5,3,1,23,1,16,6,0,0,0,10,21,0,28,23,32,8,1,14,1,5,0,0,7,0,0,0,0,0,1,0,0,2,84,96,3,0,0,92,65,29,9,22,3,12,23,50,69,31,0,0,0,6,35,44,0,15,22,77,17,97,92,9,2,6,5,26,9702,63,6,14,202.0,27,14,2.0,9111,16.0,9207,5.0,9512,9001,9101.0,12.0,7.481481,1,4,60.0,0,0.0
172556,CA,R2,44,E,U,F,3,,,,X,L4E,S4E,S4E,S4E,S4E,S4E,S4E,S4E,,S4E,S4E,S4E,S4E,S4E,S4E,S2D,S2D,A1D,A1D,A1D,A1D,,,L,E,X,X,X,C,8701,0,2801,70.0,1.0,4.0,2,0,23,14,31,3,0,3,0.0,640,160,219,0,8,92,54,46,61,0,0,11,32,6,2,0,0,0,0,0,31,0,0,1,32,40,44,34,43,47,25,45,35,20,15,25,17,17,12,7,7,20,17,30,14,19,25,11,23,23,27,50,30,15,8,63,9,6,23,199,283,85,83,3,4,1,0,2,0,2,1000,1263,2,1,48,52,93,7,6,36,73,61,30,84,16,6,3,3,21,12,4,13,36,13,0,0,0,10,25,50,69,92,10,15,42,55,50,15,5,4,0,9,42,4,0,5,1,8,17,34,9340.0,67.0,862.0,386,388,396,423,15130,27,12,4,26,22,5,0,0,4,35,5,6,12,30,6,0,0,5,22,14,26,20,46,54,3,58,36,0,0,0,0,0,6,0,0,17,13,15,0,43,69,81,53,68,45,33,31,0,99,23,17,3,0,6,6,0,0,13,42,12,0,0,0,42,0,6,3,0,0,0,23,3,3,6,0,3,3,3,3,3,0,3,6,87,0,0,120,28,12,14,27,10,3,5,0,19,1,17,0,0,0,0,13,23,0,14,40,31,16,0,1,0,13,0,0,4,0,0,0,3,0,0,0,0,29,67,56,41,3,0,94,43,27,4,38,0,10,19,39,45,55,0,0,45,22,17,0,0,16,23,77,22,93,89,16,2,6,6,27,9702,66,6,14,109.0,16,7,2.0,8711,11.0,9411,10.0,9512,8702,8711.0,9.0,6.812500,1,4,41.0,0,0.0
7112,FL,S2,16,E,H,F,3,,,3,,L2F,A2F,A2F,A2F,A1D,I2D,A1E,A1E,L1D,A1E,A1E,L1D,L3D,,L3D,A2D,A2D,A3D,A3D,A3D,I4E,A3D,A3D,L,F,X,X,X,A,8601,0,2001,78.0,3.0,2.0,60,1,28,9,53,26,3,2,,2520,627,761,99,0,0,46,54,2,98,0,0,1,0,0,0,0,0,0,0,0,0,0,0,33,45,50,36,46,50,27,34,43,23,14,21,13,15,20,12,5,13,15,34,19,19,31,7,27,16,26,57,36,24,14,42,17,9,33,235,323,99,98,0,0,0,0,0,0,0,576,594,4,3,90,10,97,3,0,42,82,49,22,92,8,20,3,17,9,23,1,1,1,0,21,58,19,0,1,2,16,67,0,2,45,52,53,16,6,0,0,0,9,0,0,0,25,58,74,83,5000.0,127.0,528.0,240,250,293,321,9836,24,29,23,13,4,4,0,0,2,21,30,22,16,4,5,0,0,3,35,8,11,14,20,80,4,73,22,1,1,0,0,0,3,1,2,1,24,27,3,76,61,73,51,65,49,80,31,81,99,10,17,8,2,6,15,3,7,22,2,9,0,7,2,2,0,6,1,5,2,2,12,2,7,6,4,15,29,4,3,26,3,2,7,49,12,1,120,16,20,30,13,3,12,5,2,26,1,20,7,1,1,1,15,28,4,9,16,53,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,65,99,0,0,0,90,45,18,25,34,0,1,3,6,33,67,0,0,9,14,72,3,0,0,99,1,21,99,96,6,2,7,11,43,9702,113,10,25,254.0,37,8,3.0,9310,15.0,9601,15.0,9601,7903,8005.0,14.0,6.864865,1,2,26.0,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
184568,AK,C2,27,,,M,Unknown,,,,,L1G,F1G,F1G,,F1G,P1G,P1G,,,,,,,,,,,,,,,,,L,G,X,X,X,C,9601,1,0,,,1.0,0,14,36,47,11,7,8,13,,27380,7252,10037,99,0,0,50,50,78,10,6,4,5,0,0,0,1,1,0,0,3,1,0,2,28,35,38,29,38,41,30,45,37,18,16,31,25,15,8,3,1,20,18,31,18,13,7,3,5,20,32,48,28,10,4,58,15,3,24,195,271,54,38,8,32,24,14,0,0,0,988,1025,6,6,56,44,89,11,3,44,72,56,32,83,17,12,3,10,16,15,8,19,55,5,3,6,0,2,10,49,73,92,0,4,40,52,53,15,4,24,8,13,14,15,12,3,69,84,92,97,380.0,0.0,743.0,433,481,499,535,18807,11,13,13,21,22,13,4,2,2,9,11,11,21,24,16,4,2,2,9,6,70,6,63,37,27,76,15,2,2,0,0,0,5,2,1,2,18,20,2,69,81,89,73,83,69,69,57,61,94,7,15,16,5,10,21,0,3,11,1,11,2,3,3,1,4,6,4,7,3,3,17,7,5,3,1,9,8,7,14,7,8,13,6,59,7,0,136,2,7,28,33,8,15,8,3,26,2,19,8,8,15,2,20,35,5,48,15,11,25,1,5,1,9,0,0,4,1,1,1,0,0,1,1,0,4,26,92,3,2,4,95,60,19,3,14,0,7,32,78,91,9,6,5,86,1,12,0,0,1,93,7,98,99,98,16,4,4,3,6,9702,14,5,12,25.0,1,0,25.0,9602,25.0,9602,25.0,9602,9602,,,25.000000,0,1,12.0,0,0.0
122706,TX,C1,24,E,H,M,3,,,,,L1F,,,,,P1F,P1F,,,,,,,,,,,,,,,,,L,F,X,X,X,A,9601,1,5001,48.0,7.0,9.0,1,0,31,43,19,4,1,0,,1254,322,361,96,0,4,51,49,91,3,0,2,6,1,0,1,0,0,0,0,5,0,0,1,30,40,40,28,41,43,39,33,42,25,9,19,43,17,7,4,2,10,16,35,23,16,9,2,7,10,20,70,52,25,6,73,4,2,20,307,346,89,88,1,1,0,0,0,0,0,1679,1723,3,3,88,12,97,3,0,63,89,85,60,96,4,2,1,1,7,5,1,28,58,5,2,2,0,18,71,88,91,97,5,1,77,82,75,20,4,1,0,10,7,1,0,5,16,26,44,79,3360.0,201.0,618.0,806,836,802,849,26538,8,9,7,6,11,29,13,2,15,10,0,8,2,13,35,16,3,13,8,5,61,7,83,17,36,80,4,4,4,0,0,0,6,5,3,3,25,32,10,61,73,88,56,87,52,48,43,99,0,0,18,31,0,13,17,0,1,2,4,6,0,3,5,1,8,8,9,3,7,9,13,9,6,0,0,4,7,13,3,4,1,0,4,78,12,0,160,1,6,12,24,7,36,14,9,35,5,32,7,0,0,0,21,31,8,43,5,19,15,1,12,1,14,0,0,4,0,0,1,0,0,0,1,0,2,51,94,3,0,2,99,84,29,4,7,2,55,90,94,94,6,0,0,82,2,16,0,0,0,69,31,67,99,97,18,5,3,2,4,9702,10,3,8,20.0,1,0,20.0,9603,20.0,9603,20.0,9603,9603,,,20.000000,1,1,2.0,0,0.0
189641,MI,C3,30,,,M,Unknown,,,,X,L3E,S4E,S4E,S3E,S3E,,A2E,N3E,N3E,N3E,N3E,N3E,N3E,N2E,F1D,,F1D,,,P1D,P1D,,,L,E,X,X,X,B,9501,1,3801,60.0,,7.0,0,0,18,46,20,7,23,0,,552,131,205,99,0,0,53,47,82,14,0,1,9,0,0,0,0,0,0,0,9,0,0,0,28,35,37,30,41,44,32,46,38,17,13,34,21,9,9,9,4,21,17,32,20,10,18,7,17,27,29,44,31,14,5,45,19,5,31,179,268,96,95,1,2,1,0,0,0,0,376,377,4,3,66,34,95,5,10,37,64,43,21,80,20,16,2,14,21,20,9,20,49,12,7,7,1,0,0,0,1,9,0,2,45,51,54,14,5,2,0,0,31,2,0,0,3,34,78,91,4040.0,61.0,551.0,263,264,319,345,12178,21,26,20,18,12,0,3,0,0,26,18,17,11,21,0,6,0,0,10,13,26,26,43,57,3,83,17,0,0,0,0,0,0,0,0,25,17,17,0,69,69,70,69,70,69,77,24,62,0,25,5,13,9,5,22,0,2,14,0,13,9,5,2,0,0,4,14,3,11,0,10,5,2,0,5,6,19,3,19,7,23,0,0,52,18,0,120,5,3,51,23,7,11,0,6,32,4,27,7,0,0,0,9,18,0,46,0,20,20,2,8,0,14,0,0,0,1,0,0,0,0,1,0,0,6,82,92,5,3,0,93,42,12,6,51,0,0,0,0,0,99,0,0,97,0,0,0,0,4,99,0,99,99,99,5,2,3,11,14,9702,33,7,17,58.0,7,4,3.0,9603,10.0,9501,10.0,9610,9410,9501.0,3.0,8.285714,1,3,34.0,0,0.0
4693,CA,C1,24,E,H,F,2,4,,1,X,L4F,S4F,A3F,S4F,S4F,S4F,S4F,S4F,S4F,S4F,S4F,S4F,S4F,S4F,S3F,S2F,S2F,A1F,A1F,A1F,A1F,S2F,S3F,L,F,X,X,X,A,8601,0,4005,58.0,7.0,2.0,0,0,28,35,20,9,1,1,7.0,1746,432,508,99,0,0,47,53,92,1,1,5,8,0,1,2,0,1,0,0,5,0,0,3,34,42,45,36,45,49,25,38,40,22,12,21,21,18,12,7,9,13,16,34,20,17,20,4,16,9,26,65,41,17,6,56,9,8,27,262,324,99,99,0,0,0,0,5,4,1,2421,2459,11,10,88,12,99,1,0,44,85,71,36,84,16,8,2,6,9,12,6,19,56,16,0,0,0,89,96,99,99,99,9,0,90,65,68,18,5,0,0,0,12,0,0,0,88,88,90,91,8735.0,13.0,803.0,552,544,568,556,15948,7,4,11,18,38,15,5,3,0,4,6,15,19,38,13,4,3,0,25,2,46,3,43,57,9,80,11,0,0,0,0,1,2,6,0,24,18,28,11,52,73,88,60,85,57,70,54,99,99,0,14,16,6,16,17,0,2,12,1,11,2,0,2,1,0,2,22,4,6,4,19,4,7,2,4,6,7,9,4,9,1,1,7,72,8,2,140,7,6,20,35,12,15,5,6,29,4,21,10,0,0,0,13,28,1,35,18,20,8,0,3,1,9,0,0,2,6,1,2,0,0,0,0,0,14,50,83,8,4,5,99,85,43,9,25,0,0,6,17,99,1,0,0,99,0,1,0,0,0,99,0,99,99,99,12,3,6,3,36,9702,127,9,31,498.0,41,18,5.0,9011,21.0,9608,18.0,9701,8612,8704.0,4.0,12.146341,1,4,11.0,1,18.0


The dataset has still too many independent features and too many null values. To quickly reduce the amount of features I will drop those columns with a very high number of categories. Neither no variance nor extremely high variance is helpful in categorical columns.

In [56]:
remove_cols = []

for col in df:
    if len(df[col].unique()) > 50:
        display(df[col].value_counts())
        remove_cols.append(col)
        
len(remove_cols)

CA    17336
FL     8360
TX     7532
IL     6417
MI     5651
NC     4155
WA     3574
GA     3362
IN     2979
WI     2794
MO     2712
TN     2484
AZ     2404
OR     2180
MN     2175
CO     2030
SC     1758
AL     1700
KY     1620
OK     1617
LA     1592
KS     1293
IA     1271
AR     1016
MS      985
NV      978
NM      873
NE      754
UT      569
ID      533
MT      527
HI      449
SD      301
AK      282
WY      280
ND      260
AP       81
NY       71
VA       55
OH       52
PA       38
MD       34
NJ       26
MA       25
CT       23
AA       18
AE       14
ME       11
NH        8
VT        7
RI        6
WV        4
DE        3
DC        1
Name: STATE, dtype: int64

40    3977
35    3617
36    3604
27    3548
24    3535
49    3311
12    2998
18    2896
13    2705
30    2603
39    2598
45    2511
43    2384
11    2373
51    2318
14    2244
41    2172
44    1950
16    1943
2     1942
21    1891
8     1826
10    1804
46    1797
28    1629
17    1621
20    1567
53    1441
3     1428
42    1357
34    1356
23    1319
31    1282
22    1282
25    1272
38    1200
15    1171
1     1140
7     1074
5     1013
37     969
26     945
47     868
29     857
32     789
48     787
50     776
9      768
6      603
33     593
4      558
19     497
52     270
Name: CLUSTER, dtype: int64

A1F    21918
A1G     9168
A2F     6275
F1F     5966
A1E     5113
       ...  
S2B        2
S3C        2
A2C        1
S3B        1
N1C        1
Name: RFA_3, Length: 70, dtype: int64

A1F    21787
A1G     9082
A2F     6257
F1F     5991
A1E     5055
       ...  
S2C        4
S3C        3
S4B        2
A3C        2
S3B        1
Name: RFA_4, Length: 63, dtype: int64

A1F    15672
F1F     6696
A1G     6623
A1E     5423
A2F     5349
       ...  
A3B        1
U1C        1
P1A        1
A2B        1
I1D        1
Name: RFA_6, Length: 108, dtype: int64

A1F    10939
A1E     6589
A1G     4918
A2F     4825
F1F     3922
       ...  
N4C        1
I4D        1
A2B        1
L4C        1
L3C        1
Name: RFA_7, Length: 105, dtype: int64

A1F    11296
A1E     6891
A1G     5054
A2F     4954
F1F     3979
       ...  
I4E        2
U1D        1
L4C        1
I3E        1
L3C        1
Name: RFA_8, Length: 108, dtype: int64

A1F    9617
A1E    7025
A1G    4254
A2F    4147
S2E    2503
       ... 
A2B       2
A3B       2
N2A       1
U1D       1
I1E       1
Name: RFA_9, Length: 106, dtype: int64

A1F    9194
A1E    6145
A1G    3936
A2F    3510
A2E    2198
       ... 
L4F       1
A2B       1
L3D       1
I4C       1
A4B       1
Name: RFA_10, Length: 93, dtype: int64

A1F    9732
A1E    7018
A1G    4137
A2F    3729
S2E    2606
       ... 
S3B       8
A4B       4
S2B       4
A3B       3
A2B       2
Name: RFA_11, Length: 100, dtype: int64

A1F    9844
A1E    7109
A1G    4192
A2F    3777
A1D    2619
       ... 
S2B       4
A3B       3
A2B       2
U1C       1
F1B       1
Name: RFA_12, Length: 106, dtype: int64

A1F    9053
A1G    3945
A2F    3277
A2E    2690
S2E    2672
       ... 
N2C       1
U1G       1
L3G       1
L4E       1
U1F       1
Name: RFA_13, Length: 86, dtype: int64

A1F    8039
A1E    7759
A1G    3736
A1D    3573
A2F    2996
       ... 
I3G       4
I3F       3
L2D       3
N2B       1
U1D       1
Name: RFA_14, Length: 94, dtype: int64

A1E    7071
A1F    6435
A1D    3317
A1G    3049
A2F    2665
       ... 
P1E       2
P1C       1
I2C       1
S2A       1
I3C       1
Name: RFA_16, Length: 122, dtype: int64

A1E    6763
A1F    5316
A1D    3639
A1G    2300
A2E    2213
       ... 
A4B       2
A3B       1
S2A       1
A4A       1
S4A       1
Name: RFA_17, Length: 117, dtype: int64

A1E    7176
A1F    5498
A1D    3933
A1G    2403
A2E    2381
       ... 
P1B       1
S2A       1
S3A       1
A4A       1
N3B       1
Name: RFA_18, Length: 121, dtype: int64

A1E    7241
A1F    5336
A1D    4145
A1G    2325
S2E    2324
       ... 
A4B       3
P1B       2
S4A       2
L4C       1
A4A       1
Name: RFA_19, Length: 107, dtype: int64

A1E    6403
A1F    4935
A1D    3334
A1G    2188
A2E    1810
       ... 
U1E       2
I1G       2
L4E       1
L3F       1
U1D       1
Name: RFA_20, Length: 79, dtype: int64

A1E    6724
A1F    5114
A1D    3601
A1G    2240
A2E    1922
       ... 
A4B       2
S3A       2
P1B       2
S4A       2
A4A       1
Name: RFA_21, Length: 101, dtype: int64

A1E    7227
A1F    5311
A1D    4103
A1G    2313
S2E    2197
       ... 
S4A       2
A1A       1
F1B       1
L4C       1
A4A       1
Name: RFA_22, Length: 116, dtype: int64

A1F    4596
A1E    4344
A1G    2396
S2E    2239
S4D    2145
       ... 
4E        1
3F        1
U1C       1
U1F       1
L1D       1
Name: RFA_23, Length: 86, dtype: int64

A1E    7220
A1F    5022
A1D    4553
F1D    3311
A1G    2182
       ... 
U1C       2
P1B       2
A1C       2
U1G       1
L4C       1
Name: RFA_24, Length: 96, dtype: int64

9501    15341
8601    14574
9401    12047
9601    10108
9101     8539
9001     7707
9201     7528
8801     6660
8901     5336
9301     3915
8701     3446
9701       15
9509        4
9209        4
9212        3
9410        3
9510        3
8912        2
9109        2
9310        2
8501        2
9506        2
9309        2
8910        2
9009        2
9202        2
9302        2
9003        1
9205        1
8909        1
9402        1
9011        1
8707        1
9012        1
8612        1
8604        1
9312        1
9303        1
8401        1
9103        1
8609        1
8702        1
9512        1
8704        1
9010        1
8611        1
8711        1
9102        1
8608        1
9111        1
9511        1
8810        1
8804        1
8306        1
Name: ODATEDW, dtype: int64

0        40863
1        25653
2        16980
28        8250
1002      1860
3          834
4          365
28028       71
72          63
980         53
13          45
4002        44
14          22
116         17
45          17
18          16
42          15
22          13
24          11
39002       11
13002        7
30           7
228          5
6            5
23           5
136          5
202          4
21           3
94           2
134          2
17           2
6400         2
9            2
14002        2
4004         2
100          2
18002        2
7            1
44           1
24002        1
93           1
27           1
50           1
76           1
96           1
38           1
58002        1
12           1
72002        1
25           1
40           1
36           1
47           1
61           1
22002        1
Name: TCODE, dtype: int64

0       23601
4801     1476
5001     1325
3001     1288
2801     1223
        ...  
7304        1
9704        1
4           1
7504        1
8011        1
Name: DOB, Length: 947, dtype: int64

50.0    1927
76.0    1883
72.0    1811
68.0    1809
74.0    1798
        ... 
8.0        1
9.0        1
10.0       1
6.0        1
15.0       1
Name: AGE, Length: 96, dtype: int64

0     55565
1      8193
2      5617
3      3377
4      2818
      ...  
84        1
67        1
79        1
73        1
69        1
Name: HIT, Length: 75, dtype: int64

0     73886
1      9280
2      4321
3      2060
4      1185
      ...  
74        1
73        1
58        1
57        1
98        1
Name: MALEMILI, Length: 95, dtype: int64

31    4303
30    4286
32    4130
29    4112
33    4072
      ... 
81       3
83       2
90       1
98       1
80       1
Name: MALEVET, Length: 89, dtype: int64

0     5485
27    3070
28    3039
31    2949
29    2903
      ... 
90       5
91       4
88       3
93       2
92       2
Name: VIETVETS, Length: 95, dtype: int64

0     5247
32    2545
33    2473
28    2397
35    2332
      ... 
96      25
92      13
95       7
97       7
98       6
Name: WWIIVETS, Length: 100, dtype: int64

6     11005
5     11005
7      9996
4      9130
8      8465
9      6636
3      6615
0      6122
10     5056
2      4251
11     3874
12     2868
13     2189
1      1705
14     1578
15     1123
16      856
17      659
18      493
19      327
20      292
21      230
22      164
23      142
24      102
25       93
26       64
27       49
28       27
29       25
32       19
31       17
30       17
36       13
34       12
33       10
59        9
35        7
43        5
38        5
39        4
37        3
41        2
44        2
50        2
99        2
40        2
71        1
57        1
55        1
45        1
64        1
49        1
48        1
53        1
Name: LOCALGOV, dtype: int64

0     14291
2     14056
3     12664
1     10301
4      9921
      ...  
53        1
55        1
73        1
59        1
57        1
Name: STATEGOV, Length: 65, dtype: int64

0     20585
2     18334
1     16906
3     12075
4      7902
5      5126
6      3481
7      2440
8      1790
9      1276
10     1006
11      725
12      578
13      442
14      365
15      334
16      232
17      210
19      176
18      160
20      129
22      122
21      106
23       98
24       77
26       55
25       55
27       53
28       45
34       38
29       36
31       35
30       32
33       32
39       30
32       29
35       27
38       25
40       18
36       17
41       15
37       13
43       10
44        9
51        6
49        6
46        4
42        4
45        4
60        2
59        1
50        1
47        1
52        1
87        1
Name: FEDGOV, dtype: int64

0        667
1086      78
923       77
1094      75
834       73
        ... 
24707      1
25808      1
17922      1
12130      1
27380      1
Name: POP901, Length: 9906, dtype: int64

0        710
296      225
265      217
261      217
281      215
        ... 
11868      1
13376      1
8759       1
5450       1
7252       1
Name: POP902, Length: 4786, dtype: int64

0        685
340      174
485      164
380      163
381      163
        ... 
17127      1
21608      1
3102       1
8473       1
2037       1
Name: POP903, Length: 5698, dtype: int64

99    50548
0     35283
98      510
97      453
95      331
      ...  
45       33
42       32
37       32
25       32
39       30
Name: POP90C1, Length: 100, dtype: int64

0     76250
99     6029
1       637
98      404
96      342
      ...  
18       65
29       63
27       62
24       56
20       53
Name: POP90C2, Length: 100, dtype: int64

0     56329
99    17284
1      1073
2       918
3       792
      ...  
68      100
75       99
82       97
73       94
71       88
Name: POP90C3, Length: 100, dtype: int64

49    18086
50    17284
48    14263
51    10067
47     9820
      ...  
24        1
96        1
17        1
23        1
16        1
Name: POP90C4, Length: 81, dtype: int64

51    18089
50    17284
52    14238
49    10062
53     9842
      ...  
4         1
12        1
83        1
77        1
84        1
Name: POP90C5, Length: 81, dtype: int64

99    15689
98     8662
97     7133
96     6092
95     4961
      ...  
14       71
20       70
19       69
17       65
18       62
Name: ETH1, Length: 100, dtype: int64

0     31108
1     18193
2      9336
3      5785
4      4068
      ...  
76       44
83       42
77       38
85       37
88       37
Name: ETH2, Length: 100, dtype: int64

0     62143
1     24419
2      4306
3      1351
4       707
      ...  
77        1
54        1
80        1
68        1
63        1
Name: ETH3, Length: 85, dtype: int64

0     37045
1     23722
2     10555
3      5691
4      3725
      ...  
85        3
82        3
99        2
95        1
93        1
Name: ETH4, Length: 96, dtype: int64

1     22602
0     15671
2     11884
3      6999
4      4690
      ...  
98       34
82       32
83       23
97       22
99       13
Name: ETH5, Length: 100, dtype: int64

0     78717
1     11518
2      2735
3       826
4       412
      ...  
49        2
21        2
67        1
56        1
72        1
Name: ETH7, Length: 61, dtype: int64

0     74044
1     13114
2      3178
3      1468
4       769
5       560
6       398
7       255
8       174
9       151
10      150
12      133
11      113
13       98
16       58
15       58
14       54
17       49
19       49
20       36
22       35
18       35
23       33
21       21
24       20
26       20
33       18
25       18
28       16
38       15
29       13
34       13
36       13
32       12
31       10
39        9
27        9
35        9
40        8
30        7
41        5
72        4
50        3
52        3
46        2
70        2
45        2
37        2
42        2
61        2
98        1
44        1
47        1
59        1
64        1
62        1
55        1
75        1
99        1
58        1
Name: ETH8, dtype: int64

0     75799
1     11223
2      3361
3      1589
4       802
      ...  
28        1
55        1
46        1
50        1
38        1
Name: ETH9, Length: 61, dtype: int64

0     33900
1     20591
2      8336
3      5230
4      3765
      ...  
94       13
93       12
95        8
97        1
96        1
Name: ETH13, Length: 98, dtype: int64

0     87649
1      5360
2       851
3       257
4       218
      ...  
73        1
44        1
81        1
48        1
51        1
Name: ETH15, Length: 78, dtype: int64

0     42738
1     26130
2     11005
3      5803
4      3147
      ...  
62        1
68        1
73        1
51        1
63        1
Name: ETH16, Length: 66, dtype: int64

33    8125
32    7979
31    7682
34    7275
30    6713
      ... 
13       2
81       1
83       1
84       1
12       1
Name: AGE901, Length: 74, dtype: int64

41    7710
42    7599
40    7354
43    6739
39    6642
      ... 
80       8
81       4
82       2
83       1
84       1
Name: AGE902, Length: 67, dtype: int64

43    7870
44    7389
42    6987
45    6516
46    5922
41    5696
47    5551
40    4853
48    4730
39    4068
49    3739
38    3625
50    3148
37    2756
51    2563
52    2082
36    1704
53    1667
54    1427
55    1243
35    1199
56     920
34     873
57     862
58     696
0      667
33     638
59     616
60     557
61     479
65     467
62     467
63     417
64     383
32     315
66     305
67     234
69     218
70     213
68     182
31     157
71     123
74     119
72     111
30      92
73      88
76      79
75      66
29      66
77      50
28      22
79      20
78      15
80      12
27       6
82       4
81       4
83       1
84       1
Name: AGE903, dtype: int64

35    8006
34    7975
36    7655
33    7312
37    6762
      ... 
17       2
78       1
12       1
81       1
84       1
Name: AGE904, Length: 66, dtype: int64

44    8616
45    8449
43    8009
46    7861
42    7069
      ... 
20       6
78       1
19       1
81       1
84       1
Name: AGE905, Length: 62, dtype: int64

48    8694
47    8613
49    8263
46    7847
50    7536
45    6535
51    6330
44    5279
52    5188
43    4319
53    3781
42    3237
54    2769
41    2321
55    2053
40    1802
56    1396
39    1168
57    1046
38     805
58     798
0      667
59     649
60     505
37     491
61     395
62     379
36     297
63     292
64     241
65     209
66     172
35     165
68     116
67     107
69     105
34      86
74      80
70      78
71      77
32      75
73      66
75      58
33      57
72      56
76      41
31      17
30       8
28       3
29       2
27       2
78       1
81       1
84       1
77       1
Name: AGE906, dtype: int64

27    6763
26    6758
28    6278
25    6135
29    5841
      ... 
57       1
75       1
67       1
56       1
62       1
Name: AGE907, Length: 64, dtype: int64

38    6899
39    6466
37    6387
36    6083
40    5822
      ... 
12       1
89       1
90       1
91       1
87       1
Name: CHIL1, Length: 85, dtype: int64

40    10181
39     9871
41     9744
38     8837
42     7654
      ...  
66        1
5         1
65        1
71        1
69        1
Name: CHIL2, Length: 70, dtype: int64

21    9581
22    9515
20    8887
23    8398
19    7789
      ... 
86       1
72       1
94       1
79       1
89       1
Name: CHIL3, Length: 82, dtype: int64

11    10457
12    10058
10     9991
13     9025
9      8247
      ...  
94        2
87        2
68        1
99        1
96        1
Name: AGEC1, Length: 98, dtype: int64

20    6730
22    6390
21    6388
19    5926
23    5526
      ... 
64       2
66       1
75       1
88       1
69       1
Name: AGEC2, Length: 74, dtype: int64

19    8756
20    8649
21    7887
18    7634
22    6623
17    6240
23    5748
16    4902
24    4755
25    3800
15    3456
26    3083
27    2568
14    2503
28    2126
29    1754
13    1684
30    1481
12    1296
31    1142
32     966
11     860
33     802
0      779
34     683
10     528
35     498
9      488
36     408
8      385
7      344
37     337
1      314
6      242
38     227
5      196
39     196
40     163
2      158
4      140
3      120
41      69
42      65
44      55
43      47
45      30
46      29
47      21
48      20
49       6
50       5
63       3
99       3
58       2
51       2
53       1
60       1
Name: AGEC3, dtype: int64

10    8337
9     8175
11    7862
8     7681
7     7099
      ... 
60       3
99       2
61       2
56       1
58       1
Name: AGEC6, Length: 63, dtype: int64

3     9300
4     9250
5     8742
2     8334
6     8124
      ... 
69       2
77       1
78       1
90       1
99       1
Name: AGEC7, Length: 77, dtype: int64

15    11255
14    10762
16     9505
13     8740
17     7826
      ...  
75        1
47        1
72        1
66        1
49        1
Name: CHILC1, Length: 62, dtype: int64

33    12455
32    12272
34    10709
31    10339
35     8373
      ...  
60        3
54        2
52        2
63        2
80        1
Name: CHILC3, Length: 62, dtype: int64

21    11518
20    11228
22    10077
19    10002
18     8015
23     7762
17     6587
24     5261
16     4914
25     3495
15     3329
14     2470
26     2019
13     1575
27     1294
0      1185
12      912
28      728
11      555
29      424
10      363
9       263
30      256
8       170
31      162
32      109
7       103
33       79
6        68
34       51
5        47
4        44
50       43
2        21
35       21
3        18
36       17
99       14
40       12
1        11
37       10
38        9
42        8
39        6
43        6
41        6
49        2
60        2
67        2
57        1
46        1
75        1
61        1
44        1
52        1
55        1
Name: CHILC4, dtype: int64

15    12207
14    11813
16    11264
13     9905
17     8668
      ...  
98        1
65        1
92        1
95        1
69        1
Name: CHILC5, Length: 97, dtype: int64

21    3390
23    3300
24    3298
22    3287
26    3262
      ... 
89      18
78      18
99      12
96       5
98       2
Name: HHAGE1, Length: 99, dtype: int64

5     6799
6     6681
4     6603
7     6499
8     6218
      ... 
75       1
85       1
89       1
86       1
78       1
Name: HHAGE2, Length: 85, dtype: int64

19    3349
21    3305
20    3264
18    3256
22    3251
      ... 
87      14
99      11
95       7
96       2
97       2
Name: HHAGE3, Length: 99, dtype: int64

21    3997
17    3976
19    3950
18    3947
22    3918
      ... 
91       4
89       2
87       2
92       2
96       1
Name: HHN1, Length: 98, dtype: int64

34    6877
33    6829
32    6469
35    6327
31    5882
      ... 
81       4
87       3
83       2
77       1
3        1
Name: HHN2, Length: 85, dtype: int64

45    3061
46    3049
47    2983
43    2976
44    2964
      ... 
92       3
90       3
91       2
93       2
94       1
Name: HHN3, Length: 96, dtype: int64

25    4130
26    3994
27    3980
23    3964
28    3885
      ... 
86       2
79       1
93       1
81       1
82       1
Name: HHN4, Length: 86, dtype: int64

9     8221
10    8029
8     7855
11    7475
7     6837
      ... 
62       2
64       2
65       1
69       1
61       1
Name: HHN5, Length: 69, dtype: int64

2     19536
3     19311
4     13636
1     12538
5      8191
0      4963
6      4958
7      3131
8      1963
9      1414
10      984
11      819
12      588
13      523
14      455
15      309
16      285
17      243
18      201
19      163
20      159
21      123
22      110
23      101
24       91
25       85
26       73
27       44
28       43
29       41
30       40
31       32
32       21
33       15
35       14
34       13
38       10
44        7
36        7
40        6
43        6
37        5
99        4
41        4
50        4
67        2
39        2
46        2
53        1
51        1
54        1
45        1
42        1
Name: HHN6, dtype: int64

66    4254
65    4208
67    4041
64    3984
63    3954
      ... 
1        5
4        5
91       3
93       2
95       2
Name: MARR1, Length: 95, dtype: int64

10    9866
9     9693
8     9216
11    8762
7     7530
12    7258
13    6194
6     5848
14    4888
5     4138
15    4051
16    3171
17    2582
4     2467
18    2032
19    1550
3     1159
20    1078
21     813
0      714
22     577
23     404
2      358
24     307
25     166
26     103
27      74
1       66
28      53
29      24
33      21
31      20
30      19
32      17
38      11
36      10
34       8
50       6
99       5
35       5
37       4
43       2
40       2
57       1
52       1
44       1
47       1
56       1
51       1
67       1
53       1
Name: MARR2, dtype: int64

5     10748
6     10350
4     10315
7      9564
3      8595
8      8033
9      6709
10     5330
2      5255
11     4303
12     3089
13     2275
1      1614
14     1577
15     1250
0       939
16      929
17      762
18      611
19      511
20      393
21      344
23      242
22      225
24      209
25      172
27      118
26      113
28      102
29       92
30       78
36       47
33       47
32       45
34       41
31       41
35       34
43       31
40       21
39       21
42       19
37       14
45        9
38        9
44        8
41        7
46        6
49        6
47        5
48        4
62        3
54        3
73        2
56        2
53        2
50        2
67        1
52        1
51        1
59        1
Name: MARR3, dtype: int64

19    7373
18    7348
20    7253
21    6694
17    5901
      ... 
93       3
96       3
90       2
85       2
86       1
Name: MARR4, Length: 100, dtype: int64

179    1181
177    1162
180    1149
174    1140
178    1122
       ... 
406       1
438       1
391       1
456       1
403       1
Name: HHP1, Length: 393, dtype: int64

267    1079
260    1057
259    1056
264    1045
263    1040
       ... 
479       1
494       1
464       1
469       1
454       1
Name: HHP2, Length: 377, dtype: int64

99    9878
98    2604
77    1886
97    1883
75    1826
      ... 
5      207
9      196
7      196
10     188
6      180
Name: DW1, Length: 100, dtype: int64

99    6578
98    2531
71    1830
72    1772
74    1765
      ... 
7      273
5      270
10     267
6      261
13     253
Name: DW2, Length: 100, dtype: int64

0     35817
1     19369
2     10354
3      7018
4      4683
      ...  
68        1
67        1
63        1
99        1
62        1
Name: DW3, Length: 70, dtype: int64

0     16194
1      8331
2      4814
3      3788
4      3223
      ...  
89      160
91      155
83      152
90      136
93      132
Name: DW4, Length: 100, dtype: int64

0     24728
1      7401
2      4555
3      3608
4      3255
      ...  
94      124
79      122
92      119
90      115
89      104
Name: DW5, Length: 100, dtype: int64

0     34002
1      6766
2      4731
3      3722
4      3022
      ...  
90       90
91       86
88       80
99       72
92       67
Name: DW6, Length: 100, dtype: int64

0     66757
1      8319
2      5281
3      3030
4      2055
      ...  
74        1
96        1
91        1
94        1
89        1
Name: DW7, Length: 97, dtype: int64

0     74714
1      5827
2      4084
3      2289
4      1430
      ...  
51        1
56        1
94        1
67        1
86        1
Name: DW8, Length: 86, dtype: int64

0     83403
1      5681
2      1949
3       992
4       563
      ...  
70        1
75        1
94        1
72        1
80        1
Name: DW9, Length: 91, dtype: int64

0       985
675     262
550     211
875     187
425     180
       ... 
5729      1
5319      1
5472      1
3915      1
3347      1
Name: HV1, Length: 4434, dtype: int64

0       985
625     146
547     138
642     136
571     135
       ... 
177       1
4187      1
4580      1
5605      1
5680      1
Name: HV2, Length: 4623, dtype: int64

84    2670
82    2638
85    2544
81    2530
83    2491
      ... 
3      124
4      110
5      110
6       94
2       94
Name: HU1, Length: 100, dtype: int64

16    2669
18    2652
15    2542
19    2531
14    2484
      ... 
91     124
95     110
96     109
94      94
98      94
Name: HU2, Length: 100, dtype: int64

97    10020
96     9729
95     8668
98     8416
94     7464
      ...  
16        2
10        2
12        1
13        1
5         1
Name: HU3, Length: 93, dtype: int64

3     10064
4      9718
5      8675
2      8375
6      7461
      ...  
84        2
90        2
88        1
87        1
95        1
Name: HU4, Length: 94, dtype: int64

0     29521
3      4975
2      4574
4      4439
5      3935
      ...  
99       71
96       53
95       50
97       33
98       28
Name: HU5, Length: 100, dtype: int64

38    3619
36    3424
35    3419
39    3403
37    3381
      ... 
89       2
92       2
88       2
98       1
90       1
Name: HHD1, Length: 95, dtype: int64

77    3814
78    3610
76    3605
79    3598
75    3533
      ... 
6        6
2        3
5        3
1        1
3        1
Name: HHD2, Length: 100, dtype: int64

66    2868
67    2832
65    2824
64    2820
63    2779
      ... 
96       6
3        5
2        5
97       3
1        3
Name: HHD3, Length: 99, dtype: int64

28    3653
27    3609
29    3478
25    3424
26    3413
      ... 
83       2
85       2
93       1
87       1
86       1
Name: HHD4, Length: 90, dtype: int64

89    5809
90    5732
88    5627
91    5535
87    5250
      ... 
5        7
13       7
2        6
7        6
1        5
Name: HHD5, Length: 100, dtype: int64

11    5809
10    5732
12    5627
9     5535
13    5250
      ... 
88       8
95       7
87       7
98       6
93       6
Name: HHD6, Length: 100, dtype: int64

5     10981
6     10957
7     10401
4      9395
8      8651
      ...  
58        1
60        1
63        1
67        1
84        1
Name: HHD7, Length: 66, dtype: int64

4     13918
5     12940
3     11694
6     10812
7      8335
      ...  
58        1
60        1
52        1
64        1
83        1
Name: HHD9, Length: 64, dtype: int64

11    9155
12    8465
10    8457
13    7524
9     7020
      ... 
88       1
65       1
70       1
77       1
68       1
Name: HHD10, Length: 83, dtype: int64

15    4693
14    4684
16    4599
13    4566
17    4490
      ... 
73       2
85       2
76       1
80       1
84       1
Name: HHD11, Length: 82, dtype: int64

2     17311
3     16693
4     13541
5      9835
1      8250
      ...  
61        2
97        1
59        1
58        1
74        1
Name: HHD12, Length: 66, dtype: int64

19    6254
20    6104
21    6080
18    5870
22    5783
17    5539
16    5226
23    4884
15    4884
14    4326
24    4043
13    3759
12    3290
25    3059
11    2835
10    2469
0     2412
26    2160
9     2092
8     1782
27    1563
7     1530
6     1308
5     1211
4     1051
28    1034
1      978
3      945
2      861
29     694
30     446
31     280
32     168
33     118
34      51
35      44
36      30
37      29
38      23
39      21
41      16
43       4
46       4
40       4
71       3
45       3
52       2
44       2
42       2
48       2
47       1
75       1
Name: ETHC1, dtype: int64

59    4448
60    4387
58    4343
56    4172
55    4165
      ... 
87       7
89       6
91       3
93       2
92       1
Name: ETHC2, Length: 95, dtype: int64

12    4383
13    4336
15    4308
14    4225
16    4145
      ... 
80      13
85      12
97       5
99       4
98       1
Name: ETHC3, Length: 100, dtype: int64

0     38095
1     19138
2      8730
3      5187
4      3517
      ...  
75        2
99        2
96        1
86        1
81        1
Name: ETHC5, Length: 78, dtype: int64

0     74953
1      9089
2      3126
3      1838
4      1249
5       895
6       740
7       563
8       417
9       365
10      284
11      215
12      188
13      182
14      146
15      124
17       99
16       95
19       86
18       72
21       60
20       59
23       55
24       53
22       50
25       46
27       43
28       29
29       24
26       23
30       16
31       15
36       12
34       12
32       11
33        9
40        6
35        6
37        4
38        4
53        3
41        2
39        2
99        1
44        1
51        1
45        1
54        1
74        1
43        1
81        1
42        1
57        1
Name: ETHC6, dtype: int64

0     39919
1     13769
2      5680
3      3391
4      2371
      ...  
58      140
55      134
62      123
64      120
59      115
Name: HVP1, Length: 100, dtype: int64

0     23965
1     13011
2      6773
3      4216
4      3075
      ...  
61      187
72      187
58      171
73      170
65      168
Name: HVP2, Length: 100, dtype: int64

0     8560
1     6520
99    6164
2     4937
3     3976
      ... 
77     303
62     302
70     297
72     297
66     282
Name: HVP3, Length: 100, dtype: int64

99    12148
98     3792
0      2823
97     2326
1      1811
      ...  
63      440
57      439
54      424
69      413
65      412
Name: HVP4, Length: 100, dtype: int64

99    26659
98     5650
97     3321
96     2679
95     2106
      ...  
6       212
5       208
3       191
2       167
1       106
Name: HVP5, Length: 100, dtype: int64

0     60986
1     10168
2      3653
3      2230
4      1565
      ...  
62       70
46       70
86       67
68       65
63       49
Name: HVP6, Length: 100, dtype: int64

1     18487
0     16963
2     13924
3      9691
4      6732
      ...  
92        1
80        1
91        1
96        1
93        1
Name: HUR1, Length: 96, dtype: int64

41    2114
40    2091
39    2068
42    2065
36    2060
      ... 
97     286
1      278
96     276
98     220
99     163
Name: HUR2, Length: 100, dtype: int64

49    5175
52    5124
51    4980
50    4961
53    4687
      ... 
10       2
14       2
85       2
9        1
6        1
Name: RHP1, Length: 82, dtype: int64

53    5681
52    5573
54    5509
55    4918
51    4858
      ... 
88       3
12       3
18       2
89       1
90       1
Name: RHP2, Length: 81, dtype: int64

0     17767
1      9897
2      6320
3      5162
4      4634
      ...  
99        8
92        7
90        4
93        3
95        2
Name: HUPA1, Length: 96, dtype: int64

0     44422
1      6831
2      4362
3      3324
4      2617
      ...  
98       46
90       39
89       38
88       36
92       22
Name: HUPA2, Length: 100, dtype: int64

0     48227
1      5028
2      2402
3      1818
4      1703
      ...  
96       16
94       16
88       15
92       15
95       12
Name: HUPA3, Length: 100, dtype: int64

8     6111
9     5902
7     5902
10    5827
6     5641
      ... 
96       1
70       1
78       1
91       1
65       1
Name: HUPA4, Length: 97, dtype: int64

0     25381
1     13592
2      9143
3      7160
4      5772
      ...  
80        1
67        1
69        1
86        1
84        1
Name: HUPA5, Length: 77, dtype: int64

0     36406
1      6863
2      4742
3      3918
4      3005
      ...  
93       39
90       37
99       36
96       32
98       28
Name: HUPA6, Length: 100, dtype: int64

0     15910
1      7380
2      4655
3      3296
4      2615
      ...  
66      394
51      392
49      382
68      379
62      368
Name: RP1, Length: 100, dtype: int64

0     6351
1     4087
2     3250
3     2659
4     2238
      ... 
48     483
51     467
49     461
52     450
66     442
Name: RP2, Length: 100, dtype: int64

99    3826
97    3287
98    3088
96    2974
95    2647
      ... 
37     527
41     526
34     497
49     490
46     469
Name: RP3, Length: 100, dtype: int64

99    7684
98    5918
97    5305
96    4593
95    3977
      ... 
6       82
4       46
3       37
2       17
1        5
Name: RP4, Length: 100, dtype: int64

0.0       21333
4480.0     4606
1600.0     4059
2160.0     2586
520.0      1685
          ...  
9140.0        1
3200.0        1
9280.0        1
743.0         1
8480.0        1
Name: MSA, Length: 298, dtype: int64

13.0     7296
51.0     4622
65.0     3765
57.0     2836
105.0    2617
         ... 
651.0       1
103.0       1
601.0       1
161.0       1
147.0       1
Name: ADI, Length: 204, dtype: int64

803.0    7296
602.0    4632
807.0    3765
505.0    2839
819.0    2588
         ... 
569.0       1
554.0       1
584.0       1
552.0       1
516.0       1
Name: DMA, Length: 206, dtype: int64

0       746
263     426
313     411
213     409
258     402
       ... 
1339      1
1198      1
1115      1
1130      1
1224      1
Name: IC1, Length: 1134, dtype: int64

0       796
288     467
313     390
263     376
315     376
       ... 
1356      1
1265      1
1126      1
1428      1
1345      1
Name: IC2, Length: 1213, dtype: int64

0       746
271     393
278     378
280     377
279     373
       ... 
910       1
87        1
1119      1
1117      1
1096      1
Name: IC3, Length: 1091, dtype: int64

0       796
344     370
320     350
325     347
346     346
       ... 
1243      1
1316      1
1282      1
58        1
1029      1
Name: IC4, Length: 1156, dtype: int64

0        728
22875     41
13103     27
12577     26
10931     26
        ... 
24046      1
21517      1
25704      1
31007      1
19081      1
Name: IC5, Length: 21514, dtype: int64

11    2887
10    2848
13    2789
6     2739
8     2730
      ... 
93       3
89       3
92       2
95       2
96       1
Name: IC6, Length: 99, dtype: int64

19    5125
20    5110
21    4978
18    4816
22    4468
      ... 
69       1
54       1
82       1
57       1
64       1
Name: IC7, Length: 65, dtype: int64

16    7288
17    6992
18    6864
15    6831
14    6000
19    5953
13    5278
20    4988
12    4531
21    4091
11    3860
10    3329
22    3207
9     2857
23    2722
8     2595
7     2120
24    1971
25    1703
6     1664
0     1450
26    1306
5     1246
4     1113
27     975
3      743
28     711
29     541
2      491
30     423
31     325
32     225
1      203
33     182
34     118
35      89
36      80
37      47
38      34
39      24
41      19
40      15
42      15
43       9
44       9
46       8
45       8
49       5
99       5
47       4
64       3
53       2
67       1
56       1
50       1
63       1
60       1
82       1
59       1
48       1
Name: IC8, dtype: int64

18    5431
19    5366
17    5361
16    5235
15    5082
      ... 
72       1
59       1
69       1
52       1
67       1
Name: IC9, Length: 66, dtype: int64

8     4235
9     4218
10    4217
7     4161
11    3914
      ... 
63       2
64       2
59       2
68       1
65       1
Name: IC10, Length: 67, dtype: int64

0     52038
1     16353
2      8886
3      4713
4      2986
      ...  
59        1
67        1
82        1
70        1
63        1
Name: IC14, Length: 71, dtype: int64

0     5447
4     4471
3     4345
5     4324
6     4195
      ... 
91       1
95       1
96       1
86       1
89       1
Name: IC15, Length: 96, dtype: int64

20    3840
19    3839
21    3838
16    3790
18    3761
      ... 
75       1
71       1
72       1
77       1
81       1
Name: IC16, Length: 78, dtype: int64

18    5731
17    5599
19    5510
16    5236
20    5157
      ... 
75       1
68       1
69       1
63       1
82       1
Name: IC17, Length: 72, dtype: int64

20    4906
21    4894
22    4862
19    4755
23    4686
      ... 
62       2
82       2
68       1
69       1
64       1
Name: IC18, Length: 71, dtype: int64

12    3567
13    3519
11    3493
10    3407
9     3344
      ... 
84       1
68       1
67       1
81       1
66       1
Name: IC19, Length: 73, dtype: int64

0     15992
2      9835
3      8843
4      7261
5      6047
1      5989
6      5150
7      4283
8      3609
9      3409
10     3169
11     2692
12     2382
13     2236
14     1842
15     1786
16     1561
17     1454
18     1204
19     1007
20      977
21      753
22      676
23      631
24      515
25      415
26      317
27      239
28      206
29      156
30      118
31      107
32      105
33       73
36       52
35       48
34       35
37       25
38       20
39       12
40        9
42        9
55        5
41        5
46        4
43        4
45        3
99        2
54        2
48        2
44        1
52        1
50        1
62        1
Name: IC20, dtype: int64

0     52574
1     13034
2      8393
3      5078
4      3423
      ...  
69        2
53        2
73        1
87        1
72        1
Name: IC23, Length: 75, dtype: int64

29    2979
24    2959
26    2940
28    2913
23    2861
      ... 
96       9
95       5
94       4
97       3
98       2
Name: HHAS1, Length: 100, dtype: int64

2     11011
3     10468
0     10022
4      9339
5      7963
      ...  
62        1
56        1
77        1
86        1
68        1
Name: HHAS2, Length: 72, dtype: int64

44    2243
43    2195
42    2189
49    2128
46    2127
      ... 
91      17
93      16
95       9
97       6
96       1
Name: HHAS3, Length: 99, dtype: int64

3     6631
4     6537
5     6276
2     6222
6     5281
      ... 
89       1
88       1
81       1
83       1
80       1
Name: HHAS4, Length: 93, dtype: int64

41    2654
44    2637
43    2633
46    2628
40    2578
      ... 
7        7
5        4
3        2
1        1
4        1
Name: MC1, Length: 99, dtype: int64

59    2662
56    2637
57    2623
54    2619
52    2577
      ... 
94      10
93       7
95       4
97       2
96       1
Name: MC2, Length: 99, dtype: int64

5     6293
6     5913
4     5708
7     5661
8     5428
      ... 
94       3
91       2
80       1
92       1
93       1
Name: MC3, Length: 98, dtype: int64

81    4580
82    4418
80    4392
83    4280
79    4248
      ... 
6        4
3        3
13       2
4        2
2        2
Name: TPE1, Length: 99, dtype: int64

11    6784
12    6555
13    6420
10    6389
9     5814
      ... 
58       1
84       1
66       1
55       1
81       1
Name: TPE2, Length: 70, dtype: int64

0     50392
1     14924
2      8125
3      4950
4      3237
      ...  
75        1
82        1
71        1
73        1
63        1
Name: TPE3, Length: 72, dtype: int64

0     56120
1     14269
2      7424
3      4253
4      2813
      ...  
57        1
75        1
82        1
53        1
63        1
Name: TPE4, Length: 67, dtype: int64

2     15779
0     15579
1     14068
3     12590
4      9551
      ...  
69        1
65        1
96        1
76        1
85        1
Name: TPE8, Length: 83, dtype: int64

2     17849
0     16780
3     14520
1     14101
4      9957
5      6580
6      4430
7      2952
8      2090
9      1381
10      992
11      755
12      585
13      384
14      364
15      283
16      202
17      164
18      156
19      121
20      113
21       68
24       58
22       55
26       44
25       41
23       39
27       35
28       25
29       17
31       15
32       12
39       12
34       11
35       10
33       10
99        9
30        9
36        9
40        7
43        6
45        5
58        3
49        3
46        2
41        2
55        2
37        2
54        2
38        2
44        1
53        1
52        1
59        1
42        1
56        1
Name: TPE9, dtype: int64

0     44701
1     22045
2     11287
3      5340
4      2803
      ...  
79        2
99        1
83        1
73        1
77        1
Name: PEC1, Length: 82, dtype: int64

0     7118
3     5090
2     5073
4     4487
5     4274
      ... 
94       9
96       7
95       7
91       6
97       2
Name: PEC2, Length: 99, dtype: int64

18    7101
19    6979
21    6611
20    6571
17    6481
      ... 
61       2
58       1
73       1
75       1
64       1
Name: TPE10, Length: 67, dtype: int64

23    6366
22    6339
21    6140
24    6130
20    5790
      ... 
61       1
75       1
64       1
71       1
58       1
Name: TPE11, Length: 62, dtype: int64

2     12334
3     12239
0     11418
4     10440
5      8104
1      7597
6      6466
7      4992
8      3929
9      3035
10     2606
11     2094
12     1654
13     1349
14     1187
15     1024
16      813
17      614
18      526
19      493
20      361
21      298
22      213
23      210
24      197
27      134
25      132
26      117
28       99
29       85
32       66
33       63
30       55
31       53
34       43
37       34
35       26
38       23
36       22
44       22
39       19
40       17
50       15
41       11
43       11
48        8
47        7
42        6
46        4
52        3
57        3
54        2
85        1
58        1
99        1
51        1
45        1
67        1
49        1
Name: TPE12, dtype: int64

72    2695
71    2668
70    2665
74    2640
69    2629
      ... 
5        7
3        7
98       6
2        5
1        1
Name: TPE13, Length: 100, dtype: int64

66    3615
68    3607
67    3601
65    3569
70    3448
      ... 
2        8
97       7
98       7
3        5
1        1
Name: LFC1, Length: 100, dtype: int64

78    3724
76    3687
80    3670
79    3655
77    3621
      ... 
5       10
2        5
4        5
1        2
3        2
Name: LFC2, Length: 100, dtype: int64

58    3387
57    3304
59    3302
56    3187
55    3147
      ... 
95       8
94       6
96       5
97       5
98       1
Name: LFC3, Length: 99, dtype: int64

75    3464
76    3436
74    3313
77    3309
73    3292
      ... 
5       20
4        5
2        2
1        2
3        2
Name: LFC4, Length: 100, dtype: int64

55    3190
53    3171
54    3145
56    3099
57    3057
      ... 
93      11
96       5
94       4
95       3
97       1
Name: LFC5, Length: 98, dtype: int64

68    3152
69    3094
65    3076
67    2972
70    2955
      ... 
7        5
6        4
5        2
4        1
3        1
Name: LFC6, Length: 98, dtype: int64

55    2704
52    2659
50    2638
53    2600
54    2535
      ... 
2       20
96      17
1        8
97       8
98       1
Name: LFC7, Length: 100, dtype: int64

99    28256
0      9181
81     1355
76     1332
88     1312
      ...  
7        21
5         8
6         7
4         4
3         2
Name: LFC8, Length: 98, dtype: int64

99    46942
0     30368
89      597
92      551
93      548
      ...  
9         5
5         4
10        3
12        1
4         1
Name: LFC9, Length: 96, dtype: int64

0     27990
2      6627
3      6510
4      6019
5      5195
      ...  
89        1
91        1
83        1
72        1
86        1
Name: LFC10, Length: 91, dtype: int64

11    5982
10    5977
9     5819
8     5482
12    5397
      ... 
61       2
78       1
68       1
67       1
77       1
Name: OCC1, Length: 70, dtype: int64

8     6629
9     6550
10    6108
7     5995
11    5389
      ... 
53       2
67       1
79       1
55       1
59       1
Name: OCC2, Length: 61, dtype: int64

11    8027
12    7909
10    7858
13    7176
9     6968
14    6544
8     6041
15    5746
7     4771
16    4700
17    3969
6     3444
18    3296
19    2490
5     2437
20    2040
0     1657
4     1615
21    1578
22    1344
23     992
3      909
24     780
25     537
2      494
26     406
27     315
28     243
29     220
31     133
1      121
30     117
33      72
32      67
36      39
34      35
35      35
38      25
37      21
40      21
43      13
42      10
39      10
44       8
45       6
41       5
99       5
54       4
47       4
61       4
49       3
48       3
50       3
65       3
51       2
57       1
55       1
67       1
56       1
66       1
Name: OCC4, dtype: int64

14    7953
15    7428
13    7302
16    6884
17    6805
      ... 
55       1
49       1
61       1
60       1
57       1
Name: OCC5, Length: 63, dtype: int64

9     7666
10    7647
8     7431
7     7270
11    6877
      ... 
64       1
69       1
68       1
59       1
56       1
Name: OCC8, Length: 63, dtype: int64

0     34147
1     19128
2     12104
3      7650
4      4952
      ...  
62        2
58        1
48        1
73        1
70        1
Name: OCC9, Length: 69, dtype: int64

12    6764
11    6437
10    6390
13    6373
9     6016
      ... 
47       1
65       1
67       1
43       1
64       1
Name: OCC10, Length: 61, dtype: int64

2     10131
3      9859
0      9756
4      8300
1      7719
      ...  
57        1
65        1
54        1
52        1
70        1
Name: OCC11, Length: 61, dtype: int64

0     30444
1     19526
2     13107
3      8096
4      5305
      ...  
66        2
75        1
83        1
73        1
70        1
Name: EIC1, Length: 75, dtype: int64

0     73934
1      9837
2      4193
3      2110
4      1373
5       816
6       594
7       466
8       345
9       209
10      184
11      173
12      162
13      138
14      105
15       87
16       78
17       65
18       50
19       45
20       36
22       32
23       31
21       29
24       27
26       23
25       16
27       16
28       15
34       14
30       10
29        8
33        7
36        6
31        5
37        5
32        4
43        3
38        3
39        3
41        3
61        2
55        2
48        2
40        2
35        2
56        2
51        2
47        1
65        1
52        1
53        1
45        1
58        1
Name: EIC2, dtype: int64

5     11183
6     10871
4     10298
7      9517
3      8278
8      8064
9      6209
2      5836
0      5022
10     4539
11     3493
1      2604
12     2568
13     1796
14     1274
15      992
16      709
17      568
18      378
19      279
20      200
21      148
22      103
23       70
24       55
25       43
27       33
26       29
29       28
31       15
28       14
32        7
30        7
34        6
33        6
99        6
38        5
36        5
39        4
35        3
45        2
42        2
50        2
60        2
46        1
56        1
77        1
43        1
48        1
37        1
40        1
Name: EIC3, dtype: int64

11    4154
12    4089
10    4085
13    4071
9     4041
      ... 
69       4
72       3
70       2
83       1
79       1
Name: EIC4, Length: 73, dtype: int64

16    7622
15    7360
17    7248
14    6951
18    6907
      ... 
54       1
59       1
63       1
85       1
66       1
Name: EIC8, Length: 71, dtype: int64

7     10881
8     10471
6     10400
5      8745
9      8708
10     6903
4      6601
11     5440
3      4429
12     4056
13     3172
0      2811
2      2623
14     2240
15     1646
16     1287
17      942
1       927
18      697
19      502
20      428
21      277
22      227
23      190
24      120
25       88
26       76
27       68
28       64
29       41
30       29
32       22
31       22
33       20
34       17
35       13
42       12
40       11
36        8
39        8
38        7
44        7
41        6
46        6
37        6
48        5
99        4
57        4
51        2
50        2
55        2
49        2
60        2
64        1
68        1
75        1
Name: EIC13, dtype: int64

7     10588
6     10390
8      9180
5      9133
9      7493
      ...  
99        2
63        1
69        1
70        1
77        1
Name: EIC14, Length: 69, dtype: int64

5     12174
4     11613
6     10923
7      9254
3      9063
8      7258
2      5633
9      5629
0      4279
10     4075
11     3142
12     2356
1      2229
13     1780
14     1335
15      963
16      801
17      595
18      459
19      394
20      297
21      225
22      164
23      144
24       99
25       78
27       57
26       47
30       44
29       30
28       27
31       15
32       11
33       11
34       10
38        9
37        7
45        6
44        6
52        4
99        4
50        4
59        3
54        3
35        3
46        2
48        2
76        2
68        2
51        1
39        1
82        1
65        1
42        1
55        1
61        1
36        1
60        1
Name: EIC15, dtype: int64

3     14777
2     13679
4     12747
5      9536
0      8876
1      7946
6      7255
7      5293
8      3597
9      2570
10     1952
11     1408
12     1121
13      974
14      668
15      562
16      427
17      332
18      262
19      221
20      203
21      153
22      121
23      108
24      101
25       84
26       68
27       40
29       39
28       30
30       24
33       19
31       16
34       11
35       10
36       10
37        8
32        7
43        4
44        4
39        4
55        2
42        2
38        2
54        1
99        1
57        1
59        1
47        1
52        1
53        1
Name: EIC16, dtype: int64

5     11345
6     11295
7     10259
4      9372
8      8764
9      6942
3      6790
10     5219
2      4278
11     4015
0      3728
12     2956
13     2193
1      1727
14     1620
15     1111
16      829
17      688
18      497
19      328
20      290
21      237
22      155
23      137
24      107
25       91
26       76
27       53
28       28
29       25
32       16
31       16
36       13
34       13
30       12
35        9
33        8
59        8
39        4
43        4
37        4
40        2
50        2
41        2
44        2
99        2
49        1
42        1
57        1
64        1
45        1
55        1
38        1
53        1
Name: OEDC1, dtype: int64

2     14611
3     13161
0     11851
1     10506
4     10125
      ...  
55        2
61        1
99        1
54        1
53        1
Name: OEDC2, Length: 63, dtype: int64

2     18877
0     18495
1     17332
3     12642
4      8007
5      5204
6      3591
7      2463
8      1891
9      1287
10     1010
11      746
12      561
13      487
14      390
15      334
16      233
17      220
19      194
18      167
20      130
22      122
21      119
23       88
24       71
26       62
25       59
27       58
29       48
30       44
33       38
28       37
31       29
35       29
34       28
39       27
32       24
38       21
37       21
36       20
40       17
41       11
43        9
44        8
51        6
46        5
60        4
45        3
49        3
42        3
56        2
50        1
99        1
64        1
Name: OEDC3, dtype: int64

6     9508
5     9241
7     8887
4     8012
8     7804
      ... 
57       1
55       1
78       1
73       1
71       1
Name: OEDC4, Length: 66, dtype: int64

75    4134
74    4024
73    4022
72    3979
76    3951
      ... 
11       1
16       1
17       1
9        1
13       1
Name: OEDC5, Length: 91, dtype: int64

4     12487
5     12252
6     10831
3     10426
7      8627
      ...  
81        1
51        1
50        1
74        1
82        1
Name: OEDC6, Length: 63, dtype: int64

120    47262
140    10326
160     6467
130     1099
126     1037
       ...  
91        18
92        12
163       10
164        3
166        2
Name: EC1, Length: 79, dtype: int64

2     8083
3     7770
1     7317
4     6953
5     6185
      ... 
69       2
68       1
76       1
74       1
70       1
Name: EC2, Length: 76, dtype: int64

10    4935
9     4891
8     4698
11    4682
12    4599
      ... 
82       1
56       1
57       1
61       1
54       1
Name: EC3, Length: 63, dtype: int64

31    3897
32    3840
30    3839
29    3785
34    3738
      ... 
90       1
88       1
78       1
74       1
68       1
Name: EC4, Length: 80, dtype: int64

23    5519
21    5498
22    5365
20    5350
19    5101
      ... 
83       1
73       1
66       1
55       1
58       1
Name: EC5, Length: 69, dtype: int64

7     5515
6     5361
8     5210
5     5125
9     4641
      ... 
58       4
67       4
99       4
56       1
61       1
Name: EC7, Length: 61, dtype: int64

3     10822
2     10012
4      9762
5      7797
1      6296
      ...  
55        1
70        1
71        1
64        1
68        1
Name: EC8, Length: 71, dtype: int64

2     16777
1     15710
3     14333
4     11418
0      8810
      ...  
67        1
56        1
88        1
69        1
82        1
Name: SEC1, Length: 83, dtype: int64

23    6389
22    6349
24    6289
25    6046
21    5815
      ... 
78       2
76       2
98       1
92       1
82       1
Name: SEC2, Length: 97, dtype: int64

19    7323
18    7161
20    7083
17    6326
21    6181
16    5782
22    5396
15    5070
14    4333
23    4147
13    4116
12    3364
24    3280
11    2823
25    2549
10    2425
9     2183
26    1880
8     1664
0     1491
7     1322
27    1293
6     1181
28     956
5      895
4      780
29     728
3      728
2      546
30     500
1      432
31     390
32     289
33     173
34     131
35     110
37      63
36      47
38      35
39      28
40      22
41      16
44       8
42       7
51       4
48       4
43       3
47       3
72       3
45       2
62       1
49       1
53       1
60       1
Name: SEC4, dtype: int64

5     12026
4     11946
6     10967
3      9960
7      9816
      ...  
95        1
91        1
87        1
96        1
85        1
Name: SEC5, Length: 97, dtype: int64

0     77974
1     10104
2      2635
3      1199
4       702
      ...  
69        1
54        1
75        1
51        1
86        1
Name: AFC1, Length: 81, dtype: int64

0     73236
1      9538
2      4457
3      2140
4      1169
      ...  
68        1
72        1
63        1
99        1
98        1
Name: AFC2, Length: 96, dtype: int64

15    8841
14    8390
16    8378
17    7455
13    7360
18    6424
12    6215
19    5364
11    4867
20    4489
21    3652
10    3552
22    2737
9     2682
23    2023
8     1905
24    1560
7     1415
25    1162
6      983
26     906
0      830
27     665
5      630
28     451
4      449
29     298
3      295
30     267
2      192
31     192
32     157
33     113
1       78
34      68
35      40
38      36
36      35
37      33
40      23
39      22
47       7
42       6
50       6
46       5
41       5
53       3
44       3
45       2
43       2
64       1
49       1
55       1
52       1
63       1
71       1
99       1
Name: AFC4, dtype: int64

30    4513
31    4463
32    4305
33    4227
29    4176
      ... 
81       5
84       4
83       3
90       1
80       1
Name: AFC5, Length: 88, dtype: int64

27    3103
28    3065
0     3018
32    2994
29    2978
      ... 
87       5
89       4
93       2
91       1
92       1
Name: VC1, Length: 95, dtype: int64

0     5898
17    5101
18    5011
16    4995
19    4855
      ... 
79       1
84       1
82       1
85       1
86       1
Name: VC2, Length: 89, dtype: int64

0     2780
32    2572
33    2467
28    2452
36    2442
      ... 
96      27
92       8
95       7
98       6
97       5
Name: VC3, Length: 100, dtype: int64

0     16075
8      5094
7      4786
9      4663
6      4663
      ...  
89        2
94        2
93        2
96        1
91        1
Name: VC4, Length: 97, dtype: int64

0     55895
1     29511
2      5921
3      1747
4       717
      ...  
52        1
40        1
63        1
59        1
83        1
Name: ANC1, Length: 68, dtype: int64

4     10518
5     10300
3      9981
2      9071
6      9003
1      7888
7      7488
0      7024
8      6109
9      4646
10     3488
11     2535
12     1704
13     1361
14      977
15      705
16      523
17      398
18      333
19      241
20      180
21      157
22      129
23       96
25       67
24       59
26       54
27       47
28       31
29       28
31       20
30       18
33       16
32       14
35       11
34       10
38        7
39        6
37        5
41        5
47        4
40        3
49        3
61        2
42        2
48        2
99        2
50        2
43        2
51        1
46        1
44        1
45        1
57        1
Name: ANC2, dtype: int64

7     7390
8     7267
6     7010
5     6455
9     6373
      ... 
75       2
83       1
92       1
76       1
77       1
Name: ANC4, Length: 82, dtype: int64

0     62950
1     19244
2      5660
3      2630
4      1510
5       959
6       590
7       403
8       320
9       199
10      131
11      112
14       62
13       60
12       59
15       57
16       40
18       36
20       27
22       26
23       21
24       20
17       20
19       14
26       14
28       13
21       10
31        9
30        8
25        8
32        8
35        7
27        6
29        6
38        6
44        5
47        4
37        4
42        3
39        3
41        3
54        2
55        2
36        1
48        1
49        1
57        1
52        1
43        1
51        1
68        1
50        1
Name: ANC9, dtype: int64

0     46848
1     25546
2      9615
3      4356
4      2581
5      1579
6      1117
7       741
8       556
9       462
10      327
11      264
12      198
13      164
14      129
15      103
16       86
17       68
18       67
19       56
20       50
21       44
23       37
24       37
22       36
25       23
26       20
31       16
27       16
32       16
30       15
28       11
40        9
37        8
35        8
33        8
38        8
36        6
29        6
44        5
39        5
47        4
42        4
65        3
41        3
57        3
71        2
50        2
52        1
34        1
74        1
99        1
55        1
45        1
53        1
46        1
58        1
63        1
48        1
43        1
Name: ANC10, dtype: int64

0     15983
1     15847
2     10241
3      8171
4      6320
      ...  
83        3
90        2
85        2
81        2
99        1
Name: POBC1, Length: 88, dtype: int64

80    1769
79    1728
81    1712
82    1652
76    1629
      ... 
4      112
96     105
97      51
99      31
98      23
Name: POBC2, Length: 100, dtype: int64

98    9921
97    9830
99    8794
96    7976
95    6776
      ... 
6       27
4       24
3       19
2       13
1        5
Name: LSC1, Length: 100, dtype: int64

1     23054
0     18868
2     13228
3      7668
4      5246
      ...  
96       18
98       14
95       13
97       11
99        9
Name: LSC2, Length: 100, dtype: int64

0     56940
1     13995
2      6994
3      4233
4      2780
      ...  
54        2
80        1
68        1
65        1
99        1
Name: LSC3, Length: 71, dtype: int64

1     21123
2     17304
0     13338
3     12156
4      8786
      ...  
76        1
66        1
58        1
77        1
63        1
Name: LSC4, Length: 79, dtype: int64

99    21096
98     9549
97     8461
96     7763
95     6476
      ...  
18        3
11        2
21        2
9         1
12        1
Name: VOC1, Length: 93, dtype: int64

62    2192
64    2178
61    2164
67    2159
65    2147
      ... 
3       59
98      46
4       44
2       30
1       30
Name: VOC2, Length: 100, dtype: int64

19    3688
18    3609
16    3581
17    3515
20    3496
      ... 
75       2
87       1
69       1
70       1
79       1
Name: VOC3, Length: 78, dtype: int64

51    4893
16    4273
17    4037
18    3911
19    3744
20    3533
15    3453
14    2971
21    2840
25    2743
26    2649
23    2502
22    2451
24    2436
13    2342
10    2246
27    2171
9     2089
11    2059
35    2016
34    1951
28    1923
36    1918
12    1867
33    1777
32    1743
29    1717
8     1682
5     1639
31    1627
6     1586
30    1552
37    1504
7     1244
38    1242
4     1130
39    1066
40     850
41     826
44     814
43     812
45     802
42     781
0      745
46     602
47     591
48     476
50     411
49     398
3      271
2      159
52     127
1       88
Name: HC2, dtype: int64

0     49787
2      9301
1      8993
3      6631
4      4752
      ...  
74        1
84        1
57        1
94        1
96        1
Name: HC3, Length: 83, dtype: int64

0     26623
2      4281
3      4058
4      3634
5      3350
      ...  
81       36
83       30
90       29
94       27
96       25
Name: HC4, Length: 100, dtype: int64

0     17985
3      2556
2      2527
4      2397
5      2181
      ...  
84      149
93      149
87      146
91      139
97      121
Name: HC5, Length: 100, dtype: int64

0     8635
99    4613
3     1389
5     1284
4     1273
      ... 
17     671
90     655
85     636
78     634
1      392
Name: HC6, Length: 100, dtype: int64

99    11290
0      3957
98     2415
97     2067
96     1875
      ...  
24      472
25      467
21      464
2       453
1        62
Name: HC7, Length: 100, dtype: int64

0     10066
99     2880
2      2410
1      2359
3      2073
      ...  
76      471
75      468
66      468
79      465
98      457
Name: HC8, Length: 100, dtype: int64

0     69702
1      3926
2      2868
3      2300
4      1828
      ...  
77        2
84        2
78        2
80        1
90        1
Name: HC9, Length: 87, dtype: int64

0     69240
1      6257
2      4637
3      3325
4      2574
5      1866
6      1400
7      1016
8       901
9       743
10      546
11      449
12      382
13      327
14      242
15      224
16      194
18      129
17      126
19      120
22       84
20       77
21       69
23       52
25       44
26       35
24       31
27       25
34       24
30       23
29       18
33       13
28       12
35       12
32       11
36        9
31        7
38        7
42        5
48        4
46        2
60        2
52        2
50        2
43        2
54        2
39        2
56        1
45        1
41        1
62        1
47        1
51        1
Name: HC10, dtype: int64

0     9456
1     3464
99    2653
2     2224
94    1872
      ... 
24     482
21     480
35     467
18     443
17     416
Name: HC11, Length: 100, dtype: int64

0     35743
1     12192
2      9021
3      5248
4      3635
      ...  
81        2
89        1
87        1
88        1
91        1
Name: HC12, Length: 92, dtype: int64

0     5607
3     3077
2     3024
5     2898
4     2863
      ... 
81     311
86     308
76     299
78     289
83     285
Name: HC13, Length: 100, dtype: int64

0     52160
1      8143
2      5678
3      3861
4      2973
      ...  
77        1
85        1
83        1
82        1
99        1
Name: HC14, Length: 98, dtype: int64

0     42265
1     10201
2      7852
3      5273
4      3602
      ...  
99        1
87        1
91        1
94        1
84        1
Name: HC16, Length: 92, dtype: int64

99    49918
98     3988
97     2545
0      2467
96     1837
      ...  
16      172
21      171
11      164
14      160
20      158
Name: HC17, Length: 100, dtype: int64

0     46478
1      5254
2      3778
3      2496
4      1856
      ...  
87      162
72      160
63      157
75      157
83      131
Name: HC18, Length: 100, dtype: int64

99    35462
98     5007
97     3445
0      3408
96     2357
      ...  
20      296
44      296
35      288
49      284
39      279
Name: HC19, Length: 100, dtype: int64

99    77878
98     7032
97     3734
96     1870
95     1159
      ...  
11        1
43        1
27        1
18        1
8         1
Name: HC20, Length: 61, dtype: int64

99    39864
98     9435
97     7319
96     5702
95     4972
      ...  
30        1
32        1
33        1
4         1
21        1
Name: HC21, Length: 77, dtype: int64

6     4954
5     4080
15    4078
11    3830
9     3741
10    3558
23    3476
22    3427
24    3420
20    3393
17    3376
16    3334
21    3280
14    3278
13    3228
19    3206
12    3050
18    3005
25    2980
27    2910
26    2891
29    2872
28    2866
30    2811
31    2640
32    2404
8     2305
7     2059
33    1884
34    1011
4      572
35     406
3      229
36     204
37     105
38      86
39      61
40      49
41      34
45      30
42      26
43      25
46      21
44      21
2       17
47      10
49       7
48       5
50       5
51       4
55       3
53       3
56       2
1        2
54       2
52       1
58       1
57       1
61       1
Name: CARDPROM, dtype: int64

13     2716
14     2511
15     1915
24     1824
25     1752
       ... 
167       1
157       1
163       1
170       1
158       1
Name: NUMPROM, Length: 165, dtype: int64

13    24581
12    18418
11    10886
10    10001
14     9462
      ...  
67        1
55        1
78        1
56        1
57        1
Name: NUMPRM12, Length: 62, dtype: int64

20.00      4201
25.00      3694
15.00      3391
30.00      2569
40.00      1595
           ... 
26.45         1
305.07        1
126.95        1
148.03        1
3775.00       1
Name: RAMNTALL, Length: 2094, dtype: int64

1      9996
2      7716
3      7178
4      7010
5      6282
       ... 
116       1
74        1
85        1
86        1
82        1
Name: NGIFTALL, Length: 89, dtype: int64

5.00     34516
3.00     17731
10.00    14660
15.00     5730
20.00     5529
         ...  
19.58        1
38.65        1
0.32         1
6.25         1
15.75        1
Name: MINRAMNT, Length: 191, dtype: int64

9602    3038
9601    3015
9512    2358
9509    2331
9510    2322
        ... 
8505       1
8512       1
8305       1
8407       1
8306       1
Name: MINRDATE, Length: 146, dtype: int64

15.00     18392
20.00     18279
25.00     12254
10.00      9384
16.00      3802
          ...  
184.00        1
375.00        1
71.93         1
74.00         1
18.14         1
Name: MAXRAMNT, Length: 275, dtype: int64

9512    10551
9601     6804
9509     6103
9602     5871
9504     5215
        ...  
8601        1
8410        1
8403        1
8504        1
8311        1
Name: MAXRDATE, Length: 150, dtype: int64

15.00     17752
20.00     15819
10.00     14237
25.00     10135
5.00       4937
          ...  
34.95         1
14.70         1
162.00        1
92.00         1
104.00        1
Name: LASTGIFT, Length: 231, dtype: int64

9501    2952
9401    2765
9310    2640
9410    2449
9601    2402
        ... 
7403       1
8303       1
7908       1
7408       1
8602       1
Name: FISTDATE, Length: 177, dtype: int64

9504.0    2251
9412.0    1968
8703.0    1956
9512.0    1866
8612.0    1687
          ... 
7711.0       1
8407.0       1
7211.0       1
7810.0       1
8412.0       1
Name: NEXTDATE, Length: 188, dtype: int64

5.0       8570
4.0       8477
3.0       8474
6.0       6642
2.0       6264
          ... 
89.0         1
77.0         1
61.0         1
58.0         1
1088.0       1
Name: TIMELAG, Length: 68, dtype: int64

15.000000    6011
20.000000    4562
25.000000    2657
10.000000    2226
12.500000    1799
             ... 
12.548387       1
3.146341        1
14.402222       1
2.409091        1
96.794872       1
Name: AVGGIFT, Length: 7707, dtype: int64

95515     1
97939     1
1290      1
119028    1
187465    1
         ..
17943     1
141702    1
147631    1
58636     1
185114    1
Name: CONTROLN, Length: 95280, dtype: int64

13.0    3466
5.0     3154
57.0    2666
59.0    2653
15.0    2559
        ... 
30.0     647
46.0     644
29.0     569
40.0     369
6.0      211
Name: CLUSTER2, Length: 62, dtype: int64

0.00     90443
10.00      939
15.00      590
20.00      577
5.00       503
         ...  
18.25        1
10.70        1
2.50         1
16.87        1
44.21        1
Name: TARGET_D, Length: 71, dtype: int64

281

In [57]:
df = df.drop(columns=remove_cols)

There is still a lot of null values, where we would have to explore specific features further to deal with before concluding the EDA.