## Data Discovery Analysis ##
### Individual Federal Campaign Contributors and the Contributions They Made, 2016 ##

Data Source: U.S. Federal Elections Commission 


##### Dataset Overview:
Data preparation begins with the the collection of Federal Election Commission (FEC) data on a segment of contributions made to two federal campaign committees associated with two federal candidates -- Hillary Clinton and Donald Trump.

The timeframes for the final set of contributions starts at the beginning of the general election, i.e. when the two candidates became the official nominee for their respective parties and run through 12/31/2016:

- Hillary Clinton: nomination/start date 7/26/2016 (159 days to end of calendar year)
- Donald Trump: nomination/start date 7/19/2016 (166 days to end of calendar year)

##### Dataset Qualifications:
1) Initial acquisition of FEC data included the full inventory of itemized contributions to all federal candidates between 1/1/2015 to 12/31/2016 (designated the 2016 Election Cycle).  The file, totaling 20.3 million contributions, represents over $1.45 Billion in gross revenue to federal campaigns for President, House and Senate.

A MySql database was create to handle the initial manipulation of the data.

2) Data selects were completed on the initial set to isolate itemized contributions made to the principal campaign committees of Hillary Clinton (Hillary for America, C00575795) and Donald Trump (Donald J. Trump for President, Inc., C00580100). These contributions covered the full, two year election cycle (2015 and 2016).  The campaign commitee will be represented throughout this project with the designations HRC for Hillary for America and DJT for Donald J. Trump for President, Inc.

- HRC, 1/1/2015 - 12/31/2016: 2,516,367 itemized contributions totaling 293.6M in campaign revenue.
- DJT, 4/2/2015 - 12/31/2016: 139,838 itemized contributions totaling 65.3M in campaign revenue.

3) Additional data selects and aggregations were completed to prepare the final sets for analysis:

- Transaction date screen to set the analysis timeline from each candidates' nomination date to the end of the year.
- Account type screen to ensure contributions where made only to the candidates' GENERAL ELECTION accounts.
- Transaction type screen to ensure only qualified, INDIVIDUAL contributions were included (i.e. no committee transfers, PAC contributions, etc.)
- The creation of a unique NAMEKEY for donors and subsequent grouping/aggregation of disting namekeys to create a DONOR level table (PEOPLE).

The final datasets for analysis consist of:

- CONTRIBUTIONS TABLE = 1,800,321 itemized contributions totaling 153.3M in campaign revenue.
    -  HRC = 1,718,315 itemized contributions totaling 108M in campaign revenue.
    -  DJT = 82,006 itemized contributions totaling 45.3M in campaign revenue.


- PEOPLE TABLE =  383,272 donors making 1,800,321 itemized contributions totaling 153.3M in campaign revenue.
    -  HRC = 325,640 donors making 1,718,315 contributions totaling 108M in campaign revenue.
    -  DJT = 57,632 donors making 82,006 contributions totaling 45.3M in campaign revenue.
  

In [1]:
%matplotlib inline
# Dependencies and Setup, short list
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


In [2]:
# load contribution_df directly from local folder/csv -- 
# take basic measures of PDs.
contributionsDF = pd.read_csv('contributions_12292018.csv',sep=";", low_memory=False)

In [3]:
contributionsDF.shape

(1800321, 16)

In [12]:
contributionsDF.head(50)

Unnamed: 0,CMTE_ID,candidate,RPT_TP,TRANSACTION_TP,NAME,CITY,STATE,ZIP_CODE,EMPLOYER,OCCUPATION,TRANSACTION_DT,TRANSACTION_AMT,TRAN_ID,FILE_NUM,SUB_ID,namekey
0,C00580100,djt,M8,15,"WEST, BRETT",RIDGEFIELD,CT,6877.0,RETIRED,RETIRED,7192016,89,SA17A.1828787,1291697,2147483647,"WEST, BRETT06877"
1,C00580100,djt,M8,15,"BROWNING, MELISSA",WESTLAKE,TX,76262.0,SELF-EMPLOYED,RANCHER,7192016,2700,SA17A.1828524,1291697,2147483647,"BROWNING, MELISSA76262"
2,C00580100,djt,M8,15,"HICKS JR, THOMAS",DALLAS,TX,75201.0,HICKS HOLDINGS LLC,INVESTOR,7192016,2700,SA17A.1828566,1291697,2147483647,"HICKS JR, THOMAS75201"
3,C00580100,djt,M8,15,"HARRISON, BRIAN",PROVO,UT,84604.0,SELF-EMPLOYED,LAWYER,7192016,80,SA17A.1828719,1291697,2147483647,"HARRISON, BRIAN84604"
4,C00580100,djt,M8,15,"HARLING, MICHAEL",DALLAS,TX,75220.0,MUNICIPAL CAPITAL MARKETS,INVESTMENT BANKER,7192016,1000,SA17A.1828715,1291697,2147483647,"HARLING, MICHAEL75220"
5,C00580100,djt,M8,15,"MCCORD, NANCY",CLEVELAND,TN,37311.0,HOMEMAKER,HOMEMAKER,7192016,1000,SA17A.1632837.2,1291697,2147483647,"MCCORD, NANCY37311"
6,C00580100,djt,M8,15,"KRUSELL, WILBUR",INCLINE VILLAGE,NV,89451.0,RETIRED,RETIRED,7192016,300,SA17A.1828588,1291697,2147483647,"KRUSELL, WILBUR89451"
7,C00580100,djt,M8,15,"RUESTERHOLZ, SCOTT",NEW YORK,NY,10021.0,BNY MELLON,FINANCIAL ANALYST,7202016,110,SA17A.1828620,1291697,2147483647,"RUESTERHOLZ, SCOTT10021"
8,C00580100,djt,M8,15,"BRILES, KEANNA",LONG BEACH,CA,90803.0,RETIRED,RETIRED,7202016,2700,SA17A.1828522,1291697,2147483647,"BRILES, KEANNA90803"
9,C00580100,djt,M8,15,"SPICZAK, ANNIE",PEORIA,AZ,85383.0,RETIRED,RETIRED,7202016,250,SA17A.1828630,1291697,2147483647,"SPICZAK, ANNIE85383"


In [5]:
contributionsDF.describe()

Unnamed: 0,ZIP_CODE,TRANSACTION_DT,TRANSACTION_AMT,FILE_NUM,SUB_ID
count,1795923.0,1800321.0,1800321.0,1800321.0,1800321.0
mean,477219500.0,9581063.0,85.14209,1140434.0,2147484000.0
std,356462100.0,1082821.0,7890.34,17194.98,0.0
min,0.0,7192016.0,1.0,1109498.0,2147484000.0
25%,117104300.0,9072016.0,15.0,1137625.0,2147484000.0
50%,410111400.0,10042020.0,25.0,1137788.0,2147484000.0
75%,890816700.0,10282020.0,75.0,1148953.0,2147484000.0
max,999281400.0,12312020.0,10000000.0,1291697.0,2147484000.0


In [6]:
# load people_df directly from local folder/csv
peopleDF = pd.read_csv('people_12292018.csv',sep=";")

In [7]:
peopleDF.shape

(383272, 10)

In [11]:
peopleDF.head(50)

Unnamed: 0,namekey,contributions,total,candidate,name,city,state,zip_code,employer,occupation
0,"WEST, BRETT06877",1,89,djt,"WEST, BRETT",RIDGEFIELD,CT,6877.0,RETIRED,RETIRED
1,"HARRISON, BRIAN84604",1,80,djt,"HARRISON, BRIAN",PROVO,UT,84604.0,SELF-EMPLOYED,LAWYER
2,"BROWNING, MELISSA76262",1,2700,djt,"BROWNING, MELISSA",WESTLAKE,TX,76262.0,SELF-EMPLOYED,RANCHER
3,"HICKS JR, THOMAS75201",1,2700,djt,"HICKS JR, THOMAS",DALLAS,TX,75201.0,HICKS HOLDINGS LLC,INVESTOR
4,"MCCORD, NANCY37311",1,1000,djt,"MCCORD, NANCY",CLEVELAND,TN,37311.0,HOMEMAKER,HOMEMAKER
5,"MOODY, DAN JR77098",1,1500,djt,"MOODY, DAN JR",HOUSTON,TX,77098.0,,
6,"KOLLAR, J B MR.29072",1,300,djt,"KOLLAR, J B MR.",LEXINGTON,SC,29072.0,,
7,"RYAN, SCOTT52314",1,1095,djt,"RYAN, SCOTT",MT. VERNON,IA,52314.0,RYAN MOTORS,BUSINESS OWNER
8,"RUESTERHOLZ, SCOTT10021",2,999,djt,"RUESTERHOLZ, SCOTT",NEW YORK,NY,10021.0,BNY MELLON,FINANCIAL ANALYST
9,"ZIROLI, CLEM89135",1,32,djt,"ZIROLI, CLEM",LAS VEGAS,NV,89135.0,DIAMOND CREEK,REAL ESTATE


In [9]:
peopleDF.describe()

Unnamed: 0,contributions,total,zip_code
count,383272.0,383272.0,382104.0
mean,4.697241,399.9329,417966400.0
std,7.79124,26083.3,367303000.0
min,1.0,1.0,0.0
25%,1.0,100.0,76071520.0
50%,2.0,200.0,303271700.0
75%,5.0,388.0,805015700.0
max,1520.0,16140810.0,999281400.0
