# Matching Form 460 Filings to their Filers

Can we figure out which committee filed each of the Form 460 filings we've extracted from the raw CAL-ACCESS database? Let's find out.

Basically, we need to be able to join each derived `calaccess_processed_form460filing` record to some record of a filer, most likely a candidate committee or a ballot proposition committee.

## Set up

In [1]:
%load_ext sql

In [2]:
from django.conf import settings
connection_string = 'postgresql+psycopg2://{USER}:{PASSWORD}@{HOST}:{PORT}/{NAME}'.format(
    **settings.DATABASES['default']
)
%sql $connection_string

u'Connected: postgres@calaccess_processed'

## Among our derived Form 460 filings, how many filer_ids are there?

Almost 7,000.

In [29]:
%%sql
select count(distinct filer_id)
from calaccess_processed_form460filing;

1 rows affected.


count
6956


## Can we match all of these filer_ids to a `FILERNAME_CD` record in the raw CAL-ACCESS data?

There are several dozen Form 460 filers that do not match to any `FILERNAME_CD` record.

In [36]:
%%sql
select a.*
from (
    select filer_id, count(*) as count_460s
    from calaccess_processed_form460filing
    group by 1
) as a
left join "FILERNAME_CD" b
on a.filer_id = b."FILER_ID"
where b."FILER_ID" is null
order by a.filer_id desc;

60 rows affected.


filer_id,count_460s
1302361,3
1301493,4
1077732,2
1077654,6
1077638,24
1077619,5
1077545,4
1077505,4
1077482,5
1077058,3


## Are there are any scraped candidate ids missing from the candidate ids derived from the Form 501 filings?

Yes, over 100.

In [37]:
%%sql
select a.*
from calaccess_processed_scrapedcandidate a
left join calaccess_processed_candidate b
on a.scraped_id::int = b.filer_id
where b.filer_id is null
and a.scraped_id <> '';

108 rows affected.


id,created,last_modified,name,scraped_id,office_name,election_id
518,2016-12-05 10:24:35.506749-06:00,2016-12-05 10:24:35.506806-06:00,"MARSHALL, G. RICK",1367063,MEMBER BOARD OF EQUALIZATION 03,9
717,2016-12-05 10:24:35.955258-06:00,2016-12-05 10:24:35.955279-06:00,"WALKER, SHERRY",1365739,ASSEMBLY 69,9
741,2016-12-05 10:24:36.016899-06:00,2016-12-05 10:24:36.016922-06:00,"MARSHALL, G. RICK",1367063,MEMBER BOARD OF EQUALIZATION 03,10
1081,2016-12-05 10:24:36.867733-06:00,2016-12-05 10:24:36.867752-06:00,"WALKER, SHERRY",1365739,ASSEMBLY 69,10
3123,2016-12-05 10:24:41.394841-06:00,2016-12-05 10:24:41.394860-06:00,"BERRYHILL, TOM",1254687,ASSEMBLY 25,53
3460,2016-12-05 10:24:42.097033-06:00,2016-12-05 10:24:42.097053-06:00,"BERRYHILL, TOM",1254687,ASSEMBLY 25,54
4134,2016-12-05 10:24:43.604343-06:00,2016-12-05 10:24:43.604362-06:00,"BRIGGS, MIKE",1004800,ASSEMBLY 29,59
4999,2016-12-05 10:24:45.453960-06:00,2016-12-05 10:24:45.453983-06:00,"UMBERG, THOMAS J.",1003579,INSURANCE COMMISSIONER,62
5166,2016-12-05 10:24:45.819571-06:00,2016-12-05 10:24:45.819657-06:00,"GALLEGOS, MARTIN",1004371,STATE SENATE 24,65
5178,2016-12-05 10:24:45.854175-06:00,2016-12-05 10:24:45.854207-06:00,"STEEL, JOHN",1004945,ASSEMBLY 78,67


## How many of the distinct Form 460 filer_ids match to the derived candidate committees?

About half of these join up nicely with the `committee_id` field on our derived `calaccess_processed_candidatecommittee` table.

In [10]:
%%sql
select *
from (
    select filer_id, count(*) as count_460s
    from calaccess_processed_form460filing
    group by 1
) as a
join calaccess_processed_candidatecommittee b
on a.filer_id = b.committee_filer_id;

3488 rows affected.


filer_id,count_460s,id,candidate_filer_id,committee_filer_id,link_type_id,link_type_description,first_session,last_session,first_effective_date,last_effective_date,first_termination_date,last_termination_date
1255059,5,24823,1256382,1255059,12013,OPPOSE,2003.0,2003.0,2003-05-28,2003-05-28,,
1377830,7,22331,1378066,1377830,12011,CANDIDATE CONTROLS THIS COMMITTEE,2015.0,2015.0,2015-06-08,2015-06-08,,
1250967,19,33055,1005703,1250967,12011,CANDIDATE CONTROLS THIS COMMITTEE,2001.0,2001.0,2002-12-02,2002-12-02,,
1282115,7,30061,1234187,1282115,12011,CANDIDATE CONTROLS THIS COMMITTEE,2005.0,2005.0,2005-12-23,2005-12-23,,
1312207,11,32576,1004819,1312207,12011,CANDIDATE CONTROLS THIS COMMITTEE,2007.0,2007.0,2008-10-02,2008-10-02,2011-11-30,2011-11-30
1069614,9,31409,1005525,1069614,12011,CANDIDATE CONTROLS THIS COMMITTEE,1997.0,1997.0,1998-04-07,1998-04-07,2001-08-02,2001-08-02
1316229,7,26731,1253966,1316229,12011,CANDIDATE CONTROLS THIS COMMITTEE,2009.0,2009.0,2009-03-03,2009-03-03,2011-06-30,2011-06-30
1309968,5,26247,1363828,1309968,12011,CANDIDATE CONTROLS THIS COMMITTEE,2013.0,2013.0,2014-02-13,2014-02-13,,
1313572,15,29347,1257933,1313572,12011,CANDIDATE CONTROLS THIS COMMITTEE,2007.0,2007.0,2008-11-19,2008-11-19,,
1382103,6,24328,1382270,1382103,12011,CANDIDATE CONTROLS THIS COMMITTEE,2015.0,2015.0,2016-01-20,2016-01-20,2016-10-27,2016-10-27


## How many of the distinct Form 460 filer_ids match to the scraped proposition committees?

In [21]:
%%sql
select *
from (
    select filer_id, count(*) as count_460s
    from calaccess_processed_form460filing
    group by 1
) as a
join calaccess_processed_propositioncommittee b
on a.filer_id = b.id;

449 rows affected.


filer_id,count_460s,id,name
1255059,5,1255059,"TAXPAYERS AGAINST THE GOVERNOR'S RECALL, ENVIRONMENTAL, LABOR AND RELIGIOUS ORGANIZATIONS AND OTHERS WHO OPPOSE THE WASTE OF TAXPAYER DOLLARS"
1256259,37,1256259,CALIFORNIA MOTOR CAR DEALERS ASSOCIATION FUND TO STOP SHAKEDOWN LAWSUITS - YES ON 64
1380590,5,1380590,"YES ON 62, NO ON 66. REPLACE THE COSTLY, FAILED DEATH PENALTY SYSTEM. SPONSORED BY TAXPAYERS FOR SENTENCING REFORM"
1301053,8,1301053,"YES ON CHILDREN'S HOSPITALS, YES ON PROP 3, SPONSORED BY CALIFORNIA CHILDREN'S HOSPITAL ASSOCIATION"
1381808,4,1381808,"YES ON 64, CALIFORNIANS TO CONTROL, REGULATE AND TAX ADULT USE OF MARIJUANA WHILE PROTECTING CHILDREN, SPONSORED BY BUSINESS, PHYSICIANS, ENVIRONMENTAL AND SOCIAL-JUSTICE ADVOCATE ORGANIZATIONS"
1331971,4,1331971,"NO ON 27 - KEEP VOTERS FIRST, A COALITION OF GOOD GOVERNMENT GROUPS"
1388518,2,1388518,YES ON 56 STOP CANCER - PLANNED PARENTHOOD ADVOCATES MAR MONTE (NON PROFIT 501 (C)(4))
1271520,2,1271520,FOCUS ON THE FAMILY CALIFORNIA COMMITTEE AGAINST PROPOSITION 71
1287066,12,1287066,"CALIFORNIANS FOR NEIGHBORHOOD PROTECTION: YES ON PROP 99, NO ON PROP 98, A COALITION OF CONSERVATIONISTS, LABOR AND BUSINESS. A SPONSORED COMMITTEE OF THE CA LEAGUE OF CONSERVATION VOTERS"
1358725,11,1358725,LABORERS PACIFIC SOUTHWEST REGIONAL ORGANIZING COALITION ISSUES PAC - YES ON PROPS 1 AND 2


In [24]:
%%sql
select filer_id, count_460s
from (
    select filer_id, count(*) as count_460s
    from calaccess_processed_form460filing
    group by 1
) as a
left join calaccess_processed_candidatecommittee cc
on a.filer_id = cc.committee_filer_id
left join calaccess_processed_propositioncommittee pc
on a.filer_id = pc.id
where cc.committee_filer_id is null
and pc.id is null;

3095 rows affected.


filer_id,count_460s
1049369,73
1060559,10
1018884,5
1332317,5
1294236,4
1339150,16
1291884,26
1251090,3
1077668,15
1044850,7
