# Campaign Finance Disclosure Filings

## Setup

In [1]:
%load_ext sql

In [2]:
from django.conf import settings
connection_string = 'postgresql+psycopg2://{USER}:{PASSWORD}@{HOST}:{PORT}/{NAME}'.format(
    **settings.DATABASES['default']
)
%sql $connection_string

ImproperlyConfigured: Requested setting DATABASES, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.

## Cover Sheets

Every campaign finance disclosure filing has a cover sheet, and the top-level information from these cover sheets ends up in the `CVR_CAMPAIGN_DISCLOSURE_CD`. This is an important table because it links because it links the dollar amount totals from the `SMRY_CD` table to a name fields and a unique identifier of the filer.

The [forms](http://calaccess.californiacivicdata.org/documentation/calaccess-files/cvr-campaign-disclosure-cd/#forms) of the filings that end up in `CVR_CAMPAIGN_DISCLOSURE_CD` are all the ones that include financial disclosures of campaigns as opposed to statements of intention and organization.

### Do the `FORM_TYPE` values vary between amendments to the same any campaign filing?

No, which is good because we can easily sort records linked to the `FILING_ID` and `AMEND_ID` combinations.

In [3]:
%%sql
SELECT cvr."FILING_ID", COUNT(DISTINCT cvr."FORM_TYPE")
FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
JOIN "FILER_FILINGS_CD" ff
ON cvr."FILING_ID" = ff."FILING_ID"
AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
GROUP BY 1
HAVING COUNT(DISTINCT cvr."FORM_TYPE") > 1;

0 rows affected.


FILING_ID,count


### Do the `FILER_ID` values vary between amendments to the same any campaign filing?

No, which is good because we should then be able to sort out who filed which filings.

In [4]:
%%sql
SELECT "FILING_ID", COUNT(DISTINCT "FILER_ID")
FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
GROUP BY 1
HAVING COUNT(DISTINCT "FILER_ID") > 1
ORDER BY COUNT(DISTINCT "FILER_ID") DESC;

0 rows affected.


FILING_ID,count


## Joining to `FILER_FILINGS_CD`

The `FILER_FILINGS_CD` has additional information about each filing, including the filing period in which the filing files. 

There are also lot of fields that seem to be redundant with fields on `CVR_CAMPAIGN_DISCLOSURE_CD`, specifically:
* `CVR_CAMPAIGN_DISCLOSURE_CD.FORM_TYPE` and `FILER_FILINGS_CD.FORM_ID`
* `CVR_CAMPAIGN_DISCLOSURE_CD.FILER_ID` and `FILER_FILINGS_CD.FILER_ID`
* `CVR_CAMPAIGN_DISCLOSURE_CD.STMT_TYPE` and `FILER_FILINGS_CD.STMNT_TYPE`
* `CVR_CAMPAIGN_DISCLOSURE_CD.RPT_DATE` and `FILER_FILINGS_CD.FILING_DATE`
* `CVR_CAMPAIGN_DISCLOSURE_CD.FROM_START` and `FILER_FILINGS_CD.RPT_START`
* `CVR_CAMPAIGN_DISCLOSURE_CD.THRU_DATE` and `FILER_FILINGS_CD.RPT_END`
* `CVR_CAMPAIGN_DISCLOSURE_CD.RPT_DATE` and `FILER_FILINGS_CD.RPT_DATE`

Might be worth checking if values in each pair of fields ever conflict.

### Does every `CVR_CAMPAIGN_DISCLOSURE_CD` record have a `FILER_FILINGS_CD` record?

Almost. And among records left on the `FILER_FILINGS_CD` table, many of the fields or blank of have values that suggest they are only for testing purposes.

In [5]:
%%sql
SELECT cvr."FORM_TYPE", cvr."FILING_ID", cvr."AMEND_ID", cvr."FILER_ID", cvr."FILER_NAML", cvr."RPT_DATE", *
FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
LEFT JOIN "FILER_FILINGS_CD" ff
ON cvr."FILING_ID" = ff."FILING_ID"
AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
WHERE ff."FILING_ID" IS NULL or ff."FILING_SEQUENCE" IS NULL;

37 rows affected.


FORM_TYPE,FILING_ID,AMEND_ID,FILER_ID,FILER_NAML,RPT_DATE,id,AMEND_ID_1,AMENDEXP_1,AMENDEXP_2,AMENDEXP_3,ASSOC_CB,ASSOC_INT,BAL_ID,BAL_JURIS,BAL_NAME,BAL_NUM,BRDBASE_YN,BUS_CITY,BUS_INTER,BUS_NAME,BUS_ST,BUS_ZIP4,BUSACT_CB,BUSACTVITY,CAND_CITY,CAND_EMAIL,CAND_FAX,CAND_ID,CAND_NAMF,CAND_NAML,CAND_NAMS,CAND_NAMT,CAND_PHON,CAND_ST,CAND_ZIP4,CMTTE_ID,CMTTE_TYPE,CONTROL_YN,DIST_NO,ELECT_DATE,EMPLBUS_CB,EMPLOYER,ENTITY_CD,FILE_EMAIL,FILER_CITY,FILER_FAX,FILER_ID_1,FILER_NAMF,FILER_NAML_1,FILER_NAMS,FILER_NAMT,FILER_PHON,FILER_ST,FILER_ZIP4,FILING_ID_1,FORM_TYPE_1,FROM_DATE,JURIS_CD,JURIS_DSCR,LATE_RPTNO,MAIL_CITY,MAIL_ST,MAIL_ZIP4,OCCUPATION,OFF_S_H_CD,OFFIC_DSCR,OFFICE_CD,OTHER_CB,OTHER_INT,PRIMFRM_YN,REC_TYPE,REPORT_NUM,REPORTNAME,RPT_ATT_CB,RPT_DATE_1,RPTFROMDT,RPTTHRUDT,SELFEMP_CB,SPONSOR_YN,STMT_TYPE,SUP_OPP_CD,THRU_DATE,TRES_CITY,TRES_EMAIL,TRES_FAX,TRES_NAMF,TRES_NAML,TRES_NAMS,TRES_NAMT,TRES_PHON,TRES_ST,TRES_ZIP4,id_1,FILER_ID_2,FILING_ID_2,PERIOD_ID,FORM_ID,FILING_SEQUENCE,FILING_DATE,STMNT_TYPE,STMNT_STATUS,SESSION_ID,USER_ID,SPECIAL_AUDIT,FINE_AUDIT,RPT_START,RPT_END,RPT_DATE_2,FILING_TYPE
F460,591533,1,992218,Damian Jones for Assembly,2000-07-30,36565446,1,Additional information received after filing.,,,,,,,,,N,,,,,,,,Pasadena,,,,Damian,Jones,,,626/398-4993,CA,91107.0,,C,0.0,44.0,2000-03-07,,,CTL,,Pasadena,,992218,,Damian Jones for Assembly,,,626/449-3346,CA,91107.0,591533,F460,2000-01-23,ASM,,,Pasadena,CA,911020445,,,,ASM,,,N,CVR,1,,,2000-07-30,,,,0.0,PE,,2000-02-19,Elk Grove,,,Vona,Copp,,,916/686-1815,CA,95624.0,,,,,,,,,,,,,,,,,
F497,595429,0,990168,X,2000-02-24,36558679,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,990168,,X,,,,,,595429,F497,,,,000551,,,,,,,,,,,CVR,0,,,2000-02-24,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
F497,598450,0,990168,x,2000-02-28,36560440,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,990168,,x,,,,,,598450,F497,,,,002251,,,,,,,,,,,CVR,0,,,2000-02-28,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
F497,601406,0,990168,X,2000-02-29,36563181,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,990168,,X,,,,,,601406,F497,,,,002371,,,,,,,,,,,CVR,0,,,2000-02-29,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
F497,602607,0,990168,X,2000-03-01,36559676,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,990168,,X,,,,,,602607,F497,,,,003120,,,,,,,,,,,CVR,0,,,2000-03-01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
F460,602619,1,931704,United Teachers Los Angeles-Political Action Council of Educators(PACE)Issues,2000-02-24,36561125,1,The amount of $1071.95 and $1063.67 were included in Schedule E but not included in Schedule D,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,,G,0.0,,2000-03-07,,,RCP,,Los Angeles,,931704,,United Teachers Los Angeles-Political Action Council of Educators(PACE)Issues,,,213-487-5560,CA,90010.0,602619,F460,2000-01-01,,,,,,,,,,,,,N,CVR,1,,,2000-02-24,,,,1.0,PE,,2000-02-19,Los Angeles,,,Patricia,Stanyo,,,213-487-5560,CA,90010.0,,,,,,,,,,,,,,,,,
F460,670063,1,950027,Carl Washington For CA St Assembly 52nd District,2000-10-26,36567613,1,Schedule F See Attached,,,,,,,,,N,,,,,,,,Paramount,,,,Carl,Washington,,,,CA,90723.0,,C,0.0,52.0,2000-11-07,,,CAO,,Paramount,,950027,-,Carl Washington For CA St Assembly 52nd District,,,,CA,90723.0,670063,F460,2000-07-01,ASM,State of California,,Paramount,CA,90723-0000,,H,Assemblymember,ASM,,,N,CVR,1,,,2000-10-26,,,,0.0,PE,S,2000-09-30,Los Angeles,,,Pam,Goodwin,,,(310) 223-0759,CA,90037.0,,,,,,,,,,,,,,,,,
F497,682833,0,972148,SHELLEY FOR ASSEMBLY,2000-10-28,36569265,0,,,,,,,,,,,,,,,,,,SAN FRANCISCO,,,95173.0,KEVIN,SHELLEY,,,4155572312,CA,941310000.0,,,1.0,12.0,,,,CTL,,SACRAMENTO,,972148,,SHELLEY FOR ASSEMBLY,,,4156939300,CA,958140000.0,682833,F497,,,,,,,,,,,ASM,,,,CVR,0,,,2000-10-28,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
F461,763288,1,479259,SHEA HOMES AND AFFILIATED ENTITIES,2001-07-03,36574597,1,,,,,,,,,,N,,,,,,X,HOME BUILDER,,,,,,,,,,,,,,0.0,,,,,MDI,,WALNUT,,479259,,SHEA HOMES AND AFFILIATED ENTITIES,,,(415) 389-6800,CA,91788.0,763288,F461,2001-01-01,,,,WALNUT,CA,91789,,,,,,,N,CVR,1,,,2001-07-03,,,,0.0,,,2001-03-31,,,,,JASON D. KAUNE ATTY./AGENT FOR FILER,,,(415) 389-6800,,,,,,,,,,,,,,,,,,,
F460,786716,1,782408,Ernst & Young LLP - Los Angeles Political Action Committee,2001-10-10,36577279,1,Correction to Treasurer's Name,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,,G,0.0,,,,,RCP,,Los Angeles,,782408,,Ernst & Young LLP - Los Angeles Political Action Committee,,,213-977-3393,CA,90017.0,786716,F460,2001-07-01,,,,,,,,,,,,,N,CVR,1,,,2001-10-10,,,,0.0,QT,,2001-09-30,Los Angeles,,,Andrew D.,Ross,,Mr.,213-977-3026,CA,90017.0,,,,,,,,,,,,,,,,,


### Do any records have conflicting `CVR_CAMPAIGN_DISCLOSURE_CD.FORM_TYPE` and `FILER_FILINGS_CD.FORM_ID` values?

They do, but not a lot. And there is a pretty clear pattern with `CVR_CAMPAIGN_DISCLOSURE_CD.FORM_TYPE` saying `F497` and `FILER_FILINGS_CD.FORM_ID` saying `F496`.

In [6]:
%%sql
SELECT cvr."FORM_TYPE", ff."FORM_ID", cvr."FILING_ID", cvr."AMEND_ID", cvr."RPT_DATE", *
FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
JOIN "FILER_FILINGS_CD" ff
ON cvr."FILING_ID" = ff."FILING_ID"
AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
WHERE UPPER(cvr."FORM_TYPE") <> UPPER(ff."FORM_ID")
ORDER BY cvr."RPT_DATE" DESC, cvr."FILING_ID" DESC, cvr."AMEND_ID" DESC;

113 rows affected.


FORM_TYPE,FORM_ID,FILING_ID,AMEND_ID,RPT_DATE,id,AMEND_ID_1,AMENDEXP_1,AMENDEXP_2,AMENDEXP_3,ASSOC_CB,ASSOC_INT,BAL_ID,BAL_JURIS,BAL_NAME,BAL_NUM,BRDBASE_YN,BUS_CITY,BUS_INTER,BUS_NAME,BUS_ST,BUS_ZIP4,BUSACT_CB,BUSACTVITY,CAND_CITY,CAND_EMAIL,CAND_FAX,CAND_ID,CAND_NAMF,CAND_NAML,CAND_NAMS,CAND_NAMT,CAND_PHON,CAND_ST,CAND_ZIP4,CMTTE_ID,CMTTE_TYPE,CONTROL_YN,DIST_NO,ELECT_DATE,EMPLBUS_CB,EMPLOYER,ENTITY_CD,FILE_EMAIL,FILER_CITY,FILER_FAX,FILER_ID,FILER_NAMF,FILER_NAML,FILER_NAMS,FILER_NAMT,FILER_PHON,FILER_ST,FILER_ZIP4,FILING_ID_1,FORM_TYPE_1,FROM_DATE,JURIS_CD,JURIS_DSCR,LATE_RPTNO,MAIL_CITY,MAIL_ST,MAIL_ZIP4,OCCUPATION,OFF_S_H_CD,OFFIC_DSCR,OFFICE_CD,OTHER_CB,OTHER_INT,PRIMFRM_YN,REC_TYPE,REPORT_NUM,REPORTNAME,RPT_ATT_CB,RPT_DATE_1,RPTFROMDT,RPTTHRUDT,SELFEMP_CB,SPONSOR_YN,STMT_TYPE,SUP_OPP_CD,THRU_DATE,TRES_CITY,TRES_EMAIL,TRES_FAX,TRES_NAMF,TRES_NAML,TRES_NAMS,TRES_NAMT,TRES_PHON,TRES_ST,TRES_ZIP4,id_1,FILER_ID_1,FILING_ID_2,PERIOD_ID,FORM_ID_1,FILING_SEQUENCE,FILING_DATE,STMNT_TYPE,STMNT_STATUS,SESSION_ID,USER_ID,SPECIAL_AUDIT,FINE_AUDIT,RPT_START,RPT_END,RPT_DATE_2,FILING_TYPE
F497,F496,687938,0,2000-11-07,36572102,0,,,,,,,,,,,,,,,,,,ANTIOCH,,,95057.0,TOM,TORLAKSON,,,5103727990.0,CA,945090000.0,,,,7.0,,,,RCP,,SAN FRANCISCO,,741504.0,,"STANDING COMMITTEE ON POLITICAL EDUCATION OF THE CALIFORNIA LABOR FEDERATION, AFL-CIO",,,4159863585,CA,941040000,687938,F497,,,,,,,,,S,,SEN,,,,CVR,0,,,2000-11-07,,,,,,S,,,,,,,,,,,,190863376,1018355,687938,,F496,0,2000-11-07,10001,11001,1999,GPEREZ,11003,11003,,,2000-11-07,
F497,F496,687359,0,2000-11-07,36571720,0,,,,,,,,,,,,,,,,,,ETIWANDA,,,89034.0,JAMES L.,BRULTE,,,9094669096.0,CA,917390000.0,,,,31.0,,,,RCP,,LOS ANGELES,,741666.0,,DEMOCRATIC STATE CENTRAL COMMITTEE OF CALIFORNIA,,,2132561968,CA,900410000,687359,F497,,,,,,,,,S,,SEN,,,,CVR,0,,,2000-11-07,,,,,,O,,,,,,,,,,,,190352214,1018392,687359,,F496,0,2000-11-07,10001,11001,1999,GPEREZ,11003,11003,,,2000-11-07,
F497,F496,687358,0,2000-11-07,36571430,0,,,,,,,,,,,,,,,,,,ANTIOCH,,,95057.0,TOM,TORLAKSON,,,5103727990.0,CA,945090000.0,,,,7.0,,,,RCP,,SAN FRANCISCO,,741504.0,,"STANDING COMMITTEE ON POLITICAL EDUCATION OF THE CALIFORNIA LABOR FEDERATION, AFL-CIO",,,4159863585,CA,941040000,687358,F497,,,,,,,,,S,,SEN,,,,CVR,0,,,2000-11-07,,,,,,S,,,,,,,,,,,,190352213,1018355,687358,,F496,0,2000-11-07,10001,11001,1999,GPEREZ,11003,11003,,,2000-11-07,
F497,F496,687347,0,2000-11-07,36571429,0,,,,,,,,,,,,,,,,,,SARATOGA,,,99065.0,REBECCA,COHN,,,4088680249.0,CA,950700000.0,,,,24.0,,,,RCP,,OAKLAND,,747285.0,,"HEALTH CARE WORKERS UNION, SEIU LOCAL 250 POLITICAL ACTION COMMITTEE",,,4085577613,CA,946120000,687347,F497,,ASM,,,,,,,S,,ASM,,,,CVR,0,,,2000-11-07,,,,,,S,,,,,,,,,,,,190352205,1020788,687347,,F496,0,2000-11-07,10001,11001,1999,GGILMORE,11003,11001,,,2000-11-07,
F497,F496,687346,0,2000-11-07,36571428,0,,,,,,,,,,,,,,,,,,SALINAS,,,99033.0,SIMON,SALINAS,,,8317586276.0,CA,939050000.0,,,,28.0,,,,RCP,,OAKLAND,,747285.0,,"HEALTH CARE WORKERS UNION, SEIU LOCAL 250 POLITICAL ACTION COMMITTEE",,,4085577613,CA,946120000,687346,F497,,,,,,,,,S,,ASM,,,,CVR,0,,,2000-11-07,,,,,,S,,,,,,,,,,,,190352204,1020788,687346,,F496,0,2000-11-07,10001,11001,1999,GGILMORE,11003,11001,,,2000-11-07,
F497,F496,687345,0,2000-11-07,36571427,0,,,,,,,,,,,,,,,,,,ALAMEDA,,,99069.0,WILMA,CHAN,,,5102726693.0,CA,945010000.0,,,,16.0,,,,RCP,,OAKLAND,,747285.0,,"HEALTH CARE WORKERS UNION, SEIU LOCAL 250 POLITICAL ACTION COMMITTEE",,,4085577613,CA,946120000,687345,F497,,ASM,,,,,,,S,,ASM,,,,CVR,0,,,2000-11-07,,,,,,S,,,,,,,,,,,,190447505,1020788,687345,,F496,0,2000-11-07,10001,11001,1999,GGILMORE,11003,11001,,,2000-11-07,
F497,F496,687216,0,2000-11-07,36572062,0,,,,,,,,SCHOOL VOUCHERS. STATE-FUNDED PRIVATE AND RELIGIOUS EDUCATION. PUBLIC SCHOOL FUNDING. INITIATIVE CONSTITUTIONAL AMENDMENT.,38.0,,,,,,,,,,,,,,,,,,,,,,,,,,,BMC,,LOS ANGELES,,931704.0,,UNITED TEACHERS LOS ANGELES-POLITICAL ACTION COUNCIL OF EDUCATORS (PACE) ISSUES,,,2134875560,CA,900100000,687216,F497,,,,,,,,,S,,,,,,CVR,0,,,2000-11-07,,,,,,O,,,,,,,,,,,,190349423,1060144,687216,,F496,0,2000-11-07,10001,11001,1999,GGILMORE,11003,11001,,,2000-11-07,
F497,F496,687109,0,2000-11-07,36571683,0,,,,,,,,,,,,,,,,,,ALAMEDA,,,99069.0,WILMA,CHAN,,,5102726693.0,CA,945010000.0,,,,16.0,,,,RCP,,LOS ANGELES,,983392.0,,BLACK LEADERSHIP POLITICAL ACTION COMMITTEE,,,2134894792,CA,900712300,687109,F497,,,,,,,,,S,,ASM,,,,CVR,0,,,2000-11-07,,,,,,S,,,,,,,,,,,,190347861,1075441,687109,,F496,0,2000-11-07,10001,11001,1999,GPEREZ,11003,11003,,,2000-11-07,
F497,F496,687281,0,2000-11-06,36571003,0,,,,,,,,VETERANS' BOND ACT OF 2000,32.0,,,,,,,,,,,,,,,,,,,,,,,,,,,RCP,,LOS ANGELES,,741666.0,,DEMOCRATIC STATE CENTRAL COMMITTEE OF CALIFORNIA,,,2132561968,CA,900410000,687281,F497,,,,,,,,,S,,,,,,CVR,0,,,2000-11-06,,,,,,S,,,,,,,,,,,,190350999,1018392,687281,,F496,0,2000-11-06,10001,11001,1999,GPEREZ,11003,11003,,,2000-11-06,
F497,F496,686558,0,2000-11-06,36571218,0,,,,,,,,,,,,,,,,,,LONG BEACH,,,98167.0,ALAN,LOWENTHAL,,,5624954766.0,CA,908030000.0,,,,54.0,,,,RCP,,SAN FRANCISCO,,982314.0,,"LABOR 2000 - CALIFORNIA LABOR FEDERATION, AFL-CIO",,,4159863585,CA,941040000,686558,F497,,ASM,,,,,,,S,,ASM,,,,CVR,0,,,2000-11-06,,,,,,S,,,,,,,,,,,,190867529,1074372,686558,,F496,0,2000-11-06,10001,11001,1999,CFLAGG,11003,11003,,,2000-11-06,


### Do any records have conflicting `CVR_CAMPAIGN_DISCLOSURE_CD.FILER_ID` and `FILER_FILINGS_CD.FILER_ID` values?

Yes, this happens about a third of the time.

In [7]:
%%sql
SELECT COUNT(*)::float / (
        SELECT COUNT(*)
        FROM "CVR_CAMPAIGN_DISCLOSURE_CD" CVR
        JOIN "FILER_FILINGS_CD" FF
        ON CVR."FILING_ID" = FF."FILING_ID"
        AND CVR."AMEND_ID" = FF."FILING_SEQUENCE"
) as pct_conflict
FROM "CVR_CAMPAIGN_DISCLOSURE_CD" CVR
JOIN "FILER_FILINGS_CD" FF
ON CVR."FILING_ID" = FF."FILING_ID"
AND CVR."AMEND_ID" = FF."FILING_SEQUENCE"
WHERE CVR."FILER_ID" <> FF."FILER_ID"::VARCHAR;

1 rows affected.


pct_conflict
0.357836331069


But one thing to note is that these `FILER_ID` fields are too different data types: char on `CVR_CAMPAIGN_DISCLOSURE_CD` and int on `FILER_FILINGS_CD`. We previously discovered that the `FILER_XREF_CD` table is a translator from seemingly old string filer_ids to numeric filer_ids. 

### Does every `CVR_CAMPAIGN_DISCLOSURE_CD` record have a `FILER_XREF_ID` record?

No. Here are the missing filer_ids, which are also not found in either the `FILERNAME_CD` or `FILERS_CD` tables.

In [8]:
%%sql
SELECT 
    cvr."FILER_ID" as cvr_filer_id, 
    fn."FILER_ID" as filername_filer_id, 
    f."FILER_ID" as filer_filer_id
FROM (
    SELECT DISTINCT cvr."FILER_ID"
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    LEFT JOIN "FILER_XREF_CD" x
    ON cvr."FILER_ID" = x."XREF_ID"
    WHERE x."XREF_ID" IS NULL
) cvr
LEFT JOIN "FILERNAME_CD" fn
ON cvr."FILER_ID" = fn."FILER_ID"::varchar
LEFT JOIN "FILERS_CD" f
ON cvr."FILER_ID" = f."FILER_ID"::varchar
ORDER BY cvr."FILER_ID"::VARCHAR DESC;

45 rows affected.


cvr_filer_id,filername_filer_id,filer_filer_id
990168.0,,
600230.0,,
499258.0,,
496280.0,,
489049.0,,
486187.0,,
482185.0,,
478533.0,,
1373290.0,,
1372926.0,,


Should probably look more into these later, but this might have something to do with conflicting filer_ids on `CVR_CAMPAIGN_DISCLOSURE_CD` and `FILER_FILINGS_CD`.

### Does the filer_id from `FILER_XREF_CD` and `FILER_FILINGS_CD` ever conflict for the same filing?

Yes, but less than 1 percent of the time.

In [9]:
%%sql
SELECT COUNT(*)::float / (
        SELECT COUNT(*)
        FROM "CVR_CAMPAIGN_DISCLOSURE_CD" CVR
        JOIN "FILER_FILINGS_CD" FF
        ON CVR."FILING_ID" = FF."FILING_ID"
        AND CVR."AMEND_ID" = FF."FILING_SEQUENCE"
) as pct_conflict
FROM "CVR_CAMPAIGN_DISCLOSURE_CD" CVR
JOIN "FILER_FILINGS_CD" FF
ON CVR."FILING_ID" = FF."FILING_ID"
AND CVR."AMEND_ID" = FF."FILING_SEQUENCE"
JOIN "FILER_XREF_CD" X
ON CVR."FILER_ID" = X."XREF_ID"
WHERE X."FILER_ID" <> FF."FILER_ID";

1 rows affected.


pct_conflict
0.0034393581278


Mostly this seems to be a problem for Form 497 filings.

In [10]:
%%sql
SELECT cvr."FORM_TYPE", COUNT(*)
FROM "CVR_CAMPAIGN_DISCLOSURE_CD" CVR
JOIN "FILER_FILINGS_CD" FF
ON CVR."FILING_ID" = FF."FILING_ID"
AND CVR."AMEND_ID" = FF."FILING_SEQUENCE"
JOIN "FILER_XREF_CD" X
ON CVR."FILER_ID" = X."XREF_ID"
WHERE X."FILER_ID" <> FF."FILER_ID"
GROUP BY 1;

2 rows affected.


FORM_TYPE,count
F461,7
F497,1291


### Does the `FILER_FILINGS_CD.FORM_ID` value ever vary between amendments to the same filing?

I guess there always has to be at least one.

In [11]:
%%sql
SELECT "FILING_ID", COUNT(DISTINCT "FORM_ID")
FROM "FILER_FILINGS_CD"
WHERE "FORM_ID" IN (
    SELECT DISTINCT "FORM_TYPE"
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
)
GROUP BY 1
HAVING COUNT(DISTINCT "FORM_ID") > 1
ORDER BY 1 DESC;

1 rows affected.


FILING_ID,count
826532,2


In [12]:
%%sql
SELECT *
FROM "FILER_FILINGS_CD"
WHERE "FILING_ID" = 826532;

2 rows affected.


id,FILER_ID,FILING_ID,PERIOD_ID,FORM_ID,FILING_SEQUENCE,FILING_DATE,STMNT_TYPE,STMNT_STATUS,SESSION_ID,USER_ID,SPECIAL_AUDIT,FINE_AUDIT,RPT_START,RPT_END,RPT_DATE,FILING_TYPE
190960875,1076525,826532,,F461,1,2000-02-29,10006,11003,2001,DBARRICK,11003,11003,2000-01-23,2000-02-19,,
190960874,1076525,826532,,F460,0,2001-02-23,10005,11003,2001,DBARRICK,11003,11003,2000-01-23,2000-02-19,,


But there aren't any `CVR_CAMPAIGN_DISCLOSURE_CD` or `SMRY_CD` records for this filing_id, so maybe it isn't real.

In [13]:
%%sql
SELECT *
FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
WHERE "FILING_ID" = 826532;
SELECT *
FROM "SMRY_CD"
WHERE "FILING_ID" = 826532;

0 rows affected.
0 rows affected.


id,FILING_ID,AMEND_ID,LINE_ITEM,REC_TYPE,FORM_TYPE,AMOUNT_A,AMOUNT_B,AMOUNT_C,ELEC_DT


### Does the `FILER_FILINGS_CD.FILER_ID` value ever vary between amendments to the same filing?

Even when we narrow to only the campaign finance-related forms, the answer is "yes".

In [18]:
%%sql
SELECT "FILING_ID", COUNT(DISTINCT "FILING_SEQUENCE"), COUNT(DISTINCT "FILER_ID")
FROM "FILER_FILINGS_CD"
WHERE "FORM_ID" IN (
    SELECT DISTINCT "FORM_TYPE"
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
)
GROUP BY 1
HAVING COUNT(DISTINCT "FILER_ID") > 1
ORDER BY 1 DESC;

1274 rows affected.


FILING_ID,count,count_1
688996,1,2
688995,1,2
688400,1,2
688397,1,2
688395,1,2
688393,1,2
688375,1,2
688374,1,2
688373,1,2
688371,1,2
