Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the lower case data in disclosure.fec_fitem_sched_x #4137

Closed
2 tasks done
fec-jli opened this issue Jan 2, 2020 · 4 comments
Closed
2 tasks done

Fix the lower case data in disclosure.fec_fitem_sched_x #4137

fec-jli opened this issue Jan 2, 2020 · 4 comments

Comments

@fec-jli
Copy link
Contributor

fec-jli commented Jan 2, 2020

Ref: Ann Arbor missing from individual contributions search
#4111

The data in disclosure.fec_fitem_sched_x should be all upper case. but we found there are small portion of data is lower/upper mix case. We need convert them to upper case.
example query:
select * FROM disclosure.fec_fitem_sched_a
LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_1 ON disclosure.fec_fitem_sched_a.cmte_id = ofec_committee_history_mv_1.committee_id AND disclosure.fec_fitem_sched_a.two_year_transaction_period = ofec_committee_history_mv_1.cycle
LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_2 ON disclosure.fec_fitem_sched_a.clean_contbr_id = ofec_committee_history_mv_2.committee_id AND disclosure.fec_fitem_sched_a.two_year_transaction_period = ofec_committee_history_mv_2.cycle
WHERE disclosure.fec_fitem_sched_a.is_individual = true
AND disclosure.fec_fitem_sched_a.contbr_city IN ('Ann Arbor')
ORDER BY disclosure.fec_fitem_sched_a.contb_receipt_dt DESC

for now, we found lower case data for these 4 sub_id, we may have more.
sub_id=3061920110009408407
4070620041039961342
4070620041039961337
4070720041039976479

  • found the reason why case conversion for those data failed
  • update lower case data to upper case for all disclosure.fec_fitem_sched_x tables
@fec-jli
Copy link
Contributor Author

fec-jli commented Jan 6, 2020

Jerry reply:
After reviewing these data in schedule A, I find that these data are converted from old 1032 system. These data were loaded into the Oracle database with a special loading program by Salient which does not using the upper function to make all city names upper case. We will run the SQL scripts to correct these data in FECP database which will then update the data in our AWS database.
For all Schedule A , B data loaded with Informatica nightly processes, all the character fields are converted from mixed case to upper case using an Upper function.

@fec-jli
Copy link
Contributor Author

fec-jli commented Jan 7, 2020

Jerry reply (01/07/2020):
We are still analyzing the upper case problem with all 1032 converted data in fecp. Our current estimation is that about 1.5 millions of transactions in F_Item table need to update. For each transaction, we need to update multiple columns such as name columns, address columns, etc. Most schedules have this upper case problem with the 1032 converted data. We also need to update these data in staging tables and normalized tables as well as F_item table. We plan to finish updating all these data by the end of this week.

@fecjjeng
Copy link
Contributor

fecjjeng commented Jan 13, 2020

Jerry and Joseph had searched f_itemtable in Oracle with the text fields with lower case to get the sub_id.
The records include SA, SB, SE, and SF. Total of approximately 1.6 millions data in f_item involved. Majority of them are SA and SB, peaked on 2002 and 2000.

Both f_item table and the corresponding nml_sched_x tables had been updated in Oracle to convert the lower/mixed case to upper case.

The changes had been propagated to ALL postgresql databases through regular nightly GoldenGate and Java program process.

f_item_sched_a_2001_2002: 844501
f_item_sched_a_1999_2000: 257346
f_item_sched_a_2005_2006: 74440

f_item_sched_b_2001_2002: 201035
f_item_sched_b_1999_2000: 32452

f_item_sched_e: 100386
f_item_sched_f: 77691

It had been verified that autovacuum had been invoked by database engine after the updates finished. This indicated that the statistics of the table had been updated to reflect the updated data.

@fecjjeng
Copy link
Contributor

The original API call described in issue #4111
https://www.fec.gov/data/receipts/individual-contributions/?committee_id=C00342204&contributor_city=Ann+Arbor
returns 29 rows now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants