Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TSVECTOR trigger definitions for search #3798

Closed
jborichevskiy opened this issue May 30, 2019 · 15 comments · Fixed by #3811
Closed

Fix TSVECTOR trigger definitions for search #3798

jborichevskiy opened this issue May 30, 2019 · 15 comments · Fixed by #3811
Assignees
Milestone

Comments

@jborichevskiy
Copy link

Searching for common names with apostrophes such as "o'hare" returns results for what seems to be all contributions containing "o" instead of the expected "o'hare".

image

Is there a correct way of escaping passed quotes when using the API (where the same issues seems to apply)?

@jason-upchurch
Copy link
Contributor

@jborichevskiy thank you for bringing this to our attention. We will look into this issue.

@jason-upchurch
Copy link
Contributor

endpoint is /schedules/schedule_b/

@jason-upchurch
Copy link
Contributor

resultant query:

SELECT * WHERE disclosure.fec_fitem_sched_b.recipient_name_text @@ to_tsquery('o:*');

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Jun 6, 2019

example data:

recipient_nm recipient_name_text count(recipient_nm) old filter count new filter count
REPUBLICAN NATIONAL COMMITTEE - EXPENDITURES (AKA REPUBLICAN NATIONAL COMMITTEE) 'aka':5 'c00003418':9 'committee':3,8 'expenditures':4 'national':2,7 'republican':1,6 1141 1141 6425
LINSLEY, JAMES A. MRS. 'a':3 'c00087205':5 'james':2 'linsley':1 'mrs':4 1 2 2
BELTRAMI COUNTY I-R , 'beltrami':1 'county':2 'i':4 'i-r':3 'r':5 3 5 12
WISEMAN, EDWIN S. D.V.M. 'd.v.m':4 'edwin':2 's':3 'wiseman':1 1 0 1
O'HARE, GRACE 'grace':3 'hare':2 'o':1 9 9 181
AMAZON.COM 'amazon.com':1' 21,215 356 45,936

@jason-upchurch
Copy link
Contributor

example of 12 from BELTRAMI COUNTY I-R from above:

recipient_nm recipient_name_text
BELTRAMI COUNTY I-R , 'beltrami':1 'county':2 'i':4 'i-r':3 'r':5
BELTRAMI COUNTY IND-, 'beltrami':1 'county':2 'ind':3
BELTRAMI COUNTY I-R , 'beltrami':1 'county':2 'i':4 'i-r':3 'r':5
BELTRAMI COUNTY I-R , 'beltrami':1 'county':2 'i':4 'i-r':3 'r':5
BELTRAMI COUNTY I-R , -COMMITTEE 'beltrami':1 'committee':6 'county':2 'i':4 'i-r':3 'r':5
BELTRAMI COUNTY I-R , -COMMITTEE 'beltrami':1 'committee':6 'county':2 'i':4 'i-r':3 'r':5
BELTRAMI COUNTY IR C, -OMM 'beltrami':1 'c':4 'county':2 'ir':3 'omm':5
BELTRAMI COUNTY IR C, -OMM 'beltrami':1 'c':4 'county':2 'ir':3 'omm':5
BELTRAMI COUNTY IR C, -OMM 'beltrami':1 'c':4 'county':2 'ir':3 'omm':5
BELTRAMI COUNTY IR C, -OMM 'beltrami':1 'c':4 'county':2 'ir':3 'omm':5
BELTRAMI COUNTY IR C, -OMM 'beltrami':1 'c':4 'county':2 'ir':3 'omm':5
BELTRAMI COUNTY IR C, -OMMITTEE 'beltrami':1 'c':4 'county':2 'ir':3 'ommittee':5

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Jun 7, 2019

To test--consider generating TSVECTOR column recipient_name_text to be generated from recipient_nm as follows:

new.recipient_name_text := to_tsvector(
    concat(regexp_replace(new.recipient_nm, '[^a-zA-Z0-9]', ' ', 'g')), ' ', 
        new.clean_recipient_cmte_id);

[edit: add 'g' for greedy]

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Jun 11, 2019

Occurrences of filter_fulltext_fields to audit and determine need to update.

line number TSVECTOR? model
./webservices/resources/sched_f.py:38: filter_fulltext_fields = [ T ScheduleF
./webservices/resources/sched_b.py:48: filter_fulltext_fields = [ T ScheduleB
./webservices/resources/sched_b.py:106: filter_fulltext_fields = [ F ScheduleBEfile
./webservices/resources/rad_analyst.py:21: filter_fulltext_fields = [ T RadAnalyst
./webservices/resources/audit.py:29: filter_fulltext_fields = [ F AuditPrimaryCategory
./webservices/resources/audit.py:62: filter_fulltext_fields = [ F AuditCategory
./webservices/resources/audit.py:118: filter_fulltext_fields = [ T, T AuditCommitteSearch, AuditCandidateSearch
./webservices/resources/audit.py:161: filter_fulltext_fields = [ T AuditCandidateSearch
./webservices/resources/audit.py:168: query = filters.filter_fulltext(models.AuditCandidateSearch.query, kwargs, self.filter_fulltext_fields) F N/A
./webservices/resources/audit.py:181: filter_fulltext_fields = [ T AuditCommitteeSearch
./webservices/resources/audit.py:188: query = filters.filter_fulltext(models.AuditCommitteeSearch.query, kwargs, self.filter_fulltext_fields) F N/A
./webservices/resources/sched_c.py:30: filter_fulltext_fields = [ T ScheduleC
./webservices/resources/committees.py:54: filter_fulltext_fields = [ T, F CommitteeSearch, Committee
./webservices/resources/candidate_aggregates.py:162: filter_fulltext_fields = [('q', models.CandidateSearch.fulltxt)] T CandidateSearch
./webservices/resources/candidate_aggregates.py:204: query = filters.filter_fulltext(query, kwargs, self.filter_fulltext_fields) F N/A
./webservices/resources/aggregates.py:128: filter_fulltext_fields = [ F ScheduleAByEmployer
./webservices/resources/aggregates.py:154: filter_fulltext_fields = [ F ScheduleAByOccupation
./webservices/resources/aggregates.py:177: filter_fulltext_fields = [ F ScheduleBByRecipient
./webservices/resources/aggregates.py:217: filter_fulltext_fields = [ F ScheduleBByPurpose
./webservices/resources/costs.py:60: filter_fulltext_fields = [ T Electioneering
./webservices/resources/search.py:20: filter_fulltext_fields = [ T CandidateSearch
./webservices/resources/search.py:27: query = filters.filter_fulltext(models.CandidateSearch.query, kwargs, self.filter_fulltext_fields) F N/A
./webservices/resources/search.py:40: filter_fulltext_fields = [ T CommitteeSearch
./webservices/resources/search.py:47: query = filters.filter_fulltext(models.CommitteeSearch.query, kwargs, self.filter_fulltext_fields) F N/A
./webservices/resources/dates.py:36: filter_fulltext_fields = [ T CalendarDate
./webservices/resources/candidates.py:42: filter_fulltext_fields = [('q', models.CandidateSearch.fulltxt)] T CandidateSearch
./webservices/resources/sched_d.py:42: filter_fulltext_fields = [ T ScheduleD
./webservices/resources/sched_e.py:54: filter_fulltext_fields = [ T ScheduleE
./webservices/resources/sched_e.py:124: filter_fulltext_fields = [ F ScheduleEEfile
./webservices/resources/sched_a.py:56: filter_fulltext_fields = [ T ScheduleA
./webservices/resources/sched_a.py:155: filter_fulltext_fields = [ T ScheduleAEfile
./webservices/common/views.py:23: filter_fulltext_fields = [] F N/A
./webservices/common/views.py:49: query = filters.filter_fulltext(query, kwargs, self.filter_fulltext_fields) F N/A
./tests/test_filters.py:211: CalendarDatesView.filter_fulltext_fields F N/A
./tests/test_sched_d.py:36: def test_filter_fulltext_field(self): F N/A

@jason-upchurch
Copy link
Contributor

updated disbursement_description_text trigger definition from ./webservices/resources/sched_b.py

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Jun 12, 2019

affected table column is_fixed commit
fec_fitem_sched_b recipient_name_text T b5e14b5
fec_fitem_sched_b disbursement_description_text T 7761a2c
fec_fitem_sched_f payee_name_text T 7506c4e
ofec_rad_analyst_vw name_txt T ced93e6
ofec_committee_fulltext_audit_mv fulltxt T 005246d
ofec_candidate_fulltext_audit_mv fulltxt T 18a5d46
fec_fitem_sched_a contributor_name_text T c679150
fec_fitem_sched_a contributor_employer_text T c679150
fec_fitem_sched_a contributor_occupation_text T c679150
fec_fitem_sched_c loan_source_name_text T a277d8d
fec_fitem_sched_c candidate_name_text T a277d8d
ofec_committee_fulltext_mv fulltxt T 863d7a0, e85f335
ofec_candidate_fulltext_mv fulltxt T f028374
ofec_electioneering_mv purpose_description_text T 63329f0
ofec_dates_vw summary_text T 2d3ea71
ofec_dates_vw description_text T 2d3ea71
fec_fitem_sched_d creditor_debtor_name_text T 0363e40
ofec_sched_e_mv payee_name_text T 0b25d0b
sa7 contributor_name_text F to be moved to separate efile issue
sa7 contributor_employer_text F to be moved to separate efile issue
sa7 contributor_occupation_text F to be moved to separate efile issue

@jason-upchurch jason-upchurch changed the title API search for names with apostrophes searches by first chunk Fix TSVECTOR trigger definitions for search Jun 12, 2019
@jason-upchurch
Copy link
Contributor

@jborichevskiy I updated the issue title based on the applicability of this issue to other searchable text. We expect as part of the overall resolution, the reported issue for "o'hare" will also be resolved.

@lbeaufort
Copy link
Member

lbeaufort commented Jun 14, 2019

Occurrences of filter_fulltext_fields to audit and determine need to update.

Update from @jason-upchurch : all entries formerly in this section are now addressed in #3798 (comment)

@lbeaufort lbeaufort modified the milestones: Sprint 9.2, Sprint 9.3 Jun 17, 2019
@jason-upchurch
Copy link
Contributor

Adding test coverage for each table in #3798 (comment)

@leobuskin
Copy link

@jason-upchurch we can see the fix was merged but seems like it wasn't deployed to production yet, right? Because the problem still exists in production.

@PaulClark2
Copy link
Contributor

PaulClark2 commented Jul 3, 2019

@leobuskin thanks for checking in!

@jason Upchurch’s work is scheduled to be merged to production on 7/9. We need to re-populate all the data tables to use the new logic. You can the follow that work here, #3836. #3836 is scheduled to be merged to production on 8/6.

@jason-upchurch
Copy link
Contributor

@leobuskin @jborichevskiy cc @PaulClark2 @lbeaufort :
Update on this issue: because the search term parser was restored to its original definition that deals with chunks separated by special characters, the issue reported in this ticket should be resolved. As of Friday, performing the same query originally reported returned 96 results.

There remain some side effects associated with restoring this function which will be resolved under #3836. Thank you again for reporting this issue, and please let us know if you have any problems with the above query.
--Jason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants