Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counts for all endpoints are incorrect when 'sort_hide_null=True' #4626

Closed
3 tasks
lbeaufort opened this issue Sep 21, 2020 · 2 comments · Fixed by #4683
Closed
3 tasks

Counts for all endpoints are incorrect when 'sort_hide_null=True' #4626

lbeaufort opened this issue Sep 21, 2020 · 2 comments · Fixed by #4683
Assignees
Milestone

Comments

@lbeaufort
Copy link
Member

lbeaufort commented Sep 21, 2020

User story

As a user, I want the endpoints to return correct counts when I use sort_hide_null=True because I want accurate results.

Expected behavior

When we use sort_hide_null=True, it's supposed to hide rows that have null values for whatever we are sorting on (e.g., first_file_date
(We want count to reflect the number of rows that are shown)

Actual behavior

  • correctly hides the row
  • count reflects rows that hidden as well

Examples of incorrect behavior:

https://api.open.fec.gov/v1/candidates/?api_key=DEMO_KEY&sort_hide_null=true&sort_nulls_last=true&is_active_candidate=true&sort=-first_file_date
is the same count as
https://api.open.fec.gov/v1/candidates/?api_key=DEMO_KEY&sort_hide_null=false&sort_nulls_last=true&is_active_candidate=true&sort=-first_file_date&per_page=100 ➡️ when I confirmed there are records with null first_file_date values.

https://api.open.fec.gov/v1/schedules/schedule_a/?api_key=DEMO_KEY&sort_hide_null=true&sort_nulls_last=false&data_type=processed&committee_id=C00671339&line_number=F3-11AI&sort=-contribution_receipt_date&per_page=100&two_year_transaction_period=2020
➡️ shows count of 3 and no results

This applies to ItemizedResource and ApiResource endpoints. This means the counts are wrong, and makes pagination really tough because it looks like there are more records than there are. I'm not sure this impacts front-end users.
I put some tests in the 4626-tests branch: ff8ab0a

Completion criteria

  • Counts update for sort_hide_null=True
  • Double-check counts update for sort_nulls_only=True
  • Need to add test cases to check count behavior

Helpful additions:

This is an example of an API resource being ok (updating counts correctly):
https://api.open.fec.gov/v1/candidates/?api_key=DEMO_KEY&sort_nulls_only=true&sort_nulls_last=true&is_active_candidate=true&sort=-first_file_date&per_page=100

@lbeaufort lbeaufort added this to the Sprint 13.6 milestone Sep 21, 2020
@lbeaufort lbeaufort changed the title Counts not updating for 'sort_hide_null=True' Counts not updating for 'sort_hide_null=True' for all endpoints Sep 21, 2020
@lbeaufort lbeaufort changed the title Counts not updating for 'sort_hide_null=True' for all endpoints Counts for all endpoints are incorrect when 'sort_hide_null=True' Sep 21, 2020
@JonellaCulmer JonellaCulmer reopened this Sep 24, 2020
@jason-upchurch jason-upchurch self-assigned this Nov 4, 2020
@jason-upchurch
Copy link
Contributor

jason-upchurch commented Nov 4, 2020

sort_hide_null = True:

SELECT count(*) FROM ofec_candidate_detail_mv 
LEFT OUTER JOIN ofec_candidate_flag_mv 
AS ofec_candidate_flag_mv_1 
ON ofec_candidate_flag_mv_1.candidate_id = ofec_candidate_detail_mv.candidate_id 
WHERE ofec_candidate_detail_mv.candidate_inactive = false 
AND ofec_candidate_detail_mv.first_file_date is not NULL;
=> 33048

sort_hide_null = False:

SELECT count(*) FROM ofec_candidate_detail_mv 
LEFT OUTER JOIN ofec_candidate_flag_mv 
AS ofec_candidate_flag_mv_1 
ON ofec_candidate_flag_mv_1.candidate_id = ofec_candidate_detail_mv.candidate_id 
WHERE ofec_candidate_detail_mv.candidate_inactive = false ;
=> 42857

@jason-upchurch
Copy link
Contributor

jason-upchurch commented Nov 4, 2020

A tracing of the logic for https://api.open.fec.gov/v1/candidates/?api_key=DEMO_KEY&sort_hide_null=true&sort_nulls_last=true&is_active_candidate=true&sort=-first_file_date:

  1. resource build_query in candidates.py (this gets the query object, which includes NULL values)
  2. common get_count in counts.py (which contains the NULL values)
  3. webservices fetch_page in utils.py which passes count=count from 2. Thus, the inflated count is included.

Passing count=None in step 3 forces the paginator to compute the correct count. Holding a team discussion later to get some additional opinions on this as we may be able to rely on the paginator count (but either way we should not call count twice if this fix is accepted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants