Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for pagination #4343

Closed
4 of 5 tasks
lbeaufort opened this issue May 11, 2020 · 4 comments · Fixed by #4344
Closed
4 of 5 tasks

Add validation for pagination #4343

lbeaufort opened this issue May 11, 2020 · 4 comments · Fixed by #4344
Assignees
Milestone

Comments

@lbeaufort
Copy link
Member

lbeaufort commented May 11, 2020

User story
As an API user, I want to be able to paginate through results. Paginating through Schedule A, B, and E (others? ItemizedResource) is confusing. When I mistakenly only pass last_index I don't get the results I'm looking for, but I can't tell why. In addition, these results are extremely slow to return.

For endpoints that use last_index pagination, users need to provide both last_index and last_contribution_receipt date. A quick look through the API umbrella indicates that lots of users aren't paginating correctly, and those queries are quite expensive. To be more user friendly and prevent very expensive queries, we should add API validation to ensure both these fields are provided.

See https://api.open.fec.gov/developers/#/receipts/get_schedules_schedule_a_
Example:

curl -I "https://api.open.fec.gov/v1/schedules/schedule_a/?sort_hide_null=false&sort_nulls_last=false&data_type=processed&min_date=05%2F06%2F2019&max_date=05%2F12%2F2020&sort=-contribution_receipt_date&per_page=100&contributor_name=Jeffrey+Allen&contributor_state=CA&api_key=DEMO_KEY" -w %{time_connect}:%{time_starttransfer}:%{time_total}

== 1.9 seconds
vs.

curl -I "https://api.open.fec.gov/v1/schedules/schedule_a/?sort_hide_null=false&sort_nulls_last=false&data_type=processed&min_date=05%2F06%2F2019&max_date=05%2F12%2F2020&sort=-contribution_receipt_date&per_page=100&contributor_name=Jeffrey+Allen&contributor_state=CA&last_index=4030220201231984113&api_key=DEMO_KEY" -w %{time_connect}:%{time_starttransfer}:%{time_total}

== 157.5 seconds and not the intended results

Completion criteria

  • If a user passes just last_index or just last_contribution_receipt_date, throw a 422 and tell them they need to pass both to paginate. For now just implementing this for last_index as there have been almost 500K of these requests in the last few months vs. a few hundred requests missing just last_index.
  • Make sure Swagger docs still work as expected
  • Make sure front end pagination works as expected (backwards, forwards, ascending, descending, Schedule A, B, E)

Technical steps

  • Determine why these queries are so expensive?
  • Add validation: To paginate through null values, users need last_index and sort_hide_nulls=True. To paginate through populated values, users need last_index and last_<sort_option. Rather than defaulting the last_ values to None, we should add validation/handling for this null pagination.
@lbeaufort lbeaufort self-assigned this May 11, 2020
@lbeaufort lbeaufort added this to the Sprint 12.3 milestone May 11, 2020
@lbeaufort
Copy link
Member Author

This is more complicated than I expected, but it might be possible. Here's an example of paginating through nulls into values with dates:

https://api.open.fec.gov/v1/schedules/schedule_a/?api_key=DEMO_KEY&sort_hide_null=false&sort_nulls_last=false&data_type=processed&two_year_transaction_period=2020&sort=-contribution_receipt_date&per_page=30&last_index=2022820191644684281&sort_null_only=true

https://api.open.fec.gov/v1/schedules/schedule_a/?api_key=DEMO_KEY&sort_hide_null=false&sort_nulls_last=false&data_type=processed&two_year_transaction_period=2020&sort=-contribution_receipt_date&per_page=30&last_index=2022820191644684281&sort_null_only=true

To paginate through null values, users need last_index and sort_hide_nulls=True. To paginate through populated values, users need last_index and last_<sort_option.

Rather than defaulting the last_ values to None, we should add validation/handling for this null pagination. A quick look through the API umbrella indicates that lots of users aren't paginating correctly, and those queries are quite expensive.

@JonellaCulmer
Copy link
Contributor

JonellaCulmer commented May 12, 2020

@lbeaufort Is there a way to inform users before proceeding with the query that they need to pass both? It's on us to tell them as soon as possible instead of telling them they're wrong for something they weren't aware of.

That's if a 422 is an error message.

@lbeaufort
Copy link
Member Author

@JonellaCulmer if I'm understanding you, yes - the plan is to send a 422, which is an error message, before running the query. https://httpstatuses.com/422

@lbeaufort
Copy link
Member Author

@JonellaCulmer after understanding your question better, I put in this follow-up issue for friendlier client-side validation in Swagger documentation: #4348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants