Feature request: Remove or increase download cap, restrict pagination on large datasets #5884

lbeaufort · 2024-06-26T14:33:53Z

Issue

When paginating through millions of records, it can take several minutes to retrieve just 100 records at a time. This inefficiency prevents users from accessing the data they need promptly and results in expensive queries being run repeatedly.

Proposed solution

To improve this process, we propose either removing or increasing the download cap and restricting pagination for datasets larger than 500k or 1 million records. This change would allow users to queue up a download for large datasets, eliminating the need to paginate through all the data.

Action item(s)

Load test with custom tests
Compare performance
Consider protections

Completion criteria

(What does the end state look like - as long as this task(s) is done, this work is complete)

[ ]

References/resources/technical considerations

(Is there sample code or a screenshot you can include to highlight a particular issue? Here is where you reinforce why this work is important)

cnlucas · 2024-07-23T16:49:32Z

#1378 Background (was set to 100k and was raised in #2584)

patphongs · 2024-07-23T18:54:26Z

increasing the download cap and restricting pagination for datasets larger than 500k or 1 million records

Notes from 7/23/2024 discussion

Excel spreadsheet limit is 1,048,576 rows
Specific key for downloads endpoint
Direct user to generate bulk download CSV of those records
Generally get 3k-6k in downloads, sometimes get 50k daily
Locust testing is an option
There's 2 calls that are made, one for count and one for data.
Long running query isn't necessarily a complex query.
Start with the API umbrella and with the calls that are over 5 minutes.

What are meaningful indicators of expensive queries?

Should we looking at response time? or at 500k+ record count?
How do we measure query complexity? (Explain plan may have a cost score? This is run to get the count)

Questions?

Would this be faster?
- How long does it take to generate a CSV of 500k+ records?
Should we do this for only API users at first? Don't do this for public website for now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Remove or increase download cap, restrict pagination on large datasets #5884

Feature request: Remove or increase download cap, restrict pagination on large datasets #5884

lbeaufort commented Jun 26, 2024 •

edited by patphongs

Loading

cnlucas commented Jul 23, 2024 •

edited

Loading

patphongs commented Jul 23, 2024 •

edited

Loading

Feature request: Remove or increase download cap, restrict pagination on large datasets #5884

Feature request: Remove or increase download cap, restrict pagination on large datasets #5884

Comments

lbeaufort commented Jun 26, 2024 • edited by patphongs Loading

Issue

Proposed solution

Action item(s)

Completion criteria

References/resources/technical considerations

cnlucas commented Jul 23, 2024 • edited Loading

patphongs commented Jul 23, 2024 • edited Loading

Notes from 7/23/2024 discussion

Questions?

lbeaufort commented Jun 26, 2024 •

edited by patphongs

Loading

cnlucas commented Jul 23, 2024 •

edited

Loading

patphongs commented Jul 23, 2024 •

edited

Loading