Updating limit of execution logs on the endpoint#7412
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Greptile SummaryThis PR increases
Confidence Score: 4/5
Important Files Changed
Last reviewed commit: 156da40 |
| router = APIRouter(tags=["Privacy Requests"], prefix=V1_URL_PREFIX) | ||
|
|
||
| EMBEDDED_EXECUTION_LOG_LIMIT = 50 | ||
| # Heavely increasing the limit so the proper status of the request task is reflected |
There was a problem hiding this comment.
Typo in comment
"Heavely" should be "Heavily".
| # Heavely increasing the limit so the proper status of the request task is reflected | |
| # Heavily increasing the limit so the proper status of the request task is reflected |
| EMBEDDED_EXECUTION_LOG_LIMIT = 50 | ||
| # Heavely increasing the limit so the proper status of the request task is reflected | ||
| # for long running requests like polling requests. | ||
| EMBEDDED_EXECUTION_LOG_LIMIT = 1000 |
There was a problem hiding this comment.
Test performance impact from 20x limit increase
This limit is used in tests at test_verbose_privacy_request_embed_limit (lines ~2176 and ~3342 of the test file), which create EMBEDDED_EXECUTION_LOG_LIMIT + 10 records. With this change, those tests will now create 1,010 DB records each (up from 60), which could noticeably slow down the test suite.
Additionally, the truncation happens in Python (line 348) after fetching all logs from the DB — the query has no SQL LIMIT. This means for long-running polling requests, the endpoint could return up to 1,000 logs per dataset in a single API response. If a request touches multiple datasets, the response payload could grow substantially.
Consider whether a more targeted solution (e.g., fetching only the most recent N logs per dataset via a SQL window function, or implementing proper server-side pagination for execution logs) might address the root cause without the performance trade-off.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
galvana
left a comment
There was a problem hiding this comment.
I think this is ok for now but we should probably implement some incremental loading or only showing the last 50 logs per dataset.
Ticket ENG-2566
Description Of Changes
Our UI depends on the latest ExecutionLog status of each task to see the status of a task. For long lived task like polling this was causing the UI to show an incomplete status, which was confusing for end users
Code Changes
Steps to Confirm
Since the natural happening of this bug is with a long lived task such as appsflyer, we have to modify a but the base code for this
src/fides/api/service/privacy_request/request_service.pyinitiate_polling_task_requeueto 60 secondsPre-Merge Checklist
CHANGELOG.mdupdatedmaindowngrade()migration is correct and works