New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not export individual client audit attachments in .csv.zip #947
Conversation
@@ -204,6 +204,7 @@ where submission_defs.current=true | |||
and "deletedAt" is null | |||
and ${odataFilter(options.filter, odataToColumnMap)} | |||
and submission_attachments.name is distinct from submission_defs."encDataAttachmentName" | |||
and submission_attachments."isClientAudit" is not true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isClientAudit
can be TRUE
, FALSE
, or NULL
. After a quick look, the only place where I see it being set to NULL
is for the encrypted data attachment:
const _makeAttachment = (ensure, submissionDefId, name, file = null, deprecated = null, index = null, isClientAudit = null) => { |
central-backend/lib/model/query/submission-attachments.js
Lines 67 to 68 in 2d8d20d
const encName = submission.def.encDataAttachmentName.trim(); | |
results.push(_makeAttachment(ensure, submission.def.id, encName, fileLookup[encName], deprecatedLookup[encName], i)); |
That attachment should be excluded by the line above this one. However, I see other submission attachments on the QA server where isClientAudit
is NULL
, not FALSE
. Maybe that's just junk data, but given that the column is nullable, I think the safest thing is to include submission attachments for which isClientAudit
is either FALSE
or NULL
.
That's what IS NOT TRUE
does: https://www.postgresql.org/docs/14/functions-comparison.html. = FALSE
or <> TRUE
would exclude submission attachments for which isClientAudit
is NULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
@getodk/testers, this is ready to be verified on staging:
|
Tested with Success! |
The submissions.csv.zip exports client audit logs in two ways:
SubmissionAttachments.streamForExport()
).ClientAudits.streamForExport()
).If we're doing (2), then I think we don't really need to do (1). Exporting both (1) and (2) effectively exports each client audit log twice. For some forms, there can be more client audit log data than actual submission data, in which case exporting both ends up adding quite a bit of redundant data.
Note that the individual client audit logs aren't easily accessible from the .zip. Typically, each client audit log has the same name: audit.csv. That means that each log is written to the same path in the .zip: media/audit.csv. Apparently the .zip spec allows multiple files to be written to the same path, but the end result is that most users won't know how to access these files (or even know that there are multiple files to access). And again, there shouldn't be the need to access the individual files, because they're already combined elsewhere.
This PR excludes individual client audit log attachments from the submissions.csv.zip.
What has been done to verify that this works as intended?
I had to change tests in ways that were expected. After making those changes, tests continue to pass.
Why is this the best possible solution? Were any other approaches considered?
Just one line changed in lib/, but I'll leave a comment about that line.
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Hopefully this PR just reduces the size of the submissions.csv.zip.
SubmissionAttachments.streamForExport()
looks to be used in only one place, the submissions.csv.zip endpoint.I sort of doubt that anyone is accessing the individual client audit logs in the submissions.csv.zip, in large part because they're not easy to access. If someone really needs to access an individual client audit log, they can still do so using the individual submission attachment endpoint (at least for an unencrypted form).
If this does end up being an issue for someone, we could come back and add a flag to control whether to include individual client audit log attachments. But I think we should start with the simplest approach, which I'm hopeful will be enough.
Does this change require updates to the API documentation? If so, please update docs/api.md as part of this PR.
I don't see client audit logs mentioned in the API docs.
Before submitting this PR, please make sure you have:
make test-full
and confirmed all checks still pass OR confirm CircleCI build passes