Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not export individual client audit attachments in .csv.zip #947

Merged
merged 1 commit into from Sep 3, 2023

Conversation

matthew-white
Copy link
Member

The submissions.csv.zip exports client audit logs in two ways:

  1. It exports each submission's individual client audit log along with other submission attachments (via SubmissionAttachments.streamForExport()).
  2. It combines all client audit logs into a single CSV file (via ClientAudits.streamForExport()).

If we're doing (2), then I think we don't really need to do (1). Exporting both (1) and (2) effectively exports each client audit log twice. For some forms, there can be more client audit log data than actual submission data, in which case exporting both ends up adding quite a bit of redundant data.

Note that the individual client audit logs aren't easily accessible from the .zip. Typically, each client audit log has the same name: audit.csv. That means that each log is written to the same path in the .zip: media/audit.csv. Apparently the .zip spec allows multiple files to be written to the same path, but the end result is that most users won't know how to access these files (or even know that there are multiple files to access). And again, there shouldn't be the need to access the individual files, because they're already combined elsewhere.

This PR excludes individual client audit log attachments from the submissions.csv.zip.

What has been done to verify that this works as intended?

I had to change tests in ways that were expected. After making those changes, tests continue to pass.

Why is this the best possible solution? Were any other approaches considered?

Just one line changed in lib/, but I'll leave a comment about that line.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Hopefully this PR just reduces the size of the submissions.csv.zip. SubmissionAttachments.streamForExport() looks to be used in only one place, the submissions.csv.zip endpoint.

I sort of doubt that anyone is accessing the individual client audit logs in the submissions.csv.zip, in large part because they're not easy to access. If someone really needs to access an individual client audit log, they can still do so using the individual submission attachment endpoint (at least for an unencrypted form).

If this does end up being an issue for someone, we could come back and add a flag to control whether to include individual client audit log attachments. But I think we should start with the simplest approach, which I'm hopeful will be enough.

Does this change require updates to the API documentation? If so, please update docs/api.md as part of this PR.

I don't see client audit logs mentioned in the API docs.

Before submitting this PR, please make sure you have:

  • run make test-full and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@matthew-white matthew-white added the needs testing Needs manual testing label Aug 4, 2023
@matthew-white matthew-white self-assigned this Aug 4, 2023
@@ -204,6 +204,7 @@ where submission_defs.current=true
and "deletedAt" is null
and ${odataFilter(options.filter, odataToColumnMap)}
and submission_attachments.name is distinct from submission_defs."encDataAttachmentName"
and submission_attachments."isClientAudit" is not true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isClientAudit can be TRUE, FALSE, or NULL. After a quick look, the only place where I see it being set to NULL is for the encrypted data attachment:

const _makeAttachment = (ensure, submissionDefId, name, file = null, deprecated = null, index = null, isClientAudit = null) => {

const encName = submission.def.encDataAttachmentName.trim();
results.push(_makeAttachment(ensure, submission.def.id, encName, fileLookup[encName], deprecatedLookup[encName], i));

That attachment should be excluded by the line above this one. However, I see other submission attachments on the QA server where isClientAudit is NULL, not FALSE. Maybe that's just junk data, but given that the column is nullable, I think the safest thing is to include submission attachments for which isClientAudit is either FALSE or NULL.

That's what IS NOT TRUE does: https://www.postgresql.org/docs/14/functions-comparison.html. = FALSE or <> TRUE would exclude submission attachments for which isClientAudit is NULL.

Copy link
Contributor

@sadiqkhoja sadiqkhoja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@matthew-white matthew-white merged commit 46862db into master Sep 3, 2023
2 checks passed
@matthew-white matthew-white deleted the skip-client-audit-attachments branch September 3, 2023 01:39
@matthew-white
Copy link
Member Author

@getodk/testers, this is ready to be verified on staging:

  • Create a form that uses client audit logs.
  • Create submissions to the form.
  • Download and unzip the submissions .csv.zip.
  • You should continue to see all the client audit logs consolidated in a single CSV file whose filename ends with "- audit.csv". However, in the media folder, you should not see any individual client audit logs. You should continue to see other files in the media folder.

@srujner
Copy link

srujner commented Sep 5, 2023

Tested with Success!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
behavior verified Behavior has been manually verified
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants