Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Combine Execution and Audit Logs in Request Status Endpoint [#1024] #1068

Merged
merged 4 commits into from
Aug 15, 2022

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Aug 12, 2022

Purpose

We have AuditLogs that store user actions at the PrivacyRequest level, and ExecutionLogs that describe system actions at the individual Collection level. There is information from both of these tables that is useful in establishing the timeline of the request.

Unify/combine this information into a single endpoint.

Changes

  • Update existing request status endpoint, so when the verbose query param is used, e.g. http://0.0.0.0:8080/api/v1/privacy-request?include_identities=true&request_id=<privacy_request_id>&verbose=true, both execution logs and audit logs are included in the response
    • Behind the scenes, this is a union all query. Both tables share some columns and have unique columns of their own, so if a column is missing from one table, we return null.
    • Current API returns collection logs grouped by dataset name, so we just give the audit log a grouping related to its request status, and then it has a single log associated, the audit log.
    • Existing pagination is used, in that we limit embedded logs per dataset to 50.

Screen Shot 2022-08-12 at 9 19 13 AM

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #1024

…s used and execution logs are embedded, also return audit logs.

Execution Logs are created at the collection level while audit logs are for the overall privacy request level, so most fields returned for audit logs are None.
Logs are also grouped at the dataset level here, so give the audit logs a fake dataset name for display purposes, for example, "Request approved".
@pattisdr pattisdr marked this pull request as ready for review August 12, 2022 14:58
AuditLog.message,
cast(AuditLog.action.label("status"), sqlalchemy.String).label("status"),
AuditLog.privacy_request_id,
null().label("dataset_name"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, I didn't know you could do this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me either. Very cool. It's great that they handle this scenario so nicely

AuditLog.user_id,
).filter(AuditLog.privacy_request_id == self.id)

combined: Query = execution_log_query.union_all(audit_log_query)
Copy link
Contributor Author

@pattisdr pattisdr Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like union all keeps all of the records from both original tables, but union removes duplicates. union sorts and eliminates duplicates first. In early testing, union was returning too many records, but in hindsight this may have had nothing to do with the union and may have been related to the fact that I didn't have other parts of my query correct yet.

Regardless, I think union all makes sense here - we don't need to do the extra step of sorting and dupe-removal.

Comment on lines +111 to +128
class ExecutionAndAuditLogResponse(BaseSchema):
"""Schema for the combined ExecutionLogs and Audit Logs
associated with a PrivacyRequest"""

collection_name: Optional[str]
fields_affected: Optional[List[FieldsAffectedResponse]]
message: Optional[str]
action_type: Optional[ActionType]
status: Optional[Union[ExecutionLogStatus, AuditLogAction]]
updated_at: Optional[datetime]
user_id: Optional[str]

class Config:
"""Set orm_mode and allow population by field name"""

use_enum_values = True
allow_population_by_field_name = True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally extended the ExecutionLogResponse schema above and just overrode fields to make them optional or change their type but mypy didn't like this, so I made a new one. I also allow_population_by_field_name here because it is no longer ExecutionLogs that are getting serialized with this, it's a union query of audit logs and execution logs.

combined: Query = execution_log_query.union_all(audit_log_query)

for log in combined.order_by(ExecutionLog.updated_at.asc()):
dataset_name: str = log.dataset_name or f"Request {log.status}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint groups execution logs by dataset name, but audit logs don't have a dataset. I just create a name for the audit log, and then the single audit log will be associated with that name.

Copy link
Contributor

@TheAndrewJackson TheAndrewJackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🚀 union_all is interesting.

AuditLog.message,
cast(AuditLog.action.label("status"), sqlalchemy.String).label("status"),
AuditLog.privacy_request_id,
null().label("dataset_name"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me either. Very cool. It's great that they handle this scenario so nicely

@TheAndrewJackson
Copy link
Contributor

It's ready to merge once the changelog conflicts are fixed.

@TheAndrewJackson TheAndrewJackson merged commit c71e865 into main Aug 15, 2022
@TheAndrewJackson TheAndrewJackson deleted the fidesops_1024_combined_logs branch August 15, 2022 13:56
@pattisdr
Copy link
Contributor Author

thanks for the review @TheAndrewJackson !

sanders41 pushed a commit that referenced this pull request Sep 22, 2022
…1068)

* Update the request status endpoint, so when the verbose query param is used and execution logs are embedded, also return audit logs.

Execution Logs are created at the collection level while audit logs are for the overall privacy request level, so most fields returned for audit logs are None.
Logs are also grouped at the dataset level here, so give the audit logs a fake dataset name for display purposes, for example, "Request approved".

* Update CHANGELOG and update docs to reflect that audit logs are included in a verbose request status response.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unify ExecutionLog and AuditLog into a single endpoint with grouping at request level
3 participants