Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Email Connector: Send Email with Erasure Instructions [#1158] #1246

Merged
merged 14 commits into from
Sep 7, 2022

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Sep 1, 2022

❗ Contains migration
Don't merge until #1240 is resolved

Purpose

Adds the final PR for the EmailConnector. An EmailConnector sends an email at the end of privacy request execution to third parties with instructions on how to mask data that we can't access automatically. Also introduces the concept of "checkpoints" for privacy request execution to give us more locations from which to resume a paused/failed privacy request.
This allows us to retry just the email send and everything else downstream if the email send fails, and makes it easier to add other checkpoints in the future.

Changes

  • Run a new method: email_connector_erasure_send after the access and erasure steps of privacy request execution. It collects data that was cached by the traversal for any collections on email-related datasets, and fires off a single email for that dataset.
  • Adds a new AuditLog type: "email_sent", and creates an AuditLog provided the email send succeeds.
  • Have the EmailConnector.mask_data raise a specific Exception just to override the ExecutionLog message for the collection, to indicate that the email was prepared. The email itself is not sent until later. ExecutionLog creation is the responsibility of the GraphTask, not an individual email connector. We use similar exceptions to "pause" a privacy request.
  • Add the concept of "checkpoints" in PrivacyRequest execution to formalize restarting from certain points in the execution. We already could resume a privacy request from a Pre-Execution Webhook or from a particular collection in the access or erasure graph, but we'd like to be able to retry just the email send portion of this. This makes the checkpoint logic more consistent.
  • Allows configuring a test_email on the EmailConnector secrets, so the fidesops admin can send a test email to themselves with test data in it, to confirm their email config is setup properly.
  • Rename some of the methods and classes related to serializing different "checkpoints". Previously, you could generally just resume request execution from a specific collection in the traversal, but we want to make a collection not necessary, if the resume point is outside of the traversal.

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #1158

…est execution.

- Add a migration to create a new audit log type.  Create an audit log for the email send.
-  Throw an exception for email-based connectors and catch to override the default execution log.
- Add a draft of an email template
- Connect sending a "test email" with dummy data.  A fidesops admin could configure to check their email config was working.
…ions from which we can resume privacy request execution without having to run from the beginning.

- Add more options to CurrentStep Enum
- Cache the checkpoint if an email send fails, so we can retry from the same step.
@pattisdr pattisdr changed the title [DRAFT] Email Connector: Send Email [#1158] [DRAFT] Email Connector: Send Email with Erasure Instructions [#1158] Sep 1, 2022
…e no updates to be applied to any of the collections on the dataset.
…ailed privacy request can be resumed outside of the traversal.
@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 2, 2022

Hi @ethyca/docs-authors, can you lend your expertise in helping me with the email copy here?

In short, this is an Email Connector that sends an email to a third party telling them how to erase data that we can't erase automatically.

There may be multiple collections that they should delete data from, and multiple fields/values they should query on their collection to find the records that should be masked, with multiple fields that need to be deleted. It is possible that they need to look up data on one of their collections to find data in another one of their collections. It is possible that some of the collections have no data to be erased but they are included because they need to locate a record on that collection in order to find data on another collection.

However, the scenario where there are multiple collections is probably less likely. We imagine the fidesops user won't have good knowledge of the third party's data structure, and will create one collection with the fields they should mask.


In my email example below, the customer has two collections, children and daycare_customer. There are several fields they need to erase on the children collection, but they need to first look up the daycare_customer with an id of 1, and then use that to query matching parent_ids on children. There are no fields to erase on the daycare_customer table, but it's included as a locator for children.

Screen Shot 2022-09-02 at 10 46 42 AM

@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 2, 2022

@ethyca/docs-authors I've also added a first draft of docs for the email connector here -

@pattisdr pattisdr changed the title [DRAFT] Email Connector: Send Email with Erasure Instructions [#1158] Email Connector: Send Email with Erasure Instructions [#1158] Sep 2, 2022
Comment on lines 93 to +103

class CollectionActionRequired(BaseSchema):
"""Describes actions needed on a given collection.
class CheckpointActionRequired(BaseSchema):
"""Describes actions needed on a particular checkpoint.

Examples are a paused collection that needs manual input, a failed collection that
needs to be restarted, or a collection where instructions need to be emailed to a third
party to complete the request.
"""

step: CurrentStep
collection: CollectionAddress
collection: Optional[CollectionAddress]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just using this existing structure that we already have to cache details about a request that paused or failed in the traversal to extend it to describe when a request fails outside of the traversal. I'm removing the mandate that a collection exists, because this is only relevant inside the traversal.

I've renamed several related pieces.

Comment on lines +824 to +829
if not paused_collection:
raise HTTPException(
status_code=HTTP_422_UNPROCESSABLE_ENTITY,
detail="Cannot save manual data on paused collection. No paused collection saved'.",
)

Copy link
Contributor Author

@pattisdr pattisdr Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just adding a new check here now that I've generally removed the requirement that a collection is saved on a CheckpointActionRequired but this is one place where it's needed.

(To be clear, this shouldn't ever be hit though)

Comment on lines +794 to +805
def can_run_checkpoint(
request_checkpoint: CurrentStep, from_checkpoint: Optional[CurrentStep] = None
) -> bool:
"""Determine whether we should run a specific checkpoint in privacy request execution

If there's no from_checkpoint specified we should always run the current checkpoint.
"""
if not from_checkpoint:
return True
return EXECUTION_CHECKPOINTS.index(
request_checkpoint
) >= EXECUTION_CHECKPOINTS.index(from_checkpoint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already had this ability in privacy request execution to resume from a specific point in the cause of a pause or failure, but this formalizes it a bit, and makes it easier to extend now that we are starting to add multiple locations from which a privacy request can be resumed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this abstraction. Makes it easier to understand as we scale the number of execution checkpoints 💯

Comment on lines 53 to 82
config = EmailSchema(**self.configuration.secrets or {})
logger.info("Starting test connection to %s", self.configuration.key)

db = Session.object_session(self.configuration)

try:
dispatch_email(
db=db,
action_type=EmailActionType.EMAIL_ERASURE_REQUEST_FULFILLMENT,
to_email=config.test_email,
email_body_params={
"test_collection": CheckpointActionRequired(
step=CurrentStep.erasure,
collection=CollectionAddress("test_dataset", "test_collection"),
action_needed=[
ManualAction(
locators={"id": ["example_id"]},
get=None,
update={
"test_field": "null_rewrite",
},
)
],
)
},
)
except EmailDispatchException as exc:
logger.info("Email connector test failed with exception %s", Pii(exc))
return ConnectionTestStatus.failed
return ConnectionTestStatus.succeeded
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a test_email is configured (you'd set up a test_email that you had access to), you can use this to send a test email to yourself. The data inside is just dummy data. It really just confirms your email config is working.

Comment on lines +317 to +337
# Send erasure requests via email to third parties where applicable
if can_run_checkpoint(
request_checkpoint=CurrentStep.erasure_email_post_send,
from_checkpoint=resume_step,
):
try:
email_connector_erasure_send(
db=session, privacy_request=privacy_request
)
except EmailDispatchException as exc:
privacy_request.cache_failed_checkpoint_details(
step=CurrentStep.erasure_email_post_send, collection=None
)
privacy_request.error_processing(db=session)
await fideslog_graph_failure(
failed_graph_analytics_event(privacy_request, exc)
)
# If dev mode, log traceback
_log_exception(exc, config.dev_mode)
return

Copy link
Contributor Author

@pattisdr pattisdr Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the primary change of this PR: if applicable, we combine cached data from all the collections on an email-based dataset and send a single email for each dataset. There's no batching currently. If there is no data to mask / the connection was read-only, etc. no email is sent.

Comment on lines +95 to +100
except PrivacyRequestErasureEmailSendRequired as exc:
self.log_end(action_type, ex=None, success_override_msg=exc)
self.resources.cache_erasure(
f"{self.traversal_node.address.value}", 0
) # Cache that the erasure was performed in case we need to restart
return 0
Copy link
Contributor Author

@pattisdr pattisdr Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following a pattern we use elsewhere - I raise an Exception in the EmailConnector.mask_data to update the execution log as it is a concern of the GraphTask and not the individual EmailConnectors. In this case, I want to add a more specific message to the erasure ExecutionLog as the action is not technically completed until we send the email at the end.

@@ -102,6 +111,7 @@ def run_webhooks_and_report_status(
webhook.key,
)
privacy_request.error_processing(db)
privacy_request.cache_failed_checkpoint_details(current_step)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using our new request execution checkpoints this lets us resume from the pre or post execution webhooks step later if they fail. We used to be able to resume from pause but had no handling to resume from failure.

@pattisdr pattisdr marked this pull request as ready for review September 2, 2022 22:01
@conceptualshark
Copy link
Contributor

@pattisdr I ran through and combined the two email guides and updated them to match the formatting - let me know if this works for you?

- Subject Identity Verification - for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide.
- Erasure Request Email Fulfillment - sends an email to configured third parties to process erasures for a given data subject. See [Email Connectors](email_connectors.md) for more information.
- Subject Identity Verification - sends a verification code to the user's email address prior to for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide.
- Erasure Request Email Fulfillment - sends an email to configured third parties to process erasures for a given data subject. See [creating Email Connectors](#create-an-email-connector) for more information.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conceptualshark I don't think this goes anywhere


Supported modes of use:

- Subject Identity Verification - for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide.
- Erasure Request Email Fulfillment - sends an email to configured third parties to process erasures for a given data subject. See [Email Connectors](email_connectors.md) for more information.
- Subject Identity Verification - sends a verification code to the user's email address prior to for more information on identity verification in subject requests, see the [Privacy Requests](privacy_requests.md#subject-identity-verification) guide.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conceptualshark Sentence doesn't seem complete... "prior to executing a privacy request" or similar?

Comment on lines +84 to +90
| Field | Description |
|----|----|
| `key` | A unique key used to manage your email connector. This is auto-generated from `name` if left blank. Accepted values are alphanumeric, `_`, and `.`. |
| `name` | A unique user-friendly name for your email connector. |
| `connection_type` | Must be `email` to create a new email connector. |
| `access` | Email connectors must be given `write` access in order to send an email. |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice formatting here will try to remember this for future docs drafts @conceptualshark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching these - ran through a bit fast 😨

Copy link
Contributor

@eastandwestwind eastandwestwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @pattisdr ! I've manually tested that:

  1. The test email was successfully sent upon adding secrets to the email connector:

Screen Shot 2022-09-06 at 2 26 59 PM

2. The actual erasure email was successfully sent upon executing an erasure request:

Screen Shot 2022-09-06 at 2 27 04 PM

Nothing blocking, just minor syntax issue in docs.

Note for future - this is another feature that relies on an email config being set up, and it's currently possible for email config not found exceptions to happen at run time during a privacy request. To prevent this, we can have Admin UI check for a valid email config at any point in the setup flow

db = Session.object_session(self.configuration)

try:
dispatch_email(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just keeping an eye on #1173 for possible conflicts depending on which is merged first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for the callout

@@ -380,14 +389,14 @@ def cache_email_connector_template_contents(

def get_email_connector_template_contents_by_dataset(
self, step: CurrentStep, dataset: str
) -> Dict[str, Optional[CollectionActionRequired]]:
) -> Dict[str, Optional[CheckpointActionRequired]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we create a new class for Dict[str, Optional[CheckpointActionRequired]]? I'd prefer to be explicit on what the str represents here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing, added a new type EmailRequestFulfillmentBodyParams that is created from template data we pull out of the cache by dataset

Comment on lines +794 to +805
def can_run_checkpoint(
request_checkpoint: CurrentStep, from_checkpoint: Optional[CurrentStep] = None
) -> bool:
"""Determine whether we should run a specific checkpoint in privacy request execution

If there's no from_checkpoint specified we should always run the current checkpoint.
"""
if not from_checkpoint:
return True
return EXECUTION_CHECKPOINTS.index(
request_checkpoint
) >= EXECUTION_CHECKPOINTS.index(from_checkpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this abstraction. Makes it easier to understand as we scale the number of execution checkpoints 💯

```json title="<code>PUT api/v1/connection/{email_connection_config_key}/secret</code>"
{
"test_email": "my_email@example.com",
"to_email": "third_party@example.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing quote "

…he cached email details are extracted by dataset.
…onnector_email_send

# Conflicts:
#	tests/ops/integration_tests/test_integration_email.py
@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 6, 2022

Comments addressed; back to you @eastandwestwind!

@eastandwestwind
Copy link
Contributor

thanks for addressing the comments, looks great @pattisdr !

@eastandwestwind eastandwestwind merged commit c49b426 into main Sep 7, 2022
@eastandwestwind eastandwestwind deleted the fidesops_1158_email_connector_email_send branch September 7, 2022 14:30
@conceptualshark
Copy link
Contributor

@pattisdr I think the basic collection/field sections are probably fine as-is - could we add something more comprehensive to explain its purpose, and could we add in the org name to the template?

This is an automated email sent by {ORGANIZATION NAME} using Fides. Customers of {ORGANIZATION NAME} would like to exercise their data privacy right to be deleted.

Please locate and erase personally identifiable information for all data subjects and records listed below:

And then the list you already have generating, I think.

@pattisdr
Copy link
Contributor Author

pattisdr commented Sep 7, 2022

great improvements, thank you @conceptualshark, I'll add this in a follow-up ticket

sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* Send an email for each email-based dataset at the end of privacy request execution.

- Add a migration to create a new audit log type.  Create an audit log for the email send.
-  Throw an exception for email-based connectors and catch to override the default execution log.
- Add a draft of an email template
- Connect sending a "test email" with dummy data.  A fidesops admin could configure to check their email config was working.

* Add more "checkpoints" to privacy request execution - these are locations from which we can resume privacy request execution without having to run from the beginning.

- Add more options to CurrentStep Enum
- Cache the checkpoint if an email send fails, so we can retry from the same step.

* Don't send an email if the connection config is read only or there are no updates to be applied to any of the collections on the dataset.

* Don't assume there's a collection when building "resume" details. A failed privacy request can be resumed outside of the traversal.

* Add a first draft of docs for setting up an email connector.

* Moves the email connector send method to the email connector file.

* Update mock location.

* Bump downrev.

* update email connector guides

* correct link, broken sentence

* Create a new EmailRequestFulfillmentBodyParams type to be used once the cached email details are extracted by dataset.

* Fix missed test.

Co-authored-by: Cole <cole@ethyca.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EmailConnector: Send Erasure Email /Test Email
3 participants