Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add management command for sync suppressed emails #21784

Merged
merged 1 commit into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions src/olympia/users/management/commands/sync_suppressed_emails.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue might explain further, but if this is something that is expected to be run as a cron, with no args or options, it would probably be better implemented as a simple function in cron.py instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a simple convenience for testing and if we ever want to manually resync the suppression list. We should run this task on a periodic schedule. The issue will likely not explain much unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that with a simple function in cron.py too 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I trigger a function in cron in both a management command and periodically? Just curious? Also would there be retry logic if implemented that way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in crontab.tpl we execute both management commands directly, and simple functions via the cron management command.

There is no retry logic in our cron setup (other that it'd be executed again the time it was scheduled). The cron.py function vs. a separate management command .py file is just about the initial execution - in both cases you'd have to either write some custom retry logic, or wrap in a task and set a retry on the task.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from django.core.management.base import BaseCommand

from olympia.users.tasks import sync_blocked_emails


class Command(BaseCommand):
"""Sync Socket labs suppression list to database."""

def handle(self, *args, **options):
sync_blocked_emails.apply()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using apply might be the right thing here. Then the task executes immediately, and logs are printed to the console where the management command is executed.

Happy to hear feedback on that but it felt right when executing locally.

10 changes: 10 additions & 0 deletions src/olympia/users/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ def assert_socket_labs_settings_defined():
def sync_blocked_emails(batch_size=BATCH_SIZE, **kw):
assert_socket_labs_settings_defined()

task_log.info('fetching suppression list from socket labs...')

path = f'servers/{settings.SOCKET_LABS_SERVER_ID}/suppressions/download'
params = {'sortField': 'suppressionLastUpdate', 'sortDirection': 'dsc'}
url = (
Expand All @@ -120,6 +122,9 @@ def sync_blocked_emails(batch_size=BATCH_SIZE, **kw):
# Raise exception if not 200 like response
response.raise_for_status()

size_in_mb = len(response.content) / (1024 * 1024)
task_log.info(f'Downloaded suppression list of {size_in_mb:.2f} MB.')

with tempfile.NamedTemporaryFile(
dir=settings.TMP_PATH, delete=not settings.DEBUG, mode='w+b'
) as csv_file:
Expand All @@ -131,6 +136,8 @@ def sync_blocked_emails(batch_size=BATCH_SIZE, **kw):

next(csv_suppression_list)

count = 0

while True:
batch = list(itertools.islice(csv_suppression_list, batch_size))

Expand All @@ -139,6 +146,9 @@ def sync_blocked_emails(batch_size=BATCH_SIZE, **kw):

email_blocks = [SuppressedEmail(email=record[3]) for record in batch]
SuppressedEmail.objects.bulk_create(email_blocks, ignore_conflicts=True)
count += len(batch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we want some logging every batch to show progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary, at least not yet. Only added the logging here because the download part can take some time to run and without any logs it can be confusing and made me wonder if something was wrong with the code. I added the count at the end to just verify that we have synced "something". The syncing is really fast due to the batching so it wouldn't add much value IMO.


task_log.info(f'synced {count:,} suppressed emails.')


@task(autoretry_for=(HTTPError, Timeout), max_retries=5, retry_backoff=True)
Expand Down