Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Make worker node optional #770

Merged
merged 8 commits into from
Jul 6, 2022
Merged

Make worker node optional #770

merged 8 commits into from
Jul 6, 2022

Conversation

seanpreston
Copy link
Contributor

@seanpreston seanpreston commented Jun 30, 2022

Purpose

We've recently added Celery as a project dependency, and it's currently compulsory to run a worker in order to execute privacy requests. This PR makes that optional, by providing an option to run the worker on the same node as the webserver as a subprocess.

Changes

  • Moves worker definition into docker-compose.worker.yml
  • Adds USE_DEDICATED_WORKER config var
  • If USE_DEDICATED_WORKER is False -> run fidesops worker as a subprocess on webserver startup

Effect

  • Running docker compose up will bring up the webserver, and run fidesops worker as a subprocess on that container
  • Running docker compose -f docker-compose.yml -f docker-compose.worker.yml will force USE_DEDICATED_WORKER to True and bring up the worker container separately

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #769

@@ -27,6 +27,8 @@ The types of changes are:
* Parallelize CI safe checks to reduce run time [#717](https://github.com/ethyca/fidesops/pull/717)
* Add dependabot to keep dependencies up to date [#718](https://github.com/ethyca/fidesops/pull/718)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have left a blank space here in anticipation of a conflict with this PR

@@ -172,6 +174,10 @@ def start_webserver() -> None:
)
)

if not config.execution.USE_DEDICATED_WORKER:
logger.info("Starting worker...")
subprocess.Popen(["fidesops", "worker"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may well be a smarter way to run this on the same container. For instance this wouldn't be any more performant than Fidesops prior to Celery if this was run outside of a high availability environment, as the same server is doing the work albeit in a slightly different structure. The main benefits of this are:

  • giving users the option to run everything with as small container footprint as possible
  • saving developers time and memory building + operating the worker container to test the privacy request flow locally

I'd be interested to test whether running the privacy request blocks the uvicorn server receiving requests in a deployed setting.

@@ -98,6 +101,10 @@ TASK_RETRY_DELAY=20
TASK_RETRY_BACKOFF=2
REQUIRE_MANUAL_REQUEST_APPROVAL=true
MASKING_STRICT=true
CELERY_BROKER_URL="redis://:testpassword@redis:6379/1"
CELERY_RESULT_BACKEND="redis://:testpassword@redis:6379/1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a separate task out to make these defaults more sane

@@ -58,6 +58,9 @@ The `fidesops.toml` file should specify the following variables:
|`TASK_RETRY_BACKOFF` | `FIDESOPS__EXECUTION__TASK_RETRY_BACKOFF` | int | 2 | 1 | The backoff factor for retries, to space out repeated retries.
|`REQUIRE_MANUAL_REQUEST_APPROVAL` | `FIDESOPS__EXECUTION__REQUIRE_MANUAL_REQUEST_APPROVAL` | bool | False | False | Whether privacy requests require explicit approval to execute
|`MASKING_STRICT` | `FIDESOPS__EXECUTION__MASKING_STRICT` | bool | True | True | If MASKING_STRICT is True, we only use "update" requests to mask data. (For third-party integrations, you should define an `update` endpoint to use.) If MASKING_STRICT is False, you are allowing fidesops to use any defined DELETE or GDPR DELETE endpoints to remove PII. In this case, you should define `delete` or `data_protection_request` endpoints for your third-party integrations. Note that setting MASKING_STRICT to False means that data may be deleted beyond the specific data categories that you've configured in your Policy.
|`CELERY_BROKER_URL` | `FIDESOPS__EXECUTION__CELERY_BROKER_URL` | str | redis://:testpassword@redis:6379/1 | N/A | The datastore to maintain ordered queues of tasks.
|`CELERY_RESULT_BACKEND` | `FIDESOPS__EXECUTION__RESULT_BACKEND` | str | redis://:testpassword@redis:6379/1 | N/A | The datastore to put results from asynchronously processed tasks.
|`WORKER_ENABLED` | `FIDESOPS__EXECUTION__WORKER_ENABLED` | bool | True | True | Whether Fidesops is running with a dedicated worker to process privacy requests asynchronously.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These descriptions will undoubtedly need work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine for right now - they can be expanded on with #755

@@ -41,6 +41,7 @@ class ExecutionSettings(FidesSettings):
MASKING_STRICT: bool = True
CELERY_BROKER_URL: str = "redis://:testpassword@redis:6379/1"
CELERY_RESULT_BACKEND: str = "redis://:testpassword@redis:6379/1"
WORKER_ENABLED: bool = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we default our users to True but use false for ourselves? - https://github.com/ethyca/fidesops/pull/770/files#diff-b9a882b5033bf73236083b237701d90db30d7405950cbba9cfee37df20bdb191R37

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're generally deploying Fidesops in production, so should be encouraged to use a robust setup. We're using Fidesops in a development context and might not need to focus on the execution of privacy requests always, and it's mainly important when execution is happening.

@eastandwestwind
Copy link
Contributor

eastandwestwind commented Jul 6, 2022

Doing some thinking about workflow here, a user would need to both set FIDESOPS__EXECUTION__WORKER_ENABLED=True and use the make server-and-worker command, if they wished to run Celery on a separate worker.

This is fine for now, but I'd like to be able to detect the FIDESOPS__EXECUTION__WORKER_ENABLED from make server, so that the user experience is simpler.

We talked briefly about passing the env var around in Makefile, but I'm unsure if we specifically addressed the following: We could do something similar to how we call run_infrastructure.py for other Make commands, and pass in our WORKER_ENABLED env variable. If enabled, we can then bring up docker-compose and docker-compose.worker. What do you think? Should we create a ticket for this?

@seanpreston
Copy link
Contributor Author

We talked briefly about passing the env var around in Makefile, but I'm unsure if we specifically addressed the following: We could do something similar to how we call run_infrastructure.py for other Make commands, and pass in our WORKER_ENABLED env variable. If enabled, we can then bring up docker-compose and docker-compose.worker. What do you think? Should we create a ticket for this?

I think they're two separate concerns. It's good for us to be able to spin up workers as and when we need to work with them locally, but it doesn't need to be part of the default setup as long as we can guarantee the same code is being run and it's only the paradigm that changes. Ideally I'd like to see if this does get in our way first and then address via the current change management avenues (retros etc).

@eastandwestwind
Copy link
Contributor

@seanpreston sounds good to me, let's make sure we at least explain steps needed in the docs issue- #755

@seanpreston seanpreston merged commit f493e55 into main Jul 6, 2022
@seanpreston seanpreston deleted the optional-worker-config branch July 6, 2022 17:44
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* default fidesops to running the worker and webserver on same container

* default to using a worker, add docker config for worker

* update changelog

* USE_DEDICATED_WORKER -> WORKER_ENABLED

* add basic descriptions for celery vars to docs

* remove unused import

* add Make command for a server + worker
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make worker node optional
5 participants