Skip to content

Example temporal worker application deployed to Cloud Run

Notifications You must be signed in to change notification settings

rross/temporal-cloud-run

Repository files navigation

Temporal on Cloud Run

This repository demonstrates how to run workers on GCP's Cloud Run. Cloud Run is a serverless offering making it easier to deploy and run containerized applications. Initially Cloud Run was only for web based applications that responds to requests. Over time Cloud Run has added the ability to run jobs as well.

As awesome as Cloud Run is, there are some challenges that need to be addressed for applications that act as Temporal workers. One challenge is that Cloud Run is designed to run when requests are made, which provides a nice pay per use model. Unfortunately this doesn't play well with workers which long poll Temporal, waiting for work to process.

The workaround for this is to disable CPU Throttling. In YAML this looks like this:

spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/cpu-throttling: 'false'  # we need to keep the CPU running

Another challenge is that by default, an application running Cloud Run is scaled down to zero instances if there are no inbound web requests. This too, doesn't play well with Temporal workers.

The workaround for this is to set the minimum number of instances:

spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: '1' # keep one instance available

Cloud Run only allows one port to be publicly exposed and exposing a Temporal SDK Metrics endpoint externally is not a good idea. Deploy the Open Telemetry Connector in a sidecar container. The Open Telemetry Collector reads the Temporal SDK Metrics Prometheus scrape endpoint and can push the metrics (and optionally tracing and logging) to a destination of your choice.

In this example, I chose to send the metrics to Google Cloud Managed Prometheus. The collector supports a wide range of other exporters including Prometheus Remote Write Collector, Datadog, Splunk, Google Cloud Pubsub, Google Cloud Operations Collector. The full list of exporters are available here.

Project Organization

The structure of this repository is laid out in the following manner

  • app - Sample Java application to deploy to Cloud Run
  • collector - Contains details for running the Open Telemetry Connector
  • gcp-infra - Pulumi project to create a new GCP project

How to Use this example

  • Follow the steps listed in the gcp-infra/readme.
  • Once the application has been deployed, you can access the application.

Access via private

By default, the application is not publicly visible. To access use the following command:

gcloud beta run services proxy temporal-metrics-sample --project <PROJECT_ID> --region <REGION>

Once the proxy is running, visit the application by navigating to [http://localhost:8080]

Make the Application Public

To access the application via its public URL, uncomment the last step in cloudbuild.yaml. Alternatively, you can run the following command:

gcloud run services set-iam-policy temporal-metrics-sample policy.yaml --region <REGION>

To retrieve the public URL run the following command:

gcloud run services describe temporal-metrics-sample --region <REGION> --format='value(status.url)'

Start the Workflow

Using the appropriate URL, (see the section above), navigate to application. You should see the following page:

Temporal Metrics Sample Application

Click on the Start the Metrics Worfklow

Start the Workflow

Enter some text for the input and click the Run Workflow button. This will start the workflow. After about 30 seconds, the workflow will complete:

Completed Workflow

The application purposefully fails the activity for a few times before completing so that there are interesting metrics.

Start the workflow a couple of more times to get a few more executions.

Open up the Temporal Cloud console, by navigating to https://cloud.temporal.io to view the progress of the workflows.

View Metrics in Google Cloud Monitoring

Open the Metrics Explorer in Google Cloud Console. In the Metric drop down, scroll down to Prometheus Target, Temporal and click on Prometheus/temporal_long_request_total/counter and click on the Apply button.

Prometheus/temporal_long_request_total/counter

Now in the time box near the upper right of the screen, click the down arrow and select Last 30 Minutes. You should see a graph that looks similar to this one:

Request Total Counter Graph

Feel free to experiment adding additional metrics.

About

Example temporal worker application deployed to Cloud Run

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages