Skip to content

Latest commit

 

History

History
1051 lines (668 loc) · 26.4 KB

gcloud.rst

File metadata and controls

1051 lines (668 loc) · 26.4 KB

Gcloud

Contains drivers that interactive with gcloud assets

Installation

Include the following section in your init.yaml under repos section

- repo: https://github.com/honeydipper/honeydipper-config-essentials
  branch: main
  path: /gcloud

Drivers

This repo provides following drivers

gcloud-dataflow

This driver enables Honeydipper to run dataflow jobs

Action: createJob

creating a dataflow job using a template

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

location

The region where the dataflow job to be created

job

The specification of the job see gcloud dataflow API reference CreateJobFromTemplateRequest for detail

Returns

job

The job object, see gcloud dataflow API reference Job for detail

See below for a simple example

---
workflows:
  start_dataflow_job:
    call_driver: gcloud-dataflow.createJob
    with:
      service_account: ...masked...
      project: foo
      location: us-west1
      job:
        gcsPath: ...
        ...

Action: updateJob

updating a job including draining or cancelling

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

location

The region where the dataflow job to be created

jobSpec

The updated specification of the job see gcloud dataflow API reference Job for detail

jobID

The ID of the dataflow job

Returns

job

The job object, see gcloud dataflow API reference Job for detail

See below for a simple example of draining a job

---
workflows:
  find_and_drain_dataflow_job:
    with:
      service_account: ...masked...
      project: foo
      location: us-west1
    steps:
      - call_driver: gcloud-dataflow.findJobByName
        with:
          name: bar
      - call_driver: gcloud-dataflow.updateJob
        with:
          jobID: $data.job.Id
          jobSpec:
            currentState: JOB_STATE_DRAINING
      - call_driver: gcloud-dataflow.waitForJob
        with:
          jobID: $data.job.Id

Action: waitForJob

This action will block until the dataflow job is in a terminal state.

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

location

The region where the dataflow job to be created

jobID

The ID of the dataflow job

interval

The interval between polling calls go gcloud API, 15 seconds by default

timeout

The total time to wait until the job is in terminal state, 1800 seconds by default

Returns

job

The job object, see gcloud dataflow API reference Job for detail

See below for a simple example

---
workflows:
  run_dataflow_job:
    with:
      service_account: ...masked...
      project: foo
      location: us-west1
    steps:
      - call_driver: gcloud-dataflow.createJob
        with:
          job:
            gcsPath: ...
            ...
      - call_driver: gcloud-dataflow.waitForJob
        with:
          interval: 60
          timeout: 600
          jobID: $data.job.Id

Action: findJobByName

This action will find an active job by its name

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

location

The region where the dataflow job to be created

name

The name of the job to look for

Returns

job

A partial job object, see gcloud dataflow API reference Job for detail, only Id, Name and CurrentState fields are populated

See below for a simple example

---
workflows:
  find_and_wait_dataflow_job:
    with:
      service_account: ...masked...
      project: foo
      location: us-west1
    steps:
      - call_driver: gcloud-dataflow.findJobByName
        with:
          name: bar
      - call_driver: gcloud-dataflow.waitForJob
        with:
          jobID: $data.job.Id

Action: waitForJob

This action will block until the dataflow job is in a terminal state.

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

location

The region where the dataflow job to be created

jobID

The ID of the dataflow job

interval

The interval between polling calls go gcloud API, 15 seconds by default

timeout

The total time to wait until the job is in terminal state, 1800 seconds by default

Returns

job

The job object, see gcloud dataflow API reference Job for detail

See below for a simple example

---
workflows:
  wait_for_dataflow_job:
    with:
      service_account: ...masked...
      project: foo
      location: us-west1
    steps:
      - call_driver: gcloud-dataflow.createJob
        with:
          job:
            gcsPath: ...
            ...
      - call_driver: gcloud-dataflow.waitForJob
        with:
          interval: 60
          timeout: 600
          jobID: $data.job.Id

Action: getJob

This action will get the current status of the dataflow job

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

location

The region where the dataflow job to be created

jobID

The ID of the dataflow job

Returns

job

The job object, see gcloud dataflow API reference Job for detail

See below for a simple example

---
workflows:
  query_dataflow_job:
    with:
      service_account: ...masked...
      project: foo
      location: us-west1
    steps:
      - call_driver: gcloud-dataflow.createJob
        with:
          job:
            gcsPath: ...
            ...
      - call_driver: gcloud-dataflow.getJob
        with:
          jobID: $data.job.Id

gcloud-gke

This driver enables Honeydipper to interact with GKE clusters.

Honeydipper interact with k8s clusters through kubernetes driver. However, the kubernetes driver needs to obtain kubeconfig information such as credentials, certs, API endpoints etc. This is achieved through making a RPC call to k8s type drivers. This driver is one of the k8s type driver.

RPC: getKubeCfg

Fetch kubeconfig information using the vendor specific credentials

Parameters

service_account

Service account key stored as bytes

project

The name of the project the cluster belongs to

location

The location of the cluster

regional

Boolean, true for regional cluster, otherwise zone'al cluster

cluster

The name of the cluster

Returns

Host

The endpoint API host

Token

The access token used for k8s authentication

CACert

The CA cert used for k8s authentication

See below for an example usage on invoking the RPC from k8s driver

func getGKEConfig(cfg map[string]interface{}) *rest.Config {
  retbytes, err := driver.RPCCall("driver:gcloud-gke", "getKubeCfg", cfg)
  if err != nil {
    log.Panicf("[%s] failed call gcloud to get kubeconfig %+v", driver.Service, err)
  }

  ret := dipper.DeserializeContent(retbytes)

  host, _ := dipper.GetMapDataStr(ret, "Host")
  token, _ := dipper.GetMapDataStr(ret, "Token")
  cacert, _ := dipper.GetMapDataStr(ret, "CACert")

  cadata, _ := base64.StdEncoding.DecodeString(cacert)

  k8cfg := &rest.Config{
    Host:        host,
    BearerToken: token,
  }
  k8cfg.CAData = cadata

  return k8cfg
}

To configure a kubernetes cluster in Honeydipper configuration yaml DipperCL

---
systems:
  my-gke-cluster:
    extends:
      - kubernetes
    data:
      source:  # all parameters to the RPC here
        type: gcloud-gke
        service_account: ...masked...
        project: foo
        location: us-central1-a
        cluster: my-gke-cluster

Or, you can share some of the fields by abstracting

---
systems:
  my-gke:
    data:
      source:
        type: gcloud-gke
        service_account: ...masked...
        project: foo

  my-cluster:
    extends:
      - kubernetes
      - my-gke
    data:
      source:  # parameters to the RPC here
        location: us-central1-a
        cluster: my-gke-cluster

gcloud-kms

This driver enables Honeydipper to interact with gcloud KMS to descrypt configurations.

In order to be able to store sensitive configurations encrypted at rest, Honeydipper needs to be able to decrypt the content. DipperCL uses e-yaml style notion to store the encrypted content, the type of the encryption and the payload/parameter is enclosed by the square bracket []. For example.

mydata: ENC[gcloud-kms,...base64 encoded ciphertext...]

Configurations

keyname

The key in KMS key ring used for decryption. e.g. projects/myproject/locations/us-central1/keyRings/myring/cryptoKeys/mykey

RPC: decrypt

Decrypt the given payload

Parameters

*

The whole payload is used as a byte array of ciphertext

Returns

*

The whole payload is a byte array of plaintext

See below for an example usage on invoking the RPC from another driver

retbytes, err := driver.RPCCallRaw("driver:gcloud-kms", "decrypt", cipherbytes)

gcloud-logging

This driver enables Honeydipper to natively send logs to GCP

Configurations

loggers

Mapping loggers to their configurations, if needed; support only one field in the configuration as of now, service_account. Loggers without configurations will attempt to use default account.

For example

---
drivers:
  gcloud-logging:
    loggers:
      my_log:
        service_account: LOOKUP[gcloud-secret,...secret...]
      "my_project|my_log":
        service_account: LOOKUP[gcloud-secret,...secret...]

Action: log

send logs to GCP

Parameters

severity

The serverity of the log entry, default to info

logger

The logger path, in the form of project|logger_nbame, or logger_name if using GCE metadata for project ID.

payload

The payload of the log entry, string, struct or a map

See below for a simple example

...
  call_driver: gcloud-logging.log
  with:
    severity: info
    logger: my_log
    payload:
      country: US
      state: CA
      city: Los Angeles
      sales: 12000

gcloud-pubsub

This driver enables Honeydipper to receive and consume gcloud pubsub events

Configurations

service_account

The gcloud service account key (json) in bytes. This service account needs to have proper permissions to subscribe to the topics.

For example

---
drivers:
  gcloud-pubsub:
    service-account: ENC[gcloud-gke,...masked...]

Event: <default>

An pub/sub message is received

Returns

project

The gcloud project to which the pub/sub topic belongs to

subscriptionName

The name of the subscription

text

The payload of the message, if not json

json

The payload parsed into as a json object

See below for an example usage

---
rules:
  - when:
      driver: gcloud-pubsub
      if_match:
        project: foo
        subscriptionName: mysub
        json:
          datakey: hello
    do:
      call_workflow: something

gcloud-secret

This driver enables Honeydipper to fetch items stored in Google Secret Manager.

With access to Google Secret Manager, Honeydipper doesn't have to rely on cipher texts stored directly into the configurations in the repo. Instead, it can query the Google Secret Manager, and get access to the secrets based on the permissions granted to the identity it uses. DipperCL uses a keyword interpolation to detect the items that need to be looked up using LOOKUP[<driver>,<key>]. See blow for example.

mydata: LOOKUP[gcloud-secret,projects/foo/secrets/bar/versions/latest]

As of now, the driver doesn't take any configuration other than the generic api_timeout. It uses the default service account as its identity.

RPC: lookup

Lookup a secret in Google Secret Manager

Parameters

*

The whole payload is used as a byte array of string for the key

Returns

*

The whole payload is a byte array of plaintext

See below for an example usage on invoking the RPC from another driver

retbytes, err := driver.RPCCallRaw("driver:gcloud-secret", "lookup", []byte("projects/foo/secrets/bar/versions/latest"))

gcloud-spanner

This driver enables Honeydipper to perform administrative tasks on spanner databases

You can create systems to ease the use of this driver.

for example

---
systems:
  my_spanner_db:
    data:
      serivce_account: ENC[...]
      project: foo
      instance: dbinstance
      db: bar
    functions:
      start_backup:
        driver: gcloud-spanner
        rawAction: backup
        parameters:
          service_account: $sysData.service_account
          project: $sysData.foo
          instance: $sysData.dbinstance
          db: $sysData.db
          expires: $?ctx.expires
        export_on_success:
          backupOpID: $data.backupOpID
      wait_for_backup:
        driver: gcloud-spanner
        rawAction: waitForBackup
        parameters:
          backupOpID: $ctx.backupOpID
        export_on_success:
          backup: $data.backup

Now we can just easily call the system function like below

---
workflows:
  create_spanner_backup:
    steps:
      - call_function: my_spanner_db.start_backup
      - call_function: my_spanner_db.wait_for_backup

Action: backup

creating a native backup of the specified database

Parameters

service_account

A gcloud service account key (json) stored as byte array

project

The name of the project where the dataflow job to be created

instance

The spanner instance of the database

db

The name of the database

expires

Optional, defaults to 180 days, the duration after which the backup will expire and be removed. It should be in the format supported by time.ParseDuration. See the document for detail.

Returns

backupOpID

A Honeydipper generated identifier for the backup operation used for getting the operation status

See below for a simple example

---
workflows:
  start_spanner_native_backup:
    call_driver: gcloud-spanner.backup
    with:
      service_account: ...masked...
      project: foo
      instance: dbinstance
      db: bar
      expires: 2160h
      # 24h * 90 = 2160h
    export_on_success:
      backupOpID: $data.backupOpID

Action: waitForBackup

wait for backup and return the backup status

Parameters

backupOpID

The Honeydipper generated identifier by backup function call

Returns

backup

The backup object returned by API. See databasepb.Backup for detail

See below for a simple example

---
workflows:
  wait_spanner_native_backup:
    call_driver: gcloud-spanner.waitForbackup
    with:
      backupOpID: $ctx.backupOpID

Systems

dataflow

This system provides a few functions to interact with Google dataflow jobs.

Configurations

service_accounts.dataflow

The service account json key used to access the dataflow API, optional

locations.dataflow

The default location to be used for new dataflow jobs, if missing will use .sysData.locations.default. And, can be overriden using .ctx.location

subnetworks.dataflow

The default subnetwork to be used for new dataflow jobs, if missing will use .sysData.subnetworks.default. And, can be overriden using .ctx.subnetwork

project

default project used to access the dataflow API if .ctx.project is not provided, optional

The system can share data with a common configuration Google Cloud system that contains the configuration.

For example

---
systems:
  dataflow:
    extends:
      - gcloud-config
  gcloud-config:
    project: my-gcp-project
    locations:
      default: us-central1
    subnetworks:
      default: default
    service_accounts:
      dataflow: ENC[gcloud-kms,xxxxxxx]

Function: createJob

Creates a dataflow job using a template.

Input Contexts

project

Optional, in which project the job is created, defaults to the project defined with the system

location

Optional, the location for the job, defaults to the system configuration

subnetwork

Optional, the subnetwork for the job, defaults to the system configuration

job

Required, the data structure describe the CreateJobFromTemplateRequest, see the API document for details.

Export Contexts

job

The job object, details here

For example

call_function: dataflow.createJob
with:
  job:
    gcsPath: gs://dataflow-templates/Cloud_Spanner_to_GCS_Avro
    jobName: export-a-spanner-DB-to-gcs
    parameters:
      instanceId: my-spanner-instance
      databaseId: my-spanner-db
      outputDir: gs://my_spanner_export_bucket

Function: findJob

Find an active job with the given name pattern

Input Contexts

project

Optional, in which project the job is created, defaults to the project defined with the system

location

Optional, the location for the job, defaults to the system configuration

jobNamePattern

Required, a regex pattern used for match the job name

Export Contexts

job

The first active matching job object, details here

For example

steps:
  - call_function: dataflow.findJob
    with:
      jobNamePattern: ^export-a-spanner-DB-to-gcs$
  - call_function: dataflow.getStatus

Function: getStatus

Wait for the dataflow job to complete and return the status of the job.

Input Contexts

project

Optional, in which project the job is created, defaults to the project defined with the system

location

Optional, the location for the job, defaults to the system configuration

job

Optional, the data structure describe the Job, see the API document for details, if not specified, will use the dataflow job information from previous createJob call.

timeout

Optional, if the job doesn't complete within the timeout, report error, defaults to 1800 seconds

interval

Optional, polling interval, defaults to 15 seconds

Export Contexts

job

The job object, details here

For example

steps:
  - call_function: dataflow.createJob
    with:
      job:
        gcsPath: gs://dataflow-templates//Cloud_Spanner_to_GCS_Avro
        jobName: export-a-spanner-DB-to-gcs
        parameters:
          instanceId: my-spanner-instance
          databaseId: my-spanner-db
          outputDir: gs://my_spanner_export_bucket
  - call_function: dataflow.getStatus

Function: updateJob

Update a running dataflow job

Input Contexts

project

Optional, in which project the job is created, defaults to the project defined with the system

location

Optional, the location for the job, defaults to the system configuration

jobSpec

Required, a job object with a id and the fields for updating.

For example

steps:
  - call_function: dataflow.findJob
    with:
      jobNamePattern: ^export-a-spanner-DB-to-gcs$
  - call_function: dataflow.updateJob
    with:
      jobSpec:
        requestedState: JOB_STATE_DRAINING
  - call_function: dataflow.getStatus

kubernetes

No description is available for this entry!

Workflows

cancelDataflowJob

Cancel an active dataflow job, and wait for the job to quit.

Input Contexts

system

The dataflow system used for draining the job

job

Required, a job object returned from previous findJob or getStatus functions, details here

cancelling_timeout

Optional, time in seconds for waiting for the job to quit, default 1800

Export Contexts

job

The updated job object, details here

reason

If the job fails, the reason for the failure as reported by the API.

For example

---
rules:
  - when:
      source:
        system: webhook
        trigger: request
    do:
      steps:
        - call_function: dataflow-sandbox.findJob
          with:
            jobNamePatttern: ^my-job-[0-9-]*$
        - call_workflow: cancelDataflowJob
          with:
            system: dataflow-sandbox
            # job object is automatically exported from previous step

drainDataflowJob

Draining an active dataflow job, including finding the job with a regex name pattern, requesting draining and waiting for the job to complete.

Input Contexts

system

The dataflow system used for draining the job

jobNamePattern

Required, a regex pattern used for match the job name

draining_timeout

Optional, draining timeout in seconds, default 1800

no_cancelling

Optional, unless specified, the job will be cancelled after draining timeout

cancelling_timeout

Optional, time in seconds for waiting for the job to quit, default 1800

Export Contexts

job

The job object, details here

reason

If the job fails, the reason for the failure as reported by the API.

For example

---
rules:
  - when:
      source:
        system: webhook
        trigger: request
    do:
      call_workflow: drainDataflowJob
      with:
        system: dataflow-sandbox
        jobNamePatttern: ^my-job-[0-9-]*$

use_gcloud_kubeconfig

This workflow will add a step into steps context variable so the following run_kubernetes workflow can use kubectl with gcloud service account credential

Input Contexts

cluster

A object with cluster field and optionally, project, zone, and region fields

The workflow will add a step to run gcloud container clusters get-credentials to populate the kubeconfig file.

---
workflows:
  run_gke_job:
    steps:
      - call_workflow: use_google_credentials
      - call_workflow: use_gcloud_kubeconfig
        with:
          cluster:
            cluster: my-cluster
      - call_workflow: run_kubernetes
        with:
          steps+:
            - type: gcloud
              shell: kubectl get deployments

use_google_credentials

This workflow will add a step into steps context variable so the following run_kubernetes workflow can use default google credentials, specify a credential through a k8s secret, or use berglas to fetch the credentials at runtime(recommended).

Important

It is recommended to always use this with run_kubernetes workflow if gcloud steps are used

Input Contexts

google_credentials_secret

The name of the k8s secret storing the service account key

google_credentials_berglas_secret

The name of the secret to be fetched using berglas, if no secrets is specified, use default service account

For example

---
workflows:
  run_gke_job:
    steps:
      - call_workflow: use_google_credentials
        with:
          google_credentials_secret: my_k8s_secret
      - call_workflow: run_kubernetes
        with:
          # notice we use append modifier here ("+") so
          # steps pre-configured through :code:`use_google_credentials`
          # wont be overwritten.
          steps+:
            - type: gcloud
              shell: gcloud compute disks list

An example with berglas secret

---
workflows:
  run_my_job:
    with:
      google_credentials_berglas_secret: sm://my-gcp-project/my-service-account-secret
    steps:
      - call_workflow: use_google_credentials
      - call_workflow: run_kubernetes
        with:
          steps+:
            - type: gcloud
              shell: gcloud compute disks list
      - call_workflow: use_google_credentials
      - call_workflow: run_kubernetes
        with:
          steps+:
            - type: gcloud
              shell: gsutil ls gs://mybucket