Skip to content

Guide for analysts

Rostyslav Zatserkovnyi edited this page Nov 9, 2018 · 16 revisions

User Quickstart

Assuming you are running on 64bit:

# Linux
$host wget http://dl.bintray.com/snowplow/snowplow-generic/sql_runner_0.5.2_linux_amd64.zip
$host unzip sql_runner_0.5.2_linux_amd64.zip

# Windows
C:\> Invoke-WebRequest -OutFile sql_runner_0.5.2_windows_amd64.zip http://dl.bintray.com/snowplow/snowplow-generic/sql_runner_0.5.2_windows_amd64.zip

# macOS
$host wget http://dl.bintray.com/snowplow/snowplow-generic/sql_runner_0.5.2_darwin_amd64.zip
$host unzip sql_runner_0.5.2_darwin_amd64.zip

See the User Guide section below for more.

User guide

CLI Arguments

There are several command line arguments that can be used:

  • -playbook : This is a required argument and should point to the playbook you wish to run.
  • -fromStep : Optional argument which will allow you to start the sql-runner from any step in your playbook.
  • -sqlroot : Optional argument to change where we look for the sql statements to run, defaults to the directory of your playbook.
  • -var : Optional argument which allows you to pass a dictionary of key-value pairs which will be used to flesh out your templates.
  • -consul : Optional argument which allows you to fetch playbooks and SQL files from a Consul server.
  • -dryRun : Optional argument which allows you to run through your playbook without executing any SQL against your target(s)
  • -runQuery : Optional argument which allows you to run an individual query in the playbook and nothing else
  • -showQueryOutput : Optional argument which displays all query output in the console
  • -fillTemplates : Optional argument that will show all SQL templates with variables filled in
  • -lock : Optional argument to create a lock on local or Consul to prevent concurrent runs; persists on failure
  • -softLock : Same as the above but will not persist on failure
  • -checkLock : Allows you to check whether a lock is present
  • -deleteLock : Allows you to delete a lock if present
  • -consulOnlyForLock : Will use Consul only for locking, and read playbooks locally

More on Consul

Using the -consul argument results in the following changes:

  • The -playbook argument becomes the key that is used to look for the playbook in Consul.
  • The -sqlroot argument also becomes a key argument for Consul.
  • The -lock argument creates a lock as a Consul key value pair
  • The -softLock argument creates a lock as a Consul key value pair
  • The -checkLock argument searches in Consul for a lock
  • The -deleteLock argument searches in Consul for a lock

If you pass in the default:

  • ./sql-runner -consul "localhost:8500" -playbook "sql-runner/playbook/1"

This results in:

  • Looking for your playbook file at this key sql-runner/playbook/1
  • Expecting all your SQL file keys to begin with sql-runner/playbook/<SQL path from playbook>

However as the key here can be used as a both a data and folder node we have added a new sqlroot option:

  • ./sql-runner -consul "localhost:8500" -playbook "sql-runner/playbook/1" -sqlroot PLAYBOOK_CHILD

This results in:

  • Looking for your playbook file at this key sql-runner/playbook/1
  • Expecting all your SQL file keys to begin with sql-runner/playbook/1/<SQL path from playbook>
    • The data node is used as a folder node as well.

Playbooks

A playbook consists of one of more steps, each of which consists of one or more queries. Steps are run in series, queries are run in parallel within the step.

Each query contains the path to a query file. See Query files for details.

All steps are applied against all targets. All targets are processed in parallel.

In the following example, a.sql, b.sql and c.sql are run in parallel.

:steps:
  - :name: "Run a,b and c in parallel"
    :queries:
      - :name: a
        :file: a.sql
      - :name: b
        :file: b.sql
      - :name: c
        :file: c.sql

By contrast, in the example below, the three SQL files are executed in sequence.

:steps:
  - :name: "Run a..."
    :queries:
      - :name: a
        :file: a.sql
  - :name: "...then run b..."
    :queries:
      - :name: b
        :file: b.sql
  - :name: "...then run c..."
    :queries:
      - :name: c
        :file: c.sql

Playbooks can be templated, and corresponding variables can be passed in with the var flag like this: sql-runner -var host=value,username=value2,password=value3.

Here is the corresponding playbook template:

:targets:
  - :name: "My Postgres database 1"
    :type: postgres
    :host: {{.host}}
    :database: sql_runner_tests_1
    :port: 5432
    :username: {{.username}}
    :password: {{.password}}
    :ssl: false # SSL disabled by default
:variables:
  :test_schema: sql_runner_tests
  :timeFormat: "2006_01_02"
:steps:
  - :name: Create schema and table
    :queries:
      - :name: Create schema and table
        :file: postgres-sql/good/1.sql
        :template: true

Query files

A query file contains one or more SQL statements. These are executed "raw" (i.e. not in a transaction) in series by SQL Runner.

If the query file is flagged as a template in the playbook, then the file is pre-processed as a template before being executed. See Templates for details

Note: If your query is a template that requires pre-processing, you must add template: true to the query definition in the playbook yml file, see below for example

:name: "Run a.."
    :queries:
      - :name: a
        :file: a.sql
        :template: true

Templates

Templates are run through Golang's text template processor. The template processor can access all variables defined in the playbook.

The following custom functions are also supported:

  • nowWithFormat [timeFormat]: where timeFormat is a valid Golang time format
  • systemEnv "ENV_VAR": where ENV_VAR is a key for a valid environment variable
  • awsEnvCredentials: supports passing credentials through environment variables, such as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • awsProfileCredentials: supports getting credentials from a credentials file, also used by boto/awscli
  • awsEC2RoleCredentials: supports getting role-based credentials, i.e. getting the automatically generated credentials in EC2 instances
  • awsChainCredentials: tries to get credentials from each of the three methods above in order, using the first one returned
  • randomInt: will return a random integer

Note: All AWS functions output strings in the Redshift credentials format (CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s').

For an example query file using templating see: integration/resources/postgres-sql/good/3.sql

Failure modes

If a statement fails in a query file, the query will terminate and report failure.

If a query fails, its sibling queries will continue running, but no further steps will run.

Failures in one target do not affect other targets in any way.

Return codes

- 0 for no errors
- 5 for target initialization errors
- 6 for query errors
- 7 for both types of error
- 8 for no queries run

Target configuration

BigQuery

To access a BigQuery project, sql-runner will need some Google credentials. These can be set up by creating a new service account in the GCP console, then providing its private key to the application via a GOOGLE_APPLICATION_CREDENTIALS environment variable - a detailed walkthrough of this process is available on the GCP documentation website.

After the credentials are set up, simply create a playbook with the following BigQuery-specific target configuration:

targets:
  - name: "My BigQuery database"
    type: bigquery
    project: ADD HERE # Project ID as shown in the GCP console's front page
steps:
    ...

That's it - you're now ready to start running SQL against BigQuery!

You can’t perform that action at this time.