# Dataform CLI on Cloud Run

Use IAM to give Storage Object Admin role to the default Cloud Build principal.

If you don't have npm installed, install it:

In [None]:
!sudo apt install nodejs npm

Install the Dataform CLI and initialize:

In [None]:
!npm i -g @dataform/cli

Enable APIs that we will need:

In [None]:
!gcloud services enable artifactregistry.googleapis.com
!gcloud services enable cloudbuild.googleapis.com
!gcloud services enable datacatalog.googleapis.com
!gcloud services enable datalineage.googleapis.com
!gcloud services enable run.googleapis.com

Create a dataform settings config file, just so that the next `dataform init` call does not ask for command line input:

In [None]:
!mkdir -p /home/jupyter/.dataform/

In [None]:
%%writefile /home/jupyter/.dataform/settings.json
{
    "allowAnonymousAnalytics": false
}


Initialize a dataform project:

In [None]:
DATAFORM_DIR = 'dataform_proj_dir'

In [None]:
%%bash
export REGION=europe-west4
export PROJECT_ID=$(gcloud config get project)
export DATAFORM_DIR=dataform_proj_dir

dataform init bigquery $DATAFORM_DIR --default-database $PROJECT_ID --default-location $REGION

cat << EOF > $DATAFORM_DIR/.df-credentials.json
{
    "projectId": "${PROJECT_ID}",
    "location": "${REGION}"
}
EOF

In [None]:
!bq --location=$REGION mk --dataset ${PROJECT_ID}.prod_raw
!bq --location=$REGION mk --dataset ${PROJECT_ID}.dataform

In [None]:
!bq load --source_format=PARQUET prod_raw.sales_data data/sales.parquet

In [None]:
!mkdir -p $DATAFORM_DIR/definitions/sources

In [None]:
%%writefile $DATAFORM_DIR/definitions/sources/sales.sqlx

config {
    type: "declaration",
    schema: "prod_raw",
    name: "sales_data",
    description: "Ingested sales data"
}

In [None]:
!mkdir -p $DATAFORM_DIR/definitions/sales_data_aggregated

In [None]:
%%writefile $DATAFORM_DIR/definitions/sales_data_aggregated/sales_data_agg.sqlx
config {
    type: "table"
}

WITH daily_orders AS (
SELECT
  DATE(orderdate) AS order_date, 
  PRODUCTLINE AS product_line,
  ROUND(SUM(SALES), 1) AS sales_value
FROM
  ${ref("sales_data")}
WHERE
  STATUS = "Shipped"
GROUP BY
  1,
  2)
SELECT order_date, product_line, sales_value, 
ROUND(SUM(sales_value) OVER (ORDER BY DATE(order_date) ROWS BETWEEN 7 PRECEDING AND CURRENT ROW  ), 1) AS rolling_average
FROM daily_orders
ORDER BY 1 DESC

In [None]:
!dataform run $DATAFORM_DIR

In [None]:
!gcloud config set artifacts/location europe-west4

In [None]:
!gcloud artifacts repositories create dataform --repository-format=docker

In [None]:
%%writefile $DATAFORM_DIR/Dockerfile
FROM dataformco/dataform

# Set working directory
ENV DATAFORM_DIR /dataform/
WORKDIR $DATAFORM_DIR

# Copy files to the image
COPY . $DATAFORM_DIR

# Install the latest npm dependencies
RUN npm install

# Run the application
ENTRYPOINT ["dataform", "run"]

In [None]:
%%writefile $DATAFORM_DIR/cloudbuild.yaml
steps:
- name: gcr.io/cloud-builders/docker
  id: Build Dataform image
  env: 
    - 'DOCKER_BUILDKIT=1'
  args: [
      'build',
      '-t', 'europe-west4-docker.pkg.dev/${PROJECT_ID}/dataform/dataform-demo',
      '--cache-from', 'europe-west4-docker.pkg.dev/${PROJECT_ID}/dataform/dataform-demo:latest',
      '.'
    ]

- name: gcr.io/cloud-builders/docker
  id: Push Dataform image to Artifact Registry
  args: [
      'push',
      'europe-west4-docker.pkg.dev/${PROJECT_ID}/dataform/dataform-demo:latest'
    ]

options:
  logging: CLOUD_LOGGING_ONLY

In [None]:
!gcloud builds submit dataform_proj_dir --config=dataform_proj_dir/cloudbuild.yaml --region=europe-west4

Now you can test it:
    
```
docker pull europe-west4-docker.pkg.dev/${PROJECT_ID}/dataform/dataform-demo:latest
docker run europe-west4-docker.pkg.dev/${PROJECT_ID}/dataform/dataform-demo:latest
```

The output should be:

```
Compiling...

Compiled successfully.

Running...

Dataset created:  dataform.sales_data_agg [table]
```


In [None]:
import os
PROJECT_ID = os.popen('gcloud config get project').read()[:-1]

In [None]:
!gcloud beta run jobs create dataform-demo --image europe-west4-docker.pkg.dev/$PROJECT_ID/dataform/dataform-demo:latest --region europe-west4

In [None]:
!gcloud beta run jobs execute dataform-demo --region europe-west4