## Manage Dataproc Workflows using gcloud Commands
Let us see how we can manage Dataproc Workflows using gcloud commands.
* Step 1: Create Dataproc Workflow Template
* Step 2: Configure active Dataproc cluster (we can also configure new cluster)
* Step 3: Add Spark SQL or Pyspark Jobs to Dataproc Workflow Templates with Dependencies
* Step 4: Run and Validate the Dataproc Workflow Template

We can take care of all the steps using `gcloud` commands.

In [None]:
!gcloud config set dataproc/region us-central1

In [None]:
!gcloud dataproc workflow-templates

In [None]:
!gcloud dataproc workflow-templates list

Here is the command to delete Dataproc Workflow Template (multiline approach doesn't work on Windows)

```shell
gcloud dataproc workflow-templates \
    delete wf-daily-product-revenue
```

In [None]:
!gcloud dataproc workflow-templates delete wf-daily-product-revenue --quiet

Here is the command to create Dataproc Workflow.

```shell
gcloud dataproc workflow-templates \
    create wf-daily-product-revenue
```

In [None]:
!gcloud dataproc workflow-templates create

In [None]:
!gcloud dataproc workflow-templates create wf-daily-product-revenue

In [None]:
!gcloud dataproc workflow-templates list

In [None]:
!gcloud dataproc workflow-templates 

In [None]:
!gcloud dataproc workflow-templates set-cluster-selector

Here is the command to attach running or active Dataproc Cluster to the Dataproc Workflow. We need to specify the label for the cluster.

```shell
gcloud dataproc workflow-templates \
    set-cluster-selector \
    wf-daily-product-revenue \
    --cluster-labels goog-dataproc-cluster-name=aidataprocdev
```

In [None]:
!gcloud dataproc workflow-templates set-cluster-selector wf-daily-product-revenue --cluster-labels goog-dataproc-cluster-name=aidataprocdev

In [None]:
!gcloud dataproc workflow-templates add-job

In [None]:
!gcloud dataproc workflow-templates add-job spark-sql

* The command `gcloud dataproc workflow-templates add-job` is similar to `gcloud dataproc jobs submit`. Here are the examples for submitting jobs using `gcloud dataproc jobs submit`.

```shell
# Without parameters
gcloud dataproc jobs submit \
    spark-sql --cluster=aidataprocdev \
    -f gs://airetail/scripts/daily_product_revenue/cleanup.sql

# With parameters
gcloud dataproc jobs submit \
    spark-sql --cluster=aidataprocdev \
    -f gs://airetail/scripts/daily_product_revenue/file_format_converter.sql \
    --params=bucket_name=gs://airetail,table_name=orders
```


Here are the commands to add Spark SQL Jobs to the Dataproc Workflow.

```shell
gcloud dataproc workflow-templates add-job spark-sql \
    --step-id=job-cleanup \
    --file=gs://airetail/scripts/daily_product_revenue/cleanup.sql \
    --workflow-template=wf-daily-product-revenue

# File Format Converter jobs with dependency on cleanup
gcloud dataproc workflow-templates add-job spark-sql \
    --step-id=job-convert-orders \
    --file=gs://airetail/scripts/daily_product_revenue/file_format_converter.sql \
    --params=bucket_name=gs://airetail,table_name=orders \
    --workflow-template=wf-daily-product-revenue \
    --start-after=job-cleanup

gcloud dataproc workflow-templates add-job spark-sql \
    --step-id=job-convert-order-items \
    --file=gs://airetail/scripts/daily_product_revenue/file_format_converter.sql \
    --params=bucket_name=gs://airetail,table_name=order_items \
    --workflow-template=wf-daily-product-revenue \
    --start-after=job-cleanup

# Last Job which depends on convert orders and order_items jobs
gcloud dataproc workflow-templates add-job spark-sql \
    --step-id=job-daily-product-revenue \
    --file=gs://airetail/scripts/daily_product_revenue/compute_daily_product_revenue.sql \
    --params=bucket_name=gs://airetail \
    --workflow-template=wf-daily-product-revenue \
    --start-after=job-convert-orders,job-convert-order-items
```

In [None]:
!gcloud dataproc workflow-templates add-job spark-sql --step-id=job-cleanup --file=gs://airetail/scripts/daily_product_revenue/cleanup.sql --workflow-template=wf-daily-product-revenue

In [None]:

!gcloud dataproc workflow-templates add-job spark-sql --step-id=job-convert-orders --file=gs://airetail/scripts/daily_product_revenue/file_format_converter.sql --params=bucket_name=gs://airetail,table_name=orders --workflow-template=wf-daily-product-revenue --start-after=job-cleanup

In [None]:
!gcloud dataproc workflow-templates add-job spark-sql --step-id=job-convert-order-items --file=gs://airetail/scripts/daily_product_revenue/file_format_converter.sql --params=bucket_name=gs://airetail,table_name=order_items --workflow-template=wf-daily-product-revenue --start-after=job-cleanup

In [None]:
!gcloud dataproc workflow-templates add-job spark-sql --step-id=job-daily-product-revenue --file=gs://airetail/scripts/daily_product_revenue/compute_daily_product_revenue.sql --params=bucket_name=gs://airetail --workflow-template=wf-daily-product-revenue --start-after=job-convert-orders,job-convert-order-items

In [None]:
!gcloud dataproc workflow-templates list

In [None]:
!gcloud dataproc workflow-templates describe wf-daily-product-revenue

Here is the command to instantiate or run Dataproc Workflow.

```shell
gcloud dataproc workflow-templates \
    instantiate wf-daily-product-revenue
```

In [None]:
!gcloud dataproc workflow-templates

In [None]:
!gcloud dataproc workflow-templates instantiate

In [None]:
!gcloud dataproc workflow-templates instantiate-from-file

In [None]:
!gcloud dataproc workflow-templates instantiate-from-file --help

In [None]:
!gcloud dataproc workflow-templates export

In [None]:
!gcloud dataproc workflow-templates export wf-daily-product-revenue

In [None]:
# This will take some time to run

!gcloud dataproc workflow-templates instantiate wf-daily-product-revenue