This repo contains hands-on-labs that cover serverless Spark on GCP powered by Cloud Dataproc, as part of the Serverless Spark Workshop.
The intended audience is Google Customer Engineers but anyone with access to GCP can try the lab modules just as well.
Run the setup in Argolis per instructions in [go/scw-tf]
(a) Just enough knowledge of serverless Spark on GCP powered by Cloud Dataproc to field customer conversations & questions,
(b) completed setup in Argolis for serverless Spark,
(c) demos and knowledge of how to run them and
(d) awareness of resources available for serverless Spark on GCP.
| # | Modules | Focus | Feature |
|---|---|---|---|
| 1 | Environment provisioning (go/scw-tf) | Environment Automation With Terraform | N/A |
| 2 | Lab 1 - Cell Tower Anomaly Detection | Data Engineering | Serverless Spark Batch from CLI & with Cloud Composer orchestration |
| 3 | Lab 2 - Wikipedia Page View Analysis | Data Analysis | Serverless Spark Batch from BigQuery UI |
| 4 | Lab 3 - Chicago Crimes Analysis | Data Analysis | Serverless Spark Interactive from Vertex AI managed notebook |
| N | Resources for Serverless Spark |
Shut down/delete resources when done
Some of the labs are developed by Tek Systems for Google, of are contributions by Googlers.
Community contribution to improve the labs or new labs are very much appreciated.