Connecting to Cloud SQL from Dataflow in Python
To make Cloud SQL available to the Dataflow pipeline we use a custom container image and instantiate Cloud SQL Proxy using Unix Sockets to connect to a production database instance without need for a public insecure IP or deal with SSL certificates.
The only configuration needed is to create the .env
file.
There is a .env-example
that is ready to use, you can just rename it.
First use make docker-push
to build and push a custom container image to GCP.
Later use make deploy
to deploy the pipeline to a dataflow template in Cloud Storage:
> make docker-push
> make deploy
To see all commands available:
> make help
├── devops // Custom container and entrypoint script
├── sql // Postgres SQL static files
├── transformations // Apache Beam PTransform and DoFn
├── main.py // The pipeline itself
├── Makefile // Commands to deploy the pipeline to GCP
└── setup.py // Python package for Dataflow remote workers (--setup_file option)
The entrypoint.sh
script was build following these instructions Run multiple services in a container using Modifying the container entrypoint as boilerplate
This pipeline in Dataflow: