Skip to content

jccatrinck/dataflow-cloud-sql-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dataflow-cloud-sql-python

Connecting to Cloud SQL from Dataflow in Python


Summary


Explanation

To make Cloud SQL available to the Dataflow pipeline we use a custom container image and instantiate Cloud SQL Proxy using Unix Sockets to connect to a production database instance without need for a public insecure IP or deal with SSL certificates.


Quick-start

The only configuration needed is to create the .env file.

There is a .env-example that is ready to use, you can just rename it.

First use make docker-push to build and push a custom container image to GCP.

Later use make deploy to deploy the pipeline to a dataflow template in Cloud Storage:

> make docker-push
> make deploy

To see all commands available:

> make help

Structure

├── devops          // Custom container and entrypoint script
├── sql             // Postgres SQL static files
├── transformations // Apache Beam PTransform and DoFn
├── main.py         // The pipeline itself
├── Makefile        // Commands to deploy the pipeline to GCP
└── setup.py        // Python package for Dataflow remote workers (--setup_file option)

The entrypoint.sh script was build following these instructions Run multiple services in a container using Modifying the container entrypoint as boilerplate


Example

This pipeline in Dataflow:

image


About

Connecting to Cloud SQL from Dataflow/Apache Beam in Python

Resources

License

Stars

Watchers

Forks