Skip to content

Prefect Flow to move Posgres tables to s3 in parquet format

Notifications You must be signed in to change notification settings

szelenka/postgres-to-s3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

postgres-to-s3

This is a Prefect Flow designed to iteratively pull data from a Postgres server running on AWS RDS service, and push the data to S3 as partitioned parquet files.

DAG

DAG

Connection details

The connection details for both Postgres and s3 are stored in a Prefect Secret in the following format:

{
    "database_user": "...",
    "database_pass": "...",
    "database_host": "...rds.amazonaws.com",
    "database_port": "...",
    "database_name": "...",
    "database_schema": "...",
    "s3_access_key": "...",
    "s3_secret_key": "...",
    "s3_server": "...",
    "s3_bucket": "..."
}

Flow Parameters

You can adjust the following Prefect Parameters:

  • tables = List of strings containing table names you want to move
  • indexed_field = String indicating which field is indexed in all of the tables to be moved
  • starting_index = Integer indicating which index you want to start moving at
  • total_records_to_move = Integer indicating how many rows you want to include in this Flow Run
  • number_of_records_in_batch = Integer indicating when distributing the workload across multiple workers, how large should each batch be?

About

Prefect Flow to move Posgres tables to s3 in parquet format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages