beam-postgres

Light IO transforms for Postgres read/write in Apache Beam pipelines.

Goal

The project aims to provide highly performant and customizable transforms and is not intended to support many different SQL database engines.

Features

ReadAllFromPostgres, ReadFromPostgres`` and WriteToPostgres` transforms
Records can be mapped to tuples, dictionaries or dataclasses
Reads and writes are in configurable batches

Usage

Printing data from the database table:

import apache_beam as beam
from psycopg.rows import dict_row

from beam_postgres.io import ReadAllFromPostgres

with beam.Pipeline() as p:
    data = p | "Reading example records from database" >> ReadAllFromPostgres(
        "host=localhost dbname=examples user=postgres password=postgres",
        "select id, data from source",
        dict_row,
    )
    data | "Writing to stdout" >> beam.Map(print)

Writing data to the database table:

from dataclasses import dataclass

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

from beam_postgres.io import WriteToPostgres


@dataclass
class Example:
    data: str


with beam.Pipeline(options=PipelineOptions()) as p:
    data = p | "Reading example records" >> beam.Create(
        [
            Example("example1"),
            Example("example2"),
        ]
    )
    data | "Writing example records to database" >> WriteToPostgres(
        "host=localhost dbname=examples user=postgres password=postgres",
        "insert into sink (data) values (%(data)s)",
    )

See here for more examples.

Reading in batches

There may be situations when you have so much data that it will not fit into the memory - then you want to read your table data in batches. You can see an example code here (the code reads records in a batches of 1).

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
beam_postgres		beam_postgres
examples		examples
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

beam-postgres

Goal

Features

Usage

Reading in batches

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

medzin/beam-postgres

Folders and files

Latest commit

History

Repository files navigation

beam-postgres

Goal

Features

Usage

Reading in batches

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages