Skip to content
/ lard Public

An operator for performing the Extract and Load bit of the ELT data pipeline in an Apache Airflow workflow.

License

Notifications You must be signed in to change notification settings

osule/lard

Repository files navigation

LARD

Dictionary definition

insert strips of fat or bacon in (meat) before cooking.

Lard provides the capability to load data into Airflow in an idempotent manner.

How it works

LARD watermarks the loaded data in the following sequence:

  • deletes records with the given watermark existing in the target table.
  • loads data into a temporary staging table.
  • copies data from the temporary staging table to the target table.

Installation

# Upgrade pip if necessary
python -m pip install --upgrade pip

# Install package
pip install lard

Usage

The following is an example of a load operation of data from an S3 bucket location to the events table of the sample schema in the Redshift database.

import lard
from airflow import DAG

dag = DAG('sample_dag')
lard_task = lard.LoadOperator(
    'lard_events',
    dag=dag,
    conn_id='redshift',
    source=dict(
        location="s3://event-logs/{{ data_interval_start.strftime('%Y/%m/%d/%H') }}/",
        conn_id='s3_default'
    ),
    target_table=dict(
        name='sample.events'
    ),
    staging_table=dict(
        name='events_staging',
    ),
    watermark=dict(
        target_name='scheduled_at',
        source_value='{{ data_interval_end | ts }}'
    )
)

About

An operator for performing the Extract and Load bit of the ELT data pipeline in an Apache Airflow workflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages