Redshift Loader

A Python package for loading data into Redshift.

Installation

poetry add git+https://github.com/user/redshift-loader.git

or

pip install git+https://github.com/user/redshift-loader.git

Usage

Functions

`dataframe_to_redshift`

This function uploads a Pandas DataFrame to a Redshift table using the COPY command. The data is first uploaded temporarily to an S3 bucket and then copied into Redshift. It is important to ensure that your AWS setup has the necessary permissions:

The function needs permission to write data to the specified S3 bucket.
Redshift must be able to access the S3 bucket to perform the COPY operation.

After the data is copied to Redshift, the temporary file in S3 is deleted.

from redshift_loader.loader import dataframe_to_redshift

dataframe_to_redshift(
    df,
    s3_client,
    's3_copy_bucket',
    'redshift_host',
    'redshift_database',
    'redshift_username',
    'redshift_password',
    'redshift_schema',
    'redshift_table',
    'optional_iam_role_arn_for_copy',
    optional_columns_mapping_dict,
    optional_deduplication_columns_list,
    'optional_on_duplicate_action'
)

`s3_objects_to_redshift`

This function processes multiple objects (CSV or Excel files) stored in an S3 bucket and loads them into Redshift. The files are combined into a single DataFrame, which is then uploaded temporarily to an S3 bucket and copied into Redshift. The same permissions are required as mentioned above, and the temporary file in S3 is deleted after the COPY operation.

from redshift_loader.loader import s3_objects_to_redshift

s3_objects_to_redshift(
    s3_client,
    's3_bucket',
    's3_prefix',
    's3_object_type', # 'csv' or 'excel'
    'redshift_host',
    'redshift_database',
    'redshift_username',
    'redshift_password',
    'redshift_schema',
    'redshift_table',
    'optional_iam_role_arn_for_copy',
    optional_new_columns_dict,
    optional_columns_mapping_dict,        
    optional_deduplication_columns_list,
    'optional_on_duplicate_action',
    'optional_s3_copy_bucket'
)

CLI

Also a CLI is available to load data into Redshift, although currently it only supports the functionality given by s3_objects_to_redshift. An example of usage is available in the presets directory.

JSON Columns Mapping

You can define a custom column mapping either as a dictionary passed directly to the functions or in a JSON file located in the columns_mappings directory for use with the CLI. An example is provided in the columns_mappings directory.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
columns_mappings		columns_mappings
presets		presets
redshift_loader		redshift_loader
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redshift Loader

Installation

Usage

Functions

`dataframe_to_redshift`

`s3_objects_to_redshift`

CLI

JSON Columns Mapping

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Redshift Loader

Installation

Usage

Functions

dataframe_to_redshift

s3_objects_to_redshift

CLI

JSON Columns Mapping

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`dataframe_to_redshift`

`s3_objects_to_redshift`

Packages