-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unload_table macro #7
Conversation
hey @abelsonlive - This is a great macro and a great PR :)
Or alternatively, use the concatenation operator to coerce it to a string:
I think that you'll have a problem getting this to work right though. When the hook is parsed, it doesn't have access to the local scope of the model, so All that is to say: I think your final implementation is a good one! I plan on adding docs to this repo in a similar fashion to dbt-utils. Once that's ready, I think we'll be good to write some docs for this macro then get it merged in! |
@abelsonlive I just updated the docs here -- can you add an entry for the |
Alright! I'm going to open another PR for the docs. |
@abelsonlive you can just make that commit in this PR if it's easier for you! |
Okay! I added docs and a |
whups accidentally pressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more tiny comment in the docs (typo), but then this looks good to me! Thanks so much for submitting @abelsonlive
README.md
Outdated
s3_path='s3://bucket/folder', | ||
aws_key='abcdef', | ||
aws_secret='ghijklm', | ||
delimter='|') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should be delimiter
:)
WHAT
This PR adds an
unload_table
macro for dumping a Redshift table to S3.WHY
Many data science applications built on top of Redshift require passing the outputs of a query along to another application. Using
unload_table
in apost-hook
is a nice way of making the results of adbt
run accessible to other applications which can access S3.HOW
unload_table
closely follows the implementation ofUNLOAD
in Redshift.unload_table
includes sane defaults such as:table
) and the location to unload it to (s3_path
) are always set.IAM_ROLE
orACCESS_KEY_ID
andSECRET_ACCESS_KEY
are always set.DELIMITER
].(http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html#unload-usage-notes)NULL AS
set to''
.DELIMITER
set to,
.MANIFEST
turned off.MAX_FILE_SIZE
set to6 GB
.ADDQUOTES
turned off.COMPRESSION
turned off.ENCRYPTED
turned off.ALLOWOVERWRITE
turned off.PARALLEL
turned off.ISSUES
Ideally,
unload_table
would be handled by a more generalized macrounload
which would accept a query string rather than a table name. However, my attempts to implement this inside a model were unsuccessful. My idea was that I could do something like this:But, presumably because
this
is a special variable, it raised the following error:A second, more crude approach, attempted the following:
But this returned the following error:
I assume there are a few different approaches to this macro that might address this issue, but for now i think this is extremely useful as-is!