-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add format and compression to COPY command #21
Add format and compression to COPY command #21
Conversation
@graingert when you get a chance can you close #1? thanks! what's needed to merge this PR?
anything else? |
@@ -360,12 +363,13 @@ def visit_copy_command(element, compiler, **kw): | |||
return """ | |||
COPY %(schema_name)s.%(table_name)s FROM '%(data_location)s' | |||
CREDENTIALS 'aws_access_key_id=%(access_key)s;aws_secret_access_key=%(secret_key)s%(session_token)s' | |||
CSV | |||
%(format)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do this without string formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quoted values should definitely be implemented as SQL parameters
needs a test, and needs some review for SQL injection possibilities |
I know it's not your code, but I think this PR is a good excuse to clean this function up a bit. |
@graingert how do you feel about adding import enum
class Format(enum.Enum):
CSV = 0
JSON = 1
class Compression(enum.Enum):
GZIP = 0
LZOP = 1
# more as needed |
Could do something like |
It doesn't appear that there's a way to use bindparams with the
|
The format for credentials is:
|
I think the credentials string will need to be formatted in python and bound as one single SQL parameter |
fine by me |
looks nice, could do with some tests though. Don't worry about uploading files to S3 to the real redshift instance for now. Just some simple compiles with verification of the text produced: You can use http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html#how-do-i-render-sql-expressions-as-strings-possibly-with-bound-parameters-inlined to do the compile including bound params. |
i added |
Or rather, what else do you want me to test? |
Ah sorry I didn't spot the tests. |
class CopyCommand(Executable, ClauseElement): | ||
''' Prepares a RedShift COPY statement | ||
''' Prepares a Redshift COPY statement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer pep257 dosctrings:
"""
Prepares a Redshift COPY statement.
...
"""
This changes the interfaces (to something I'd prefer), so we'll have to get some consensus with other users of this code. Another possibility is to rename this implementation as |
self.credentials = credentials | ||
self.delimiter = delimiter | ||
self.ignore_header = ignore_header | ||
self.null = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I do a ctrl+F "self.null" on this page I don't find anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's because it's used in the visit_copy_command
as element.null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😳 ah of course.
I had no idea so many people were developing this library! Awesome. |
'aws_access_key_id=<access-key-id>;' | ||
'aws_secret_access_key=<secret-access-key>' | ||
'[;token=<temporary-session-token>]\ngot %r' % | ||
credentials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change the way the credentials are passed in? This seems way more brittle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't brittle, this simply fails fast with invalid data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably prefer taking the params as arguments, doing the format then doing the regex validation.
Eg: (Don't quote me on this)
def __init__(self, ..., access_key_id, secret_access_key, token=None):
credentials = 'aws_access_key_id={access_key};aws_secret_access_key={secret_access_key}'.format(
access_key=access_key_id,
secret_access_key=secret_access_key,
)
if token is not None:
credentials += ';token=' + token
if not creds_rx.match(credentials):
...fail...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why not just have them as seperate arguments like before? That's definitely more robust instead of this regex stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine by me to have this as separate arguments
no. you can't pass kwargs like that to |
it doesn't work for the same reasons as stated above about escaping |
It shouldn't be escaped at all. It should come through as a literal null byte |
Is there a place that we can chat? I feel like this is moving slower than it could. |
sqla irc? |
Yup my contact details are here https://graingert.co.uk/foaf.rdf
|
i'm in the sqla irc |
I still don't see why setting Can we please get this merged today. This is dragging on far too long. |
I will not accept anything that breaks the thread-safety of the query compiler. Let's just use |
Also, setting private variables is a no-no |
For everything, or just for
I'm aware of that.
Can you give an example of how this breaks thread safety? Is there no public state on the compiler that can be set by a user during compilation? |
Just for null, call the parameter "dangerous_null_delimiter=None" and only add the "NULL AS " if it's not None.
|
Nice example.
is that a joke? |
nope |
Thanks, but that's unnecessary. |
Cool looks like this is ready to go, can you clean up your commits (squash and rebase as appropriate)? |
Sure |
@graingert Would you like me to squash further? |
probably best for one commit here I think? |
done |
Add format and compression to COPY command
@graingert thanks for your patience. |
@graingert no, thank you for your patience, feature and improvements to the code! |
…sion Add format and compression to COPY command
replaces #1