Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add format and compression to COPY command #21

Merged
merged 1 commit into from
Aug 27, 2015
Merged

Add format and compression to COPY command #21

merged 1 commit into from
Aug 27, 2015

Conversation

cpcloud
Copy link
Contributor

@cpcloud cpcloud commented Aug 24, 2015

replaces #1

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

@graingert when you get a chance can you close #1? thanks!

what's needed to merge this PR?

  1. use sa.text.
  2. add a test

anything else?

@@ -360,12 +363,13 @@ def visit_copy_command(element, compiler, **kw):
return """
COPY %(schema_name)s.%(table_name)s FROM '%(data_location)s'
CREDENTIALS 'aws_access_key_id=%(access_key)s;aws_secret_access_key=%(secret_key)s%(session_token)s'
CSV
%(format)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this without string formatting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quoted values should definitely be implemented as SQL parameters

@graingert
Copy link
Member

needs a test, and needs some review for SQL injection possibilities

@graingert
Copy link
Member

I know it's not your code, but I think this PR is a good excuse to clean this function up a bit.

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

@graingert how do you feel about adding enum34 as a dependency so that we can have something like this:

import enum


class Format(enum.Enum):
    CSV = 0
    JSON = 1


class Compression(enum.Enum):
    GZIP = 0
    LZOP = 1
    # more as needed

@graingert
Copy link
Member

Could do something like assert format in ['JSON', 'CSV'] rather than an enum

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

It doesn't appear that there's a way to use bindparams with the session_token parameter:

In [1]: sa.text(':secret_key:session_token').bindparams(secret_key='a', session_token='b')
ArgumentError: This text() construct doesn't define a bound parameter named 'secret_key'

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

The format for credentials is:

aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>[;token=<temporary-session-token>]

@graingert
Copy link
Member

I think the credentials string will need to be formatted in python and bound as one single SQL parameter

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

fine by me

@graingert
Copy link
Member

looks nice, could do with some tests though. Don't worry about uploading files to S3 to the real redshift instance for now. Just some simple compiles with verification of the text produced:

You can use http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html#how-do-i-render-sql-expressions-as-strings-possibly-with-bound-parameters-inlined to do the compile including bound params.

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

i added test_format, test_compression, test_invalid_format and test_invalid_compression. What else do you want me to add?

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

Or rather, what else do you want me to test?

@graingert
Copy link
Member

Ah sorry I didn't spot the tests.

class CopyCommand(Executable, ClauseElement):
''' Prepares a RedShift COPY statement
''' Prepares a Redshift COPY statement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer pep257 dosctrings:

"""
Prepares a Redshift COPY statement.

...
"""

@graingert
Copy link
Member

This changes the interfaces (to something I'd prefer), so we'll have to get some consensus with other users of this code.

Another possibility is to rename this implementation as NewCopyCommand (name TBD) and adapt the old CopyCommand interface to call this NewCopyCommand

self.credentials = credentials
self.delimiter = delimiter
self.ignore_header = ignore_header
self.null = null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do a ctrl+F "self.null" on this page I don't find anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's because it's used in the visit_copy_command as element.null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳 ah of course.

@graingert
Copy link
Member

@jklukas @thisfred @bouk: Thoughts?

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 24, 2015

I had no idea so many people were developing this library! Awesome.

'aws_access_key_id=<access-key-id>;'
'aws_secret_access_key=<secret-access-key>'
'[;token=<temporary-session-token>]\ngot %r' %
credentials)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change the way the credentials are passed in? This seems way more brittle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't brittle, this simply fails fast with invalid data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably prefer taking the params as arguments, doing the format then doing the regex validation.

Eg: (Don't quote me on this)

def __init__(self, ..., access_key_id, secret_access_key, token=None):
    credentials = 'aws_access_key_id={access_key};aws_secret_access_key={secret_access_key}'.format(
        access_key=access_key_id,
        secret_access_key=secret_access_key,
    )
    if token is not None:
        credentials += ';token=' + token
    if not creds_rx.match(credentials):
        ...fail...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not just have them as seperate arguments like before? That's definitely more robust instead of this regex stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me to have this as separate arguments

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 26, 2015

no.

you can't pass kwargs like that to sa.text

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 26, 2015

it doesn't work for the same reasons as stated above about escaping

@graingert
Copy link
Member

It shouldn't be escaped at all. It should come through as a literal null byte

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 26, 2015

Is there a place that we can chat? I feel like this is moving slower than it could.

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 26, 2015

sqla irc?

@graingert
Copy link
Member

Yup my contact details are here https://graingert.co.uk/foaf.rdf
On 26 Aug 2015 21:19, "Phillip Cloud" notifications@github.com wrote:

sqla irc?


Reply to this email directly or view it on GitHub
#21 (comment)
.

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 26, 2015

i'm in the sqla irc

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

I still don't see why setting compiler.dialect._backslash_escapes to False and then resetting it back to the default for the dialect isn't the right solution here. No matter what the approach is (overriding compiler.render_literal_value, or a custom type), it will involve turning off _backslash_escapes and then turning it back to the default.

Can we please get this merged today. This is dragging on far too long.

@graingert
Copy link
Member

it will involve turning off _backslash_escapes and then turning it back to the default.

I will not accept anything that breaks the thread-safety of the query compiler. Let's just use .format() here

@bouk
Copy link
Contributor

bouk commented Aug 27, 2015

Also, setting private variables is a no-no

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

Let's just use .format() here.

For everything, or just for null?

Also, setting private variables is a no-no

I'm aware of that.

I will not accept anything that breaks the thread-safety of the compiler.

Can you give an example of how this breaks thread safety? Is there no public state on the compiler that can be set by a user during compilation?

@graingert
Copy link
Member

Just for null, call the parameter "dangerous_null_delimiter=None" and only add the "NULL AS " if it's not None.

Thread 1 Thread 2
disable slashes
text_query = compile(obj)
text_query = compile(obj)
enable slashes
text_query is not escaped!

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

Nice example.

call the parameter "dangerous_null_delimiter=None"

is that a joke?

@graingert
Copy link
Member

nope

@graingert graingert added this to the 1.0.0 milestone Aug 27, 2015
@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

Thanks, but that's unnecessary.

@graingert
Copy link
Member

Cool looks like this is ready to go, can you clean up your commits (squash and rebase as appropriate)?

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

Sure

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

@graingert Would you like me to squash further?

@graingert
Copy link
Member

probably best for one commit here I think?

@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

done

@graingert
Copy link
Member

@jklukas @thisfred @bouk: I'm going to merge this. If you've got any objections we can change them in another PR.

graingert added a commit that referenced this pull request Aug 27, 2015
Add format and compression to COPY command
@graingert graingert merged commit c302895 into sqlalchemy-redshift:master Aug 27, 2015
@cpcloud cpcloud deleted the format-compression branch August 27, 2015 15:24
@cpcloud
Copy link
Contributor Author

cpcloud commented Aug 27, 2015

@graingert thanks for your patience.

@graingert
Copy link
Member

@graingert no, thank you for your patience, feature and improvements to the code!

haleemur pushed a commit to haleemur/redshift_sqlalchemy that referenced this pull request Sep 2, 2015
…sion

Add format and compression to COPY command
@jklukas jklukas mentioned this pull request Mar 18, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants