New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add format and compression to COPY command #21

Merged
merged 1 commit into from Aug 27, 2015

Conversation

Projects
None yet
3 participants
@cpcloud
Contributor

cpcloud commented Aug 24, 2015

replaces #1

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

@graingert when you get a chance can you close #1? thanks!

what's needed to merge this PR?

  1. use sa.text.
  2. add a test

anything else?

Contributor

cpcloud commented Aug 24, 2015

@graingert when you get a chance can you close #1? thanks!

what's needed to merge this PR?

  1. use sa.text.
  2. add a test

anything else?

@graingert

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@graingert

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

needs a test, and needs some review for SQL injection possibilities

Collaborator

graingert commented Aug 24, 2015

needs a test, and needs some review for SQL injection possibilities

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

I know it's not your code, but I think this PR is a good excuse to clean this function up a bit.

Collaborator

graingert commented Aug 24, 2015

I know it's not your code, but I think this PR is a good excuse to clean this function up a bit.

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

@graingert how do you feel about adding enum34 as a dependency so that we can have something like this:

import enum


class Format(enum.Enum):
    CSV = 0
    JSON = 1


class Compression(enum.Enum):
    GZIP = 0
    LZOP = 1
    # more as needed
Contributor

cpcloud commented Aug 24, 2015

@graingert how do you feel about adding enum34 as a dependency so that we can have something like this:

import enum


class Format(enum.Enum):
    CSV = 0
    JSON = 1


class Compression(enum.Enum):
    GZIP = 0
    LZOP = 1
    # more as needed
@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

Could do something like assert format in ['JSON', 'CSV'] rather than an enum

Collaborator

graingert commented Aug 24, 2015

Could do something like assert format in ['JSON', 'CSV'] rather than an enum

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

It doesn't appear that there's a way to use bindparams with the session_token parameter:

In [1]: sa.text(':secret_key:session_token').bindparams(secret_key='a', session_token='b')
ArgumentError: This text() construct doesn't define a bound parameter named 'secret_key'
Contributor

cpcloud commented Aug 24, 2015

It doesn't appear that there's a way to use bindparams with the session_token parameter:

In [1]: sa.text(':secret_key:session_token').bindparams(secret_key='a', session_token='b')
ArgumentError: This text() construct doesn't define a bound parameter named 'secret_key'
@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

The format for credentials is:

aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>[;token=<temporary-session-token>]
Contributor

cpcloud commented Aug 24, 2015

The format for credentials is:

aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>[;token=<temporary-session-token>]
@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

I think the credentials string will need to be formatted in python and bound as one single SQL parameter

Collaborator

graingert commented Aug 24, 2015

I think the credentials string will need to be formatted in python and bound as one single SQL parameter

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

fine by me

Contributor

cpcloud commented Aug 24, 2015

fine by me

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

looks nice, could do with some tests though. Don't worry about uploading files to S3 to the real redshift instance for now. Just some simple compiles with verification of the text produced:

You can use http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html#how-do-i-render-sql-expressions-as-strings-possibly-with-bound-parameters-inlined to do the compile including bound params.

Collaborator

graingert commented Aug 24, 2015

looks nice, could do with some tests though. Don't worry about uploading files to S3 to the real redshift instance for now. Just some simple compiles with verification of the text produced:

You can use http://docs.sqlalchemy.org/en/latest/faq/sqlexpressions.html#how-do-i-render-sql-expressions-as-strings-possibly-with-bound-parameters-inlined to do the compile including bound params.

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

i added test_format, test_compression, test_invalid_format and test_invalid_compression. What else do you want me to add?

Contributor

cpcloud commented Aug 24, 2015

i added test_format, test_compression, test_invalid_format and test_invalid_compression. What else do you want me to add?

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

Or rather, what else do you want me to test?

Contributor

cpcloud commented Aug 24, 2015

Or rather, what else do you want me to test?

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

Ah sorry I didn't spot the tests.

Collaborator

graingert commented Aug 24, 2015

Ah sorry I didn't spot the tests.

@graingert

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@graingert

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@graingert

View changes

Show outdated Hide outdated tests/test_alembic_dialect.py Outdated
@graingert

View changes

Show outdated Hide outdated tests/test_copy_command.py Outdated
@graingert

View changes

Show outdated Hide outdated tests/test_copy_command.py Outdated
@graingert

View changes

Show outdated Hide outdated tests/test_copy_command.py Outdated
@graingert

View changes

Show outdated Hide outdated tests/test_copy_command.py Outdated
@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

This changes the interfaces (to something I'd prefer), so we'll have to get some consensus with other users of this code.

Another possibility is to rename this implementation as NewCopyCommand (name TBD) and adapt the old CopyCommand interface to call this NewCopyCommand

Collaborator

graingert commented Aug 24, 2015

This changes the interfaces (to something I'd prefer), so we'll have to get some consensus with other users of this code.

Another possibility is to rename this implementation as NewCopyCommand (name TBD) and adapt the old CopyCommand interface to call this NewCopyCommand

@graingert

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 24, 2015

Collaborator

@jklukas @thisfred @bouk: Thoughts?

Collaborator

graingert commented Aug 24, 2015

@jklukas @thisfred @bouk: Thoughts?

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 24, 2015

Contributor

I had no idea so many people were developing this library! Awesome.

Contributor

cpcloud commented Aug 24, 2015

I had no idea so many people were developing this library! Awesome.

'aws_access_key_id=<access-key-id>;'
'aws_secret_access_key=<secret-access-key>'
'[;token=<temporary-session-token>]\ngot %r' %
credentials)

This comment has been minimized.

@bouk

bouk Aug 24, 2015

Contributor

why did you change the way the credentials are passed in? This seems way more brittle

@bouk

bouk Aug 24, 2015

Contributor

why did you change the way the credentials are passed in? This seems way more brittle

This comment has been minimized.

@graingert

graingert Aug 24, 2015

Collaborator

this isn't brittle, this simply fails fast with invalid data.

@graingert

graingert Aug 24, 2015

Collaborator

this isn't brittle, this simply fails fast with invalid data.

This comment has been minimized.

@graingert

graingert Aug 24, 2015

Collaborator

I'd probably prefer taking the params as arguments, doing the format then doing the regex validation.

Eg: (Don't quote me on this)

def __init__(self, ..., access_key_id, secret_access_key, token=None):
    credentials = 'aws_access_key_id={access_key};aws_secret_access_key={secret_access_key}'.format(
        access_key=access_key_id,
        secret_access_key=secret_access_key,
    )
    if token is not None:
        credentials += ';token=' + token
    if not creds_rx.match(credentials):
        ...fail...
@graingert

graingert Aug 24, 2015

Collaborator

I'd probably prefer taking the params as arguments, doing the format then doing the regex validation.

Eg: (Don't quote me on this)

def __init__(self, ..., access_key_id, secret_access_key, token=None):
    credentials = 'aws_access_key_id={access_key};aws_secret_access_key={secret_access_key}'.format(
        access_key=access_key_id,
        secret_access_key=secret_access_key,
    )
    if token is not None:
        credentials += ';token=' + token
    if not creds_rx.match(credentials):
        ...fail...

This comment has been minimized.

@bouk

bouk Aug 24, 2015

Contributor

But why not just have them as seperate arguments like before? That's definitely more robust instead of this regex stuff

@bouk

bouk Aug 24, 2015

Contributor

But why not just have them as seperate arguments like before? That's definitely more robust instead of this regex stuff

This comment has been minimized.

@cpcloud

cpcloud Aug 24, 2015

Contributor

fine by me to have this as separate arguments

@cpcloud

cpcloud Aug 24, 2015

Contributor

fine by me to have this as separate arguments

@bouk

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@bouk

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@bouk

View changes

Show outdated Hide outdated redshift_sqlalchemy/dialect.py Outdated
@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 26, 2015

Collaborator

Does

engine.execute(sa.text("copy nulltable from 's3://bucket/file-with-null-values.txt' credentials 'aws_access_key_id=<...>;aws_secret_access_key=<...>' compupdate off csv ignoreheader 1 delimiter ',' null as :nullas"), nullas=u'\0')

work for you?

Collaborator

graingert commented Aug 26, 2015

Does

engine.execute(sa.text("copy nulltable from 's3://bucket/file-with-null-values.txt' credentials 'aws_access_key_id=<...>;aws_secret_access_key=<...>' compupdate off csv ignoreheader 1 delimiter ',' null as :nullas"), nullas=u'\0')

work for you?

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 26, 2015

Contributor

no.

you can't pass kwargs like that to sa.text

Contributor

cpcloud commented Aug 26, 2015

no.

you can't pass kwargs like that to sa.text

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 26, 2015

Contributor

it doesn't work for the same reasons as stated above about escaping

Contributor

cpcloud commented Aug 26, 2015

it doesn't work for the same reasons as stated above about escaping

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 26, 2015

Collaborator

It shouldn't be escaped at all. It should come through as a literal null byte

Collaborator

graingert commented Aug 26, 2015

It shouldn't be escaped at all. It should come through as a literal null byte

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 26, 2015

Contributor

Is there a place that we can chat? I feel like this is moving slower than it could.

Contributor

cpcloud commented Aug 26, 2015

Is there a place that we can chat? I feel like this is moving slower than it could.

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 26, 2015

Contributor

sqla irc?

Contributor

cpcloud commented Aug 26, 2015

sqla irc?

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 26, 2015

Collaborator

Yup my contact details are here https://graingert.co.uk/foaf.rdf
On 26 Aug 2015 21:19, "Phillip Cloud" notifications@github.com wrote:

sqla irc?


Reply to this email directly or view it on GitHub
#21 (comment)
.

Collaborator

graingert commented Aug 26, 2015

Yup my contact details are here https://graingert.co.uk/foaf.rdf
On 26 Aug 2015 21:19, "Phillip Cloud" notifications@github.com wrote:

sqla irc?


Reply to this email directly or view it on GitHub
#21 (comment)
.

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 26, 2015

Contributor

i'm in the sqla irc

Contributor

cpcloud commented Aug 26, 2015

i'm in the sqla irc

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

I still don't see why setting compiler.dialect._backslash_escapes to False and then resetting it back to the default for the dialect isn't the right solution here. No matter what the approach is (overriding compiler.render_literal_value, or a custom type), it will involve turning off _backslash_escapes and then turning it back to the default.

Can we please get this merged today. This is dragging on far too long.

Contributor

cpcloud commented Aug 27, 2015

I still don't see why setting compiler.dialect._backslash_escapes to False and then resetting it back to the default for the dialect isn't the right solution here. No matter what the approach is (overriding compiler.render_literal_value, or a custom type), it will involve turning off _backslash_escapes and then turning it back to the default.

Can we please get this merged today. This is dragging on far too long.

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

it will involve turning off _backslash_escapes and then turning it back to the default.

I will not accept anything that breaks the thread-safety of the query compiler. Let's just use .format() here

Collaborator

graingert commented Aug 27, 2015

it will involve turning off _backslash_escapes and then turning it back to the default.

I will not accept anything that breaks the thread-safety of the query compiler. Let's just use .format() here

@bouk

This comment has been minimized.

Show comment
Hide comment
@bouk

bouk Aug 27, 2015

Contributor

Also, setting private variables is a no-no

Contributor

bouk commented Aug 27, 2015

Also, setting private variables is a no-no

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

Let's just use .format() here.

For everything, or just for null?

Also, setting private variables is a no-no

I'm aware of that.

I will not accept anything that breaks the thread-safety of the compiler.

Can you give an example of how this breaks thread safety? Is there no public state on the compiler that can be set by a user during compilation?

Contributor

cpcloud commented Aug 27, 2015

Let's just use .format() here.

For everything, or just for null?

Also, setting private variables is a no-no

I'm aware of that.

I will not accept anything that breaks the thread-safety of the compiler.

Can you give an example of how this breaks thread safety? Is there no public state on the compiler that can be set by a user during compilation?

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

Just for null, call the parameter "dangerous_null_delimiter=None" and only add the "NULL AS " if it's not None.

Thread 1 Thread 2
disable slashes
text_query = compile(obj)
text_query = compile(obj)
enable slashes
text_query is not escaped!
Collaborator

graingert commented Aug 27, 2015

Just for null, call the parameter "dangerous_null_delimiter=None" and only add the "NULL AS " if it's not None.

Thread 1 Thread 2
disable slashes
text_query = compile(obj)
text_query = compile(obj)
enable slashes
text_query is not escaped!
@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

Nice example.

call the parameter "dangerous_null_delimiter=None"

is that a joke?

Contributor

cpcloud commented Aug 27, 2015

Nice example.

call the parameter "dangerous_null_delimiter=None"

is that a joke?

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

nope

Collaborator

graingert commented Aug 27, 2015

nope

@graingert graingert added this to the 1.0.0 milestone Aug 27, 2015

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

Thanks, but that's unnecessary.

Contributor

cpcloud commented Aug 27, 2015

Thanks, but that's unnecessary.

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

Cool looks like this is ready to go, can you clean up your commits (squash and rebase as appropriate)?

Collaborator

graingert commented Aug 27, 2015

Cool looks like this is ready to go, can you clean up your commits (squash and rebase as appropriate)?

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

Sure

Contributor

cpcloud commented Aug 27, 2015

Sure

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

@graingert Would you like me to squash further?

Contributor

cpcloud commented Aug 27, 2015

@graingert Would you like me to squash further?

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

probably best for one commit here I think?

Collaborator

graingert commented Aug 27, 2015

probably best for one commit here I think?

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

done

Contributor

cpcloud commented Aug 27, 2015

done

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

@jklukas @thisfred @bouk: I'm going to merge this. If you've got any objections we can change them in another PR.

Collaborator

graingert commented Aug 27, 2015

@jklukas @thisfred @bouk: I'm going to merge this. If you've got any objections we can change them in another PR.

graingert added a commit that referenced this pull request Aug 27, 2015

Merge pull request #21 from cpcloud/format-compression
Add format and compression to COPY command

@graingert graingert merged commit c302895 into sqlalchemy-redshift:master Aug 27, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cpcloud cpcloud deleted the cpcloud:format-compression branch Aug 27, 2015

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 27, 2015

Contributor

@graingert thanks for your patience.

Contributor

cpcloud commented Aug 27, 2015

@graingert thanks for your patience.

@graingert

This comment has been minimized.

Show comment
Hide comment
@graingert

graingert Aug 27, 2015

Collaborator

@graingert no, thank you for your patience, feature and improvements to the code!

Collaborator

graingert commented Aug 27, 2015

@graingert no, thank you for your patience, feature and improvements to the code!

haleemur pushed a commit to haleemur/redshift_sqlalchemy that referenced this pull request Sep 2, 2015

Merge pull request sqlalchemy-redshift#21 from cpcloud/format-compres…
…sion

Add format and compression to COPY command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment