-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/download directly to s3 #2584
Conversation
This avoids using the local filesystem, which is good because we have occasionally run into "No space left on device" errors. We also avoid creating a zip file and a manifest and instead upload a CSV file directly. This has 2 consequences: - most users would prefer this because it avoids the step of having to unzip the downloaded zip file. - as there is no manifest, there is no way to show the query criteria that were used to create the download. This uses the package `smart_open` which unfortunately uses `boto` and not `boto3`. That is not too bad, as `boto` and `boto3` can co-exist.
Using the partial function `use_kwargs` was preventing the parsing of the POSTed JSON. We can restore the parsing by using the original 'use_kwargs_original`.
Codecov Report
@@ Coverage Diff @@
## develop #2584 +/- ##
===========================================
- Coverage 90.09% 89.66% -0.44%
===========================================
Files 73 73
Lines 5645 5612 -33
===========================================
- Hits 5086 5032 -54
- Misses 559 580 +21
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, @vrajmohan! I just had a couple of questions, but I think this is good to go. :-)
|
||
client = boto3.client('s3') | ||
|
||
MAX_RECORDS = 100000 | ||
MAX_RECORDS = 500000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow we're able to bump the max records with this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have called this out. My plan was to bump this up and test in dev
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's give it a shot! :-)
db.session.connection().engine, | ||
format='csv', | ||
header=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is a lot simpler in addition to streaming directly! :-)
@@ -19,3 +22,13 @@ def get_bucket(): | |||
|
|||
def get_object(key): | |||
return get_bucket().Object(key=key) | |||
|
|||
def get_s3_key(name): | |||
connection = boto.s3.connect_to_region( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you had mentioned this before somewhere but can't remember - boto3
didn't provide what was needed? I know both can co-exist, just wanted to make sure I had a clear understanding!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is summarized in the commit message for 18F@028361c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, shame on me for not reading more closely, thanks!
No description provided.