Feature/download directly to s3 #2584

vrajmohan · 2017-08-04T00:01:48Z

No description provided.

This avoids using the local filesystem, which is good because we have occasionally run into "No space left on device" errors. We also avoid creating a zip file and a manifest and instead upload a CSV file directly. This has 2 consequences: - most users would prefer this because it avoids the step of having to unzip the downloaded zip file. - as there is no manifest, there is no way to show the query criteria that were used to create the download. This uses the package `smart_open` which unfortunately uses `boto` and not `boto3`. That is not too bad, as `boto` and `boto3` can co-exist.

Using the partial function `use_kwargs` was preventing the parsing of the POSTed JSON. We can restore the parsing by using the original 'use_kwargs_original`.

codecov-io · 2017-08-04T00:08:45Z

Codecov Report

Merging #2584 into develop will decrease coverage by 0.43%.
The diff coverage is 52.94%.

@@             Coverage Diff             @@
##           develop    #2584      +/-   ##
===========================================
- Coverage    90.09%   89.66%   -0.44%     
===========================================
  Files           73       73              
  Lines         5645     5612      -33     
===========================================
- Hits          5086     5032      -54     
- Misses         559      580      +21

Impacted Files	Coverage Δ
webservices/resources/download.py	`100% <100%> (ø)`	⬆️
webservices/tasks/download.py	`66.66% <33.33%> (-25.78%)`	⬇️
webservices/tasks/utils.py	`80.95% <50%> (-19.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d582b0...bba27f4. Read the comment docs.

ccostino

This is great, @vrajmohan! I just had a couple of questions, but I think this is good to go. :-)

ccostino · 2017-08-04T20:35:29Z

webservices/resources/download.py


 client = boto3.client('s3')

-MAX_RECORDS = 100000
+MAX_RECORDS = 500000


Oh wow we're able to bump the max records with this too?

I should have called this out. My plan was to bump this up and test in dev first.

Let's give it a shot! :-)

ccostino · 2017-08-04T20:36:30Z

webservices/tasks/download.py

+            db.session.connection().engine,
+            format='csv',
+            header=True
+        )


Nice, this is a lot simpler in addition to streaming directly! :-)

ccostino · 2017-08-04T21:20:08Z

webservices/tasks/utils.py

@@ -19,3 +22,13 @@ def get_bucket():

 def get_object(key):
    return get_bucket().Object(key=key)
+
+def get_s3_key(name):
+    connection = boto.s3.connect_to_region(


I know you had mentioned this before somewhere but can't remember - boto3 didn't provide what was needed? I know both can co-exist, just wanted to make sure I had a clear understanding!

Yes, this is summarized in the commit message for 18F@028361c

D'oh, shame on me for not reading more closely, thanks!

Vraj Mohan added 3 commits August 3, 2017 14:22

Unblock the parsing of the POSTed filename

46bd30a

Using the partial function `use_kwargs` was preventing the parsing of the POSTed JSON. We can restore the parsing by using the original 'use_kwargs_original`.

Increase allowed download size to 500,000 rows

bba27f4

vrajmohan requested review from ccostino and LindsayYoung August 4, 2017 14:55

noahmanger mentioned this pull request Aug 4, 2017

Use .csv extension for download filename fecgov/openFEC-web-app#2242

Merged

ccostino reviewed Aug 4, 2017

View reviewed changes

ccostino merged commit 17d9b73 into develop Aug 4, 2017

ccostino deleted the feature/download-directly-to-S3 branch August 4, 2017 21:32

cnlucas mentioned this pull request Jul 23, 2024

Feature request: Remove or increase download cap, restrict pagination on large datasets #5884

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/download directly to s3 #2584

Feature/download directly to s3 #2584

vrajmohan commented Aug 4, 2017

codecov-io commented Aug 4, 2017 •

edited

Loading

ccostino left a comment

ccostino Aug 4, 2017

vrajmohan Aug 4, 2017

ccostino Aug 4, 2017

ccostino Aug 4, 2017

ccostino Aug 4, 2017

vrajmohan Aug 4, 2017

ccostino Aug 4, 2017

Feature/download directly to s3 #2584

Feature/download directly to s3 #2584

Conversation

vrajmohan commented Aug 4, 2017

codecov-io commented Aug 4, 2017 • edited Loading

Codecov Report

ccostino left a comment

Choose a reason for hiding this comment

ccostino Aug 4, 2017

Choose a reason for hiding this comment

vrajmohan Aug 4, 2017

Choose a reason for hiding this comment

ccostino Aug 4, 2017

Choose a reason for hiding this comment

ccostino Aug 4, 2017

Choose a reason for hiding this comment

ccostino Aug 4, 2017

Choose a reason for hiding this comment

vrajmohan Aug 4, 2017

Choose a reason for hiding this comment

ccostino Aug 4, 2017

Choose a reason for hiding this comment

codecov-io commented Aug 4, 2017 •

edited

Loading