Add functionality to push data products to cloud storage #121

lewismc · 2018-06-27T17:56:13Z

Some functions for the associated services have a path='' meaning that the user can download the data to wherever they want on the local machine.

This issue looks to allow s3 paths such that the data can be sent to s3 for analysis.

The text was updated successfully, but these errors were encountered:

swatisingh45 · 2018-08-24T16:34:31Z

Hey ! can I take up the issue?

lewismc · 2018-08-24T17:53:12Z

Hi @swatisingh45 yes please. The idea would be to add a new parameter to both def granule_subset(self, input_file_path, path='') and extract_l4_granule(self, dataset_id='', path='') to essentially include a boolean flag to persistence in s3.
The new function signatures would then look something like

extract_l4_granule(self, dataset_id='', store='local', path='')
...
granule_subset(self, input_file_path, store='local', path='')

By default the storage device would be 'local' disk however the possible options would be both 'local' and 's3'.

When using s3 we should introduce a config.properties file which essentially contains key values representing the AWS configuration e.g. username and password. This file could be read when the user create an instance of Podaac().

Regarding the code for uploading files to s3, you can base it on the following example

import boto
import boto.s3
import sys
from boto.s3.key import Key

AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

bucket_name = AWS_ACCESS_KEY_ID.lower() + '-dump'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY)


bucket = conn.create_bucket(bucket_name,
    location=boto.s3.connection.Location.DEFAULT)

testfile = "replace this with an actual filename"
print 'Uploading %s to Amazon S3 bucket %s' % \
   (testfile, bucket_name)

def percent_cb(complete, total):
    sys.stdout.write('.')
    sys.stdout.flush()


k = Key(bucket)
k.key = 'my test file'
k.set_contents_from_filename(testfile,
    cb=percent_cb, num_cb=10)

Thank you for taking this issue on, if you have any issues then please let me know.

lewismc · 2018-09-25T13:22:16Z

@swatisingh45 are you working on this? If not then I will do it, thank you.

lewismc · 2019-10-13T21:51:12Z

Using Apache LibCloud's Python Object Storage API might be a good idea here.

lewismc added granule preview granule subset Level2 Subsetting ocean color data labels Jun 27, 2018

lewismc added this to the 2.2.0 milestone Jun 27, 2018

lewismc self-assigned this Jun 27, 2018

lewismc removed the ocean color data label Oct 19, 2018

lewismc changed the title ~~Add functionality to download data to s3~~ Add functionality to push data products to s3 Mar 9, 2019

lewismc modified the milestones: 2.2.0, 2.4.0 Aug 7, 2019

lewismc modified the milestones: 2.4.0, 2.5.0 Aug 23, 2019

lewismc changed the title ~~Add functionality to push data products to s3~~ Add functionality to push data products to cloud storage Oct 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to push data products to cloud storage #121

Add functionality to push data products to cloud storage #121

lewismc commented Jun 27, 2018

swatisingh45 commented Aug 24, 2018

lewismc commented Aug 24, 2018

lewismc commented Sep 25, 2018

lewismc commented Oct 13, 2019

Add functionality to push data products to cloud storage #121

Add functionality to push data products to cloud storage #121

Comments

lewismc commented Jun 27, 2018

swatisingh45 commented Aug 24, 2018

lewismc commented Aug 24, 2018

lewismc commented Sep 25, 2018

lewismc commented Oct 13, 2019