Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Add functionality to push data products to cloud storage #121

Open
lewismc opened this issue Jun 27, 2018 · 4 comments
Open

Add functionality to push data products to cloud storage #121

lewismc opened this issue Jun 27, 2018 · 4 comments

Comments

@lewismc
Copy link
Member

lewismc commented Jun 27, 2018

Some functions for the associated services have a path='' meaning that the user can download the data to wherever they want on the local machine.

This issue looks to allow s3 paths such that the data can be sent to s3 for analysis.

@swatisingh45
Copy link

Hey ! can I take up the issue?

@lewismc
Copy link
Member Author

lewismc commented Aug 24, 2018

Hi @swatisingh45 yes please. The idea would be to add a new parameter to both def granule_subset(self, input_file_path, path='') and extract_l4_granule(self, dataset_id='', path='') to essentially include a boolean flag to persistence in s3.
The new function signatures would then look something like

extract_l4_granule(self, dataset_id='', store='local', path='')
...
granule_subset(self, input_file_path, store='local', path='')

By default the storage device would be 'local' disk however the possible options would be both 'local' and 's3'.

When using s3 we should introduce a config.properties file which essentially contains key values representing the AWS configuration e.g. username and password. This file could be read when the user create an instance of Podaac().

Regarding the code for uploading files to s3, you can base it on the following example

import boto
import boto.s3
import sys
from boto.s3.key import Key

AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''

bucket_name = AWS_ACCESS_KEY_ID.lower() + '-dump'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY)


bucket = conn.create_bucket(bucket_name,
    location=boto.s3.connection.Location.DEFAULT)

testfile = "replace this with an actual filename"
print 'Uploading %s to Amazon S3 bucket %s' % \
   (testfile, bucket_name)

def percent_cb(complete, total):
    sys.stdout.write('.')
    sys.stdout.flush()


k = Key(bucket)
k.key = 'my test file'
k.set_contents_from_filename(testfile,
    cb=percent_cb, num_cb=10)

Thank you for taking this issue on, if you have any issues then please let me know.

@lewismc
Copy link
Member Author

lewismc commented Sep 25, 2018

@swatisingh45 are you working on this? If not then I will do it, thank you.

@lewismc lewismc changed the title Add functionality to download data to s3 Add functionality to push data products to s3 Mar 9, 2019
@lewismc lewismc modified the milestones: 2.2.0, 2.4.0 Aug 7, 2019
@lewismc lewismc modified the milestones: 2.4.0, 2.5.0 Aug 23, 2019
@lewismc lewismc changed the title Add functionality to push data products to s3 Add functionality to push data products to cloud storage Oct 13, 2019
@lewismc
Copy link
Member Author

lewismc commented Oct 13, 2019

Using Apache LibCloud's Python Object Storage API might be a good idea here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants