# boto copying from the AWS cloud

### copying glodap data files from Amazon public cloud object storage to make local copies

This notebook copies files from AWS S3 buckets to ```/home/jovyan/data```, i.e. the local Jupyter workspace.
This *should not* be done when direct access to S3 is possible; but we are setting aside that aesthetic here.
Copying from AWS object storage S3 buckets *should* in fact be done using **boto3**, which is distinct from
its predecessor **boto**. However I don't have time to make that change at the moment; so we proceed with
two lost style points using just plain **boto**.


By the by Stack Overflow says:

> The boto package is the hand-coded Python library that has been around 
since 2006. It is very popular and is fully supported by AWS but because 
it is hand-coded and there are so many services available (with more 
appearing all the time) it is difficult to maintain.
> 
> So, boto3 is a new version of the boto library based on botocore. All 
of the low-level interfaces to AWS are driven from JSON service descriptions 
that are generated automatically from the canonical descriptions of the services. 
So, the interfaces are always correct and always up to date. There is a 
resource layer on top of the client-layer that provides a nicer, more Pythonic interface.
>
>The boto3 library is being actively developed by AWS and \[is therefore recommended\].

In [3]:
# Don't run unless you want to grab glodap files from S3
# This may take a couple minutes to run
# It fails if ~/data/glodap does not exist; so it should really check for that and create it
# The other key point here is that we should be using boto3

import boto

data_dir = '/home/jovyan/data/glodap/'
local_salinity_filename = data_dir + 'glodap_salinity.nc'
local_temperature_filename = data_dir + 'glodap_temperature.nc'
local_oxygen_filename = data_dir + 'glodap_oxygen.nc'

connection = boto.connect_s3(anon=True)
bucket = connection.get_bucket('himatdata')
for key in bucket.list():
    keyname = str(key.name.encode('utf-8'))
    if 'glodap/' in keyname and 'salinity' in keyname: key.get_contents_to_filename(local_salinity_filename)
    elif 'glodap/' in keyname and 'temperature' in keyname: key.get_contents_to_filename(local_temperature_filename)
    elif 'glodap/' in keyname and 'oxygen' in keyname: key.get_contents_to_filename(local_oxygen_filename)

In [4]:
# Don't run unless you want to grab a set of nine ARGO profile netcdf files (9 ARGO platforms (drifters (floats)))
# f = ...strip strip strip strip is due to possible irregularities in the string cast of the key name
# This may take a couple minutes to run

import boto
data_dir = '/home/jovyan/data/glodap/'
connection = boto.connect_s3(anon=True)
bucket = connection.get_bucket('himatdata')
for key in bucket.list(): 
    keyname = str(key.name.encode('utf-8'))
    f = keyname.strip("b'").strip('b"').strip('"').strip("'")
    if 'argo-profiles' in keyname: 
        ff = '/home/jovyan/data/' + f
        key.get_contents_to_filename(ff)


In [5]:
# Don't run unless you want to grab a large (800MB) tar file from S3 bucket 'oceanhackweek' to the local directory
# This contains a bunch of different sub-dirs and data files as it un-tars into the 'data' directory.
# This takes less than a minute to run.

import boto
f = '/home/jovyan/data.tar'
connection = boto.connect_s3(anon=True)
bucket = connection.get_bucket('oceanhackweek')
for key in bucket.list(): 
    keyname = str(key.name.encode('utf-8'))
    if 'data.tar' in keyname: key.get_contents_to_filename(f)