## Cloud Object Storage Utilities
This notebook illustrates several examples of using the python package (ibm_boto3) to manage cloud object storage.

This notebook is based on the code and utilities from this github repository: https://github.com/biosopher/unofficial-watson-studio-python-utils

In [None]:
# Install required packages
import ibm_boto3
from ibm_botocore.client import Config

### COS Credentials
In the next cell, you need to specify the cloud object storage instance credentials.
For details on how to create the credentials for your Cloud Object Storage instance, check the following link:

https://github.com/biosopher/unofficial-watson-studio-python-utils/wiki/Save-COS-Credentials-to-cos_credentials.json

**Note** Make sure you use the **{"HMAC":true}** parameter when creating the credentials.

The COS credentials should look as follows:

```{
  "apikey": "********************",
  "cos_hmac_keys": {
    "access_key_id": "*************************",
    "secret_access_key": "*************************"
  },
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "***********************",
  "iam_apikey_name": "*****************************",
  "iam_role_crn": "***************************",
  "iam_serviceid_crn": "****************************",
  "resource_instance_id": "********************************"
}```

Additionally, you need to specify the service endpoint for your COS instance. To get that endpoint:
- Navigate to your COS instance
- Click on the Endpoint link in the left navigation column
- Copy the public endpoint corresponding to your  COS location. If your location is us-geo, then select the public endpoint for us-geo.

The service endpoint would look as follows:

**service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'**


In [None]:
cos_credentials = {
  "apikey": "**************",
  "cos_hmac_keys": {
    "access_key_id": "**************",
    "secret_access_key": "**************"
  },
  "endpoints": "https://cos-service.bluemix.net/endpoints",
  "iam_apikey_description": "**************",
  "iam_apikey_name": "**************",
  "iam_role_crn": "**************",
  "iam_serviceid_crn": "**************",
  "resource_instance_id": "**************"
}
service_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'

In [None]:
# The code was removed by DSX for sharing.

In [None]:
import boto3

In [None]:
cos_client = boto3.client('s3', 
                          endpoint_url = service_endpoint, 
                          aws_access_key_id=cos_credentials["cos_hmac_keys"]["access_key_id"], 
                          aws_secret_access_key=cos_credentials["cos_hmac_keys"]["secret_access_key"])



### COS Utilities
In the next cell, we define multiple utilities that are useful when working with Cloud Object Storage.

- **get_all_buckets** returns all the buckets created in your COS instance.
- **get_objects_in_bucket** returns all the objects in a specific bucket in your COS instance.
- **create_unique_bucket** creates a new bucket in your COS instance.
- **upload_file_to_bucket** uploads file from the local notebook environment to a bucket in your COS instance.
- **download_file_from_bucket** downloads file from the bucket in your COS instance.
- **download_file_from_url** downloads file from a given url to the local notebook environment.
- **remove_files_from_dir** removes files from a local directory; mainly used to clean up files when no longer needed.

If the training data is provided via a URL, then you can use the download_file_from_url and upload_file_to_bucket to get the data to your COS bucket.

If the training data is provided via a COS bucket, then you can use the download_file_from_bucket and upload_file_to_bucket to get the data to your COS bucket. It may be better to just use the data in the COS bucket specified as opposed to copying to your own COS bucket.


In [None]:
# load some require python packages
import random
import string
import os
import urllib

# Return all buckets in your COS instance
def get_all_buckets(cos_client):
    response = cos_client.list_buckets()
    allbuckets = []
    for bucket in response['Buckets']:
        allbuckets.append(bucket['Name'])
    return allbuckets

# Return all the objects in a COS bucket
def get_objects_in_bucket(cos_client,bucket_name):
    return cos_client.list_objects(Bucket=bucket_name)

# Create a unique COS bucket
def create_unique_bucket(cos_client, bucket_prefix):
    # Create a random 10 digit string
    # this random string increases the likelihood of the bucket name to be unique
    lst = [random.choice(string.ascii_letters + string.digits) for n in range(10)]
    random_string = "".join(lst).lower()
    bucket = "%s-%s" % (bucket_prefix, random_string)
    
    #print("creating bucket: ", bucket)
    cos_client.create_bucket(Bucket=bucket)
    print("Bucket %s created" % bucket)

# Upload objects to COS bucket
def upload_file_to_bucket(cos_client,file,bucket):
    file_name = os.path.basename(file)
    print("Uploading %s to bucket: %s" % (file_name,bucket))
    cos_client.upload_file(file, bucket, file_name)

# Download objects from COS bucket
def download_file_from_bucket(cos_client, bucket, file_to_download, save_path, is_redownload=False):
    if not os.path.exists(save_path) or is_redownload:
        with open(save_path, 'wb') as file:
            print("Downloading %s" % file_to_download)  # "\r" allows us to overwrite the same line
            try:
                cos_client.download_fileobj(bucket, file_to_download, file)
            except:
                e = sys.exc_info()[0]
                print(e.__dict__)
                if e.response != None:
                    print("Detailed error: ", e.response)
                print('An error occured downloading %s from %s' % (file_to_download, bucket))
                os.remove(local_file)
            finally:
                file.close()

# Download objects from a URL 
def download_file_from_url(file_url,save_directory=None):
    # If save directory provided then don't delete local downloads
    working_directory = "temp_cos_files"
    if save_directory is not None:
        working_directory = save_directory
    os.makedirs(working_directory, exist_ok=True)

    file_name = os.path.basename(file_url)
    # Delete file if present as perhaps download failed and file corrupted
    file_path = os.path.join(working_directory, file_name)
    if os.path.exists(file_path):
        os.remove(file_path)

    file_path, _ = urllib.request.urlretrieve(file_url, file_path)
    stat_info = os.stat(file_path)
    print('Downloaded', file_path, stat_info.st_size, 'bytes.')
    
    
# Remove files from the specified directory in the local environment
def remove_files_from_dir(dir):
    for f in os.listdir(dir):
        file_path = os.path.join(dir, f)
        if os.path.exists(file_path):
            os.remove(file_path)


### COS Tests
In the following cells, we run some tests to make sure we can download and upload files to COS buckets as well as list contents of a bucket and create new buckets.

In [None]:
# List all buckets in your COS instance
buckets = get_buckets(cos_client)
print(buckets)

In [None]:
# Create a new bucket in your COS instance
bucketName = 'mnist-training-restuls'
create_unique_bucket(cos_client,bucketName)

In [None]:
# List all the objects in the specific bucket
bucketName = 'mnist-training-data-hxjh7iohms'
objects = get_objects_in_bucket(cos_client,bucketName)
contents = objects['Contents']
for c in contents:
    print('file: %s ' % c['Key'])
#print(objects)

In [None]:
# Download training data from the following URLs and upload to COS bucket
data_links = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
              'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
              'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
              'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz']

bucketName = 'mnist-training-data-hxjh7iohms'
working_dir = "mnist_files"
for file_url in data_links:
    file_name = os.path.basename(file_url)
    print("file url: %s " % file_url)
    print("file name: %s " % file_name)
    working_dir = "mnist_files"
    download_file_from_url(file_url,working_dir)
    file_path = os.path.join(working_dir, file_name)
    upload_file_to_bucket(cos_client,file_path,bucketName)
remove_files_from_dir(working_dir)    

In [None]:
# Print files in a local dir
files = os.listdir(working_dir)
print(files)