# Buckets creation, and notifications configuration

In [11]:
import os

# Since every user gets their own namespace, we'll be using the same base name
# for all the buckets. If you're using shared infrastructure, pick a unique
# value for this.
bucket_base_name = 'images'

# Our access key and id were entered as environment variables earlier in the
# lab.
aws_access_key_id = os.getenv('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.getenv('AWS_SECRET_ACCESS_KEY')

print("Access Key Id: %s"% (aws_access_key_id,))
print("Access Key: %s"% (aws_secret_access_key,))

# This value was defined when we created our Data Hub instance. Environment
# variables can be predefined as a part of the KFDef.
endpoint_url = os.getenv('S3_ENDPOINT_URL')

print("S3 Endpoint: %s"% (endpoint_url,))

Access Key Id: W92D9P7YA2ZGDW41M0D5
Access Key: 1P8fhzRq6WKkiPnC3hbPEv0lsPNICJTIsWv1VDjX
S3 Endpoint: http://ceph-nano-0


## Imports
Of course we'll need some libraries, so import them by running the cell.

In [12]:
import boto3
import json
import botocore
import argparse

## S3 connections
The boto3 is the standard library from AWS to interact with all their services. As Ceph is compatible with S3, we can directly use this library to interact with the storage. So first, let's create the clients (you can see we are using some parameters we defined earlier).

In [13]:
s3 = boto3.client('s3', '',
                endpoint_url = os.getenv('S3_ENDPOINT_URL'),
                aws_access_key_id = aws_access_key_id,
                aws_secret_access_key = aws_secret_access_key,
                config=botocore.client.Config(signature_version = 's3'))


## Create buckets
Now that we can connect to the storage, we can create our buckets. Run the first cell, which will define a "creation function" (an S3 API call using the client we created). Then the second cell that will create the 3 buckets we will need.

In [14]:
def create_bucket(bucket_name):
    result = s3.create_bucket(Bucket=bucket_name)
    return result

In [15]:
create_bucket(bucket_base_name)
create_bucket(bucket_base_name+'-processed')
create_bucket(bucket_base_name+'-anonymized')

{'ResponseMetadata': {'RequestId': 'tx00000000000000000000b-006090222a-1010-default',
  'HostId': '',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-request-id': 'tx00000000000000000000b-006090222a-1010-default',
   'content-length': '0',
   'date': 'Mon, 03 May 2021 16:17:46 GMT',
   'connection': 'Keep-Alive'},
  'RetryAttempts': 0}}

### Verification
As the previous output may have been cryptic (and anyway it's always good to check), let's list all the buckets and verify the indeed have been created.

In [16]:
for bucket in s3.list_buckets()['Buckets']:
    print(bucket['Name'])

images
images-anonymized
images-processed


## Make buckets public read
Our Grafana dashboard will display the last image from each bucket. Instead of setting up a dedicated web server, we can directly query our object stores to retrieve the images. For this to work we have to make our bucket "public-readable". This is done by applying to each this bucket policy.

In [17]:
for bucket in s3.list_buckets()['Buckets']:
    bucket_policy = {
                      "Version":"2012-10-17",
                      "Statement":[
                        {
                          "Sid":"AddPerm",
                          "Effect":"Allow",
                          "Principal": "*",
                          "Action":["s3:GetObject"],
                          "Resource":["arn:aws:s3:::{0}/*".format(bucket['Name'])]
                        }
                      ]
                    }
    bucket_policy = json.dumps(bucket_policy)
    s3.put_bucket_policy(Bucket=bucket['Name'], Policy=bucket_policy)