### Overview

This demo walks through connecting to Ceph from an EPIC cluster using python (not Spark).

### Setup

- On your client machine, run the script `./scripts/end_user_scripts/ceph/1_demo_server_setup.sh` to setup a ceph nano server on the RDP Server host
- Add the EPIC Spark 2.4 image
- Configure EPIC with Active Directory [see README](https://github.com/bluedata-community/bluedata-demo-env-aws-terraform/blob/master/docs/README-AD.md)
- Setup Demo Tenant with Active Directory [see README](https://github.com/bluedata-community/bluedata-demo-env-aws-terraform/blob/master/docs/README-AD.md)
- Provision a Spark cluster in the Demo Tenant with:
  - 1 x Spark Controller (small)
  - 1 x Jupyter Hub (small)

### Connect

- Verify that we are able to get a response from the ceph instance. We should see something like:

```
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>
```

In [1]:
! curl 10.1.0.216:8080 # Change to the private IP of RDP server

- Install the boto library

In [2]:
! pip install --user boto # now restart kernel to pick up the boto library  



- Set up the connection

In [3]:
import boto
import boto.s3.connection
access_key = 'sandboxAccessKey'
secret_key = 'sandboxSecretKey'
host       = '10.1.0.216' # Change to the private IP of RDP server 

conn = boto.connect_s3(
        aws_access_key_id = access_key,
        aws_secret_access_key = secret_key,
        host = host,  
        port = 8080,
        is_secure=False,
        calling_format = boto.s3.connection.OrdinaryCallingFormat(),
        )

- List the current buckets

In [4]:
for bucket in conn.get_all_buckets():
        print("{name}\t{created}".format(
                name = bucket.name,
                created = bucket.creation_date,
        ))

sandboxbucket	2020-04-23T21:38:01.127Z


- Create a new bucket

In [5]:
bucket = conn.create_bucket('my-new-bucket')

- List the buckets

In [6]:
for bucket in conn.get_all_buckets():
        print("{name}\t{created}".format(
                name = bucket.name,
                created = bucket.creation_date,
        ))

my-new-bucket	2020-04-24T00:38:31.464Z
sandboxbucket	2020-04-23T21:38:01.127Z


In [7]:
bucket = conn.delete_bucket('my-new-bucket')

### Lab Exercise

- Upload a dataset to Ceph on the RDP Server using `s3cmd` installed on the RDP Server
- Retrieve the data set with boto
- Read the data set into Pandas