# Utilization of object storage in softlayer

**Kenneth Chen**  

Using object storage API in Jupyter Notebook (Python 2.7.5)  
https://github.com/softlayer/softlayer-object-storage-python

In [1]:
import object_storage
import os
import time

In [2]:
# Connect to storage account
sl_storage = object_storage.get_client('SLOS1729689-2:USER_ID', 
                                       'API_KEY', 
                                       datacenter='dal05')

In [5]:
# create a container 'yelp'
sl_storage['yelp'].create()

Container(yelp)

In [3]:
# Listing all the containers
sl_storage.containers()

[Container(mybucket), Container(mybucket2), Container(week7)]

In [4]:
# Checking the object storage properties
sl_storage.properties

{'container_count': 3,
 'meta': {'cdn-id': '99655', 'nas-id': '53336359'},
 'object_count': 4,
 'path': '',
 'size': 5228838023,
 'url': u'https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815'}

In [13]:
# Uploading a yelp data (>3GB) to 'yelp' container

yelp_data = os.path.expanduser("/root/yelp_dataset.tar")
yelp_data_size = 3130         # data size in 3130 MB

upload_start = time.time()
fileUpload = sl_storage['week7']['yelp'].load_from_filename(yelp_data)

upload_end = time.time()

uploadtime = upload_end - upload_start
print("Total upload time is {:.4f} seconds".format(uploadtime))
print("Upload speed is {:.4f} MB/s".format(yelp_data_size/uploadtime))

Total upload time is 266.2504 seconds
Upload speed is 11.7559 MB/s


In [13]:
# Uploading bigrams data (>1.6GB) 

bigram_data = os.path.expanduser("/root/googlebooks-eng-all-2gram-20090715-0.csv")
bigram_data_size = 1675         # data size in 1675 MB

upload_start = time.time()
fileUpload = sl_storage['week7']['bigrams'].load_from_filename(bigram_data)

upload_end = time.time()

uploadtime = upload_end - upload_start
print("Total upload time is {:.4f} seconds".format(uploadtime))
print("Upload speed is {:.4f} MB/s".format(bigram_data_size/uploadtime))

Total upload time is 143.9054 seconds
Upload speed is 11.6396 MB/s


In [16]:
crime_data = os.path.expanduser("/root/crimes2010.csv")
crime_data_size = 426         # data size in 426 MB

upload_start = time.time()
fileUpload = sl_storage['week7']['crimes'].load_from_filename(crime_data)

upload_end = time.time()

uploadtime = upload_end - upload_start
print("Total upload time is {:.4f} seconds".format(uploadtime))
print("Upload speed is {:.4f} MB/s".format(crime_data_size/uploadtime))

Total upload time is 40.4759 seconds
Upload speed is 10.5248 MB/s


In [1]:
# Downloading the yelp data 

"""Since my yelp data was larger than 3GB, the MemoryError pops up. """ 

# yelp_data_size = 3130         # data size in 3130 MB
# download_start = time.time()

# uploadstart = time.time()
# sl_storage['week7']['yelp'].read()

# download_end = time.time()

# downloadtime = download_end - download_start
# print("Total download time is {:.4f} seconds".format(downloadtime))
# print("Download speed is {:.4f} MB/s".format(yelp_data_size/downloadtime))

'Since my yelp data was larger than 3GB, the MemoryError pops up. '

In [14]:
# Downloading the bigrams data 

bigram_data_size = 1675       # data size in 1675 MB
download_start = time.time()

uploadstart = time.time()
sl_storage['week7']['bigrams'].read()

download_end = time.time()

downloadtime = download_end - download_start
print("Total download time is {:.4f} seconds".format(downloadtime))
print("Download speed is {:.4f} MB/s".format(bigram_data_size/downloadtime))

Total download time is 61.6082 seconds
Download speed is 27.1879 MB/s


In [17]:
# Downloading the crime data

crime_data_size = 426       # data size in 426 MB
download_start = time.time()

uploadstart = time.time()
sl_storage['week7']['crimes'].read()

download_end = time.time()

downloadtime = download_end - download_start
print("Total download time is {:.4f} seconds".format(downloadtime))
print("Download speed is {:.4f} MB/s".format(crime_data_size/downloadtime))

Total download time is 19.5927 seconds
Download speed is 21.7428 MB/s


In [5]:
# list the data in the yelp container
sl_storage['week7'].objects()

[StorageObject(week7, bigrams, 1675035751B),
 StorageObject(week7, crimes, 426352513B),
 StorageObject(week7, yelp, 3127449759B)]

In [6]:
# delete the file object in yelp container
sl_storage['yelp'].delete()

True

## Analysis  on upload and download speed using Python  

### Upload result
```
| Dataset           | Total Upload Time | Upload Speed |  
|-------------------|-------------------|--------------|  
|yelp (3.13GB)      | 266.2504s         | 11.7559 MB/s |  
|bigrams (1.6 GB)   | 143.9054s         | 11.6396 MB/s |
|crimes2010 (0.5 GB)| 40.4759s          | 10.5248 MB/s |
|-------------------|-------------------|--------------|
|Average            | NA                | 11.3068 MB/s |
```

### Download result
```
| Dataset           | Total Download Time | Download Speed |  
|-------------------|---------------------|----------------|  
|yelp (3.13GB)      | MemoryError         | MemoryError    |  
|bigrams (1.6 GB)   | 61.6082s            | 27.1879 MB/s   |
|crimes2010 (0.5GB) | 19.5927s            | 21.7428 MB/s   |
|-------------------|---------------------|----------------|
|Average            | NA                  | 24.4654 MB/s   |
```

Overall upload speed and download speed averages at `11.3068 MB/s` and `24.4654 MB/s` respectively. Download speed is `2.3` times faster than the upload speed on average. 

# Managing Object Storage by REST API

In [16]:
# To get the X-Auth-Token and X-Storage-Url

! curl -i -H "X-Auth-User: SLOS1729689-2:SL1729689 " \
-H "X-Auth-Key: f81bdfabd3456bb71d87433cf36bf6826b238cd8c3213049cc5d90689bfde55d" \
https://dal05.objectstorage.softlayer.net/auth/v1.0

HTTP/1.1 200 OK
Content-Length: 1536
X-Auth-Token-Expires: 46895
X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7
X-Storage-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7
X-Storage-Url: https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815
Content-Type: application/json
X-Trans-Id: txb38019d5e2b149bca31e6-005bc73950
Date: Wed, 17 Oct 2018 13:29:52 GMT

{"clusters": {"lon02": "https://lon02.objectstorage.softlayer.net/auth/v1.0", "syd01": "https://syd01.objectstorage.softlayer.net/auth/v1.0", "mon01": "https://mon01.objectstorage.softlayer.net/auth/v1.0", "dal05": "https://dal05.objectstorage.softlayer.net/auth/v1.0", "ams01": "https://ams01.objectstorage.softlayer.net/auth/v1.0", "osl01": "https://osl01.objectstorage.softlayer.net/auth/v1.0", "tor01": "https://tor01.objectstorage.softlayer.net/auth/v1.0", "hkg02": "https://hkg02.objectstorage.softlayer.net/auth/v1.0", "mex01": "https://mex01.objectstorage.softlayer.net/auth/v1.0

In [21]:
# To get the list of the container 

! curl -i -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815

HTTP/1.1 200 OK
Content-Length: 30
X-Account-Meta-Nas-Id: 53336359
X-Account-Object-Count: 4
X-Account-Storage-Policy-Standard-Container-Count: 4
X-Timestamp: 1539653747.99715
X-Account-Meta-Cdn-Id: 99655
X-Account-Storage-Policy-Standard-Object-Count: 4
X-Account-Bytes-Used: 5228838023
X-Account-Container-Count: 4
Content-Type: text/plain; charset=utf-8
Accept-Ranges: bytes
X-Account-Storage-Policy-Standard-Bytes-Used: 5228838023
X-Trans-Id: tx54cb7e68f12f47fdbb691-005bc71f81
Date: Wed, 17 Oct 2018 11:39:45 GMT

mybucket
mybucket2
week7
yelp


In [12]:
# To check the content of the container `week7` by appending the container name at the end of the X-Storage-Url path

! curl -i -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7

HTTP/1.1 200 OK
Content-Length: 20
X-Container-Object-Count: 3
Accept-Ranges: bytes
X-Storage-Policy: standard
X-Container-Bytes-Used: 5228838023
X-Timestamp: 1539691271.46881
Content-Type: text/plain; charset=utf-8
X-Trans-Id: tx83f1b73b808a490593740-005bc736e9
Date: Wed, 17 Oct 2018 13:19:37 GMT

bigrams
crimes
yelp


In [7]:
# upload the yelp data by REST API

upload_start = time.time()
yelp_data_size = 3130
    
! curl -i -XPUT -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
-T yelp_dataset.tar https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/yelp

upload_end = time.time()

uploadtime = upload_end - upload_start
print("Total upload time by REST API for the yelp data is {:.4f} seconds".format(uploadtime))
print("Upload speed by REST API for the yelp data is {:.4f} MB/s".format(yelp_data_size/uploadtime))

HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Last-Modified: Wed, 17 Oct 2018 12:23:31 GMT
Content-Length: 0
Etag: 47dc2b68fca1fe30360e512bdaed2a1d
Content-Type: text/html; charset=UTF-8
X-Trans-Id: tx655f52cc8c804b1eb5b40-005bc729c2
Date: Wed, 17 Oct 2018 12:27:44 GMT

Total upload time by REST API for the yelp data is 254.6764 seconds
Upload speed by REST API for the yelp data is 12.2901 MB/s


In [19]:
# Checking the content of the container 

! curl -i -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/

HTTP/1.1 200 OK
Content-Length: 20
X-Container-Object-Count: 3
Accept-Ranges: bytes
X-Storage-Policy: standard
X-Container-Bytes-Used: 5228838023
X-Timestamp: 1539691271.46881
Content-Type: text/plain; charset=utf-8
X-Trans-Id: tx05f938f2e8ef484e8ab00-005bc739bb
Date: Wed, 17 Oct 2018 13:31:39 GMT

bigrams
crimes
yelp


In [7]:
# upload the bigram data by REST API

upload_start = time.time()
bigram_data_size = 1675
    
! curl -i -XPUT -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
-T googlebooks-eng-all-2gram-20090715-0.csv https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/bigrams

upload_end = time.time()

uploadtime = upload_end - upload_start
print("Total upload time by REST API for the yelp data is {:.4f} seconds".format(uploadtime))
print("Upload speed by REST API for the yelp data is {:.4f} MB/s".format(bigram_data_size/uploadtime))

HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Last-Modified: Wed, 17 Oct 2018 12:46:42 GMT
Content-Length: 0
Etag: a3666a7a31f518347d3ada250536b8c6
Content-Type: text/html; charset=UTF-8
X-Trans-Id: tx6d893706918748e6b927c-005bc72f31
Date: Wed, 17 Oct 2018 12:49:02 GMT

Total upload time by REST API for the yelp data is 141.3334 seconds
Upload speed by REST API for the yelp data is 11.8514 MB/s


In [8]:
# upload the crime data by REST API

upload_start = time.time()
crime_data_size = 426
    
! curl -i -XPUT -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
-T crimes2010.csv https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/crimes

upload_end = time.time()

uploadtime = upload_end - upload_start
print("Total upload time by REST API for the crime data is {:.4f} seconds".format(uploadtime))
print("Upload speed by REST API for the crime data is {:.4f} MB/s".format(crime_data_size/uploadtime))

HTTP/1.1 100 Continue

HTTP/1.1 201 Created
Last-Modified: Wed, 17 Oct 2018 13:02:35 GMT
Content-Length: 0
Etag: efe79d991d0921afec5cc52a5a840f47
Content-Type: text/html; charset=UTF-8
X-Trans-Id: txb9786de2d7e34b87aec6e-005bc732ea
Date: Wed, 17 Oct 2018 13:03:09 GMT

Total upload time by REST API for the crime data is 41.2955 seconds
Upload speed by REST API for the crime data is 10.3159 MB/s


In [5]:
# Download the bigrams data by REST API

"""I've commented out the script because it was loading forever to show the content of the file,
which is impossible to show here"""

# bigram_data_size = 1675
# download_start = time.time()

# ! curl -i -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
# https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/bigrams
    
# download_end = time.time()
# downloadtime = download_end - download_start 
# print("Total download time is {:.4f} seconds".format(downloadtime))
# print("Download speed is {:.4f} MB/s".format(bigram_data_size/downloadtime))

"I've commented out the script because it was loading forever to show the content of the file,\nwhich is impossible to show here"

In [7]:
# Download the crime data by REST API

"""I've commented out the script because it was loading forever to show the content of the file,
which is impossible to show here"""

# crime_data_size = 426
# download_start = time.time()

# ! curl -i -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
# https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/crimes
    
# download_end = time.time()
# downloadtime = download_end - download_start 
# print("Total download time is {:.4f} seconds".format(downloadtime))
# print("Download speed is {:.4f} MB/s".format(crime_data_size/downloadtime))

"I've commented out the script because it was loading forever to show the content of the file,\nwhich is impossible to show here"

In [None]:
# delete a storage object

! curl -X DELETE -H "X-Auth-Token: AUTH_tk5d970f9d098a4df6aeafab2fffdc8cd7" \
https://dal05.objectstorage.softlayer.net/v1/AUTH_b7619532-8c35-4938-bf47-773773206815/week7/

## Analysis  on upload and download speed using REST API

### Upload result
```
| Dataset           | Total Upload Time | Upload Speed |  
|-------------------|-------------------|--------------|  
|yelp (3.13GB)      | 254.6764s         | 12.2901 MB/s |  
|bigrams (1.6 GB)   | 141.3334s         | 11.8514 MB/s |
|crimes2010 (0.5 GB)| 41.2955s          | 10.3159 MB/s |
|-------------------|-------------------|--------------|
|Average            | NA                | 11.4858 MB/s |
```

### Download result
```
| Dataset           | Total Download Time | Download Speed |  
|-------------------|---------------------|----------------|  
|yelp (3.13GB)      | NA                  | NA             |  
|bigrams (1.6 GB)   | NA                  | NA             |
|crimes2010 (0.5GB) | NA                  | NA             |
|-------------------|---------------------|----------------|
|Average            | NA                  | NA             |
```

Overall upload speed by REST API was `11.4858 MB/s` on average. It is comparable to the upload speed by Python API as well. 

## Questions  

### 1. What is the average READ speed in Mb/sec?  

The average write speed was 24.4 MB/s. I tested across 3 different file size (3.13GB, 1.6 GB, and 0.5GB). The speed is almost similar regardless of the file size. 

### 2. What is the average WRITE speed in Mb/sec?

The average read speed was 11.4 MB/s regardless of the file size or the method (Python or REST API). 

### 3. Can you account for the discrepancies? Consider all of the possible reasons and explain.

The discrepancy between read and write speed is how the data are accessed during the process. During the read speed, the available data is easily retrieved from the disk which makes the speed much faster than the write speed. In WRITE, the data has to be checked against the block size of the disk as the data are being written. 

### 4. What happens to these speeds if you run two threads in parallel?

If we try to run these processes in two threads in parallel, the speed will reduce to half because they need to cater to the bandwidth for each process in parallel. I don't know in details but I suspect they must have developed some algorithms to better serve two processes in parallel, similar to max-min fairness in TCP: TCP Reno, TCP Cubic. 