Skip to content
Shawfeng Dong edited this page Nov 6, 2016 · 2 revisions

The Huawei Universal Distributed Storage (UDS) is a clone of Amazon S3 (Simple Storage Service)[1]. Amazon S3 is a web service that enables you to store data in the cloud. S3 manages data with an object storage architecture. S3 stores arbitrary objects (computer files) up to 5 TB in size, each accompanied by up to 2 KB of metadata. Objects are organized into buckets (each owned by an account on Huawei UDS), and identified within each bucket by a unique, user-assigned key.

Buckets and objects can be created, listed, and retrieved using a REST-style HTTP interface. Additionally, objects can be downloaded using the HTTP GET interface. Requests are authorized using an access control list associated with each bucket and object.

Bucket names and keys are chosen so that objects are addressable using either of the following HTTP URLs (note that our Huawei cloud storage's domain name is yun.ucsc.edu, which is different from Amazons S3's s3.amazonaws.com) :

Each account on Huawei UDS is assigned a pair of access keys: an access key ID (serving as the user ID) and a secret access key (serving as the password). If you have an account on Huawei UDS (to request one, please contact Shawfeng Dong), there is a file named UDS.txt in your home directory on Hyades, containing your access keys.

Table of Contents

S3 Basics

Amazon S3 is a simple key, value store designed to store as many objects as you want. Files uploaded to S3 are called objects; and objects are stored in one or more buckets.




Buckets

Every file you upload to S3 is stored in a container called bucket, which is sort of like a directory or folder of a conventional file system. There are some restrictions for buckets[2][3]:

  • Buckets names are globally unique amongst all users of S3;
  • Each user can create up to 100 buckets;
  • Buckets can not be nested into a deeper hierarchy;
  • The name of a bucket can only consist of basic alphanumeric characters plus dot (.) and dash (-).

Objects

Files stored in S3 are called objects and the names assigned to objects are called keys.[4]. You use the object key to retrieve the object. There are much less restrictions on object key names. These can be any UTF-8 strings of up to 1024 bytes long. Interestingly enough the object key name can contain forward slash character (/) thus my/funny/picture.jpg is a valid object key name. Note that the Amazon S3 data model is a flat structure. There is no hierarchy of subbuckets or subfolders. So my and funny are neither directories nor buckets. It is really a single object whose key is my/funny/picture.jpg. Nonetheless, it is a convenient way to mimic the directory hierarchy in a convention file system.

The full URI of such an image could be, for example: http://yun.ucsc.edu/my-bucket/my/funny/picture.jpg;

or equivalently: http://my-bucket.yun.ucsc.edu/my/funny/picture.jpg

Access Control List

The S3 Access Control Lists (ACLs) enable you to manage access to buckets and objects. Each bucket and object has an ACL attached to it as a subresource. When a request is received against a resource, Amazon S3 checks the corresponding ACL to verify the requester has the necessary access permissions[5].

The default ACL for buckets and objects is private, which grants the owner full control over the resource but give no access right to anyone else. The S3 APIs allows you to set the ACL when you create bucket or an object, or to modify the ACL of an existing bucket or an object; so others may be able to access your recourses. One useful case is to set the ACL of a resource to be public-read, which permits unsigned request to the resource. Anyone can use a generic web browser to access the resource with the public-read ACL, no bespoke S3 client required! This is a good way to share your data with the public.

S3 REST API

There are 2 Application Programming Interfaces (API) for Amazon S3: REST and SOAP APIs[6]. REST is the recommended API for Amazon S3 operations; while SOAP support over HTTP is deprecated[7].

The Huawei UDS support most of the Amazon S3 REST API; but doesn't appear to support the SOAP API.

There are many tools for uploading, retrieving and managing data in Amazon S3 and other cloud storages that use the S3 protocol, including Huawei UDS. We'll introduce a few S3 clients in sections below. Before we delve into those tools, it's perhaps instructional to show you how to upload a file to an existing bucket in Huawei UDS, using only a few lines of bash code[8][9][10], which can:

  1. help you understand how the S3 REST API works;
  2. be handy when you don't have access to other more fully fledged tools.
Here is the BASH script (putS3.sh):
#!/bin/bash

S3KEY="your access key ID"
S3SECRET="your secret access key"
 
function putS3
{
  file=$1
  bucket=$2
  date=$(date +"%a, %d %b %Y %T %Z")
  acl="x-amz-acl:public-read"
  content_type=${3-"application/octet-stream"}
  string="PUT\n\n$content_type\n$date\n$acl\n/$bucket/$file"
  signature=$(echo -en "${string}" | openssl sha1 -hmac "${S3SECRET}" -binary | base64)
  curl -X PUT -T "$file" \
    -H "Host: $bucket.yun.ucsc.edu" \
    -H "Date: $date" \
    -H "Content-Type: $content_type" \
    -H "$acl" \
    -H "Authorization: AWS ${S3KEY}:$signature" \
    "http://$bucket.yun.ucsc.edu/$file"
}

Annotations of putS3.sh:

  • To add an object to a bucket, we use the PUT Object operation of the REST API;
  • We use the -X PUT option to curl to specify PUT as the request method to use when communicating with the HTTP server (the default is GET);
  • We make the uploaded object readable to the public, using the Access Control List (ACL) Specific Request Header x-amz-acl:public-read (the default is private)[11];
  • We can optionally specify a Content-Type for the uploaded file (content_type=${3-"application/octet-stream"})[12][13];
  • The Date header is required when specifying the Authorization header[14];
  • Here we use the virtual hosted-styled request to Huawei UDS ($bucket.yun.ucsc.edu)[15];
  • The Amazon S3 REST API uses a custom HTTP scheme based on a keyed-HMAC (Hash Message Authentication Code) for authentication[16].
As an example, we'll show you how to use the above BASH script to upload a PNG file CENIC-HPR-Topology.png to the bucket network on Huawei UDS. Here we assume the bucket network already exists on UDS. If it doesn't, it is straightforward to use the PUT Bucket operation of the REST API to create a new bucket (or use one of the tools listed below.

First source the BASH script:

$ . putS3.sh

Then call the function putS3 to upload the file CENIC-HPR-Topology.png to the bucket network:

$ putS3 CENIC-HPR-Topology.png network "image/png"
Note since the file is a PNG image, we specify the Content-Type to be image/png (rather than the default application/octet-stream). Because we've set the ACL of the object to be public-read, anyone can access the file via the HTTP GET interface without authentication (e.g., using a web browser), using either of the following URLs:

S3 Clients

s3cmd

On Hyades, the recommended S3 client for accessing Huawei UDS is s3cmd. The author of s3cmd maintains excellent How-To guides, which you should read first. Here are some tailored example for Hyades.

NOTE on Hyades, if you have an account on Huawei UDS, s3cmd is already properly configured for you (see .s3cfg in your home directory). You DON'T need to run:

$ s3cmd --configure

s3cmd is written in Python. To use it on Hyades, you must first load the python module:

$ module load python

To learn the usage of s3cmd:

$ s3cmd -h

To create a bucket named cconroy:

$ s3cmd mb s3://cconroy

The ACL of the bucket is private by default. To change it to public-read (so that anyone on the internet can access it, without authentication), run:

$ s3cmd setacl --acl-public s3://cconroy

Alternatively, you can create a bucket with the public-read ACL in one step:

$ s3cmd mb --acl-public s3://cconroy

To upload a local file some.tar on Hyades to the bucket:

$ s3cmd put some.tar s3://cconroy/some.tar

The ACL of the object is private by default. To make the object publicly available:

$ s3cmd setacl --acl-public s3://cconroy/some.tar

Alternatively, you can upload a file and make it publicly available in one step (-P is equivalent to --acl-public):

$ s3cmd put -P some.tar s3://cconroy/some.tar

Now anyone can use a web browser to download the file, using either of the following HTTP URLs:

You can upload multiple files with a single s3cmd command. For example, to upload all your Fortran source code file in your current directory to the bucket cconroy:
$ s3cmd put *.f90 s3://cconroy/

One nifty feature of s3cmd is that it can do recursive upload, download or removal, with the --recursive (-r) option. For example, to upload a two-directory (mesa and FLASH) tree onto a virtual directory codes in bucket cconroy:

$ s3cmd put --recursive mesa FLASH s3://cconroy/codes/

List all your buckets:

$ s3cmd ls

List the contents of a bucket:

$ s3cmd ls s3://cconroy

Note Running s3cmd on your computer:

The Huawei UDS is a cloud storage. You can access it from any computer, as long as you know your access keys.

We, however, find that the latest version (1.5.2) of s3cmd does not work with Huawei UDS; but version 1.0.1 does work. You can install s3cmd from the source; or using pip:

$ sudo pip install s3cmd==1.0.1

You need a configuration file (.s3cfg in your home folder) for s3cmd to work. You can run s3cmd --configure to generate .s3cfg; but you'll have to manually edit the file to make it work with Huawei UDS. Or you can take the easier route, by simply copying the working configuration from Hyades to your own computer:

$ scp YourUsername@hyades.ucsc.edu:.s3cfg ~

Now you can use s3cmd to manage data in Huawei UDS, from your own computer!

Cyberduck

On your own computer, you might prefer to use a GUI tool to access the Huawei UDS. One good option is Cyberduck, which is an open source client for FTP and SFTP, WebDAV, OpenStack Swift, and Amazon S3, available for Mac OS X and Windows.

To configure Cyberduck for Huawei UDS, open the Open Connection dialog box by either:

  • Clicking the Open Connection icon on the toolbar;
  • Selecting File > Open Connection from the menu bar
From the dropdown box, select S3 (Amazon Simple Storage Service). For the Server: field, type yun.ucsc.edu; for Username:, type your access key ID; for Password:, type your secret access key.

You might have noticed that Cyberduck use https (port 443) by default to access S3 services. At this moment, the Huawei UDS uses a self-signed certificate. You can either import that certificate to your computer, or use http (port 80) to access Huawei UDS. I recommend the latter. To set it up, you have to manually edit the bookmark file:

Quit Cyberduck.

Use your favorite text editor, and change the following line in the bookmark file (a .duck file in the folder $HOME/Library/Application Support/Cyberduck/Bookmarks on Mac OS X):

        <string>443</string>
to
        <string>80</string>

To learn more on how to Cyberduck, please consult Cyberduck Help.

Other S3 Clients

s3curl
s3curl is written in Perl. It is a wrapper around curl that calculates the authentication parameters for S3 requests.
AWS Command Line Interface
The AWS Command Line Interface (CLI) is a unified tool (written in Python) to manage all your AWS services, including Amazon S3. It can be adapted to work with Huawei UDS as well.
S3 Browser
S3 Browser is freeware Windows client for Amazon S3 and Amazon CloudFront.

Performance Tips

The Huawei UDS is connected to the 10GE Dell 8132F Switch, which is in turn connected to UCSC's SciDMZ router. However, we find that the data transfer speed is way below the theoretical bandwidth of 10 Gbps. Here are a couple of performance tips:

  1. It is much faster to transfer big files than smaller files. So rather than uploading a lot of small files in a directory, it is advantageous to tar the directory and then upload the tar file;
  2. A single stream is still pretty slow. To reach the theoretical bandwidth, one can create a few buckets and transfer to those buckets simultaneously in multiple streams.

References

  1. ^ Amazon S3
  2. ^ s3cmd
  3. ^ Working with Amazon S3 Buckets
  4. ^ Working with Amazon S3 Objects
  5. ^ S3 Access Control List (ACL) Overview
  6. ^ Amazon Simple Storage Service API Reference
  7. ^ Amazon S3 SOAP API
  8. ^ s3-bash
  9. ^ Uploading to S3 in Bash
  10. ^ Uploading to S3 in 18 lines of Shell
  11. ^ S3 PUT Object
  12. ^ The Content-Type Header Field
  13. ^ Bash Parameter Substitution
  14. ^ S3 Common Request Headers
  15. ^ Virtual Hosting of Buckets
  16. ^ S3 - Signing and Authenticating REST Requests
Clone this wiki locally