Scalable Walrus Tech Preview in Eucalyptus 3.4

Neil Soman edited this page Apr 17, 2014 · 33 revisions
Clone this wiki locally

Eucalyptus 3.4.0 Tech Preview includes an implementation of an object storage system that is designed to scale out horizontally, with multiple active front end nodes implementing the S3 API.

In 3.4.0 Tech Preview mode, Eucalyptus uses RiakCS as the distributed object storage backend for Walrus.

The legacy, single node Walrus implementation is superceded by multiple active Object Storage Gateways (OSGs), which act as pass through proxies for user requests. The cloud administrator can register multiple, active OSGs, which are designed to act as redundant active/active nodes. The client may direct requests to any active OSG. OSGs handle authentication and enforce IAM (Identity and Access Management) settings.

On the backend, each OSG connects to a distributed object store (currently RiakCS). A typical RiakCS installation consists of a number of storage nodes (at least 5 for production use) that run Riak/RiakCS components. We expect that Riak/RiakCS is installed and functioning before attempting OSG configuration.

We do NOT recommend accessing the RiakCS installation directly. Doing so might lead to an inconsistent state and you may not be able to access your data through Eucalyptus.

As this is a tech-preview, there are known issues. For information on those see: [Scalable Walrus Tech-Preview Known Issues](RiakCS Tech Preview Known Issues)

Installing Object Storage Gateways (OSGs)

Please install the Eucalyptus 3.4 nightly release package for the tech preview,

yum install

Next, please install the "eucalyptus-osg" package on each host designated as the OSG.

yum install eucalyptus-osg

In this setup, you will only have to install the "eucalyptus-cloud" package on the CLC and the "eucalyptus-osg" package on each OSG host.

Registering Object Storage Gateways (OSGs)

The cloud administrator can register multiple Object Storage Gateway (OSG) components with the Eucalyptus Cloud Controller (CLC).

To do so, you may use the euca_conf utility or the euca-regiser-object-storage-gateway command. For instance,

euca_conf --register-osg <OSG IP>

You should do this for every OSG you wish to register (substituting the IP with the correct one, of course).

OSGs may be deregistered similarly, but remember to use the OSG's component name and NOT its IP address.

euca_conf --deregister-osg <OSG component name>

You can list the component name for your OSG by running euca-describe-services. For example,

SERVICE	objectstorage  	objectstorage  	osg-	NOTREADY  	23	arn:euca:bootstrap:objectstorage:objectstorage:osg-

To deregister this OSG, you would run the following command,

euca_conf --deregister-osg osg-

Configuring the Storage Provider

After you register an OSG, you cannot use it until it is correctly configured. After registration, OSG state will be initially listed as BROKEN. For example,

SERVICE	objectstorage  	objectstorage  	osg-	BROKEN    	23	arn:euca:bootstrap:objectstorage:objectstorage:osg-

To configure the OSG, please specify a storage provider using the euca-modify-property command. To use RiakCS as the backend, you will need to pick "s3" as the storage provider,

euca-modify-property -p objectstorage.providerclient=s3

You will now have to specify the RiakCS/S3 endpoint that you wish to use with Eucalyptus (you may configure a round robin DNS and specify a host name instead of an individual node IP). For example,

euca-modify-property -p

In addition, you will have to provide Eucalyptus with credentials to access your RiakCS installation.

euca-modify-property -p objectstorage.s3provider.s3accesskey=<access key>

euca-modify-property -p objectstorage.s3provider.s3secretkey=<secret key>

Please note that these are access and secret keys for the RiakCS cluster and NOT Eucalyptus. You may use the RiakCS front end web interface to create users.

Make sure that the user with these credentials has administrative access to RiakCS.

Additional Configuration (optional)

The following properties are for tuning the behavior of the Object Storage service and Gateways, the defaults are reasonable and changing is not necessary, but they are available for unexpected situations.

  • objectstorage.cleanuptaskintervalseconds : The interval, in seconds, at which background cleanup tasks are run. Default is 60 seconds. The background cleanup tasks purge the backend of overwritten objects and clean up object history.

  • objectstorage.failedputtimeouthours : The time, in hours, after which to consider an un-committed object upload to be failed. The default is 24 hours. This allows cleansing of metadata for objects that were pending upload when an OSG fails or is stopped in the middle of a user operation. This should be kept at least as long as the longest reasonable time to upload a single large object, in order to prevent unintentional cleanup of actually progressing uploads. The S3 maximum single upload size is 5GB.

  • objectstorage.queue_size : The size in, chunks, of the internal buffers that queue data for transfer to the backend on a per-request basis. A larger value will allow more buffering in the OSG when the client is uploading quickly but the backend bandwidth is lower and cannot consume data fast enough. Too large a value may result in OOM if the JVM does not have sufficient heap space to handle the concurrent requests * queue_size. The default is 100.

  • objectstorage.s3provider.s3usehttps : Whether or not to use https for the connections to the backend provider. If you configure this, be sure you can use the backend properly with HTTPS (certs, etc.) or the OSG will fail to connect. For RiakCS, you must configure certificates and identities to support HTTPS, it is not enabled in a default RiakCS installation. Default value is false.

Checking Service State

You may use euca-describe-services to check service status. After successful configuration, the state of the OSG will be reported as ENABLED.

SERVICE	objectstorage  	objectstorage  	osg-	ENABLED    	23	arn:euca:bootstrap:objectstorage:objectstorage:osg-

If the state appears as DISABLED or BROKEN, please check cloud-*.log files in /var/log/eucalyptus. "DISABLED" generally indicates that there is a problem with your network or credentials.

Accessing Object Storage

You can now use your favorite S3 client (e.g. s3curl) to interact with Eucalyptus. Simply replace your S3_URL with the address of the OSG you wish to interact with and the service path with "/services/objectstorage" instead of "/services/Walrus". For example,

S3_URL = http://<OSG IP>:8773/services/objectstorage

Or you may set your s3 endpoint manually.

If you have DNS enabled, you may use the "objectstorage" prefix to access object storage. Eucalyptus will return a list of IPs that correspond to ENABLED OSGs.

NOTE: A current known issue is that the objectstorage URL is not included in the eucarc downloaded with euca_conf --get-credentials. Simply construct it as above and you may place it in the eucarc if you wish. This will be resolved in the official release.

Configuring Load Balancers

We recommend that you use a load balancer to balance traffic across all RiakCS nodes. Below is an example of how to use Nginx to get you started. You may use HAProxy if you wish. If you use Nginx, please install the latest (1.4.6+) as some older versions (such as the one included in CentOS 6.x) have bugs in POST request handling as well as don't allow passing HTTP 1.1 to the backend, which is required for RiakCS.

You will have to install Nginx on one of your servers and tell direct HTTP traffic to your RiakCS nodes. By default, RiakCS listens to web traffic on port 8080. In this example,, and are three RiakCS nodes that you have previously configured.

On many Linux installations, Nginx uses /etc/nginx/conf.d for server configuration. You can either edit the default configuration or create a new config file. Here is a sample configuration,

upstream riak_cs_host {

server {
  listen   80;
  server_name  _;
  access_log  /var/log/nginx/riak_cs.access.log;
  client_max_body_size 5G; #5GB is max S3 single upload size, so use that value, or 0 to disable checks.

location / {
  proxy_set_header Host $http_host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_redirect off;
  proxy_http_version 1.1;
  proxy_connect_timeout      90;
  proxy_send_timeout         90;
  proxy_read_timeout         90;
  proxy_buffer_size    128k;
  proxy_buffers     4 256k;
  proxy_busy_buffers_size 256k;
  proxy_temp_file_write_size 256k;

  proxy_pass http://riak_cs_host;

You can then restart nginx or merely reload rules (/etc/init.d/nginx reload). You can then access port 80 on your Nginx host, which will forward requests to your RiakCS cluster.

Configuring OSG to use the Walrus backend

euca_conf --register-walrus <Walrus host or IP>

euca-modify-property -p objectstorage.providerclient=walrus