wucluster – stateful control of cloud clusters

Wucluster lets you spin up a cloud cluster of nodes and volumes, do your thing, and then put it away safely — unmount and detach volumes, generate snapshots, and finally delete nodes and volumes, with interlocks to ensure your data persists.

The distinguishing feature of wucluster is that you don’t specify a process (“detach volumes”), you demand a final state (“become separated”). This makes it robust against operations failing or different parts of the cluster being in different states; and it allows you to change the cluster’s structure while live.

Cluster definition and commands

A cluster definition exists independently of any part of it running or not, so we refer to nodes (processor instances if running, AMIs if not) and mounts (EBS volumes if running, snapshots if not). Each mount belongs to a specific node and understands its mount point.

In practice, you’ll usually just ask your cluster to do two things: become launched (Cluster#launch!) and become terminated (Cluster.terminate!)

Cluster States

Behing the scenes, wucluster will walk the nodes and mounts through the correct progression of states. The defined states of a cluster are, from birth to death:

instantiated: all nodes are running and all mounts exist.
attached: instantiated, and all mounts are attached to their nodes.
mounted: attached, and all mounts are mounted within their node.
launched: cluster is a fully armed and operational battlestation. Right now, same as mounted — but it’s possible other assertions could be added.
unmounted: no mount is mounted. Note that this doesn’t mean anything is running or exists or whatever; just that if a mount is attached that mount is not mounted.
separated: unmounted, and no mount is attached.
recently snapshotted: separated; and every mount that exists has a snapshot that is “recent” (less than a certain amount old). We’ll probably use local information to make this condition stronger in a later version.
haltable: separated; and all nodes can be shut down. Right now, this is a no-op, but you can hook in here.
terminated: no nodes are running and no mounts exist.

A cluster doesn’t have to be in one of those states: an operation might have failed, leaving some nodes attached and others detached. Or you could add three more nodes and mounts to a running cluster, in which case most of the nodes+mounts are launched while three are still terminated.

Cluster Definition

    # Amazon AWS credentials
    :aws_access_key_id:     20CHARACTERALPHANUMS
    :aws_secret_access_key: strigof/gob+bledygookWithUPPERlowerand12
    :aws_account_id:        123456789012
    :aws_availability_zone: us-east-1d
    # will look in private_key_dir/cluster_name.pem for ssh key
    :private_key_dir:       %(home)s/.hadoop-ec2
    :ssh_options:           -i %(private_key)s -o StrictHostKeyChecking=no

    # parameters for remote connection attempts
    :max_tries:             15
    :sleep_time:            1.0

    # define the actual clusters
    :clusters:
      :spiders:
        :availability_zone: us-east-1d
        :image_id:          ami-0b02e162
        :instance_type:     m1.small
        # runs the queues, etc so it's a medium instance
        :main:
          :image_id:        ami-0b02e162
          :instance_type:   m1.medium
          :mounts:
            - { :device: "/dev/sdd", :mount_point: '/data',   :size: 200,  :volume_id: "vol-c8675309" }
            - { :device: "/dev/sdf", :mount_point: '/data2',  :size: 200,  :volume_id: "vol-aaa24601" }
            - { :device: "/dev/sdg", :mount_point: '/data3',  :size: 1000, :volume_id: "vol-deadbeef" }
        # remaining scraper nodes are all default, have no mounts
        :s1:
        :s2:
        :s3:

      :bonobo:
        :availability_zone: us-east-1d
        :image_id:          ami-0b02e162
        :instance_type:     m1.small
        :nodes:
          :master:
            - :mounts:
                - { :device: "/dev/sdh", :mount_point: "/mnt/home", :size: 10, :volume_id: "vol-d5d826bc" }
                - { :device: "/dev/sdj", :mount_point: "/ebs1",     :size: 1 }
                - { :device: "/dev/sdk", :mount_point: "/ebs2",     :size: 1 }
          :slave:
            - :image_id:                  ami-0b02e162
              :instance_type:             m1.small
              :mounts:
                - { :device: "/dev/sdj", :mount_point: "/ebs1",     :size: 1 }
                - { :device: "/dev/sdk", :mount_point: "/ebs2",     :size: 1 }

Design

Abstract state machine

The distributed nature of cloud control operations demands a slightly different treatment of the state machine design pattern. At low level we can’t actually issue an imperative command (“instantiate this AMI, dammit”). We can only give the API a shove in the right direction (“try to run this AMI please”) and then query whether the operation has completed (“is this instance running yet?”). In state machine terms, we can only issue events and only read states. (The existing ruby state machine libraries typically respond to events and treat states as a matter of internal state.)

event
precondition
state

AWS Facade

Wucluster tries to be efficient in the general case with its API requests. It caches results where appropriate; and in general tries to use single requests returning info on all volumes/instances/snapshot rather than individual requests per volume/instance/snapshot.

A volume’s low-level state:

existence: (absent), creating, available, in-use, deleting, deleted, error (via AWS api)
attachment: attaching, attached, detaching, detached, error (via AWS api)
filesystem: mounted, unmounted, error (via SSH’ing to node)

An instance can be:

pending, running, shutting-down, terminated, stopping, stopped (via AWS api)

A snapshot can be:
*

Workflow

instantiate:

instantiate mount:
- return if volume exists and is instantiated or is creating.
- if volume has error, raise.
- if volume is deleting or deleted, decouple from it (no volume). Then,
- if mount has no volume, instantiate from snapshot (mount’s if any, otherwise the cluster’s generic snapshot)
instantiate node:
- return if instance exists and is instantiated
- if instance has error, raise
- if instance is shutting-down, terminated, stopping, or stopped, decouple from it (no instance). Then,
- if no node, instantiate from its image (node’s if any, otherwise the cluster’s generic image)

attach:

attach mount:
— preconditions: node is instantiated and mount is instantiated
- if not instantiated, call instantiate! and return
- return if attaching or attached correctly
- raise if not correctly attached
- available,

mount:

attach! and return if not attached
raise if mounted wrong
return if mounted
issue command to mount,

unmount:

return if not attached
issue unmount command

separate:

return unless instance is running and volume is in-use
*

snapshot:

terminate:

return unless
separaate

detach

principles

Wucluster operations are

idempotent — doing something, then doing it again has no additional effect, useful in the face of uncertain latency.
operations can start from any state and proceed in parallel

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
bin		bin
lib		lib
notes		notes
old		old
spec		spec
.document		.document
.gitignore		.gitignore
LICENSE.textile		LICENSE.textile
README.textile		README.textile
Rakefile		Rakefile
VERSION		VERSION
wucluster.gemspec		wucluster.gemspec

License

mrflip/wucluster

Folders and files

Latest commit

History

Repository files navigation

wucluster – stateful control of cloud clusters

Cluster definition and commands

Cluster States

Cluster Definition

Design

Abstract state machine

AWS Facade

Workflow

principles

About

Resources

License

Stars

Watchers

Forks

Languages