Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: 878f5293e0
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 342 lines (235 sloc) 16.157 kb

Version 3 alpha now available!

See the version_3 branch for a major upgrade.

If you are a new or prospective adopter of cluster_chef we strongly recommend you begin with the version_3 branch. If you are an existing user, you should hold back until the official release. Our firm target release date is Dec 14.

Version 3 Improvements:

knife cluster toolset

  • from @howech, cluster files directly manipulate their cluster/facet role.
  • cluster files are now dramatically cleaner and yet more powerful.
  • cluster_chef is now a gem (thanks @arp!) — no more clunky symlinking to get started
  • node discovery and synchronization of cloud, cluster and server are both rock-solid
  • ec2 cloud now supports EBS volume creation, attachment and tagging; elastic IP association; keypair creation

New Superpowers

  • Rundeck integration — temujin9 can lift small cities with his mind now. Ridiculously awesome.
  • (initial) support for Vagrant/Virtualbox — test all your clusters on your laptop, deploy them to the cloud.

Cookbook cleanup

  • all cookbooks have uniform attribute names, directory structure, and shared conventions — log_dir is always spelled log_dir, is always referred to via node attribute (and never sometimes-hardcoded-sometimes-configurable), and always handled uniformly.
  • cookbooks are assertive and (adaptably) opinionated: a conf_dir is by default be in /etc/{service_name}, with permissions root:root 755, and all config files live there.
  • cookbooks are entirely free of references
  • all cookbooks bear complete and accurate metadata.rb and README.md files

Aspects / Discovery and Integration Cookbooks

The most important changes are the development of aspects and integration cookbooks. Cookbooks repeatably express these and other aspects:

  • “I launch these daemons: …”
  • “I haz a bukkit, itz naem ‘/var/log/lol’”
  • “I have a dashboard at ‘http://….:…’”
  • … and much more.

Wouldn’t it be nice if announcing a log directory caused…
– my log rotation system to start rotating my logs?
– a ‘disk free space’ gauge to be added to the monitoring dashboard for that service?
– flume (or whatever) began picking up my logs and archiving them to a predictable location?
– in the case of standard apache logs, a listener to start counting the rate of requests, 200s, 404s and so forth?
Similarly, announcing ports should mean
– the firewall and security groups configure themselves correspondingly
– the monitor system starts regularly pinging the port for uptime and latency
– and pings the interfaces that it should not appear on to ensure the firewall is in place?

The version_3 cookbooks make those aspects standardized and predictable, and provides integration and discovery hooks. The key is to make integration inevitable: No more forgetting to rotate or monitor a service, or having a config change over here screw up a dependent system over there.

Joe Bob says check it out.


cluster_chef

Chef is a powerful tool for maintaining and describing the software and configurations that let a machine provide its services.

cluster_chef is

  • a clean, expressive way to describe how machines and roles are assembled into a working cluster.
  • Our collection of Industrial-strength, cloud-ready recipes for Hadoop, Cassandra, HBase, Elasticsearch and more.
  • a set of conventions and helpers that make provisioning cloud machines easier.

Walkthrough

Here’s a basic 3-node hadoop cluster:

    ClusterChef.cluster 'demohadoop' do
      merge!('defaults')
      
      facet 'master' do
        instances           1
        role                "nfs_server"
        role                "hadoop_master"
        role                "hadoop_worker"
        role                "hadoop_initial_bootstrap"
      end

      facet 'worker' do
        instances           2
        role                "nfs_client"
        role                "hadoop_worker"
        server 0 do
	  chef_node_name 'demohadoop_worker_zero'
        end 
      end

      cloud :ec2 do
        image_name          "lucid"
        flavor              "c1.medium"
        availability_zones  ['us-east-1d']
        security_group :logmuncher do
          authorize_group "webnode"
        end
      end
      
    end

This defines a cluster (group of machines that serve some common purpose) with two facets, or unique configurations of machines within the cluster. (For another example, a webserver farm might have a loadbalancer facet, a database facet, and a webnode facet).

In the example above, the master serves out a home directory over NFS, and runs the processes that distribute jobs to hadoop workers. In this small cluster, the master also has workers itself, and a utility role that helps initialize it out of the box.

There are 2 workers; they use the home directory served out by the master, and run the hadoop worker processes.

Lastly, we define what machines to use for this cluster. Instead of having to look up and type in an image ID, we just say we want the Ubuntu ‘Lucid’ distribution on a c1.medium machine. Cluster_chef understands that this means we need the 32-bit image in the us-east-1 region, and makes the cloud instance request accordingly. It also creates a ‘logmunchers’ security group, opening it so all the ‘webnode’ machines can push their server logs onto the HDFS.

The following commands launch each machine, and once ready, ssh in to install chef and converge all its software.

knife cluster launch demohadoop master --bootstrap
knife cluster launch demohadoop worker --bootstrap

You can also now launch the entire cluster at once with the following

knife cluster launch demohadoop --bootstrap

The cluster launch operation is idempotent: nodes that are running won’t be started!

Philosophy

Some general principles of how we use chef.

  • Chef server is never the repository of truth — it only mirrors the truth.
    – a file is tangible and immediate to access
  • Specifically, we want truth to live in the git repo, and be enforced by the chef server. There is no truth but git, and chef is its messenger.
    – this means that everything is versioned, documented and exchangeable.
  • Systems, services and significant modifications cluster should be obvious from the `clusters` file. I don’t want to have to bounce around nine different files to find out which thing installed a redis:server.
    – basically, the existence of anything that opens a port should be obvious when I look at the cluster file.
  • Roles define systems, clusters assemble systems into a machine.
    – For example, a resque worker queue has a redis, a webserver and some config files — your cluster should invoke a whatever_queue role, and the whatever_queue role should include recipes for the component services.
    – the existence of anything that opens a port or runs as a service should be obvious when I look at the roles file.
  • include_recipe considered harmful Do NOT use include_recipe for anything that a) provides a service, b) launches a daemon or c) is interesting in any way. (so: include_recipe java yes; include_recipe iptables no.) You should note the dependency in the metadata.rb. This seems weird, but the breaking behavior is purposeful: it makes you explicitly state all dependencies.
  • It’s nice when machines are in full control of their destiny.
    – initial setup (elastic IP, attaching a drive) is often best enforced externally
    – but machines should be ablt independently assert things like load balancer registration that that might change at any point in the lifetime.
  • It’s even nicer, though, to have full idempotency from the command line: I can at any time push truth from the git repo to the chef server and know that it will take hold.

Getting Started

This assumes you have installed chef, have a working chef server, and have an AWS account. If you can run knife and use the web browser to see your EC2 console, you can start here. If not, see the instructions below.

Setup

Install these gems (you may have several installed already):

sudo gem install chef fog highline configliere net-ssh-multi formatador terminal-table gorillib extlib

(and maybe right_aws and broham?)

Add cluster_chef to your cookbooks repo

This assumes the same setup as described in the knife quickstart walkthrough, but to keep things convenient we’ll use these shortcuts to refer

CHEF_REPO_DIR=$HOME/PATH/TO/chef-repo      # has cookbooks/ and site-cookbooks/
DOT_CHEF_DIR=$HOME/PATH/TO/chef-repo/.chef # has your knife.rb, USERNAME.pem etc

I recommend adding cluster chef as a git submodule; you can instead clone it elsewhere.

cd $CHEF_REPO_DIR
git submodule add git@github.com:infochimps/cluster_chef.git cluster_chef

Knife setup

In your $DOT_CHEF_DIR/knife.rb, modify the cookbook path (to include cluster_chef/cookbooks and cluster_chef/site-cookbooks) and to add settings for cluster_chef_path, cluster_path and keypair_path. Here’s mine:

current_dir = File.dirname(__FILE__)
organization  = 'CHEF_ORGANIZATION'
username      = 'CHEF_USERNAME'

# The full path to your cluster_chef installation
cluster_chef_path File.expand_path("#{current_dir}/../cluster_chef")
# The list of paths holding clusters
cluster_path      [ File.expand_path("#{current_dir}/../clusters") ]
# The directory holding your cloud keypairs
keypair_path      File.expand_path(current_dir)

log_level                :info
log_location             STDOUT
node_name                username
client_key               "#{keypair_path}/#{username}.pem"
validation_client_name   "#{organization}-validator"
validation_key           "#{keypair_path}/#{organization}-validator.pem"
chef_server_url          "https://api.opscode.com/organizations/#{organization}"
cache_type               'BasicFile'
cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )

# The first things have lowest priority (so, site-cookbooks gets to win)
cookbook_path            [
  "#{cluster_chef_path}/cookbooks",
  "#{current_dir}/../cookbooks",
  "#{current_dir}/../site-cookbooks",
]

# If you primarily use AWS cloud services:
knife[:ssh_address_attribute] = 'cloud.public_hostname'
knife[:ssh_user] = 'ubuntu'

# Configure bootstrapping
knife[:bootstrap_runs_chef_client] = true
bootstrap_chef_version   "~> 0.10.0"

# AWS access credentials
knife[:aws_access_key_id]      = "XXXXXXXXXXX"
knife[:aws_secret_access_key]  = "XXXXXXXXXXXXX"

Push to chef server

To send all the cookbooks and role to the chef server, visit your cluster_chef directory and run:

cd $CHEF_REPO_DIR
mkdir -p $CHEF_REPO_DIR/site-cookbooks
knife cookbook upload --all
for foo in roles/*.rb ; do knife role from file $foo & sleep 1 ; done

You should see all the cookbooks defined in cluster_chef/cookbooks (ant, apt, …) listed among those it uploads.

Cluster chef’s Knife plugins

mkdir -p $DOT_CHEF_DIR/plugins
ln -s $CHEF_REPO_DIR/cluster_chef/lib/cluster_chef/knife           $DOT_CHEF_DIR/plugins/
ln -s $CHEF_REPO_DIR/cluster_chef/lib/cluster_chef/knife/bootstrap $DOT_CHEF_DIR/

If you type knife cluster launch you should see it found the new scripts:

    ** CLUSTER COMMANDS **
    knife cluster launch CLUSTER_NAME [FACET_NAME] (options)
    knife cluster show CLUSTER_NAME [FACET_NAME] (options)

Your first cluster

Let’s create a cluster called ‘demosimple’. It’s, well, a simple demo cluster.

Create a simple demo cluster

Create a directory for your clusters; copy the demosimple cluster and its associated roles from cluster_chef:

mkdir -p $CHEF_REPO_DIR/clusters
cp cluster_chef/clusters/{defaults,demosimple}.rb ./clusters/
cp cluster_chef/roles/{big_package,nfs_*,ssh,base_role,chef_client}.rb  ./roles/
for foo in roles/*.rb ; do knife role from file $foo ; done

Symlink in the cookbooks you’ll need, and upload them to your chef server:

cd $CHEF_REPO_DIR/cookbooks
ln -s ../cluster_chef/site-cookbooks/{nfs,big_package,cluster_chef,cluster_service_discovery,firewall,motd} .
knife cookbook upload nfs big_package cluster_chef cluster_service_discovery firewall motd

AWS credentials

Make a cloud keypair, a secure key for communication with Amazon AWS cloud. cluster_chef expects a keypair named after its cluster — this is a best practice that helps keep your environments distinct.

  1. Log in to the AWS console and create a new keypair named demosimple. Your browser will download the private key file.
  2. Move the private key file you just downloaded to your .chef dir, and make it private key unsnoopable, or ssh will complain:
mv ~/downloads/demosimple.pem $DOT_CHEF_DIR/demosimple.pem
chmod 600 $DOT_CHEF_DIR/*.pem

Cluster chef knife commands

knife cluster launch

Hooray! You’re ready to launch a cluster:

    knife cluster launch demosimple homebase --bootstrap

It will kick off a node and then bootstrap it. You’ll see it install a whole bunch of things. Yay.

Extended Installation Notes

Set up Knife on your local machine, and a Chef Server in the cloud

If you already have a working chef installation you can skip this section.

To get started with knife and chef, follow the Chef Quickstart, We use the hosted chef service and are very happy, but there are instructions on the wiki to set up a chef server too. Stop when you get to “Bootstrap the Ubuntu system” — cluster chef is going to make that much easier.

Cloud setup

If you can use the normal knife bootstrap commands to launch a machine, you can skip this step.

Steps:

Right now cluster chef works well with AWS. If you’re interested in modifying it to work with other cloud providers, see here or get in touch.

Tips and Troubleshooting

Help a git deploy recipe has gone limp

Suppose you are using the git resource to deploy a recipe (george for sake of example). If /var/chef/cache/revision_deploys/var/www/george exists then nothing will get deployed, even if /var/www/george/{release_sha} is empty or screwy. If git deploy is acting up in any way, nuke that cache from orbit — it’s the only way to be sure.

$ sudo rm -rf /var/www/george/{release_sha} /var/chef/cache/revision_deploys/var/www/george
Hadoop namenode bootstrap

Once the master runs to completion with all daemons started, remove the hadoop_initial_bootstrap recipe from its run_list. (Note that you may have to edit the runlist on the machine itself depending on how you bootstrapped the node).

Problems starting NFS server on ubuntu maverick

For problems starting NFS server on ubuntu maverick systems, read, understand and then run /tmp/fix_nfs_on_maverick_amis.sh — See this thread for more

Setup for Knife versions before 0.10

On older versions of chef that don’t have a plugin mechanism for new commands, we have to do surgery on the knife itself… we’ll just symlink the new commands into chef’s lib/chef/knife directory, and symlink the bootstrap templates into chef’s lib/chef/knife/bootstrap directory. Set the path to your cluster_chef directory and run the following:

    sudo ln -s $CLUSTER_CHEF_DIR/lib/cluster_chef/knife/*.rb            $(dirname `gem which chef`)/chef/knife/
    sudo ln -s $CLUSTER_CHEF_DIR/lib/cluster_chef/knife/bootstrap/*     $(dirname `gem which chef`)/chef/knife/bootstrap/
Something went wrong with that request. Please try again.