Big Data Compute Cluster using Chef, Hadoop and Cassandra in the Amazon AWS/EC2 cloud
Big Data Cluster using Chef, Hadoop and Cassandra


ClusterChef will help you create a scalable, efficient compute cluster in the cloud. It has recipes for Hadoop, Cassandra, NFS and more — use as many or as few as you like. For example, you can create and:

  • A small 1-5 node cluster for development or just to play around with Hadoop or Cassandra
  • A spot-priced, ebs-backed cluster for unattended computing at rock-bottom prices
  • A large 30+ machine cluster with multiple EBS volumes per node running Hadoop and Cassandra, with optional NFS for


  • With Chef, you declare a final state for each node, not a procedure to follow. Adminstration is more efficient, robust and maintainable.
  • You get a nice central dashboard to manage clients
  • You can easily roll out configuration changes across all your machines
  • Chef is actively developed and has well-written recipes for webservers, databases, development tools, and a ton of different software packages.
  • Poolparty makes creating amazon cloud machines concise and easy: you can specify spot instances, ebs-backed volumes, disable-api-termination, and more.



Chef Concepts

In chef,

  • A Recipe gives concrete steps that make a node achieve its desired final configuration. For example, the hadoop_cluster cookbook has a recipe to install the hadoop packages, and another to configure and run the namenode. If the cookbook isn’t installed,
  • A Cookbook holds a collections of related recipes and attributes, and the templates, libraries &c that support them.
  • A Role is a collection of related recipes and default attributes that work together. For example, there is a ‘hadoop_worker’

