cluster_chef/TODO.textile at master · gwho/cluster_chef · GitHub

Top Priorities

Role discovery sucks right now — broham = weak, too much setup, ick
- Make broham a webservice in front of pluggable backend?
- Get people’s blessing that cluster_service_discovery is a good impl.
Documentation: moar plz.
Videocast: a visual walkthrough would help adoption
Bootstrap: bless a minimal bootstrap method
- Should we use packages or scripts?
ChefHadoopNfsMaster: smoother instantiation of a standalone (starter) cluster.
- chef server walkthrough could use work

Cloud / Local

EBS volume creation: create, attach, format, populate databag
- Make sure
Local cluster: separate out cloud code to work with local code
Rackspace & other clouds: Abstract out EC2-specific codes
JClouds
Poolparty
- The scripts are insufficently DRY — bring some of this into poolparty trunk?
- Complain if the validation.pem (etc) is missing
bugs:
- this thing with the [defunct] processes

Hadoop

Tuning for generic node — is EC2-specific tuning necessary?
Admin tasks
- Make namenode formatting safe
- recipe to do namenode metadata backup?
EBS attachment = still shaky
Secondary Namenode on non-master: Right now it’s really hard to do something like ‘make one slave also be the 2NN’. This should be easy
Patched version: this is still hard
If I set a hostname (‘gibbon.infochimps.com’) on a

Azkaban

runit script flapping

Zookeeper

Need scripts
- cluster discovery integration?

Cassandra

runit script flapping
Newer point version
cassandra shouldn’t start before the /etc/cassandra is present
Tuning for cluster size

HBase

Need scripts
- cluster discovery

Ponies

Monitoring, etc:
- Scribe
- Ganglia
Elephant Friends
- Hive
- Mahout
Dynamic DNS

Plays well with others

Whirr: integrate &or play nice with Whirr
Opscode cookbooks acceptance: Clean up and document, metadata.rb, etc so that we can put the cookbooks in for chef adoption.
Performance qualification: run some chimpmark tasks across clusters and report on completion times