Burndown list for cluster_chef 3 release #102

mrflip opened this Issue Feb 8, 2012 · 2 comments


None yet

3 participants

mrflip commented Feb 8, 2012

Hooray! there is a burndown list
Yikes! It's long. Luckily we are badasses. Comments and assitance welcome.

Must Do

  • sync infochimps-labs/opscode-cookbooks with opscode/cookbooks (Flip)
  • merge volumes into silverware. merge ebs_volumes into ec2 cookbook (Flip)
  • Basic CI testing of cookbooks (Flip)
  • RSpecs for silverback (lib and knife tools) (Flip)
  • RSpecs for silverware are mostly in place -- ensure they are. (Flip)
  • Raid is having problems (race conditions in converge/configure with mounting)- (Flip)
  • version bump all cookbooks (Nathan)
  • push cookbooks to community.opscode.com (Flip)
  • refactor homebase setup into easier-to-maintain structure (Nathan) still need to do some work to refine and explain updated git workflow, but core structure is complete
  • triage remaining "Piddly Shit" into easy/important quadrants, handle in order of importance & easiness (Nathan)



Use the opscode EC2 fast start as a guide -- our getting started should start at the same place, and cover the same detail as the EC2 bootstrap guide.

  • Clean up README file in homebase, silverware and cluster_chef
  • Clear description of metadiscovery
  • organize homebase/notes (and decide where it goes). There’s some cruft in there that should leave
  • make sure README files in cookbooks aren’t wildly inaccurate
  • Carry out setup directions, ensure they work:
    • cluster_chef if you’re using our homebase
    • cluster_chef if you’re using opscode’s homebase
  • local vagrant environment
  • hadoop cluster bootstrapping

Piddly Shit

  • standardize the zabbix cookbook (no more /opt, etc -- more in the TODO)
  • volumes don't deep merge -- eg you have to mount_ephemerals in the facet if you modify htem
  • kill_old_service should disable services (may be leaving /etc/rc.d cruft).
  • kill old service doesn't go the first time. why?
  • in something somewhere: “WARN: Missing gem ‘right_aws’, ‘fog’, ‘rvm’” couldn't reproduce
  • chef client/server cookbook: set chef user UID / GID; client can set log directory
  • apt has a dashboard at http://{hostname}:3142/report
  • ironfan#98 flume master should not announce as zookeeper
  • ironfan#93 not quite so opinionated about keypairs
  • ironfan#91 probably fixed -- regions
  • ironfan#86 patch available -- security groups with mixed case
  • ironfan#81 probably fixed -- cassandra cookbook
  • ironfan#76 knife cluster kick should work even if service not running
  • ironfan#67 probably fixed
  • ironfan#10 cookbook checklist
  • can use knife ssh as me@ or as ubuntu@
  • knife command to set/remove permanent on a node + disableApiTermination on box. knife cluster kill refuses to delete nodes with permanent set. knife cluster sync sets permanent on if permanent(true), removes if permanent(false), ignores if permanent nil or unset.
  • rip out cookbook_munger no instances found live, several README.md contain reference
  • rip out repo_man no instances found in live code
  • check pull requests and execute if valid
  • style-guide alignment (prefix_root becomes prefix)


  • ironfan#101 fog can fail to create tags fixed in fog trunk, not cut into gem yet.
  • ironfan#87 probably fixed -- can create ebs volumes at launch with no snapshot not sure if fixed, but can be moved to Pony if not
  • ironfan#48 bootstrap with elastic_ip - probably a pony

Really Want

  • unify the hashlike underpinning to be same across silverware & cluster_chef. Make sure we love (or accept) all the differences between it and Gorrillib’s, and between it and Chef’s.
  • Route 53
  • Keys are transmitted in databags, using a helper, and not in node attributes
  • easy to create a dummy node (load balancer, external resource, etc)
  • components can have arbitrary attributes (kinda. they take an :info param, behavior which may change later)
  • All cookbooks have nice detailed announcements
  • full roll out of log_integration, monitoring
  • Rakefile becomes skinnier
  • Git deploy abstraction similar to install_from (Flip)
  • knife cluster [cmd] --cloud=vagrant (it's a spike, but it's working)

Cookbook checklist:


  • Validate all the cookbooks against checklist -- see notes/README-checklist.md

                      | flip fixed | temujin9 checked |
    cassandra         |            |                  |
    ec2               |            |                  |
    elasticsearch     |            |                  |
    firewall          |            |                  |
    flume             |            |                  |
    ganglia           |            |                  |
    graphite          |            |                  |
    hadoop_cluster    |            |                  |
    hbase             |            |                  |
    hive              |            |                  |
    jenkins           |            |                  |
    jruby             |            |                  |
    nfs               |            |                  |
    nodejs            |            |                  |
    papertrail        |            |                  |
    pig               |            |                  |
    redis             |            |                  |
    resque            |            |                  |
    Rstats            |            |                  |
    statsd            |            |                  |
    zookeeper         |            |                  |
    # meta:
    install_from      |            |                  |
    motd              |            |                  |
    mountable_volumes |            |                  |
    provides_service  |            |                  |
    # Need thinkin':
    big_package       |            |                  |
    cluster_chef      |            |                  |

Things that are probably straightforward to fix as soon as we know how

  • announcements should probably be published very early, but they need to know lots about the machine YUK
  • split between clusters / roles / integration cookbooks
  • inheritance of clusters

Things We Hate But Might Have to Continue Hating

  • Cluster refactor -- clusters / stacks / components, not clusters / roles / cookbooks
  • move cluster discovery to cloud class.
  • Server#normalize! doesn’t imprint object (ie. server attributes poke through to the facet & cluster, rather than being set on the object)
  • The fact you can only see one cluster at a time is stupid.
  • security group pairing is sucky.
  • ubuntu home drive bullshit
  • Finer-grained security group control (eg nfs server only opens a couple ports, not all)
  • nfs recipe uses discovery right (thus allowing more than one NFS share to exist in the universe)


  • sync cookbooks up/down to infochimps-cookbooks/ 
      - note: infochimps-cookbooks the org will be dereferenced in favor of ironfan-lib the single repo; it's unclear which pull requesters will prefer. We will do at least one push so that names and URLs are current, and we're not removing anything, but infochimps-cookbooks has an unclear future.
  • foodcritic compatibility
  • build out cookbook munger, make it less spike-y
  • spot pricing
  • rackspace compatibility
  • cookbook munger reads comments in attributes file to populate metadata.rb
  • gem install ironfan; ironfan install checks everything out
@aseever aseever was assigned Feb 8, 2012

Tweaked a couple of things, marked an item completed, and added tags for likely breakdown of tasks.

@mrflip : what do you mean by "push cookbooks to community server"? (I meant 'community.opscode.com' -- flip)


Version 3 has launched: closing this issue. Please raise any remaining sub-tasks as separate issues (not omnibus tracking issues), so they can be addressed more easily.

@temujin9 temujin9 closed this Apr 12, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment