Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

pulled in checklist and TODO from version_3

  • Loading branch information...
commit 2242fb11ad4aaf2ec7b0e0a0a47ed0bce148d57d 1 parent c727edf
Philip (flip) Kromer authored
Showing with 158 additions and 52 deletions.
  1. +36 −0 README-checklist.md
  2. +122 −52 TODO.md
View
36 README-checklist.md
@@ -0,0 +1,36 @@
+# Checklist for cookbooks, clusters and roles
+
+## Cookbooks
+
+* Dependencies are in metadata.rb, and include_recipe in the `default` recipe
+ - especially: `runit`, `java`, `cluster_service_discrovery`, `thrift`, `apt`
+ - **include_recipe** is only used if putting it in the role would be utterly un-interesting. You *want* the run to break unless it's explicitly included the role.
+ - *yes*: `java`, `ruby`, `cluster_service_discovery`, etc.
+ - *no*: `zookeeper:client`, `nfs:server`, or anything that will start a daemon
+
+* (*see TODO*) Does `cluster_service_discovery` uniformly handle referring to a foreign cluster for the service?
+
+#### Recipes
+
+* Naming:
+ - foo/default -- information shared by anyone using foo, including support packages, directories
+ - foo/client -- configure me as a foo client
+ - foo/server -- configure me as a foo server
+ - foo/aws_config -- cloud-specific settings
+
+* Recipes shouldn't repeat their service name: `hbase:master` and not `hbase:hbase_master`; `zookeeper:server` not `zookeeper:zookeeper_server`.
+
+#### Attributes
+
+* Attribute file named ???? (which is the prefered name?)
+
+
+## Cluster
+
+* roles and recipes
+ - remove `cluster_role` and `facet_role` if empty
+ - are not in `run_list`, but populated by the `role` and `recipe` directives
+*
+
+
+
View
174 TODO.md
@@ -1,52 +1,122 @@
-First, about the stale recipe: the initial minimal DNA for a node is *not* in the machine's user-data, but in `/etc/chef/client-config.json` file
-
-On startup:
-
-* knife reads the cluster definition
-* it puts a minimal amount of DNA into the node's user-data hash, passed along via the EC2 command.
-* when the node starts up, it does the following:
- - makes an empy config hash.
- - If the node's `user-data` is valid JSON, merge it into the config hash.
- - if there is a file called chef-config.json, merge its contents into the config hash
- - if there is a file called client-config.json, merge it into the `:attributes ` sub-hash of the config.
-* The outside part of the config (not the attributes field) has the info required for the node to discover its purpose and connect to the chef server (server url, validation_key, etc)
-* If the `client-config.json` file is missing, it is created using the attributes subhash. Basically, this means that you can use that file to override anything set in the (immutable) user-data. Not the best thing, but it does work.
-* The elements in the attributes field are passed to `json_attribs` to become chef node attributes, and **win out over anything in the git repo**. This is bad.
-
-## Proposal
-
-Using a hadoop cluster called 'gibbon' (with namenode, jobtracker and workers) as an example:
-
-### in `clusters/gibbon.rb`:
-
-* physical configuration:
- - machine size, number of instances per facet, etc
- - external assets (elastic IP, ebs volumes)
-* high-level assembly of roles and systems:
- - roles hadoop_namenode, nfs_client, flume_client, etc.
-* important modifications:
- - cluster_chef::system_internals, mounts ebs volumes, etc
-* override attributes:
- - heap size, rvm versions, etc.
-
-### implement a a `knife cluster sync` command
-
- - pushes override attributes to the chef server
- - pushes the runlist to the chef server
-* knife cluster bootstrap and knife cluster launch both invoke knife cluster sync.
-
-### in `roles/`
-
-* High-level roles that assemble recipes.
-* Cluster and facet roles (`roles/gibbon_cluster.rb`, `roles/gibbon_namenode.rb`, etc) go away; override attributes go into the cluster.
- - currently, those files are typically empty and are badly cluttering the roles/ directory.
- - the cluster and facet override attributes should be together, not scattered in different files.
-* roles shouldn't assemble systems. The contents of the `infochimps_chef/roles/plato_truth.rb` file belong in a *facet*.
-
-### in the machine's user-data:
-
-* the user-data should only have the minimal information required to join the chef server.
- - remove the run_list and aws fields
- - keep the cluster/facet/index identification
- - unfortunately, also have to keep the validation info
-* Passing information in through the user-data is essential to be able to launch 30+ node clusters reliably. External surgery is problematic.
+### seamless stop-start support
+
+* chef-client on bootup
+ - when you stop/start machines their IP address changes, so must reconverge
+
+* create chef node for me
+
+* chef needs to converge twice on hadoop master
+
+* dirs are fucked up under natty beause paths are /dev/xvdi not /dev/sdi
+
+### cluster_service_discovery
+
+* should let me concisely refer to another cluster for a service (or use the current server)
+* Wait for service to announce
+
+### cluster_chef DSL
+
+* `role` and `recipe`
+ - inject into the run_list directly
+ - `cluster_role_implication`
+ - clean up `first_boot.json`
+
+
+### Minor quibbles
+
+* NFS server boostrapping
+ - need to upgrade kernel, restart
+
+* A 'safety catch' -- see https://github.com/infochimps/cluster_chef/issues/18#issuecomment-1194916
+
+* `use defaults`
+* `ephemeral drives` cleanup
+
+* Fog routines should use the cluster's region always -- https://github.com/infochimps/cluster_chef/issues/54
+
+* ebs volumes shouldn't complain if data_bag missing
+
+### Concern Separation
+
+Cluster chef currently consists of the following separate(able) concerns:
+
+* **cluster_chef tools**
+ - the DSL that lets you define clusters
+ - the knife commands which use that DSL
+ - optional bootstrap scripts for a machine image that can then launch bootstrap-less
+* **cluster-oriented cookbooks**
+ - `cluster_service_discovery` (recipes to let clusters discover services using chef-server's search)
+ - ?others?
+* **cloud utility cookbooks**
+ - motd, system_internals (swappiness, ulimit, etc)
+* **big data cookbooks** (hadoop, cassandra, redis, etc):
+ - cookbooks
+ - roles
+ - clusters
+
+I think it's time to separate those into at least two repos.
+
+REQUEST FOR COMMENTS:
+
+#### Division of concerns
+
+It's clear that the cluster_chef tools and the big data cookbooks should be divorced.
+
+Proposed:
+
+* `cluster_chef` holds only the DSL, knife commands, and bootstrap scripts -- basically the stuff in `lib/`, along with the gemspec etc.
+* `cluster_chef-systems` -- holds cookbooks, roles and example clusters that use them.
+ - Utility cookbooks (`cluster_service_discovery`, motd, etc) and system cookbooks(hadoop, cassandra, etc) are housed in two separate folders.
+ - The standard layout would just include the cookbooks, but a cluster-oriented approach demands that the roles travel along too
+* (possibly) `cluster_chef-chef-repo` (??better name, anyone??) -- a fork of https://github.com/opscode/chef-repo that integrates the above
+
+#### Handling of cookbooks that originate from opscode-cookbooks
+
+Right now we *copy* standard cookbooks from opscode's repo into the `cookbooks` directory. This lets us version them separately, but means we have to track them, and could cause conflicts with the majority of people who will be pulling from opscode-cookbooks already.
+
+1. omit entirely, but list as dependencies (my vote)
+2. `git subtree` pull them into the cluster_chef-cookbooks repo
+3. copy them in as we've been doing
+4. `git submodule` opscode-cookbooks and symlink
+
+#### Organization of cookbooks repo
+
+Opscode recommends a [standard layout for your chef repo](https://github.com/opscode/chef-repo). We should make the new arrangement work seamlessly within that structure.
+
+The new layout should
+* easy to integrate if you have your own existing chef-repo
+* straightforward for a new chef user to adopt
+* either mirror or be what we actually use
+
+Revised proposal:
+
+```
+ clusters/
+ { actual clusters }
+ roles/
+ { roles }
+ { symlinks to things in vendor/cluster_chef/roles }
+
+ site-cookbooks/
+ { directories holding internal cookbooks }
+
+ cookbooks/
+ { symlinks into vendor/opscode-cookbooks }
+ { symlinks into vendor/cluster_chef-systems/site-cookbooks }
+
+ vendor/
+ opscode/cookbooks/ # git submodule of https://github.com/opscode/cookbooks
+ cluster_chef-systems/ # git submodule of https://github.com/infochimps/cluster_chef-systems
+ site-cookbooks/ # hadoop, cassandra, cluster_service_discovery, etc.
+ roles/
+ examples/
+ clusters/ # example clusters
+ roles/ # roles (if any) needed for just the example clusters
+
+ .chef/ # knife config, keypairs, etc
+ certificates/
+ config/
+ data_bags/
+ environments/
+```
+
Please sign in to comment.
Something went wrong with that request. Please try again.