compute system provisioning automation with CoreOS and Docker
Python Shell Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
cloudconfig-writer @ feda1b0

this is the btrfs branch with the following change/feature:

  • the registry just can't use the winnfs share (win problem?)! so now it doesn't write to the local files. However, as a work around, the registry uses a mounted filesystem created on your shared file folder. The downside is the you have to set some size by setting VARLIBDOCKER_GB (GB in integers) in config/project.env. it turns out that any program that needs advanced access to the filesystem, cannot use the nfs share. so i'm thinking of making this change as part of the master branch.

CoreOS-based personal compute cloud

formal writeup in appendix A of thesis

personal compute cloud using Ansible, CoreOS, Docker, Vagrant, Virtualbox, and weave.

git clone --recursive


because scientific computing (some explanation). Briefly, the goal is cater to a workflow that starts with local development, and seamlesslessly brings more compute power on demand.

What it Does

Two types of machines are started to support the scientific computing workflow (using Docker). There is a local virtualized controller machine (called init) prividing coordination and services; and compute machines that are more ephemeral. A local compute machine is brought up for 'development'. But when a remote compute machine is acquired, it would use the same (ansible) setup script. Therefore, the local compute machine is really a stand-in for a remote machine.


The controller and compute machines together provide:


  • global network addressing of docker containers across clouds (thanks to weave)
  • private docker registry accessible on all compute hosts (started on boot). The images in the registry persist over instantantiations of the machines as they are stored on the local file system.
  • automatic building of Dockerfiles and pushing them to the registry (on boot)
  • global NFS fileshare .. no messing with sending and receiving files (functioning but not properly but seems find for working with code)
  • automatic configuration of ssh access
  • CUDA installation (if machine has NVIDIA gpu)
  • Saving of compute machine state in EC2 or Vagrant for quick resumption of work.


  • Linux: duh. windows users can use (plain) cygwin. but i prefer babun.
  • Ansible: tested with 1.9. works on windows with cygwin with setup/cygwin/ But as of 8/'15, you'll have to get my version of Ansible even on Linux until this gets figured out.
  • python-vagrant
  • Vagrant: windows users should install vagrant-winnfsd (see setup/install-vagrant.bat). Kill the winnfs.exe process if you have nfs mounting issues



Project-level variables are located in .env files in the config/ folder. CoreOS-specific variables are in config/coreos. Ansible-specific variables are in their appropriate Ansible best practice location in ansible/. There is no immediate need for changing these variables as I tried to make everything as automatic and reasonable as possible.

Exceptions: You may want to remove the line control_path = /tmp in ansible/ansible.cfg as it is a cygwin hack. Also, NFS mount options can be overriden by specifiying NFS_OPTS in config/coreos/global.env if you are having trouble with NFS mounting (an attempt is made to automatically set them). On a related note, NFS_SERVER in config/coreos/init.env is hard-coded to correspond with VAGRANT_INT_IP in ansible/library/vagrant. Change as needed.


So all you have to do is add your Dockerfiles in the docker/ folder like docker/999-mybusybox. The build script will only build folders that start with an number followed by a hyphen, in order. Make use of this behavior to satisfy Docker image dependencies.


Run setup/ from within its directory.

Provider Inventory

Also, in ansible/inventory/ansible remove the Ansible dynamic inventory scripts for unused providers. But don't remove`.


cd ansible. Start the init machine: ansible-playbook init.yml. Now you can ssh init.

Compute Provisioning

Then aquire the machines with the provided ansible playooks with any of the following providers.


Start machine: ansible-playbook vagrant.yml.

EC2 (suggested method)

Setup your EC2 account. Add the following substituting your credientials to config/.private


Start machine: ansible-playbook ec2.yml. To get a GPU machine: ansible-playbook ec2.yml -e type=gpu.

Compute Machine Setup

After getting the machines, set them up: ansible-playbook setup.yml -e hosts=ansiblepattern. ansiblepattern is usually going to be the provider name. You can also use any of the groups defined in ansible/inventory/ansible/hosts.

After setup you can ssh ec2hostname or ssh vagrant because hosts are automatically added to ~/.ssh/config. Furthermore, hosts are aliased with a prefix made of a group name followed by a hypen. So, ssh cpu-vagrant or ssh ec2-someec2hostt will work since there are groups for providers (eg. vagrant or EC2) and compute type (cpu or gpu). EC2 machines have more groups than the ones defined in the hosts file such as instance type and instance id. (Depending on your shell, you might be able to just hit tab after partially issuing the ssh command to complete the command.)


  • Shortcut local machine setup: ansible/ Sets up init machine and a (local) vagrant compute machine.
  • ansible/ to decommision its hosts.
  • $REGISTRY_HOST is a variable on all machines to access the private docker registry like docker pull $REGISTRY_HOST/mybusybox. See note about setting up your dockerfiles in the Setup section.
  • Use the build script docker/ to iterate on your dockerfiles.
  • Make use of weave commands.
  • Make use of the file share on /project.
  • cd into ansible/.vagrant to issue vagrant commands on the local machine.
  • Use cuda docker image to build your CUDA application.
  • Clean out your old hosts by removing entries in the directory ~/.ssh/config.d/ and the ~/.ssh/config file (just delete them if you're feeling brave. todo: automate this)
  • Use ansible/ to save the state of its machines. Resume by running the corresponding provisioning and setup programs.


  • No claims are made as to the security (or lack there of) of this setup. Convenience (in the form of simplicity and automation) takes priority over security measures.
  • fleet and etcd, part of CoreOS, have been disabled. I don't see a use for them for the intended workflow.
  • Given harware-assisted virtualization (enabled in virtualbox), perfomance should be close to bare-metal performance. Unfortunately, GPU passthrough (for the local compute machine) is not a simple matter (help!).