-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken zookeeper build on AWS multinode #24
Comments
Firstly, thank you for your recent contributions!
This looks something went wrong. AMIs I published yesterday should have zookeeper service and 'zookeeper' user for sure. Did you use |
I was using AMI ami-1a389d72 without any changes (from the http://cloud-images.ubuntu.com/locator/ec2/ site referenced in the old aws_region_ami.yaml) since i was using t2.medium instances and needed hvm support in us-east-1 / VPC. i was manually changing the aws_region_ami.yaml which lead me to the PR. My setup will still work if I keep my commits and prior, if i reset to the current master HEAD I get the above error. So now I reset my master to the current head, modified my cluster config to NOT use the custom AMI and use the default instance types: But after multiple attempts it just hangs continuously: The instance is Running in AWS console, status checks claim all is well. Vagrant is hanging and I also can't manually ssh in. Everything stayed the same in my cluster config (i.e.: security groups etc) with the exception changing the instance type to m1.small and not specifying a custom AMI. |
just as a sanity check below is a diff of the cluster.yml file that i was using before which was working fine and the mods i made not to use the custom AMI:
|
after re-reading your last comment and then looking at one of the commits about "pre-baked images" -- is it safe to say before things like zookeeper were being built when deploying instances and now with the "pre-baked" images it comes already installed? so basically if i want to use a non-"pre-baked" AMI, id have to build an AMI with services such as zookeeper already installed? need to dig through the code a bit more but wanted to confirm my suspicion before getting to far in. |
@chrisjs sorry for your inconvenience.
Yes, that's right. please refer to
Basically, yes. "pre-baked" images is shipped with 4 projects bellow:
If you don't have such images, you can add "mesos" recipe and its configuration json like:
If you need to install marathon and chronos, you can add installation scripts like this. I will add some explanation about assumptions which the box used in |
ok thanks. running into a couple other issues as well -- for example when rebooting the marathon/chronos instance, neither of those services start again at boot, yet an unexpected zookeeper instance does (which didn't when the instance is first created). this is using the codebase prior to the pre-baked images. seems like either I'm using this in the wrong way or there may be other issues to work out. going to close this issue for now as I'm not utilizing the new pre-baked ami. thanks! |
Apologies everpeace, Im trying to modify your configuration to run on DigitalOcean, and frankly I am a newbie. I dont have an image with Zookeeper etc so running into a similar challenge. This might be a horribly stupid question but I'd like to install Zookeeper via Salt/Chef as part of the config. Where would one add the above mesos JSON snippet in order to add that to the image/recipe? |
seems like the commit made in master / 19c7c6d ends up breaking the zookeeper build in AWS/multinode. this is with and without PR 23 (just a simplification PR but same error output) applied. i back tracked through the recent commits and this seems to be the culprit. failure log:
Bringing machine 'zk1' up with 'aws' provider...
Bringing machine 'zk2' up with 'aws' provider...
Bringing machine 'zk3' up with 'aws' provider...
Bringing machine 'master1' up with 'aws' provider...
Bringing machine 'master2' up with 'aws' provider...
Bringing machine 'slave1' up with 'aws' provider...
Bringing machine 'slave2' up with 'aws' provider...
Bringing machine 'slave3' up with 'aws' provider...
Bringing machine 'slave4' up with 'aws' provider...
Bringing machine 'slave5' up with 'aws' provider...
Bringing machine 'marathon' up with 'aws' provider...
==> zk1: HandleBoxUrl middleware is deprecated. Use HandleBox instead.
==> zk1: This is a bug with the provider. Please contact the creator
==> zk1: of the provider you use to fix this.
Updating Vagrant's berkshelf: '/home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1'
DEPRECATED: Your Berksfile contains a site location pointing to the Opscode Community Site (site :opscode). Site locations have been replaced by the source location. Change this to: 'source "https://supermarket.getchef.com"' to remove this warning. For more information visit https://github.com/berkshelf/berkshelf/wiki/deprecated-locations
Resolving cookbook dependencies...
Using 7-zip (1.0.2)
Using apt (2.6.0)
Using ark (0.9.0)
Using aufs (0.1.1)
Using build-essential (2.0.6)
Using chef_handler (1.1.6)
Using device-mapper (0.1.0)
Using dmg (2.2.0)
Using docker (0.35.2) from git://github.com/bflad/chef-docker.git (at master)
Using dpkg_autostart (0.1.10)
Using git (4.0.2)
Using golang (1.4.0)
Using homebrew (1.9.0)
Using iptables (0.14.0)
Using java (1.28.0)
Using lxc (1.1.8)
Using maven (1.1.0)
Using mesos (0.2.0) from git://github.com/everpeace/cookbook-mesos.git (at master)
Using modules (0.2.0)
Using ohai (2.0.1)
Using python (1.4.6)
Using runit (1.5.10)
Using sysctl (0.6.0)
Using ulimit (0.3.2)
Using windows (1.34.2)
Using yum (3.3.2)
Using yum-epel (0.5.1)
Vendoring 7-zip (1.0.2) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/7-zip
Vendoring apt (2.6.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/apt
Vendoring ark (0.9.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/ark
Vendoring aufs (0.1.1) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/aufs
Vendoring build-essential (2.0.6) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/build-essential
Vendoring chef_handler (1.1.6) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/chef_handler
Vendoring device-mapper (0.1.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/device-mapper
Vendoring dmg (2.2.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/dmg
Vendoring docker (0.35.2) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/docker
Vendoring dpkg_autostart (0.1.10) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/dpkg_autostart
Vendoring git (4.0.2) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/git
Vendoring golang (1.4.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/golang
Vendoring homebrew (1.9.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/homebrew
Vendoring iptables (0.14.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/iptables
Vendoring java (1.28.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/java
Vendoring lxc (1.1.8) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/lxc
Vendoring maven (1.1.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/maven
Vendoring mesos (0.2.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/mesos
Vendoring modules (0.2.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/modules
Vendoring ohai (2.0.1) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/ohai
Vendoring python (1.4.6) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/python
Vendoring runit (1.5.10) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/runit
Vendoring sysctl (0.6.0) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/sysctl
Vendoring ulimit (0.3.2) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/ulimit
Vendoring windows (1.34.2) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/windows
Vendoring yum (3.3.2) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/yum
Vendoring yum-epel (0.5.1) to /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/yum-epel
==> zk1: Warning! The AWS provider doesn't support any of the Vagrant
==> zk1: high-level network configurations (
config.vm.network
). They==> zk1: will be silently ignored.
==> zk1: Warning! You're launching this instance into a VPC without an
==> zk1: elastic IP. Please verify you're properly connected to a VPN so
==> zk1: you can access this machine, otherwise Vagrant will not be able
==> zk1: to SSH into it.
==> zk1: Launching an instance with the following settings...
==> zk1: -- Type: t2.medium
==> zk1: -- AMI: ami-1a389d72
==> zk1: -- Region: us-east-1
==> zk1: -- Keypair: XXX
==> zk1: -- Subnet ID: XXX
==> zk1: -- Private IP: 10.0.16.11
==> zk1: -- Security Groups: ["XXX"]
==> zk1: -- Block Device Mapping: []
==> zk1: -- Terminate On Shutdown: false
==> zk1: -- Monitoring: false
==> zk1: -- EBS optimized: false
==> zk1: -- Assigning a public IP address in a VPC: false
==> zk1: Warning! Vagrant might not be able to SSH into the instance.
==> zk1: Please check your security groups settings.
==> zk1: Waiting for instance to become "ready"...
==> zk1: Waiting for SSH to become available...
==> zk1: Machine is booted and ready for use!
==> zk1: Rsyncing folder: /home/ec2-user/vagrant-mesos/multinodes/ => /vagrant
==> zk1: Rsyncing folder: /home/ec2-user/.berkshelf/zk1/vagrant/berkshelf-20140916-1909-v25yvz-zk1/ => /tmp/vagrant-chef-3/chef-solo-1/cookbooks
==> zk1: Installing Chef 11.16.0 Omnibus package...
==> zk1: Downloading Chef 11.16.0 for ubuntu...
==> zk1: downloading https://www.getchef.com/chef/metadata?v=11.16.0&prerelease=false&nightlies=false&p=ubuntu&pv=14.04&m=x86_64
==> zk1: to file /tmp/install.sh.1493/metadata.txt
==> zk1: trying wget...
==> zk1: url https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/13.04/x86_64/chef_11.16.0-1_amd64.deb
==> zk1: md5 dd1a8f6e92bf4434a78c2abff5b8934b
==> zk1: sha256 06d22b6c35082377a3745999eb32589fdf547ae49e353a3d2280188b82a64e0a
==> zk1: downloaded metadata file looks valid...
==> zk1: downloading https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/13.04/x86_64/chef_11.16.0-1_amd64.deb
==> zk1: to file /tmp/vagrant-cache/vagrant_omnibus/chef_11.16.0-1_amd64.deb
==> zk1: trying wget...
==> zk1: Comparing checksum with sha256sum...
==> zk1: Installing Chef 11.16.0
==> zk1: installing with dpkg...
==> zk1: Selecting previously unselected package chef.
==> zk1: (Reading database ... 51097 files and directories currently installed.)
==> zk1: Preparing to unpack .../chef_11.16.0-1_amd64.deb ...
==> zk1: Unpacking chef (11.16.0-1) ...
==> zk1: Setting up chef (11.16.0-1) ...
==> zk1: Thank you for installing Chef!
==> zk1: Running provisioner: chef_solo...
Generating chef JSON and uploading...
==> zk1: Running chef-solo...
==> zk1: stdin: is not a tty
==> zk1: [2014-09-16T20:21:28+00:00] INFO: Forking chef instance to converge...
==> zk1: [2014-09-16T20:21:28+00:00] WARN:
==> zk1: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
==> zk1: SSL validation of HTTPS requests is disabled. HTTPS connections are still
==> zk1: encrypted, but chef is not able to detect forged replies or man in the middle
==> zk1: attacks.
==> zk1:
==> zk1: To fix this issue add an entry like this to your configuration file:
==> zk1:
==> zk1:
==> zk1: # Verify all HTTPS connections (recommended) ==> zk1: ssl_verify_mode :verify_peer ==> zk1: ==> zk1: # OR, Verify only connections to chef-server ==> zk1: verify_api_cert true ==> zk1:
==> zk1:
==> zk1: To check your SSL configuration, or troubleshoot errors, you can use the
==> zk1:
knife ssl check
command like so:==> zk1:
==> zk1:
==> zk1: knife ssl check -c /tmp/vagrant-chef-3/solo.rb ==> zk1:
==> zk1:
==> zk1: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
==> zk1: [2014-09-16T20:21:28+00:00] INFO: *** Chef 11.16.0 ***
==> zk1: [2014-09-16T20:21:28+00:00] INFO: Chef-client pid: 1676
==> zk1: [2014-09-16T20:21:30+00:00] INFO: Setting the run_list to ["recipe[apt]"] from CLI options
==> zk1: [2014-09-16T20:21:30+00:00] INFO: Run List is [recipe[apt]]
==> zk1: [2014-09-16T20:21:30+00:00] INFO: Run List expands to [apt]
==> zk1: [2014-09-16T20:21:30+00:00] INFO: Starting Chef Run for ip-10-0-16-11.ec2.internal
==> zk1: [2014-09-16T20:21:30+00:00] INFO: Running start handlers
==> zk1: [2014-09-16T20:21:30+00:00] INFO: Start handlers complete.
==> zk1: [2014-09-16T20:21:44+00:00] INFO: execute[apt-get-update-periodic] ran successfully
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local] created directory /var/cache/local
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local] owner changed to 0
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local] group changed to 0
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local] mode changed to 755
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local/preseeding] created directory /var/cache/local/preseeding
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local/preseeding] owner changed to 0
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local/preseeding] group changed to 0
==> zk1: [2014-09-16T20:21:44+00:00] INFO: directory[/var/cache/local/preseeding] mode changed to 755
==> zk1: [2014-09-16T20:21:44+00:00] INFO: Chef Run complete in 14.170175328 seconds
==> zk1: [2014-09-16T20:21:44+00:00] INFO: Running report handlers
==> zk1: [2014-09-16T20:21:44+00:00] INFO: Report handlers complete
==> zk1: Running provisioner: shell...
zk1: Running: inline script
==> zk1: stdin: is not a tty
==> zk1: chown: invalid user: 'zookeeper'
==> zk1: sudo: unknown user: zookeeper
==> zk1: sudo: unable to initialize policy plugin
==> zk1: /tmp/vagrant-shell: line 5: /etc/zookeeper/conf/zoo.cfg: No such file or directory
==> zk1: restart: Unknown job: zookeeper
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
chmod +x /tmp/vagrant-shell && /tmp/vagrant-shell
Stdout from the command:
Stderr from the command:
stdin: is not a tty
chown: invalid user: 'zookeeper'
sudo: unknown user: zookeeper
sudo: unable to initialize policy plugin
/tmp/vagrant-shell: line 5: /etc/zookeeper/conf/zoo.cfg: No such file or directory
restart: Unknown job: zookeeper
The text was updated successfully, but these errors were encountered: