Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad upgrade from 0.5.6 to 0.6.3 cause nomad job runs to fail #3236

Closed
lgfausak opened this issue Sep 17, 2017 · 3 comments
Closed

nomad upgrade from 0.5.6 to 0.6.3 cause nomad job runs to fail #3236

lgfausak opened this issue Sep 17, 2017 · 3 comments

Comments

@lgfausak
Copy link

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Nomad v0.6.3

Operating system and Environment details

redhat Linux hypervisor-03 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

Issue

After upgrade from 0.5.6 to 0.6.3 I can no longer run a job

Reproduction steps

I removed all jobs from 0.5.6. Upgraded to 0.6.3, then tried to start any job and got this error on the client:

Sep 12 07:36:12 hypervisor-03 nomad: 2017/09/12 07:36:12.850775 [WARN] client: failed to start task "redis" for alloc "2ed6a584-e6ff-216a-d9f7-604f62fa02d3": Failed to start container 5887933013d804693dc2fefa78d3d22a64caccf64c88e8428434cb7f7b116d5e: API error (500): {"message":"driver failed programming external connectivity on endpoint redis-2ed6a584-e6ff-216a-d9f7-604f62fa02d3 (f80425e6a15a5c531c10751c211b97b67097ad07ee2ab4b3139c235159626cf2): Error starting userland proxy: listen tcp [fe80::1618:77ff:fe30:5e56]:27338: bind: invalid argument"}

Work Around

The suggestion from the nomad google group

{ "datacenter": "dc" ,"region": "us" ,"name": "hypervisor-03" ,"data_dir": "/var/nomad" ,"bind_addr": "0.0.0.0" ,"consul": {"address": "1.2.3.4:8500"} ,"client": { "enabled": true ,"network_interface": "br-bond0-34" ,"options": {"driver.docker.enable": "1"} }} added the network_interface definition in the nomad configuration file. In this particular case the client has 50 interfaces. The interface I selected was an ipv4 interface. The error message might imply that an ipv6 interface was selected automatically.

This workaround worked for me.

dadgar added a commit that referenced this issue Sep 25, 2017
This PR enhances the upgrade documentation from 0.5.x to 0.6.x

Fixes #3236
@dadgar
Copy link
Contributor

dadgar commented Sep 25, 2017

Hey @lgfausak Thanks for filing this. I enhanced the upgrade guide to better explain the need to set the network_interface on the clients when upgrading from 0.5.x to 0.6.x! 👍

Thanks and sorry you hit this hiccup!

@schmichael
Copy link
Member

I'm afraid this was caused by a change in our network fingerprinting in 0.6: #2536

Could you share the output of ip addr on that node? It may not be possible for Nomad to pick the right interface automatically, but it shouldn't choose a link-local IPv6 address when there are non-link-local addresses available.

Explicitly selecting an interface is probably a good idea on a node with 50 interfaces as our network autodetection code is not guaranteed to produce the same results between releases (although our goal is to always have it pick the most-likely-to-succeed interface!).

@github-actions
Copy link

github-actions bot commented Dec 7, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants