Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support manual interfaces and other issues #31

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

busterswt
Copy link

This PR introduces enhancements for supporting 'manual' interfaces (those without IPs). It also introduces a 'fix' for bonds in Debian/Ubuntu to bring up member interfaces prior to trying to bring up the bond. Lastly, introduced a new var to pause the playbooks after interfaces are bounced, but before facts are gathered, due to some server interfaces not being UP/active prior to fact gathering.

Copy link
Collaborator

@markgoddard markgoddard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @busterswt.

@@ -17,3 +17,4 @@ interfaces_pkg_state: installed
interfaces_ether_interfaces: []
interfaces_bridge_interfaces: []
interfaces_bond_interfaces: []
playbook_pause_time: 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this with a prefix of interfaces_?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can. Please see handler comment.

- Gather facts
- Verify network interfaces exist
- Verify network interfaces are active
- Verify network interface IP configuration
- Verify bond interface slaves
- Verify bridge interface ports

- name: Pause playbook to give interfaces time to recover
pause:
seconds: "{{ playbook_pause_time }}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a race going on here, is this always going to work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To which race might you be referring to? The idea behind this was simply to avoid hitting the gather_facts tasks before the interfaces were up on the server. If there's a better way of doing this I'm totally open to it. What I was seeing was the interfaces weren't all up by the time facts were gathered, resulting in failures when checking for active interfaces.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point was, if we need to sleep here, that suggests to me there is a race condition between the network being 'up' and us gathering facts. Adding a sleep isn't necessarily going to guarantee that we gather facts after the interfaces are up - how do we know how long to sleep for? Would that value work everywhere?

Perhaps the default of 3 seconds is sufficient to work in most environments. It's not something I've needed to do, but then I primarily use CentOS.

{% if item.bond_slaves is defined and item.bond_mode == 'active-backup' %}
bond-slaves none
{% for slave in item.bond_slaves %}
pre-up (sleep 2 && ifup {{ slave }}) &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems quite brittle. It's run as a background task, so will likely start after the master has been brought up. It all seems a bit racy to me. Shouldn't the OS networking scripts handle these dependencies automatically? It seems to work on Ubuntu 16.04.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to remove this. I've seen issues where the bond master would not come up unless the slaves were up, but bringing up slaves first brings up the bond master. This may not be an issue with your configuration, so I'll take it out.

{% if item.bond_mode is defined %}
bond-mode {{ item.bond_mode }}
{% endif %}
bond-miimon {{ item.bond_miimon|default(100) }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing these options for manual: downdelay, updelay, xmit_hash_policy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right on. I'll take a look and fix that, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

markgoddard added a commit to stackhpc/kayobe-original that referenced this pull request Feb 4, 2021
* Use source images
* Need to specify bash for &> syntax

Issues worked around:

* Manually configuring bridge via ip commands makes ifup fail to bring
  up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
  persistent configuration for the all-in-one network.

* bridge not active after interfaces role bounce. Added a pause, similar
  to michaelrigart/ansible-role-interfaces#31

* fails installing docker python module for kolla user. WARNING: The
  repository located at mirror-int.ord.rax.opendev.org is not a trusted
  or secure host and is being ignored ERROR: No matching distribution
  found for docker===4.4.0 Adding trusted host for PyPI mirror.

* Tenks fails to create block devices - missing qemu-img (in qemu-utils)

* Tenks qemu emulator is different on Ubuntu

Remaining issues:

* Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
  failures such as the following:

    ipmitool chassis bootdev pxe

    Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
    Parameter 0\n

  Bare metal testing is disabled on Ubuntu for now.

Depends-On: https://review.opendev.org/766984
Depends-On: https://review.opendev.org/766958

Story: 2004960
Task: 29393

Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
markgoddard added a commit to stackhpc/kayobe-original that referenced this pull request Feb 12, 2021
* Use source images
* Need to specify bash for &> syntax

Issues worked around:

* Manually configuring bridge via ip commands makes ifup fail to bring
  up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
  persistent configuration for the all-in-one network.

* bridge not active after interfaces role bounce. Added a pause, similar
  to michaelrigart/ansible-role-interfaces#31

* fails installing docker python module for kolla user. WARNING: The
  repository located at mirror-int.ord.rax.opendev.org is not a trusted
  or secure host and is being ignored ERROR: No matching distribution
  found for docker===4.4.0 Adding trusted host for PyPI mirror.

* Tenks fails to create block devices - missing qemu-img (in qemu-utils)

* Tenks qemu emulator is different on Ubuntu

Remaining issues:

* Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
  failures such as the following:

    ipmitool chassis bootdev pxe

    Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
    Parameter 0\n

  Bare metal testing is disabled on Ubuntu for now.

Depends-On: https://review.opendev.org/766984
Depends-On: https://review.opendev.org/766958

Story: 2004960
Task: 29393

Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
markgoddard added a commit to stackhpc/kayobe-original that referenced this pull request Feb 23, 2021
* Use source images
* Need to specify bash for &> syntax

Issues worked around:

* Manually configuring bridge via ip commands makes ifup fail to bring
  up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
  persistent configuration for the all-in-one network.

* bridge not active after interfaces role bounce. Added a pause, similar
  to michaelrigart/ansible-role-interfaces#31

* fails installing docker python module for kolla user. WARNING: The
  repository located at mirror-int.ord.rax.opendev.org is not a trusted
  or secure host and is being ignored ERROR: No matching distribution
  found for docker===4.4.0 Adding trusted host for PyPI mirror.

* Tenks fails to create block devices - missing qemu-img (in qemu-utils)

* Tenks qemu emulator is different on Ubuntu

Remaining issues:

* Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
  failures such as the following:

    ipmitool chassis bootdev pxe

    Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
    Parameter 0\n

  Bare metal testing is disabled on Ubuntu for now.

Depends-On: https://review.opendev.org/766984
Depends-On: https://review.opendev.org/766958

Story: 2004960
Task: 29393

Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
openstack-mirroring pushed a commit to openstack/openstack that referenced this pull request Mar 4, 2021
* Update kayobe from branch 'master'
  to 0e3ec62471b98d69cb5f46b8a9e1ebbefe9c4a13
  - Merge "CI: add Ubuntu overcloud deploy job"
  - CI: add Ubuntu overcloud deploy job
    
    * Use source images
    * Need to specify bash for &> syntax
    
    Issues worked around:
    
    * Manually configuring bridge via ip commands makes ifup fail to bring
      up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
      persistent configuration for the all-in-one network.
    
    * bridge not active after interfaces role bounce. Added a pause, similar
      to michaelrigart/ansible-role-interfaces#31
    
    * fails installing docker python module for kolla user. WARNING: The
      repository located at mirror-int.ord.rax.opendev.org is not a trusted
      or secure host and is being ignored ERROR: No matching distribution
      found for docker===4.4.0 Adding trusted host for PyPI mirror.
    
    * Tenks fails to create block devices - missing qemu-img (in qemu-utils)
    
    * Tenks qemu emulator is different on Ubuntu
    
    Remaining issues:
    
    * Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
      failures such as the following:
    
        ipmitool chassis bootdev pxe
    
        Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
        Parameter 0\n
    
      Bare metal testing is disabled on Ubuntu for now.
    
    Depends-On: https://review.opendev.org/766984
    Depends-On: https://review.opendev.org/766958
    
    Story: 2004960
    Task: 29393
    
    Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
openstack-mirroring pushed a commit to openstack/kayobe that referenced this pull request Mar 4, 2021
* Use source images
* Need to specify bash for &> syntax

Issues worked around:

* Manually configuring bridge via ip commands makes ifup fail to bring
  up the link. Adds a kayobe-network-bootstrap Zuul CI role that adds
  persistent configuration for the all-in-one network.

* bridge not active after interfaces role bounce. Added a pause, similar
  to michaelrigart/ansible-role-interfaces#31

* fails installing docker python module for kolla user. WARNING: The
  repository located at mirror-int.ord.rax.opendev.org is not a trusted
  or secure host and is being ignored ERROR: No matching distribution
  found for docker===4.4.0 Adding trusted host for PyPI mirror.

* Tenks fails to create block devices - missing qemu-img (in qemu-utils)

* Tenks qemu emulator is different on Ubuntu

Remaining issues:

* Bare metal testing is unreliable on Ubuntu - some jobs see IPMI
  failures such as the following:

    ipmitool chassis bootdev pxe

    Error setting Chassis Boot Parameter 5\nError setting Chassis Boot
    Parameter 0\n

  Bare metal testing is disabled on Ubuntu for now.

Depends-On: https://review.opendev.org/766984
Depends-On: https://review.opendev.org/766958

Story: 2004960
Task: 29393

Change-Id: I1985efae7c18f55c3ff7c27c17d6242523904f3e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants