New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openstack refarch deploy-dns fails missing ServerGroupAntiAffinityFilter #699

Closed
dlbewley opened this Issue Aug 27, 2017 · 8 comments

Comments

Projects
None yet
5 participants
@dlbewley
Contributor

dlbewley commented Aug 27, 2017

BUG REQUEST INFO:

Environment:

  • Cloud provider: my own hardware

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 3.10.0-514.26.2.el7.x86_64 x86_64
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
PRETTY_NAME="Red Hat Enterprise Linux"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.3"
  • Version of Ansible: (ansible --version):
ansible 2.2.3.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
  • Version of Jinja: (pip freeze | grep -i jinja)
-bash: pip: command not found
  • Version of Shade: (pip freeze | grep -i shade)
-bash: pip: command not found

Openstack-ansible-contrib version (commit) (git rev-parse --short HEAD):

138e625

Copy of used inventory files and custom variables (please omit your secrets!):

# dns-vars.yaml
---
domain_name: ocp3.example.com
contact: openshift-admins@example.com
# real DNS servers from environment
dns_forwarders: [x.x.x.x, y.y.y.y]

update_key: "pxxxxxxxxxxxxxxxxxxxxx=="
slave_count: 2

stack_name: dns-service
external_network: external-179

image: rhel7
flavor: tiny-1x1
ssh_user: cloud-user
ssh_key_name: ocp3

# NOTE: For Red Hat Enterprise Linux:
#rhn_username: "rhnusername"
#rhn_password: "NOT A REAL PASSWORD"
#rhn_pool: "pool id string"
# Either RHN or Sat6
sat6_hostname: "satellite.company.com"
sat6_organization: "Company"
sat6_activationkey: "system-import"

Command used to invoke ansible:

#!/bin/bash
export ANSIBLE_HOST_KEY_CHECKING=False
ansible-playbook --private-key ocp3.key -e @dns-vars.yaml -vv\
        openshift-ansible-contrib/reference-architecture/osp-dns/deploy-dns.yaml | tee dns-deploy.log

Output of ansible run:

Using /etc/ansible/ansible.cfg as config file

PLAYBOOK: deploy-dns.yaml ******************************************************
5 plays in openshift-ansible-contrib/reference-architecture/osp-dns/deploy-dns.yaml

PLAY [Deploy the DNS servers] **************************************************

TASK [Check whether the stack exists already] **********************************
task path: /home/stack/openshift/openshift-ansible-contrib/reference-architecture/osp-dns/deploy-dns.yaml:16
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["openstack", "stack", "show", "dns-service"], "delta": "0:00:01.417990", "end": "2017-08-26 19:18:42.678917", "failed": true, "rc": 1, "start": "2017-08-26 19:18:41.260927", "stderr": "Stack not found: dns-service", "stdout": "", "stdout_lines": [], "warnings": []}
...ignoring

TASK [Create the Heat Stack] ***************************************************
task path: /home/stack/openshift/openshift-ansible-contrib/reference-architecture/osp-dns/deploy-dns.yaml:21
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Stack 'dns-service' failed.", "rc": 1, "stderr": "\n Stack dns-service CREATE_FAILED \n\n", "stdout": "2017-08-27 02:18:49Z [dns-service]: CREATE_IN_PROGRESS  Stack CREATE started\n2017-08-27 02:18:49Z [dns-service.network]: CREATE_IN_PROGRESS  state changed\n2017-08-27 02:19:00Z [dns-service.network]: CREATE_COMPLETE  state changed\n2017-08-27 02:19:01Z [dns-service.hosts]: CREATE_IN_PROGRESS  state changed\n2017-08-27 02:21:14Z [dns-service.hosts]: CREATE_FAILED  ResourceInError: resources.hosts.resources.slaves.resources.slaves.resources[1].resources.host: Went to status ERROR due to \"Message: ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured, Code: 400\"\n2017-08-27 02:21:14Z [dns-service]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources.hosts.resources.slaves.resources.slaves.resources[1].resources.host: Went to status ERROR due to \"Message: ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured, Code: 400\"\n", "stdout_lines": ["2017-08-27 02:18:49Z [dns-service]: CREATE_IN_PROGRESS  Stack CREATE started", "2017-08-27 02:18:49Z [dns-service.network]: CREATE_IN_PROGRESS  state changed", "2017-08-27 02:19:00Z [dns-service.network]: CREATE_COMPLETE  state changed", "2017-08-27 02:19:01Z [dns-service.hosts]: CREATE_IN_PROGRESS  state changed", "2017-08-27 02:21:14Z [dns-service.hosts]: CREATE_FAILED  ResourceInError: resources.hosts.resources.slaves.resources.slaves.resources[1].resources.host: Went to status ERROR due to \"Message: ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured, Code: 400\"", "2017-08-27 02:21:14Z [dns-service]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources.hosts.resources.slaves.resources.slaves.resources[1].resources.host: Went to status ERROR due to \"Message: ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured, Code: 400\""]}
        to retry, use: --limit @/home/stack/openshift/openshift-ansible-contrib/reference-architecture/osp-dns/deploy-dns.retry

PLAY RECAP *********************************************************************
localhost                  : ok=1    changed=1    unreachable=0    failed=1
[stack@director openshift]$ openstack stack event list dns-service
2017-08-27 02:18:49Z [dns-service]: CREATE_IN_PROGRESS  Stack CREATE started
2017-08-27 02:18:49Z [dns-service.network]: CREATE_IN_PROGRESS  state changed
2017-08-27 02:19:00Z [dns-service.network]: CREATE_COMPLETE  state changed
2017-08-27 02:19:01Z [dns-service.hosts]: CREATE_IN_PROGRESS  state changed
2017-08-27 02:21:14Z [dns-service.hosts]: CREATE_FAILED  ResourceInError: resources.hosts.resources.slaves.resources.slaves.resources[1].resources.host: Went to status ERROR due to "Message: ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured, Code: 400"
2017-08-27 02:21:14Z [dns-service]: CREATE_FAILED  Resource CREATE failed: ResourceInError: resources.hosts.resources.slaves.resources.slaves.resources[1].resources.host: Went to status ERROR due to "Message: ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured, Code: 400"

Anything else do we need to know:

This is a fresh OSP cluster installed with help of Tiger team, and I am a OpenStack newb.

Running all from Director node as ocp3 user in ocp3 project.

I'm working through the ref arch to deploy DNS

Apparently I do not have an anti affiinity configuration in Nova and am not sure how best to rectify.

This does not seem to be listed as a prereq in the ref arch. I know it is written for OSP 10 and I am on OSP 11.

@cooktheryan

This comment has been minimized.

Show comment
Hide comment
@cooktheryan

cooktheryan Aug 27, 2017

Collaborator

Give this a look it may be relevant

redhat-openstack/openshift-on-openstack#249

Collaborator

cooktheryan commented Aug 27, 2017

Give this a look it may be relevant

redhat-openstack/openshift-on-openstack#249

@dlbewley

This comment has been minimized.

Show comment
Hide comment
@dlbewley

dlbewley Aug 27, 2017

Contributor

I'm trying to understand how to modify the director's heat templates to enable this and do a openstack overcloud deploy to fix it.

Would the manual fix be to uncomment the first 2 lines below and restart openstack-nova?

[root@osp-comp-01 nova]# grep -E 'enabled_filters=|available_filters=' /etc/nova/nova.conf
#available_filters=nova.scheduler.filters.all_filters
#enabled_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
#baremetal_enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ExactRamFilter,ExactDiskFilter,ExactCoreFilter

I sent a note to refarch-feedback suggesting there is a missing pre-req.

Contributor

dlbewley commented Aug 27, 2017

I'm trying to understand how to modify the director's heat templates to enable this and do a openstack overcloud deploy to fix it.

Would the manual fix be to uncomment the first 2 lines below and restart openstack-nova?

[root@osp-comp-01 nova]# grep -E 'enabled_filters=|available_filters=' /etc/nova/nova.conf
#available_filters=nova.scheduler.filters.all_filters
#enabled_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
#baremetal_enabled_filters=RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ExactRamFilter,ExactDiskFilter,ExactCoreFilter

I sent a note to refarch-feedback suggesting there is a missing pre-req.

@cooktheryan

This comment has been minimized.

Show comment
Hide comment
@cooktheryan

cooktheryan Aug 27, 2017

Collaborator

I pinged my engineers after you sent the email but will also ping a couple osp experts on here as well to get some more traction.

@tomassedovic @bogdando

Collaborator

cooktheryan commented Aug 27, 2017

I pinged my engineers after you sent the email but will also ping a couple osp experts on here as well to get some more traction.

@tomassedovic @bogdando

@dlbewley

This comment has been minimized.

Show comment
Hide comment
@dlbewley

dlbewley Aug 27, 2017

Contributor

I think I'm getting closer with advanced_overcloud_customization doc

[stack@director manifests]$ grep _filters /etc/puppet/modules/nova/manifests/scheduler/*
# [*scheduler_available_filters*]
#   Defaults to ['nova.scheduler.filters.all_filters']
# [*scheduler_default_filters*]
# [*baremetal_scheduler_default_filters*]
# [*scheduler_use_baremetal_filters*]
#   (optional) Use baremetal_scheduler_default_filters or not.
  $scheduler_available_filters                    = ['nova.scheduler.filters.all_filters'],
  $scheduler_default_filters                      = $::os_service_default,
  $baremetal_scheduler_default_filters            = $::os_service_default,
  $scheduler_use_baremetal_filters                = false,
...
[stack@director scheduler]$ grep "_filters" /usr/share/openstack-tripleo-heat-templates/puppet/services/nova-*
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-scheduler.yaml:            nova::scheduler::filter::scheduler_available_filters: {get_param: NovaSchedulerAvailableFilters}
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-scheduler.yaml:            nova::scheduler::filter::scheduler_default_filters: {get_param: NovaSchedulerDefaultFilters}

I think I need to add the following to my custom_hiera.yaml

NovaSchedulerDefaultFilters:
 [RetryFilter,AvailabilityZoneFilter,RamFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter]

For the keystone token expiration (which is called out as pre-req) I resorted to Ansible to configure the controllers "by hand".

Contributor

dlbewley commented Aug 27, 2017

I think I'm getting closer with advanced_overcloud_customization doc

[stack@director manifests]$ grep _filters /etc/puppet/modules/nova/manifests/scheduler/*
# [*scheduler_available_filters*]
#   Defaults to ['nova.scheduler.filters.all_filters']
# [*scheduler_default_filters*]
# [*baremetal_scheduler_default_filters*]
# [*scheduler_use_baremetal_filters*]
#   (optional) Use baremetal_scheduler_default_filters or not.
  $scheduler_available_filters                    = ['nova.scheduler.filters.all_filters'],
  $scheduler_default_filters                      = $::os_service_default,
  $baremetal_scheduler_default_filters            = $::os_service_default,
  $scheduler_use_baremetal_filters                = false,
...
[stack@director scheduler]$ grep "_filters" /usr/share/openstack-tripleo-heat-templates/puppet/services/nova-*
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-scheduler.yaml:            nova::scheduler::filter::scheduler_available_filters: {get_param: NovaSchedulerAvailableFilters}
/usr/share/openstack-tripleo-heat-templates/puppet/services/nova-scheduler.yaml:            nova::scheduler::filter::scheduler_default_filters: {get_param: NovaSchedulerDefaultFilters}

I think I need to add the following to my custom_hiera.yaml

NovaSchedulerDefaultFilters:
 [RetryFilter,AvailabilityZoneFilter,RamFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter]

For the keystone token expiration (which is called out as pre-req) I resorted to Ansible to configure the controllers "by hand".

@tomassedovic

This comment has been minimized.

Show comment
Hide comment
@tomassedovic

tomassedovic Aug 28, 2017

Collaborator

@dlbewley unfortunately I'm not sure how to sat that up via tripleo/director, but I want to check one thing: does your cluster have more than one compute node? Anti-affinity won't ever work otherwise.

If you only have one compute node, set this in your vars file:

slave_server_group_policies: ['affinity']

That will still require having the filter, though.

I think leaving the list empty should work everywhere but I don't have a set up where to test this right now -- so you could try that and see if it unblocks you in the meantime.

Collaborator

tomassedovic commented Aug 28, 2017

@dlbewley unfortunately I'm not sure how to sat that up via tripleo/director, but I want to check one thing: does your cluster have more than one compute node? Anti-affinity won't ever work otherwise.

If you only have one compute node, set this in your vars file:

slave_server_group_policies: ['affinity']

That will still require having the filter, though.

I think leaving the list empty should work everywhere but I don't have a set up where to test this right now -- so you could try that and see if it unblocks you in the meantime.

@pschiffe pschiffe added the osp label Aug 28, 2017

@dlbewley

This comment has been minimized.

Show comment
Hide comment
@dlbewley

dlbewley Aug 28, 2017

Contributor

@tomassedovic I have multiple compute nodes.

I ran into an issue with deployment and my cloud is down. I have opened a case with support and I'll try changing the server group to 'affinity' and resume troubleshooting this once we have the deploy running cleanly again.

Contributor

dlbewley commented Aug 28, 2017

@tomassedovic I have multiple compute nodes.

I ran into an issue with deployment and my cloud is down. I have opened a case with support and I'll try changing the server group to 'affinity' and resume troubleshooting this once we have the deploy running cleanly again.

@ioggstream

This comment has been minimized.

Show comment
Hide comment
@ioggstream

ioggstream Aug 31, 2017

Contributor

@tomassedovic on OSP10 there's a case for backporting soft-anti-affinity on heat (as on nova it's already supported). See:

Contributor

ioggstream commented Aug 31, 2017

@tomassedovic on OSP10 there's a case for backporting soft-anti-affinity on heat (as on nova it's already supported). See:

@dlbewley

This comment has been minimized.

Show comment
Hide comment
@dlbewley

dlbewley Sep 14, 2017

Contributor

After my cloud was fixed up, the scheduler blocker was cleared. I do think the ref arch might be more comprehensive it it included the nova filters required. @markllama

I'm now finding that bind-server.yml:9 is perturbed by the lack of firewalld in my rhel 7 image, but that's another issue.

Contributor

dlbewley commented Sep 14, 2017

After my cloud was fixed up, the scheduler blocker was cleared. I do think the ref arch might be more comprehensive it it included the nova filters required. @markllama

I'm now finding that bind-server.yml:9 is perturbed by the lack of firewalld in my rhel 7 image, but that's another issue.

@dlbewley dlbewley closed this Sep 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment