Skip to content
This repository has been archived by the owner on Nov 20, 2023. It is now read-only.

Openstack heat #2

Merged
merged 26 commits into from
Dec 21, 2016
Merged

Openstack heat #2

merged 26 commits into from
Dec 21, 2016

Conversation

etsauer
Copy link
Contributor

@etsauer etsauer commented Oct 17, 2016

@oybed @sabre1041 I wanted to start getting your eyes on this. It's not complete working yet, but it's very close. The changes so far include the following:

  • new set of playbooks at ./playbooks/openshift/end-to-end.yaml, plus sub-playbooks provision.yaml, pre-install.yaml, install.yaml, post-install.yaml.
  • new provisioning role at roles/openstack-stack, which uses heat to do the provisioning. not sold on heat provisioning yet as it doesn't appear to be as idempotent as I was hoping.
  • a new hybrid inventory, using a combination of an openstack dynamic inventory script, plus a static inventory file that looks much more like a traditional openshift inventory.

The big advantage of this new approach is that the use of the hybrid inventory is that we can apply the exact same inventory to all phases of the provision/install without needing to do a bunch of fact hacking. It also means we don't need to manage the openshift-install role anymore. No more need to update an inventory template to inherit new functionality from openshift-ansible.

This isn't a polished product yet, lots of things hardcoded still. However it's a good sample of what the next phase could look like. Take a look.

The way this stuff would be run would be something like:

ansible-playbook -i ./casl-ansible/inventory/provision-openstack-sample/ ./casl-ansible/playbooks/openshift/{playbookname}.yaml

connection: local
gather_facts: False
tasks:
- command: inventory/openstack.py --refresh --list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. meta: refresh_inventory will invoke the dynamic inventory script

- hosts: OSEv3:dns
tasks:
- name: waiting for server to come back
local_action: wait_for host={{ inventory_hostname }} state=started delay=30 timeout=300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This play should target all hosts. I received intermittent failures since it only ran against the dns group

Also, to confirm full availability, a port check should be made
local_action: wait_for host={{ hostvars[inventory_hostname]['ansible_ssh_host'] }} port=22 delay=30 timeout=300

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabre1041 we can't just include all here because we can't make the assumption that we're running in a clean OSP tenant.

@sabre1041
Copy link
Contributor

@etsauer reviewed PR. Added some comments inline. Also, due to the upgrade to Ansible 2.2.0.0, bare variables no longer to function completly. This is affecting the dns and dns-server roles.

There also appears to be an ansible bug with relative paths and meta: refresh_inventory. When using relative paths, it is unable to locate the inventory file and bails out completly resulting in no host/groups/variables. Providing a full path to the folder containing the static and dynamic inventory file results in a successful resolution and execution

The following is a patch of the changes I made (minus minor inventory file changes)

ablock-changes.txt

@etsauer
Copy link
Contributor Author

etsauer commented Nov 15, 2016

@sabre1041 @oybed getting back to this. my tests are mostly working using the end-to-end playbook.. but bombing out during the secure-registry piece in post-install.yaml.

Instructions to run:

First, make sure you have the heat_stack_owner role assigned to you in your OSP project. Then run the following:

cp ./casl-ansible/inventory/provisioning-openstack-sample/clouds.yaml ~/.config/openstack/
ansible-playbook -i /root/repository/casl-ansible/inventory/provision-openstack-sample/ /root/repository/casl-ansible/playbooks/openshift/end-to-end.yml -e 'openshift_ansible_path=/root/repository/openshift-ansible'

NOTE: as @sabre1041 points out above, there is a smal bug in the meta: refresh_inventory module that requires that the path to our inventory is an absolute path.

@etsauer
Copy link
Contributor Author

etsauer commented Nov 15, 2016

Currently, the install fails at the post-install phase with the following error:

PLAY [masters:nodes] ***********************************************************

TASK [setup] *******************************************************************
ok: [master-0.test-stack.casl.rht-labs.com]
ok: [infranode-0.test-stack.casl.rht-labs.com]
ok: [app-node-0.test-stack.casl.rht-labs.com]

TASK [openshift-common : Setting OpenShift Common Facts] ***********************
ok: [infranode-0.test-stack.casl.rht-labs.com]
ok: [master-0.test-stack.casl.rht-labs.com]
ok: [app-node-0.test-stack.casl.rht-labs.com]

TASK [secure-registry : Check that Openshift Docker Registry exists] ***********
fatal: [master-0.test-stack.casl.rht-labs.com -> None]: FAILED! => {"changed": true, "cmd": ["oc", "get", "deploymentConfig", "docker-registry", "-n", "default"], "delta": "0:00:00.052364", "end": "2016-11-15 15:25:06.003363", "failed": true, "rc": 1, "start": "2016-11-15 15:25:05.950999", "stderr": "error: Missing or incomplete configuration info.  Please login or point to an existing, complete config file:\n\n  1. Via the command-line flag --config\n  2. Via the KUBECONFIG environment variable\n  3. In your home directory as ~/.kube/config\n\nTo view or setup config directly use the 'config' command.", "stdout": "", "stdout_lines": [], "warnings": []}

NO MORE HOSTS LEFT *************************************************************
        to retry, use: --limit @/root/repository/casl-ansible/playbooks/openshift/end-to-end.retry

PLAY RECAP *********************************************************************
app-node-0.test-stack.casl.rht-labs.com : ok=191  changed=67   unreachable=0    failed=0
dns-0.test-stack.casl.rht-labs.com : ok=49   changed=31   unreachable=0    failed=0
infranode-0.test-stack.casl.rht-labs.com : ok=191  changed=67   unreachable=0    failed=0
localhost                  : ok=26   changed=12   unreachable=0    failed=0
master-0.test-stack.casl.rht-labs.com : ok=530  changed=156  unreachable=0    failed=1   

@oybed
Copy link
Contributor

oybed commented Dec 3, 2016

@etsauer in general, it seems to work quite well. Thanks again for all your hard work on this one.

A few observations along the way that we should write up github issues for to get implemented/corrected, or at least bring up for discussion:

  • When setting the env_id to something more meaningful, it seems to have issues with generating the DNS records (i.e.: the records seems to be the instance id rather than the fqdn). Will retry without setting the env_id to validate that it works as expected in that case. NVM - this was due to missing the file clouds.yaml in ~/.config/openstack
  • For some reason, some of the objects created by heat cannot be removed using the regular OSP console. Need to investigate why and figure out if there's some additional work needed to allow for this (or if it's just a heat limitation that we have to live with)
  • As discussed already, we need to support multiple "stacks" in the same tenant - currently it doesn't
  • It would be nice to control the name of the key pair and be able to re-use existing ones (due to knowing which private key goes with the public key - with a random name it can easily be forgotten in a few months)
  • Same for security groups and networks - would be nice to be able to reuse (although this may complicate things for "heat delete")
  • Would like to be able to control the number of the instances - i.e.: personally I prefer to start on 1, and eliminate the dash, for these use cases.
  • Is the NFS instance left out on purpose, or just a missing piece?
  • Need to support using separate volumes for storage (i.e.: for etcd, logs, etc.)

Next steps (beyond working the above items):

  • need to support multiple masters, or at least multiple etcd instances, ASAP. I've seen more corruptions with single etcds latetly than what I like. Hopefully by having a clustered etcd we can at least recover without reinstall.
  • tower integration

@etsauer
Copy link
Contributor Author

etsauer commented Dec 16, 2016

@oybed will respond bullet-by-bullet:

  • yep, clouds.yaml
  • Yeah, the individual objects don't show in the UI. However, the intention of HEAT is that you create/delete the entire stack from the orchestration tab. Also, in the command line, you can clean up via:
    heat stack-delete <stackname>
    
    This is how I generally cleanup my test runs.
  • Yes, i'm working on a fix to the dynamic inventory to support multiple stacks
  • So the deal with HEAT for the keys (and other resources) is, either you let heat create it, and you don't get the control, or you pre-create it, and just pass in the name, and then it's not managed by heat. I think this makes sense for keys, since its more of a user managed thing, it makes sense to have those pre-created, and we can just pass heat a name of that key.
  • I disagree a bit on the security groups and networks. I like the fact that all of that is created and cleanedup by heat, and the user doesn't have to maintain any of it. However, on the network side, if we were reusing network resource ACROSS tenants, then I could see it differently. I don't really see an issue with security groups be part of the stack. If the only impact is that our shared tenants would need a higher sec group quota than we normally use, I think its worth not having to maintain separate ansible modules for security groups.
  • NFS was just missing, no specific reason
  • yes, agreed. could this be a post-merge step?

Copy link
Contributor

@oybed oybed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etsauer please see some global search/replace comments

openstack_default_image_name="rhel-guest-image-7.2"
openstack_default_flavor="m1.medium"
openstack_external_network_name="external"
openstack_dns_domain="casl.rht-labs.com"
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to 'example.com'

openstack_nameservers=10.9.48.31
openstack_num_nodes=1
openstack_num_infra=1
dns_domain="casl.rht-labs.com"
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to 'example.com'


# Subscription Management Details
rhsm_register=True
rhsm_satellite='sat6-1.etl.lab.eng.rdu2.redhat.com'
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to '.example.com'

deployment_type=openshift-enterprise
openshift_deployment_type=openshift-enterprise

openshift_master_default_subdomain=apps.test-stack.casl.rht-labs.com
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to 'example.com'

@@ -0,0 +1,22 @@
-----BEGIN CERTIFICATE-----
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend renaming this directory to something with 'example.com' and maybe clobber the content of some of the important files ...

deployment_type: openshift-enterprise
openshift_deployment_type: openshift-enterprise

openshift_master_default_subdomain: "apps.{{ env_id }}.casl.rht-labs.com"
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change 'example.com'.
Also, looking at the content below, this shouldn't have been on github.com in the first place. It will now stick in the history - not sure how big of a problem that is. :-)

openstack_default_image_name="rhel-guest-image-7.2"
openstack_default_flavor="m1.large"
openstack_external_network_name="external"
openstack_dns_domain="casl.rht-labs.com"
Copy link
Contributor

@oybed oybed Dec 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above - please change to 'example.com' (global search/replace)

@etsauer
Copy link
Contributor Author

etsauer commented Dec 20, 2016

@oybed cleaned up inventory per your request

@oybed
Copy link
Contributor

oybed commented Dec 20, 2016

In https://github.com/etsauer/casl-ansible/blob/openstack-heat/ose-provision.yml#L112, can you please change openshift-provision to openshift-prep ?

@etsauer
Copy link
Contributor Author

etsauer commented Dec 20, 2016

@oybed done

@oybed
Copy link
Contributor

oybed commented Dec 20, 2016

@etsauer very very close - please update https://github.com/etsauer/casl-ansible/blob/openstack-heat/provision.sh#L23 to read post-install.yml as the a was removed from the filename.

@oybed
Copy link
Contributor

oybed commented Dec 20, 2016

@etsauer looks like the post-install has changed in a way that breaks the (not-so) old implementation:

TASK [Add cluster hosts to cluster groups] *************************************
fatal: [localhost]: FAILED! => {"failed": true, "msg": "{{ groups['meta-environment_' ~ dns_domain] | intersect(groups['meta-clusterid_' ~ env_id]) }}: 'dns_domain' is undefined"}

@etsauer
Copy link
Contributor Author

etsauer commented Dec 21, 2016

@oybed as post-install.yml currently only does two things (sync additional ssh keys, and create htpasswd users), I propose there's little value in keeping it in the old script, and that the workaround during the transitional period is to remove that playbook run from provision.sh. What do you think?

@oybed
Copy link
Contributor

oybed commented Dec 21, 2016

@etsauer agreed - with that change that you just made, I believe we are good to merge.

@oybed oybed merged commit 9283ebb into redhat-cop:master Dec 21, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants