Skip to content
This repository has been archived by the owner on Jul 11, 2020. It is now read-only.

Scaffold initial required containers #1

Closed
5 tasks done
geerlingguy opened this issue Oct 30, 2019 · 13 comments
Closed
5 tasks done

Scaffold initial required containers #1

geerlingguy opened this issue Oct 30, 2019 · 13 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Oct 30, 2019

See: https://github.com/geerlingguy/awx-container/blob/master/docker-compose.yml (for a starting point).

Basically, I need to add in containers and resources for:

  • postgres
  • memcached
  • rabbitmq
  • awx_web
  • awx_task

Note that I'm still trying to see if there are any 'official' tower docker images available via Docker Hub, Quay, or elsewhere. Will ask around internally to see if there are or not.

@geerlingguy
Copy link
Owner Author

It looks like the main images are here:

And the installer for Tower on OpenShift is here: https://releases.ansible.com/ansible-tower/setup_openshift/

I'll have to dig through the installer and see exactly what it's doing / how it's plugging things together. Hopefully it can be mostly a 1:1 matchup.

@geerlingguy
Copy link
Owner Author

Found this issue upstream as I was working on Minikube support for local testing and development: operator-framework/operator-sdk#2168

@geerlingguy
Copy link
Owner Author

I have the basic components put together, but it seems I need to do some updating to get the SECRET_KEY working properly. Checking the logs on the awx web pod, I see a lot of repeat errors:

  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/conf/__init__.py", line 176, in __init__
    raise ImproperlyConfigured("The SECRET_KEY setting must not be empty.")
django.core.exceptions.ImproperlyConfigured: The SECRET_KEY setting must not be empty.
2019-11-06 22:23:28,430 INFO exited: daphne (exit status 1; not expected)

@geerlingguy
Copy link
Owner Author

I'll need to mount that as a file inside the pod; reference the AWX docker-compose.yml.j2: https://github.com/ansible/awx/blob/devel/installer/roles/local_docker/templates/docker-compose.yml.j2#L22-L26

@geerlingguy
Copy link
Owner Author

Heh... did that and now getting:

django.core.exceptions.ImproperlyConfigured: No AWX configuration found at /etc/tower/settings.py.
Define the AWX_SETTINGS_FILE environment variable to specify an alternate path.

I need to be able to set the SECRET_KEY in that file but not have Kubernetes interfere with the directory in which that file resides.

@geerlingguy
Copy link
Owner Author

Grr, trying to mount secret to one location and copy in a postStart lifecycle event in the pod... but getting:

PostStartHookError: command '/bin/sh -c cp /etc/tower-secrets/SECRET_KEY /etc/tower/SECRET_KEY' exited with 1: cp: -r not specified; om
itting directory '/etc/tower-secrets/SECRET_KEY'

@geerlingguy
Copy link
Owner Author

The /etc/tower directory is owned by root but the container starts as the awx user. I think the easiest path forward is to manage all the contents (currently on a blank install just the settings.py file) of that directory through Kubernetes using ConfigMaps.

Going to work on that now.

@geerlingguy
Copy link
Owner Author

I am also just noticing that there is a kubernetes role in the official AWX installer... which would've saved at least a little setup time: https://github.com/ansible/awx/blob/devel/installer/roles/kubernetes

Note that it looks like it's trying to be all things to all people, and has a ton of configurable options in it that make it a bit harder to reason about. Makes me ponder the pareto principle :)

@geerlingguy
Copy link
Owner Author

K, 99% of the way there, it looks like AWX is installing correctly, but I'm seeing:

2019/11/08 21:18:46 [emerg] 159#0: bind() to 0.0.0.0:80 failed (13: Permission denied)
nginx: [emerg] bind() to 0.0.0.0:80 failed (13: Permission denied)
2019-11-08 21:18:46,568 INFO exited: nginx (exit status 1; not expected)
2019-11-08 21:18:47,578 INFO gave up: nginx entered FATAL state, too many start retries too quickly

Pretty sure that's because supervisord (and, indeed, everything else) is being run as the awx user, therefore can't bind to port 80 since it doesn't have that privilege.

@geerlingguy
Copy link
Owner Author

Woohoo, I'm getting somewhere!

awx-upgrading

Now I need to set the environment correctly, I think.

@geerlingguy
Copy link
Owner Author

I'm still getting stuck at migrations_notran, and it looks like the upgrade is not completing. There are a ton of extra containers running in the AWX stateful set in https://github.com/ansible/awx/blob/devel/installer/roles/kubernetes/templates/deployment.yml.j2, so my guess is maybe I'm missing something from there.

Tower is a kinda-crazy-complex beast, apparently more so than the last time I checked in on custom installation a year or so ago :P

@geerlingguy
Copy link
Owner Author

Hmm... might actually just be the 'management-pod' (https://github.com/ansible/awx/blob/devel/installer/roles/kubernetes/templates/management-pod.yml.j2) is not currently set up (I was so focused on the 'web' pod that I haven't been looking at the task deployment at all.

Going to check that out now.

@geerlingguy
Copy link
Owner Author

After a long while...

2019-11-08 22:32:40,842 DEBUG    awx.main.migrations Removing all Rackspace InventorySource from database.
2019-11-08 22:32:41,019 DEBUG    awx.main.migrations Removing all Azure Credentials from database.
2019-11-08 22:32:41,545 DEBUG    awx.main.migrations Removing all Azure InventorySource from database.
2019-11-08 22:32:44,696 DEBUG    awx.main.migrations Removing all InventorySource that have no link to an Inventory from database.
2019-11-08 22:33:18,567 INFO     rbac_migrations Computing role roots..
2019-11-08 22:33:18,569 INFO     rbac_migrations Found 0 roots in 0.000392 seconds, rebuilding ancestry map
2019-11-08 22:33:18,570 INFO     rbac_migrations Rebuild completed in 0.000151 seconds
...
2019-11-08 22:34:53,084 INFO success: channels-worker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2019-11-08 22:34:58,030 ERROR    celery.beat Removing corrupted schedule file '/var/lib/awx/beat.db': error(11, 'Resource temporarily unavailable')
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/celery/beat.py", line 485, in setup_schedule
    self._store = self._open_schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/celery/beat.py", line 475, in _open_schedule
    return self.persistence.open(self.schedule_filename, writeback=True)
  File "/usr/lib64/python3.6/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/lib64/python3.6/shelve.py", line 227, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib64/python3.6/dbm/__init__.py", line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: [Errno 11] Resource temporarily unavailable
2019-11-08 22:34:58,106 WARNING  awx.main.dispatch scaling up worker pid:184
2019-11-08 22:34:58,154 WARNING  awx.main.dispatch scaling up worker pid:185
2019-11-08 22:34:58,207 WARNING  awx.main.dispatch scaling up worker pid:186
2019-11-08 22:34:58,259 WARNING  awx.main.dispatch scaling up worker pid:187
2019-11-08 22:34:58,290 DEBUG    awx.main.tasks Syncing Schedules
...

And then a wild login screen appears:

Screen Shot 2019-11-08 at 4 38 47 PM

And a login succeeds!

Screen Shot 2019-11-08 at 4 40 57 PM

Time to close this issue and start opening follow-ups.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant