Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper script and related docs #7218

Closed
mrjones-plip opened this issue May 4, 2021 · 9 comments
Closed

Add helper script and related docs #7218

mrjones-plip opened this issue May 4, 2021 · 9 comments
Assignees

Comments

@mrjones-plip
Copy link
Contributor

See parent issue for more info/context.

There have been some issues where docker-compose has failed and it's unclear how to fix it.

We should publish a bash script that can be run to check the health of a docker-compose up call. Specifically it should:

  • make sure the two containers are up: haproxy, medic-os
  • make sure all services in medic-os are running: API, CouchDB, nginx
  • offer helpful tips on how to troubleshoot any of the services that may have gotten stuck
  • be aware that there may be more than one instance of medic-os running both in running exec calls the specify container name and that there may be port conflicts
  • Be sure the admin password to CHT is easy to find. It can some times be set to something unknown (alternately - just set forcefully to the same thing every time)
  • publish documentation on docs site on how to use this helper script and it's output.

It'd be great to have the script give a fixed output that users can get used to and easily find the information they're looking for. Just brainstorming here, but this might look something like this:

-------------------------------------------------------------
                CHT Instance: Foo Bar Smang

     CHT Version: 3.9.1
     URL:  https://192-168-1-1.my.local-ip.co
     Fauxton: http://172.18.0.2:5984/_utils/
     Admin Login: Medic
     Admin Password: Medic321

     + Medic OS
           - horticulturalist - RUNNING ('horticulturalist/horticulturalist' started successfully)
           - medic-api - RUNNING ('medic-api/medic-api' started successfully)
           - couchdb - RUNNING ('medic-core/couchdb' started successfully)
           - nginx - RUNNING ('medic-core/nginx' started successfully)
           - openssh - RUNNING ('medic-core/openssh' started successfully)
           - medic-couch2pg - RUNNING ('medic-couch2pg/medic-couch2pg' started successfully)
           - postgresql - RUNNING ('medic-rdbms/postgresql' started successfully)
           - medic-sentinel - RUNNING ('medic-sentinel/medic-sentinel' started successfully)
           - cron - RUNNING ('system-services/cron' started successfully)
           - syslog - RUNNING ('system-services/syslog' started successfully)
     + HA Proxy
           - haproxy - RUNNING (May 11 21:32:59 0df0af7b7c90 haproxy[1]: Proxy couch-backend2 started.)
-------------------------------------------------------------

This is likely TMI and could skip a lot of the granular statuses. However the top part I've had to go looking for a lot and would massively helpful to have easily accessible.

@mrjones-plip
Copy link
Contributor Author

@binokaryg - thoughts/comments on my requirements above?

@binokaryg
Copy link
Member

This looks super helpful, thanks @mrjones-plip.
Can we hide the password unless specified when running the script? We might want to share the output with others and might forget to hide the password, especially if the same password is used elsewhere.

@mrjones-plip
Copy link
Contributor Author

@binokaryg - Thanks for the feedback! Glad this is looking helpful.

Can we hide the password unless specified when running the script? We might want to share the output with others and might forget to hide the password, especially if the same password is used elsewhere.

Yup yup, this should be easy enough!

@mrjones-plip
Copy link
Contributor Author

I've intentionally done a lot of testing of docker-compose up calls, and so far this is the error mode I reliably see (haproxy exited with code 1). When checking with curl it reported:

curl  -vv https://localhost/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:443

And then:

click to see full docker-compose logs for this

docker-compose up                                                
Recreating haproxy ... done                                                                                                                                                                        
Recreating medic-os ... done                                                                   
Attaching to haproxy, medic-os                                                                                                                                                                     
medic-os    | mesg: ttyname failed: Inappropriate ioctl for device                                         
medic-os    | [2021/05/12 21:50:52] Info: Setting up software...                                                                                                                                   
haproxy     | Starting enhanced syslogd: rsyslogd.                                                               
haproxy     | May  8 00:49:09 702c68f04f1b haproxy[25]: 172.18.0.3,200,POST,/medic/_changes?include_docs=true&feed=longpoll&heartbeat=10000&filter=_doc_ids&since=205-g1AAAAFTeJzLYWBg4MhgTmEQTM4vT
c5ISXIwNDLXMwBCwxygFFMiQ5L8____sxIF8ShKUgCSSfZgdbz41DmA1MWD1bHjU5cAUlcPVieGR10eC5BkaABSQKXzsxIlCKpdAFG7PytRiKDaAxC197MSxQmqfQBRC3RvYBYArFBfWw&limit=25,-,horticulturalist,'{"doc_ids":["horti-upgra
de","_design/medic:staged"]}',3658,10006,-,'node-fetch/1.0 (+https://github.com/bitinn/node-fetch)'                                                                                                
haproxy     | May  8 00:49:09 702c68f04f1b haproxy[25]: 172.18.0.3,200,GET,/medic/_changes?feed=longpoll&heartbeat=10000&since=205-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____sxIF
8ShKUgCSSfZgdbz41DmA1MWD1bHjU5cAUlcPVieGR10eC5BkaABSQKXzsxIlCKpdAFG7PytRiKDaAxC197MSxQmqfQBRC3RvYBYArFBfWw&limit=25,api,medic-api,'-',3658,10004,-,'node-fetch/1.0 (+https://github.com/bitinn/node
-fetch)'                                                                             
haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Stopping frontend http-in in 0 ms.                                                                                                         haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Stopping frontend http-in2 in 0 ms.
haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Stopping backend couch-backend in 0 ms.                                                                                                    haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Stopping backend couch-backend2 in 0 ms.
haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Proxy http-in stopped (FE: 271 conns, BE: 0 conns).                                                                                        haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Proxy http-in2 stopped (FE: 0 conns, BE: 0 conns).
haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Proxy couch-backend stopped (FE: 0 conns, BE: 271 conns).                                                                                  
haproxy     | May  8 00:49:10 702c68f04f1b haproxy[25]: Proxy couch-backend2 stopped (FE: 0 conns, BE: 0 conns).
medic-os    | [2021/05/12 21:50:52] Info: Running setup task 'horticulturalist/sudoers'
haproxy     | # Setting `log` here with the address of 127.0.0.1 will have the effect
haproxy     | # of haproxy sending the udp log messages to its own rsyslog instance
haproxy     | # (which sits at `127.0.0.1`) at the `local0` facility including all
haproxy     | # logs that have a priority greater or equal to the specified log level
haproxy     | # log 127.0.0.1 local0 warning
haproxy     | global
haproxy     |   maxconn 4096
haproxy     |   lua-load /usr/local/etc/haproxy/parse_basic.lua
haproxy     |   lua-load /usr/local/etc/haproxy/parse_cookie.lua
haproxy     |   lua-load /usr/local/etc/haproxy/replace_password.lua
haproxy     |   log /dev/log len 65535 local2 info
haproxy     |
haproxy     | defaults
haproxy     |   mode http
haproxy     |   log global
haproxy     |   option dontlognull
haproxy     |   option http-ignore-probes
haproxy     |   timeout client 150000
haproxy     |   timeout server 3600000
haproxy     |   timeout connect 15000
haproxy     |   stats enable
haproxy     |   stats refresh 30s
haproxy     |   stats auth admin:password
haproxy     |   stats uri /haproxy?stats
haproxy     |
haproxy     | frontend http-in
haproxy     |   bind  *:5984
haproxy     |   acl has_user req.hdr(x-medic-user) -m found
haproxy     |   acl has_cookie req.hdr(cookie) -m found
haproxy     |   acl has_basic_auth req.hdr(authorization) -m found
haproxy     |   declare capture request len 400000
haproxy     |   http-request set-header x-medic-user %[lua.parseBasic] if has_basic_auth
haproxy     |   http-request set-header x-medic-user %[lua.parseCookie] if !has_basic_auth !has_user has_cookie
haproxy     |   http-request capture req.body id 0 # capture.req.hdr(0)
haproxy     |   http-request capture req.hdr(x-medic-service) len 200 # capture.req.hdr(1)
haproxy     |   http-request capture req.hdr(x-medic-user) len 200 # capture.req.hdr(2)
haproxy     |   http-request capture req.hdr(user-agent) len 600 # capture.req.hdr(3)
haproxy     |   capture response header Content-Length len 10 # capture.res.hdr(0)
haproxy     |   log-format "%ci,%ST,%[capture.req.method],%[capture.req.uri],%[capture.req.hdr(1)],%[capture.req.hdr(2)],'%[capture.req.hdr(0),lua.replacePassword]',%B,%Tr,%[capture.res.hdr(0)],'
%[capture.req.hdr(3)]'"
haproxy     |   default_backend couch-backend
haproxy     |
haproxy     | frontend http-in2
haproxy     |   bind  *:5986
haproxy     |   default_backend couch-backend2
haproxy     |
haproxy     | backend couch-backend
haproxy     |   balance roundrobin
haproxy     |   server couchdb1 medic-os:5985
haproxy     |
haproxy     | backend couch-backend2
haproxy     |   balance roundrobin
haproxy     |   server couchdb1 medic-os:5987
haproxy     | [alert] 131/215051 (1) : parseBasic loaded
haproxy     | [alert] 131/215051 (1) : parseCookie loaded
haproxy     | [alert] 131/215051 (1) : replacePassword loaded
haproxy     | [ALERT] 131/215051 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:48] : 'server couchdb1' : could not resolve address 'medic-os'.
haproxy     | [ALERT] 131/215051 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:52] : 'server couchdb1' : could not resolve address 'medic-os'.
haproxy     | [ALERT] 131/215051 (1) : Failed to initialize server(s) addr.
haproxy exited with code 1
medic-os    | [2021/05/12 21:50:52] Info: Running setup task 'horticulturalist/users'
medic-os    | [2021/05/12 21:50:52] Info: Setting up software (14% complete)...
medic-os    | [2021/05/12 21:50:52] Info: Running setup task 'medic-api/link-logs'
medic-os    | [2021/05/12 21:50:52] Info: Running setup task 'medic-api/logrotate'
medic-os    | [2021/05/12 21:50:52] Info: Running setup task 'medic-api/users'
medic-os    | [2021/05/12 21:50:52] Info: Setting up software (28% complete)...
medic-os    | [2021/05/12 21:50:52] Info: Running setup task 'medic-core/ldconfig'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/link-logs'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/logrotate'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/nginx'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/nginx-ssl'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/profile'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/ssh-authorized-keys'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/ssh-keygen'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-core/usb-modeswitch'
medic-os    | [2021/05/12 21:50:53] Info: Setting up software (42% complete)...
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-couch2pg/link-logs'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-couch2pg/logrotate'
medic-os    | [2021/05/12 21:50:53] Info: Setting up software (57% complete)...
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-rdbms/ldconfig'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-rdbms/link-logs'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-rdbms/reconfigure'
medic-os    | [2021/05/12 21:50:53] Info: Setting up software (71% complete)...
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-sentinel/link-logs'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-sentinel/logrotate'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'medic-sentinel/users'
medic-os    | [2021/05/12 21:50:53] Info: Setting up software (85% complete)...
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'system-services/home-directories'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'system-services/link-logs'
medic-os    | [2021/05/12 21:50:53] Info: Running setup task 'system-services/logrotate'
medic-os    | [2021/05/12 21:50:53] Info: Setting up software (100% complete)...
medic-os    | [2021/05/12 21:50:53] Info: Starting services...
medic-os    | [2021/05/12 21:50:53] Info: Service 'horticulturalist/horticulturalist' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-api/medic-api' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-core/couchdb' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-core/nginx' started successfully
medic-os    | [2021/05/12 21:50:53] Info: CouchDB is already configured
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-core/openssh' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-couch2pg/medic-couch2pg' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-rdbms/postgresql' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'medic-sentinel/medic-sentinel' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'system-services/cron' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Service 'system-services/syslog' started successfully
medic-os    | [2021/05/12 21:50:53] Info: Synchronizing disks...
medic-os    | [2021/05/12 21:50:56] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:50:58] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:01] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:03] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:03] Info: System started successfully
medic-os    | [2021/05/12 21:51:03] Info: Starting log streaming
medic-os    | [2021/05/12 21:51:06] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:08] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:11] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:13] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:16] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:18] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:21] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:23] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:26] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:28] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:31] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:33] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:36] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:38] Info: Horticulturalist has already bootstrapped
medic-os    | [2021/05/12 21:51:41] Info: Horticulturalist has already bootstrapped

@mrjones-plip
Copy link
Contributor Author

After hitting the above error today, I did a ctrl + c and then for kicks I did another (press Ctrl+C again to force). When re-running docker-compose up I got a new error! medic-os failed to start:

medic-os    | [2021/06/02 22:46:37] Warning: Package 'medic-sentinel' is not currently installed
medic-os    | Fatal: Failed to extract required software from disk/image
medic-os exited with code 1

Indeed, nobody is home as expected:

➜  ~ curl  -vv https://localhost/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* connect to 127.0.0.1 port 443 failed: Connection refused
* Failed to connect to localhost port 443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 443: Connection refused

➜  ~ docker ps                   
CONTAINER ID   IMAGE                         COMMAND                  CREATED         STATUS         PORTS     NAMES
0205c5fd25a2   medicmobile/haproxy:rc-1.17   "/entrypoint.sh -f /…"   7 minutes ago   Up 7 minutes             haproxy

And then:

click to see full docker-compose logs for this

➜  cht-core git:(master) ✗ docker-compose up --remove-orphans
Starting haproxy ... done
Starting medic-os ... done
Attaching to haproxy, medic-os
haproxy     | Starting enhanced syslogd: rsyslogd.
haproxy     | # Setting `log` here with the address of 127.0.0.1 will have the effect
haproxy     | # of haproxy sending the udp log messages to its own rsyslog instance
haproxy     | # (which sits at `127.0.0.1`) at the `local0` facility including all
haproxy     | # logs that have a priority greater or equal to the specified log level
haproxy     | # log 127.0.0.1 local0 warning
haproxy     | global
haproxy     |   maxconn 4096
haproxy     |   lua-load /usr/local/etc/haproxy/parse_basic.lua
haproxy     |   lua-load /usr/local/etc/haproxy/parse_cookie.lua
haproxy     |   lua-load /usr/local/etc/haproxy/replace_password.lua
haproxy     |   log /dev/log len 65535 local2 info
haproxy     |
haproxy     | defaults
haproxy     |   mode http
haproxy     |   log global
haproxy     |   option dontlognull
haproxy     |   option http-ignore-probes
haproxy     |   timeout client 150000
haproxy     |   timeout server 3600000
haproxy     |   timeout connect 15000
haproxy     |   stats enable
haproxy     |   stats refresh 30s
haproxy     |   stats auth admin:password
haproxy     |   stats uri /haproxy?stats
haproxy     |
haproxy     | frontend http-in
haproxy     |   bind  *:5984
haproxy     |   acl has_user req.hdr(x-medic-user) -m found
haproxy     |   acl has_cookie req.hdr(cookie) -m found
haproxy     |   acl has_basic_auth req.hdr(authorization) -m found
haproxy     |   declare capture request len 400000
haproxy     |   http-request set-header x-medic-user %[lua.parseBasic] if has_basic_auth
haproxy     |   http-request set-header x-medic-user %[lua.parseCookie] if !has_basic_auth !has_user has_cookie
haproxy     |   http-request capture req.body id 0 # capture.req.hdr(0)
haproxy     |   http-request capture req.hdr(x-medic-service) len 200 # capture.req.hdr(1)
haproxy     |   http-request capture req.hdr(x-medic-user) len 200 # capture.req.hdr(2)
haproxy     |   http-request capture req.hdr(user-agent) len 600 # capture.req.hdr(3)
haproxy     |   capture response header Content-Length len 10 # capture.res.hdr(0)
haproxy     |   log-format "%ci,%ST,%[capture.req.method],%[capture.req.uri],%[capture.req.hdr(1)],%[capture.req.hdr(2)],'%[capture.req.hdr(0),lua.replacePassword]',%B,%Tr,%[capture.res.hdr(0)],'%[capture.req.hdr(3)]'"
haproxy     |   default_backend couch-backend
haproxy     |
haproxy     | frontend http-in2
haproxy     |   bind  *:5986
haproxy     |   default_backend couch-backend2
haproxy     |
haproxy     | backend couch-backend
haproxy     |   balance roundrobin
haproxy     |   server couchdb1 medic-os:5985
haproxy     |
haproxy     | backend couch-backend2
haproxy     |   balance roundrobin
haproxy     |   server couchdb1 medic-os:5987
haproxy     | [alert] 152/224635 (1) : parseBasic loaded
haproxy     | [alert] 152/224635 (1) : parseCookie loaded
haproxy     | [alert] 152/224635 (1) : replacePassword loaded
haproxy     | Jun  2 22:46:35 0205c5fd25a2 haproxy[1]: Proxy http-in started.
haproxy     | Jun  2 22:46:35 0205c5fd25a2 haproxy[1]: Proxy http-in2 started.
haproxy     | Jun  2 22:46:35 0205c5fd25a2 haproxy[1]: Proxy couch-backend started.
haproxy     | Jun  2 22:46:35 0205c5fd25a2 haproxy[1]: Proxy couch-backend2 started.
medic-os    | mesg: ttyname failed: Inappropriate ioctl for device
medic-os    | [2021/06/02 22:46:36] Info: Setting up software...
medic-os    | [2021/06/02 22:46:36] Info: Running setup task 'horticulturalist/sudoers'
medic-os    | [2021/06/02 22:46:36] Info: Running setup task 'horticulturalist/users'
medic-os    | [2021/06/02 22:46:36] Info: Setting up software (14% complete)...
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-api/link-logs'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-api/logrotate'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-api/users'
medic-os    | [2021/06/02 22:46:37] Info: Setting up software (28% complete)...
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/ldconfig'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/link-logs'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/logrotate'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/nginx'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/nginx-ssl'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/profile'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/ssh-authorized-keys'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/ssh-keygen'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-core/usb-modeswitch'
medic-os    | [2021/06/02 22:46:37] Info: Setting up software (42% complete)...
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-couch2pg/link-logs'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-couch2pg/logrotate'
medic-os    | [2021/06/02 22:46:37] Info: Setting up software (57% complete)...
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-rdbms/ldconfig'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-rdbms/link-logs'
medic-os    | [2021/06/02 22:46:37] Info: Running setup task 'medic-rdbms/reconfigure'
medic-os    | [2021/06/02 22:46:37] Info: Setting up software (71% complete)...
medic-os    | [2021/06/02 22:46:37] Warning: Package 'medic-sentinel' is not currently installed
medic-os    | Fatal: Failed to extract required software from disk/image
medic-os exited with code 1

@mrjones-plip
Copy link
Contributor Author

OK! I'm happy to report I believe I have a way to reliably reproduce a startup bug. On Ubuntu 18 with docker 20.10.6, build 370c289 and docker-compose 1.25.4, build 8d51620a I can reliably get an error:

  1. bootstrap the containers successfully - don't use -d so you're attached
  2. stop the containers ctrl + c
  3. nuke the planet from orbit: docker system prune&&docker volume prune
  4. run docker-compose up

this reliably results in:

haproxy     | [ALERT] 154/191358 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:48] : 'server │js:499:39)
couchdb1' : could not resolve address 'medic-os'.                                                │    at /usr/lib/node_modules/medic-conf/node_modules/webpack/lib/Compiler.js:298:10
haproxy     | [ALERT] 154/191358 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:52] : 'server │    at /usr/lib/node_modules/medic-conf/node_modules/webpack/lib/Compiler.js:485:14
couchdb1' : could not resolve address 'medic-os'.                                                │    at AsyncSeriesHook.eval [as callAsync] (eval at create (/usr/lib/node_modules/medic-conf/nod
haproxy     | [ALERT] 154/191358 (1) : Failed to initialize server(s) addr.                      │e_modules/tapable/lib/HookCodeFactory.js:33:10), <anonymous>:13:1) 
haproxy exited with code 1 
[SNIP]
medic-os    | [2021/06/04 19:20:39] Info: Starting services...                                   │rms/app/delivery-media for form /home/mrjones/Documents/MedicMobile/cht-core2/config/default/formedic-os    | [2021/06/04 19:20:39] Info: Synchronizing disks...                                 │ms/app/delivery.xml 
medic-os    | [2021/06/04 19:20:50] Info: System started successfully                            │INFO Form /home/mrjones/Documents/MedicMobile/cht-core2/config/default/forms/app/delivery.xml up
medic-os    | [2021/06/04 19:20:50] Info: Starting log streaming                                 │loaded 
medic-os    | Fatal: CouchDB failed to start properly

the fix being to ctrl + c again and up again.

@Hareet
Copy link
Member

Hareet commented Jun 6, 2021

@mrjones-plip

A couple things here, all related to entrypoints and networking. Your haproxy container is unable to talk to your medic-os container. As you highlighted, we spoke about how restarts would be the best fix. You can also play around with the restart declaration in the compose template to handle some of the scenarios you will discover.

Entrypoints

The issue above is resolved by entrypoint scripts, and essentially running something like wait-for-it.sh for your other container's service to come up. We have resolved this in architecture v3 for all the separate containers and services. We still need to thoroughly test that. You should be able to use our test haproxy image that contains the wait for it script for port 5984, but it may break in other untested scenarios. Furthermore, I'd recommend just restarting compose for this particular scenario like you recommended at the bottom and performing further testing for startup issues on our new images, compose architecture.

Networking

Say haproxy never resolves medic-os but couchdb is up inside? There is some networking changes across Operating Systems. Your haproxy container isn't able to find the medic-os container in this setup. This is resolved by: https://github.com/medic/medic-infrastructure/blob/master/images/haproxy/haproxy.cfg#L48 and setting that COUCHDB_HOST variable as localhost in your docker-compose file. You will also need to declare host networking in your compose file. I'd recommend this approach and just using a range of uncommon ports. This would work on most versions of operations systems. Obviously, we don't recommend this setup for production.

The proper way for a "one solution" fix would be to discover networking configuration before docker-compose launches to configure necessary networking variables in the compose file. Something we may think about resolving later.

Also, there's a lot of scripts and systemctl/supervisor status inside medic-os. Those do not correctly reflect the status of the services overtime due to issues with running a large VM inside a container. A lot of what you are solving, is essentially that, and is solved out of the box in arch v3.

@mrjones-plip
Copy link
Contributor Author

@Hareet - thanks for the thoughts on how this might be improved! I very much defer to SRE for what's the best solution to public updates to docker-compose.yml a la changing networking to host. I think this would be better than private App Services only internal publishing of a new docker-compose.yml.

My working assumption right now is that SRE is working through the queue of tickets for the larger App Builder effort and trying to see where they fit in schedule wise. Please let me know if this is not the case!

@mrjones-plip mrjones-plip self-assigned this Jul 27, 2021
@mrjones-plip
Copy link
Contributor Author

Adding some notes now that I'm getting started on this:

  • likely use bashsimplecurses to handle "windowing"
  • can use a spicier version of this script to find LAN IP which can then be made into localhost-ip URL: https://YOU-IP-HERE-OK.my.local-ip.co. We'll just enhance the script to match the LAN IP with the router IP!
  • wondering if ip is a common enough util? 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants