Glusterfs catalog version wont start anymore #3670

joostliketoast · 2016-02-22T13:27:33Z

Version:
rancher v0.59.0
cattle v0.148.0
user interface v0.90.0
rancher compose v0.7.2

Steps:

Create glusterFS stack
Create convoy gluster stack
Create some volumes
Remove the volumes
Remove gluster and convoy stack
Create Glusterfs

Results:
glusterfs stays in a booting loop

Expected:
A running glusterfs stack

EDIT:
step 7. Create convoy gluster

results in the following error:


2/22/2016 2:26:48 PMWaiting for metadata.
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/2522/ns/mnt -F -- /var/lib/docker/aufs/mnt/a6ba41d189c1d3adbf3c8cbdb347ada8f0786910277c3184785a21ccc937441e/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30 -- /launch volume-agent-glusterfs-internal]"
2/22/2016 2:26:48 PMWaiting for metadata
2/22/2016 2:26:48 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Got: drivers [glusterfs]"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=web_vol glusterfs.servers=glusterfs]"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --root=/var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30 --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=web_vol --driver-opts=glusterfs.servers=glusterfs]"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30" pkg=daemon
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.servers:glusterfs glusterfs.defaultvolumepool:web_vol] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=debug msg="Volume web_vol is being mounted it to /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30/glusterfs/mounts/web_vol, with option [-t glusterfs]" pkg=util
2/22/2016 2:26:49 PMtime="2016-02-22T13:26:49Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:50 PMtime="2016-02-22T13:26:50Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:51 PMtime="2016-02-22T13:26:51Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:52 PMtime="2016-02-22T13:26:52Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:53 PMtime="2016-02-22T13:26:53Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:54 PMtime="2016-02-22T13:26:54Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:55 PMtime="2016-02-22T13:26:55Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:56 PMtime="2016-02-22T13:26:56Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:57 PMtime="2016-02-22T13:26:57Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:58 PMtime="2016-02-22T13:26:58Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:59 PMtime="2016-02-22T13:26:59Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:00 PMtime="2016-02-22T13:27:00Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:01 PMtime="2016-02-22T13:27:01Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:02 PMtime="2016-02-22T13:27:02Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:03 PMtime="2016-02-22T13:27:03Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=debug msg="Cleaning up environment..." pkg=daemon
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/web_vol /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30/glusterfs/mounts/web_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
2/22/2016 2:27:04 PM{
2/22/2016 2:27:04 PM    "Error": "Failed to execute: mount [-t glusterfs glusterfs:/web_vol /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30/glusterfs/mounts/web_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
2/22/2016 2:27:04 PM}
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=info msg="convoy exited with error: exit status 1"
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=info msg=Exiting.

The text was updated successfully, but these errors were encountered:

deniseschannon · 2016-02-24T22:08:50Z

We recently replaced the existing version of GlusterFS, which may have caused this issue.

When did you launch your original GlusterFS and your new one? Also, when did you launch convoy-gluster?

joostliketoast · 2016-02-25T20:21:04Z

i just tried it again by creating a new environment setting up glusterfs and gluster convoy then added some volumes and then deleting the glusterfs and then the gluster convoy.

Then creating a glusterfs which seems to get running alright after it restarts itself a couple of times.
but the gluster convoy keeps saying:

2/25/2016 9:17:31 PMWaiting for metadata.
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/2197/ns/mnt -F -- /var/lib/docker/aufs/mnt/08c60e5598cc2d655b86472b0fea779c0f6d0f4e3c1fe67d0fda2f49685a3510/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad -- /launch volume-agent-glusterfs-internal]"
2/25/2016 9:17:31 PMWaiting for metadata
2/25/2016 9:17:31 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=web_storage glusterfs.servers=glusterfs]"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="Got: drivers [glusterfs]"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --driver-opts=glusterfs.defaultvolumepool=web_storage --driver-opts=glusterfs.servers=glusterfs --root=/var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad --drivers=glusterfs]"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad" pkg=daemon
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.defaultvolumepool:web_storage glusterfs.servers:glusterfs] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=debug msg="Volume web_storage is being mounted it to /var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad/glusterfs/mounts/web_storage, with option [-t glusterfs]" pkg=util
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=debug msg="Cleaning up environment..." pkg=daemon
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/web_storage /var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad/glusterfs/mounts/web_storage], output Mount failed. Please check the log file for more details.\n, error exit status 1"
2/25/2016 9:17:31 PM{
2/25/2016 9:17:31 PM    "Error": "Failed to execute: mount [-t glusterfs glusterfs:/web_storage /var/lib/rancher/convoy/convoy-gluster-7709e40d-6754-4e97-ade2-741f7051a2ad/glusterfs/mounts/web_storage], output Mount failed. Please check the log file for more details.\n, error exit status 1"
2/25/2016 9:17:31 PM}
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg="convoy exited with error: exit status 1"
2/25/2016 9:17:31 PMtime="2016-02-25T20:17:31Z" level=info msg=Exiting.

edit:

this is the output from the convoy gluster storage pool container:


2/25/2016 9:05:13 PMWaiting for metadata.
2/25/2016 9:05:13 PMtime="2016-02-25T20:05:13Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
2/25/2016 9:05:13 PMtime="2016-02-25T20:05:13Z" level=info msg="Socket file: /host/var/run/convoy-convoy-gluster.sock"
2/25/2016 9:05:13 PMtime="2016-02-25T20:05:13Z" level=info msg="Initializing event router" workerCount=10
2/25/2016 9:05:13 PMtime="2016-02-25T20:05:13Z" level=info msg="Connection established"
2/25/2016 9:05:18 PMtime="2016-02-25T20:05:18Z" level=debug msg="storagepool event [096eefea-1df7-4ce9-a2a2-6d219cd2a5e3 7e61b68b-7a83-4e67-9a27-572e08eac5b0 65c5cbce-71c6-4749-ba88-8860269f62af]"

and from the glusterfs server:


2/25/2016 8:44:13 PMWaiting for all service containers to start...
2/25/2016 8:48:19 PMContainers are starting...
2/25/2016 8:48:19 PMWaiting for Gluster Daemons to come up
2/25/2016 8:49:02 PMgluster peer probe 10.42.195.55
2/25/2016 8:49:02 PMpeer probe: success. Host 10.42.195.55 port 24007 already in peer list
2/25/2016 8:49:03 PMgluster peer probe 10.42.163.174
2/25/2016 8:49:03 PMpeer probe: failed: Probe returned with Transport endpoint is not connected
2/25/2016 8:50:26 PMWaiting for all service containers to start...
2/25/2016 8:50:33 PMContainers are starting...
2/25/2016 8:50:33 PMWaiting for Gluster Daemons to come up
2/25/2016 8:51:06 PMgluster peer probe 10.42.195.55
2/25/2016 8:51:06 PMpeer probe: success. Host 10.42.195.55 port 24007 already in peer list
2/25/2016 8:51:07 PMgluster peer probe 10.42.163.174
2/25/2016 8:51:07 PMpeer probe: success.

anymore information usefull?

tfiduccia · 2016-02-25T21:29:18Z

Rancher version - master
Glusterfs template version - 3.7.5-rancher1
Convoy glusterfs template version - 2.0

I followed the steps above. With the newest build and latest glusterfs and convoy glusterfs templates, I was not able to reproduce. I did notice that if you don't wait a little bit to recreate glusterfs, there are problems. Even after the delete it takes a few minutes for everything to clear out. Please wait for the next build and try again, this should be resolved. If it is not please reopen and let me know.

joostliketoast · 2016-02-26T22:38:30Z

i think my issue might be related to #2903
seeing as i got the same problem with the gluster convoy

deniseschannon · 2016-03-01T19:42:45Z

@joostliketoast Yes, if you deployed when we first launched these templates, you might be having these issues. The newer templates address these re-deployment scenarios.

joostliketoast · 2016-03-03T11:33:20Z

@deniseschannon i assumed i used the latest templates when selecting from the catalog, is there any other way to force updating them? Cause the issue is still here after recreating the 2 stacks

joostliketoast · 2016-03-03T11:38:16Z

also my glusterfs stack seems to have a problem getting the server containers up in time

3/3/2016 12:34:07 PMWaiting for Gluster Daemons to come up
3/3/2016 12:34:37 PMpool list: failed

cause it only waits for 30 seconds and sometimes it takes longer to get a container up.
this cause to stack to go into a reboot loop trying to get all 3 running

soundman666 · 2016-03-03T11:41:24Z

@deniseschannon , @joostliketoast , I have the same problem

joostliketoast · 2016-03-03T12:29:00Z

from the agent.log incase it might be usefull

2016-03-03 12:20:05,411 INFO agent [139819748437360] [utils.py:430] Response: {"name": "reply.8410449701303523903", "transitioningProgress": null, "resourceType": null, "resourceId": null, "id": "21a593c3-1bac-4b90-ad17-e8d4b2ad3b29", "transitioningMessage": "Update failed", "time": 1457007605000, "previousNames": ["delegate.request"], "transitioning": "yes", "data": {"name": "config.update.reply", "transitioningProgress": null, "resourceType": "agent", "resourceId": "461", "id": "0588464c-7505-4b39-a598-49ed42dff6b8", "transitioningMessage": "Update failed", "time": 1457007605000, "previousNames": ["config.update"], "transitioning": "yes", "data": {"output": "Lock failed", "exitCode": 122}, "previousIds": ["34b1bbb3-92d9-45bb-8ca8-8a107f7cd81c"]}, "previousIds": ["a0aa1e95-b9f6-4bc6-94ac-0f0a9cdc5d5a"]} [0.00612998008728] seconds

deniseschannon · 2016-03-09T05:24:32Z

@cloudnautique Can you take a look?

cloudnautique · 2016-03-09T06:03:13Z

Out of curiosity, what size hosts are these? Are they located within a relatively close network?

joostliketoast · 2016-03-14T09:17:56Z

i think its related to this issue #3750
the screenshot on there also has the specs of the hosts

pboos · 2016-03-17T13:13:20Z

We are experiencing the same problem. It seems like glusterfs is not correctly starting up (though it shows up as green). convoy-gluster fails terribly at starting up with the message further below.

We used the older versions of glusterfs before. We removed them completely and switched to the new version. Maybe that triggers the problem?

We are willing to try out anything you suggest and report back.

Info to the hosts: They are within the same data center (two of them even in the same rack). And the machines are 4core machines with 64gb ram. Pretty powerful machines. Network is really fast as well. Ping between the servers through the VPN is < 1ms.

On the glusterfs_glusterfs-server_1 1-3 we see the following (the show up green).

3/17/2016 1:33:00 PMWaiting for all service containers to start...
3/17/2016 1:33:01 PMContainers are starting...
3/17/2016 1:33:01 PMWaiting for Gluster Daemons to come up
3/17/2016 1:38:48 PMWaiting for all service containers to start...
3/17/2016 1:38:49 PMContainers are starting...
3/17/2016 1:38:49 PMWaiting for Gluster Daemons to come up
3/17/2016 1:55:36 PMgluster peer probe 10.42.162.226
3/17/2016 1:55:36 PMConnection failed. Please check if gluster daemon is operational.
3/17/2016 1:56:37 PMWaiting for all service containers to start...
3/17/2016 1:56:38 PMContainers are starting...
3/17/2016 1:56:38 PMWaiting for Gluster Daemons to come up

Error on convoy-gluster container convoy-gluster_convoy-gluster_1

3/17/2016 1:56:28 PMWaiting for metadata.
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/645/ns/mnt -F -- /var/lib/docker/aufs/mnt/d08f1b25cb1d7d119db93d942329f25599f5981b2e6ed65c5b4b7b27f48e424a/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54 -- /launch volume-agent-glusterfs-internal]"
3/17/2016 1:56:28 PMWaiting for metadata
3/17/2016 1:56:28 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Got: drivers [glusterfs]"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=integral_vol glusterfs.servers=glusterfs]"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --root=/var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54 --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=integral_vol --driver-opts=glusterfs.servers=glusterfs]"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54" pkg=daemon
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.defaultvolumepool:integral_vol glusterfs.servers:glusterfs] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=debug msg="Volume integral_vol is being mounted it to /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54/glusterfs/mounts/integral_vol, with option [-t glusterfs]" pkg=util
3/17/2016 1:56:29 PMtime="2016-03-17T12:56:29Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:30 PMtime="2016-03-17T12:56:30Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:31 PMtime="2016-03-17T12:56:31Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:32 PMtime="2016-03-17T12:56:32Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:33 PMtime="2016-03-17T12:56:33Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:34 PMtime="2016-03-17T12:56:34Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:35 PMtime="2016-03-17T12:56:35Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:36 PMtime="2016-03-17T12:56:36Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=debug msg="Cleaning up environment..." pkg=daemon
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/integral_vol /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54/glusterfs/mounts/integral_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
3/17/2016 1:56:37 PM{
3/17/2016 1:56:37 PM    "Error": "Failed to execute: mount [-t glusterfs glusterfs:/integral_vol /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54/glusterfs/mounts/integral_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
3/17/2016 1:56:37 PM}
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=info msg="convoy exited with error: exit status 1"
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=info msg=Exiting.

And some screenshots here

cloudnautique · 2016-03-17T16:10:54Z

What were the logs on glusterfs_glusterfs-server_glusterfs-volume-create_1?

pboos · 2016-03-17T16:30:10Z

Log of: glusterfs_glusterfs-server_glusterfs-volume-create_1

3/17/2016 5:21:35 PMWaiting for all containers to come up...
3/17/2016 5:22:14 PMContainers are coming up...

Delete + recreate of glusterfs

glusterfs_glusterfs-server_glusterfs-volume-create_1-3

We now deleted and recreated glusterfs. The log on create_1 is now:

3/17/2016 5:20:20 PMWaiting for all containers to come up...
3/17/2016 5:22:13 PMContainers are coming up...
3/17/2016 5:22:13 PMWaiting for pool...
3/17/2016 5:22:18 PMWaiting for pool...
3/17/2016 5:22:23 PMWaiting for pool...
3/17/2016 5:22:28 PMWaiting for pool...
3/17/2016 5:22:33 PMWaiting for pool...
3/17/2016 5:22:38 PMWaiting for pool...
3/17/2016 5:22:43 PMWaiting for pool...
3/17/2016 5:22:48 PMWaiting for pool...
3/17/2016 5:22:53 PMWaiting for pool...
3/17/2016 5:22:58 PMWaiting for peerprobes and gluster daemons to come on line
3/17/2016 5:23:33 PMGetting peer mount points...
3/17/2016 5:23:33 PMVolume integral_vol does not exist
3/17/2016 5:23:33 PMCreating volume integral_vol...
3/17/2016 5:23:33 PMvolume create: integral_vol: success: please start the volume to access data
3/17/2016 5:23:38 PMStarting volume integral_vol...
3/17/2016 5:23:38 PMvolume start: integral_vol: success

But create_2 and create_3 are stuck at Containers are coming up... (since 30 minutes)

It seems like create_2 and create_3 are not even Waiting for pool.... As if something did not get started correctly.

glusterfs_glusterfs-server_1-3

As well there is a difference between glusterfs_glusterfs-server_1, glusterfs_glusterfs-server_2, glusterfs_glusterfs-server_3. server_1 and server_2 start up okay it seems:

3/17/2016 5:19:58 PMWaiting for all service containers to start...
3/17/2016 5:22:12 PMContainers are starting...
3/17/2016 5:22:12 PMWaiting for Gluster Daemons to come up
3/17/2016 5:22:56 PMgluster peer probe 10.42.57.8
3/17/2016 5:22:56 PMpeer probe: success.
3/17/2016 5:22:57 PMgluster peer probe 10.42.179.194
3/17/2016 5:22:57 PMpeer probe: success.

but glusterfs_glusterfs-server_3:

3/17/2016 5:22:30 PMWaiting for all service containers to start...
3/17/2016 5:22:31 PMContainers are starting...
3/17/2016 5:22:31 PMWaiting for Gluster Daemons to come up

The log is like this for over 20 minutes already.

Anything else we can provide you with? Other log files somewhere in the containers, or whatever.

cjellick · 2016-03-17T17:00:56Z

I looked into @deniseschannon's setup where she had this problem.

The volume never got created by the glusterfs. The create-volume container had this output:

3/17/2016 9:10:52 AMWaiting for all containers to come up...
3/17/2016 9:13:02 AMContainers are coming up...
3/17/2016 9:13:02 AMWaiting for pool...
3/17/2016 9:13:07 AMWaiting for pool...
3/17/2016 9:13:12 AMWaiting for pool...
3/17/2016 9:13:17 AMWaiting for pool...
3/17/2016 9:13:22 AMWaiting for pool...
3/17/2016 9:13:27 AMWaiting for pool...
3/17/2016 9:13:32 AMWaiting for pool...
3/17/2016 9:13:37 AMWaiting for pool...
3/17/2016 9:13:42 AMpool list: failed
3/17/2016 9:13:42 AMWaiting for pool...

and was stopped. I started that container back up and it created the gluster volume and convoy-gluster started up properly.

I'm sure there is some fix that will work around this problem, but I would also suggest that we do this:
#3356
so that the glusterfs stack appears unhealthy if it failed to create the gluster volume.

cjellick · 2016-03-17T17:02:23Z

Is the issue that when this happend: 3/17/2016 9:13:42 AMpool list: failed, the create volume container prematurely quit and did not restart?

mholttech · 2016-03-18T05:15:10Z

I tried the steps @cjellick mentioned on my own last week with no success, i'll update everything to the latest codebase and try this again tomorrow

cloudnautique · 2016-03-18T05:21:13Z

@pboos , the create_volume containers do a leader election. Only the first one, under normal circumstances, will create the volume. The others just exit, and we show them as having completed their task in the UI. They only ever run once.

cloudnautique · 2016-03-23T05:19:25Z

@joostliketoast, we pushed a new template up today. It has a revamped initialization process, that we feel is more stable. Please give it a try.

sangeethah · 2016-03-23T20:22:34Z

With the latest gluster-fs version - 3.7.9 and rancher-server version - v1.0.0-rc1

Create glusterFS stack.
Create convoy gluster stack.
Launch a service with volumes using volume-driver convoy-gluster.
Delete the service.
remove and purge the volume.

Purged volume continues to be listed in the docker volume ls command:
Also I see this error - "list convoy-gluster: invalid character 'H' looking for beginning of value" when listing docker volumes.

root@sangeemyrc1-10acre-2:/home/sangeethahariharan1# docker volume ls
list convoy-gluster: invalid character 'H' looking for beginning of value
DRIVER              VOLUME NAME
local               f9f5998d80e40031c19274069db0245a2eaabd2eccb3c066e9ec9afbd2f84a55
local               a25506dbda733a505fea83a2b26f4757f8b8aa85d7ec1f5f793fa90a2f257c66
local               fed7bf62e63f0e2755e822128618c27217e9b817b6b9f4167fd7794de9256b3a
local               1d80100d5a14bb6fc42eb4ab0ccee24270a0d0e1e6815e045a9ba02d068d497d
local               cd5b18f15269b4046b9ea7dda1dcc44d3430bfae2cadcf1407ae7c28ceb4d829
local               ee0e649506b3cc03e5bb5f00a109eb196ff74be90e6c64472c341f7572c3cf7e
convoy-gluster      test1
root@sangeemyrc1-10acre-2:/home/sangeethahariharan1#

rancher/convoy-agent -v0.3.0 is being used convoy-gluster instances.

This issue is tracked in #3671

sangeethah · 2016-03-23T20:46:27Z

Will need to retest this scenario once #3671 gets addressed

sangeethah · 2016-03-23T21:00:28Z

If I remove the existing gluster and convoy stack and recreate Glusterfs and convoy-gluster stack ,

glusterfs stack comes up fine . But instances in convoy-gluster are not able to start successgully and are in "stopped" state.

Following is the container logs of convoy-gluster_convoy-gluster_1 instance:

3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/8715/ns/mnt -F -- /var/lib/docker/aufs/mnt/2521eb6c2cc4a543a06384696c2f38c481a3bf9b4606ea94ee40d0923860386d/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5 -- /launch volume-agent-glusterfs-internal]"
3/23/2016 1:54:40 PMWaiting for metadata
3/23/2016 1:54:40 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="Got: drivers [glusterfs]"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=my_vol glusterfs.servers=glusterfs]"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=my_vol --driver-opts=glusterfs.servers=glusterfs --root=/var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5]"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5" pkg=daemon
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.defaultvolumepool:my_vol glusterfs.servers:glusterfs] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=debug msg="Volume my_vol is being mounted it to /var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5/glusterfs/mounts/my_vol, with option [-t glusterfs]" pkg=util
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=debug msg="Cleaning up environment..." pkg=daemon
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/my_vol /var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5/glusterfs/mounts/my_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
3/23/2016 1:54:40 PM{
3/23/2016 1:54:40 PM    "Error": "Failed to execute: mount [-t glusterfs glusterfs:/my_vol /var/lib/rancher/convoy/convoy-gluster-265a2c39-f391-48ef-898e-6bc6bd6162f5/glusterfs/mounts/my_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
3/23/2016 1:54:40 PM}
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg="convoy exited with error: exit status 1"
3/23/2016 1:54:40 PMtime="2016-03-23T20:54:40Z" level=info msg=Exiting.

iangcarroll · 2016-04-01T13:34:58Z

Also running into this issue fairly frequently (same log as @sangeethah).

guruvan · 2016-04-10T06:55:35Z

Well - I'm trying to track this down as well - I've clearly got glusterfs running correctly

started with preexisting environment, having attempted this previously, and having marked entries in the storage_pool and storage_pool_host_map tables purged or removed - storage pool appeared to be removed.
created convoy-gluster stack by grabbing .yml files from catalog + upload slightly modded via UI
storagepool service is stuck initializing - appears to not pass healthcheck(?)
Infrastructure/Storagepools shows all the hosts, but not volume
convoy-agent instances (global) would not remain in Running state, continued to stop/start
deleted the convoy-gluster stack
rebuilt the convoy-gluster stack
convoy-agent runs now on one host successfully
- this host happens to be the rancher-server with a rancher-agent running on it
- all other services appear to have correctly working intercontainer networking (we have several services in full production now)
- the remaining hosts have no evidence of the convoy-convoy-gluster.sock that should be present
- rm /etc/docker/plugins/*.spec && reboot host appears to have no effect

Convoy isn't creating the socket on these hosts. (but did on the one host?)

guruvan · 2016-04-10T22:29:32Z

UPDATE: - semi-working state

Removed and replaced the convoy-gluster stack, setting up the storagepool service instance to run on the host I had working. This finally finished initializing on this host (this host had produced a proper convoy-gluster socket
Rebooted a glusterfs server that appears to be underpowered and was lagging
At this time (after a few rancher-initiated container restarts) the convoy-gluster containers ceased reporting errors, and appeared to be in the working state. Test data expected to be on the glusterfs mount was present on the host as expected in /var/lib/rancher/convoy/convoy-gluster-126a2d27-845f-44b3-96bd-6b303cf8f985/glusterfs/mounts/my_vol
One host refused to create a socket and run the convoy-gluster container - I deactivated this host
I spun up a fresh host (on AWS) and added to rancher via Custom host option
convoy-gluster deployed to this host and fired right up (looks like it restarted 1 or 2 times first, and then
correctly launched)

Infrastructure/Storagepools shows "convoy-glusterfs" and shows all hosts as green, and no volumes
Adding a volume here results in "Requested" state for requested volume name (never progresses)

Testing docker, I'm able to mount the glusterfs volume:
docker run -it --rm --volume my_vol:/data --volume-driver=convoy-gluster guruvan/bash touch /data/anewfile
this file is correctly replicated and available across hosts, however it appears in a subdirectory of the original volume with the name of the volume:
/var/lib/rancher/convoy/convoy-gluster-126a2d27-845f-44b3-96bd-6b303cf8f985/glusterfs/mounts/my_vol/my_vol

I am unable as yet to get Rancher to launch a container with the same parameters

from rancher-server logs:
016-04-10 22:14:05,542 ERROR [be5bb2bf-6580-4684-a8d1-2b2da3ada6fe:118955] [instance:1952] [instance.start->(InstanceStart)->instance.allocate] [] [torService-1715] [c.p.e.p.i.DefaultProcessInstanceImpl] Unknown exception io.cattle.platform.eventing.exception.EventExecutionException: Scheduling failed: volume [3027] have exactly these pool(s): [13]

Conclusion: Rancher networking issue appears to be at the root of the original problem. The host above which is now deactivated also seems otherwise unable to participate in rancher managed overlay network. (no rancher managed containers on this host ping any other containers on any other hosts AFAICT so far)

this host has networking services as standalone containers that may interfere

convoy appears to be working, while it appears likely that the rancher DB is ....a mess at this point.

YMMV ;)

cjellick · 2016-10-03T15:56:25Z

GlusterFS. Don't think this needs to be in 1.2.0.

deniseschannon · 2016-10-04T04:23:49Z

Please note that we have removed GlusterFS and Convoy Gluster from the catalog as users were expecting a robust tool as an alternative persistent storage for Docker volumes. However, due to lack of active maintenance, we cannot recommend this solution going forward.

Instead, we recommend and certify Convoy NFS, which is actively maintained by Rancher. As a user, you can get GlusterFS support directly from Red Hat and use it in Rancher using Rancher's NFS plugin.

Due to these changes regarding glusterfs and convoy gluster, we will not be addressing this bug for 1.2.0

deniseschannon · 2016-12-13T19:03:50Z

We aren't able to actively help maintain GlusterFS in the catalog and will not be able to fix these issues.

will-chan modified the milestone: Milestone 2/24/2016 Feb 23, 2016

will-chan assigned cloudnautique Feb 23, 2016

will-chan modified the milestones: Release 1.0, Milestone 2/24/2016 Feb 23, 2016

will-chan added the release/v1.0.0 label Feb 23, 2016

tfiduccia closed this as completed Feb 25, 2016

deniseschannon reopened this Mar 9, 2016

deniseschannon removed the release/v1.0.0 label Mar 13, 2016

deniseschannon added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Mar 17, 2016

deniseschannon mentioned this issue Mar 17, 2016

Glusterfs Not Working #3878

Closed

cloudnautique added the status/to-test label Mar 23, 2016

cloudnautique assigned sangeethah and unassigned cloudnautique Mar 23, 2016

sangeethah removed the status/to-test label Mar 23, 2016

sangeethah assigned cjellick and unassigned sangeethah Mar 23, 2016

will-chan modified the milestones: Release 1.1, Release 1.0 Mar 23, 2016

guruvan mentioned this issue Apr 11, 2016

Convoy-Gluster - Can't reuse volume name after Deleting Volume and Purging #3671

Closed

iangcarroll mentioned this issue Apr 13, 2016

instances using convoy in "starting" loop after docker restart on host #3641

Closed

will-chan modified the milestones: Release 1.2.0, Release 1.1.0 May 31, 2016

deniseschannon modified the milestones: Release 1.3.0, Release 1.2.0 Oct 4, 2016

deniseschannon closed this as completed Dec 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glusterfs catalog version wont start anymore #3670

Glusterfs catalog version wont start anymore #3670

joostliketoast commented Feb 22, 2016

deniseschannon commented Feb 24, 2016

joostliketoast commented Feb 25, 2016

tfiduccia commented Feb 25, 2016

joostliketoast commented Feb 26, 2016

deniseschannon commented Mar 1, 2016

joostliketoast commented Mar 3, 2016

joostliketoast commented Mar 3, 2016

soundman666 commented Mar 3, 2016

joostliketoast commented Mar 3, 2016

deniseschannon commented Mar 9, 2016

cloudnautique commented Mar 9, 2016

joostliketoast commented Mar 14, 2016

pboos commented Mar 17, 2016

cloudnautique commented Mar 17, 2016

pboos commented Mar 17, 2016

cjellick commented Mar 17, 2016

cjellick commented Mar 17, 2016

mholttech commented Mar 18, 2016

cloudnautique commented Mar 18, 2016

cloudnautique commented Mar 23, 2016

sangeethah commented Mar 23, 2016

sangeethah commented Mar 23, 2016

sangeethah commented Mar 23, 2016

iangcarroll commented Apr 1, 2016

guruvan commented Apr 10, 2016

guruvan commented Apr 10, 2016

cjellick commented Oct 3, 2016 •

edited

deniseschannon commented Oct 4, 2016

deniseschannon commented Dec 13, 2016

Glusterfs catalog version wont start anymore #3670

Glusterfs catalog version wont start anymore #3670

Comments

joostliketoast commented Feb 22, 2016

deniseschannon commented Feb 24, 2016

joostliketoast commented Feb 25, 2016

tfiduccia commented Feb 25, 2016

joostliketoast commented Feb 26, 2016

deniseschannon commented Mar 1, 2016

joostliketoast commented Mar 3, 2016

joostliketoast commented Mar 3, 2016

soundman666 commented Mar 3, 2016

joostliketoast commented Mar 3, 2016

deniseschannon commented Mar 9, 2016

cloudnautique commented Mar 9, 2016

joostliketoast commented Mar 14, 2016

pboos commented Mar 17, 2016

cloudnautique commented Mar 17, 2016

pboos commented Mar 17, 2016

Delete + recreate of glusterfs

glusterfs_glusterfs-server_glusterfs-volume-create_1-3

glusterfs_glusterfs-server_1-3

cjellick commented Mar 17, 2016

cjellick commented Mar 17, 2016

mholttech commented Mar 18, 2016

cloudnautique commented Mar 18, 2016

cloudnautique commented Mar 23, 2016

sangeethah commented Mar 23, 2016

sangeethah commented Mar 23, 2016

sangeethah commented Mar 23, 2016

iangcarroll commented Apr 1, 2016

guruvan commented Apr 10, 2016

guruvan commented Apr 10, 2016

cjellick commented Oct 3, 2016 • edited

deniseschannon commented Oct 4, 2016

deniseschannon commented Dec 13, 2016

cjellick commented Oct 3, 2016 •

edited