New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glusterfs catalog version wont start anymore #3670
Comments
We recently replaced the existing version of GlusterFS, which may have caused this issue. When did you launch your original GlusterFS and your new one? Also, when did you launch convoy-gluster? |
i just tried it again by creating a new environment setting up glusterfs and gluster convoy then added some volumes and then deleting the glusterfs and then the gluster convoy. Then creating a glusterfs which seems to get running alright after it restarts itself a couple of times.
edit: this is the output from the convoy gluster storage pool container:
and from the glusterfs server:
anymore information usefull? |
Rancher version - master I followed the steps above. With the newest build and latest glusterfs and convoy glusterfs templates, I was not able to reproduce. I did notice that if you don't wait a little bit to recreate glusterfs, there are problems. Even after the delete it takes a few minutes for everything to clear out. Please wait for the next build and try again, this should be resolved. If it is not please reopen and let me know. |
i think my issue might be related to #2903 |
@joostliketoast Yes, if you deployed when we first launched these templates, you might be having these issues. The newer templates address these re-deployment scenarios. |
@deniseschannon i assumed i used the latest templates when selecting from the catalog, is there any other way to force updating them? Cause the issue is still here after recreating the 2 stacks |
also my glusterfs stack seems to have a problem getting the server containers up in time
cause it only waits for 30 seconds and sometimes it takes longer to get a container up. |
@deniseschannon , @joostliketoast , I have the same problem |
from the agent.log incase it might be usefull
|
@cloudnautique Can you take a look? |
Out of curiosity, what size hosts are these? Are they located within a relatively close network? |
i think its related to this issue #3750 |
We are experiencing the same problem. It seems like glusterfs is not correctly starting up (though it shows up as green). convoy-gluster fails terribly at starting up with the message further below. We used the older versions of glusterfs before. We removed them completely and switched to the new version. Maybe that triggers the problem? We are willing to try out anything you suggest and report back. Info to the hosts: They are within the same data center (two of them even in the same rack). And the machines are 4core machines with 64gb ram. Pretty powerful machines. Network is really fast as well. Ping between the servers through the VPN is < 1ms. On the
Error on convoy-gluster container
|
What were the logs on glusterfs_glusterfs-server_glusterfs-volume-create_1? |
Log of:
Delete + recreate of glusterfsglusterfs_glusterfs-server_glusterfs-volume-create_1-3We now deleted and recreated glusterfs. The log on create_1 is now:
But create_2 and create_3 are stuck at It seems like create_2 and create_3 are not even glusterfs_glusterfs-server_1-3As well there is a difference between glusterfs_glusterfs-server_1, glusterfs_glusterfs-server_2, glusterfs_glusterfs-server_3. server_1 and server_2 start up okay it seems:
but glusterfs_glusterfs-server_3:
The log is like this for over 20 minutes already. Anything else we can provide you with? Other log files somewhere in the containers, or whatever. |
I looked into @deniseschannon's setup where she had this problem. The volume never got created by the glusterfs. The create-volume container had this output:
and was stopped. I started that container back up and it created the gluster volume and convoy-gluster started up properly. I'm sure there is some fix that will work around this problem, but I would also suggest that we do this: |
Is the issue that when this happend: |
I tried the steps @cjellick mentioned on my own last week with no success, i'll update everything to the latest codebase and try this again tomorrow |
@pboos , the create_volume containers do a leader election. Only the first one, under normal circumstances, will create the volume. The others just exit, and we show them as having completed their task in the UI. They only ever run once. |
@joostliketoast, we pushed a new template up today. It has a revamped initialization process, that we feel is more stable. Please give it a try. |
With the latest gluster-fs version - 3.7.9 and rancher-server version - v1.0.0-rc1 Create glusterFS stack. Purged volume continues to be listed in the docker volume ls command:
rancher/convoy-agent -v0.3.0 is being used convoy-gluster instances. This issue is tracked in #3671 |
Will need to retest this scenario once #3671 gets addressed |
If I remove the existing gluster and convoy stack and recreate Glusterfs and convoy-gluster stack , glusterfs stack comes up fine . But instances in convoy-gluster are not able to start successgully and are in "stopped" state. Following is the container logs of convoy-gluster_convoy-gluster_1 instance:
|
Also running into this issue fairly frequently (same log as @sangeethah). |
Well - I'm trying to track this down as well - I've clearly got glusterfs running correctly
Convoy isn't creating the socket on these hosts. (but did on the one host?) |
UPDATE: - semi-working state
Infrastructure/Storagepools shows "convoy-glusterfs" and shows all hosts as green, and no volumes Testing docker, I'm able to mount the glusterfs volume: I am unable as yet to get Rancher to launch a container with the same parameters from rancher-server logs: Conclusion: Rancher networking issue appears to be at the root of the original problem. The host above which is now deactivated also seems otherwise unable to participate in rancher managed overlay network. (no rancher managed containers on this host ping any other containers on any other hosts AFAICT so far)
convoy appears to be working, while it appears likely that the rancher DB is ....a mess at this point. YMMV ;) |
GlusterFS. Don't think this needs to be in 1.2.0. |
Please note that we have removed GlusterFS and Convoy Gluster from the catalog as users were expecting a robust tool as an alternative persistent storage for Docker volumes. However, due to lack of active maintenance, we cannot recommend this solution going forward. Instead, we recommend and certify Convoy NFS, which is actively maintained by Rancher. As a user, you can get GlusterFS support directly from Red Hat and use it in Rancher using Rancher's NFS plugin. Due to these changes regarding glusterfs and convoy gluster, we will not be addressing this bug for 1.2.0 |
We aren't able to actively help maintain GlusterFS in the catalog and will not be able to fix these issues. |
Version:
rancher v0.59.0
cattle v0.148.0
user interface v0.90.0
rancher compose v0.7.2
Steps:
Results:
glusterfs stays in a booting loop
Expected:
A running glusterfs stack
EDIT:
step 7. Create convoy gluster
results in the following error:
The text was updated successfully, but these errors were encountered: