-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed mon assert when mds standby replay is assigned with multiple file systems #1027
Comments
Note that multiple file systems is still considered an experimental feature in ceph. Perhaps rook makes it too easy to go down this path of an experimental feature. |
Any way to fix this? |
This seems like a Ceph issue with the FS Map when MDSs are running in |
I opened #1382, I had made a second file system not long before. I’d second the suggestion of limiting the system to one FS until it’s fixed. The documentation / tools makes it seem like having however many is fine. |
@travisn We need to implement a limit to just one FS as more and more users are currently hitting this. |
Any way to recover? |
@Coolfeather2 I think someone on the Slack said that you can potentially remove the second FS by modifying the fsmap directly but I can't tell you the exact steps. |
@travisn, it looks like Luminous 12.2.7 is out now? Does it have this fix? http://docs.ceph.com/docs/master/releases/luminous/#v12-2-7-luminous |
yes, we can pick up this fix now. |
@leseb when should we expect a ceph-container v3.0.7 release that includes luminous 12.2.7? my previous comment missed that we would require that release first. |
@travisn it's here! $ docker run -ti --entrypoint=ceph ceph/daemon:v3.0.7-stable-3.0-luminous-centos-7 -v
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) |
@leseb thanks! |
While testing multiple file systems with varying standby and replay settings, two of the mons core dumped with the following assert:
It would appear there is an issue with the standby being assigned by the mon after adding a third filesystem. The configuration of the file systems in the cluster was:
myfs
: two mds active, two mds on standby-replayyourfs
: three mds active, three mds on standbyjaredsfs
: one mds active, one mds on standby-replayAfter the first two were created,
ceph status
showed the following mds status:The pod status after the crash is
The text was updated successfully, but these errors were encountered: