-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] 0.25.0 maxscale statefulset failing to start #381
Comments
I tried to set up an initcontainer (to do the chown) inside the MaxScaleSpec. The CRD accepted it, but the operator seems to ignored it. |
Same problem here |
Mhh, strange, same problem, even with podSecurityContext set to: maxScale:
podSecurityContext:
runAsUser: 0 |
But another problem (related to operator) is that it ignore the initContainer. |
Hey there! thanks for reporting. As you mentioned, it looks like a problem of the image that can be mitigated with an Adding support for
Like we currently do for
Contributions welcome! I will do it myself before the next release if needed 👍🏻 |
Out of curioristy, which storage are you using? |
It works now, I use longhorn for local storage, but as I see, I need a minimum of 300Mi of storage and set security context to work (same as needed for galera x longhorn). Hope it can help. maxScale:
enabled: true
config:
volumeClaimTemplate:
resources:
requests:
storage: 300Mi
storageClassName: longhorn-encrypted-global-noretain
accessModes:
- ReadWriteMany
podSecurityContext:
runAsUser: 0 # For galera, caused by longhorn storage permission
|
@K-MeLeOn it does, thank you!. Actually, some Longhorn users reported the same error with Galera: Not a Longhorn user unfortunately, maybe something we could do for preventing this? Will take a closer look to the Longhorn docs |
My favorite longhorn hack is to add a busybox image to grant permission to the right user or group. - name: volume-hack
image: busybox
command:
- /bin/sh
- -c
- chown -R USER_ID:GROUP_ID /var/lib/maxscale
securityContext:
runAsUser: 0 |
This will be doable after we add support to the |
Unfortunately, runAsUser: 0 did not work for me. I tried it with Hetzner-CSI (RWO) and with NFS-CSI (RWX), same result. |
Do you have a minimum of 300Mi of storage set on your volumeClaimTemplate with longhorn ? I haven't tried it with HCSI, but it's 10Gb minimum, so it should theoretically work. |
Yes, I used your example from above for NFS and 10 Gi for Hetzner. No success. |
Mmmh, after many recreation, I got the error again with RWO, whit RWX it's working, this is a strange behavior, need to wait the |
Very strange indeed. It would be very much appreciated if someone could try the following steps:
kubectl scale deployment mariadb-operator --replicas=0
- name: volume-hack
image: busybox
command:
- /bin/sh
- -c
- chown -R USER_ID:GROUP_ID /var/lib/maxscale
securityContext:
runAsUser: 0
I guess it should work, but just to confirm. |
It works with : initContainers:
- name: volume-hack
image: busybox
command:
- /bin/sh
- -c
- chown -R 998:996 /var/lib/maxscale
resources: {}
volumeMounts:
- name: storage
mountPath: /var/lib/maxscale Looking for the user and group id :
Applying But as I see, the initContainers:
- name: volume-hack
image: busybox
command:
- /bin/sh
- -c
- chown -R 998:996 /var/lib/maxscale && chown -R 998:996 /usr/lib64/maxscale
resources: {}
volumeMounts:
- name: storage
mountPath: /var/lib/maxscale |
Works for me, thanks! My storage is ceph/rbd/ext4. After the init container finished I started the operator. The MariaDB and MaxScale crd went true, I see the new config in the maxscale pod's log. Now the problem is I can't log in to the admin console with the password from the adminPasswordSecretKeyRef. I also tried to use maxctrl and got permission denied. There is a /var/lib/maxscale/passwd file with the admin user and a password hash. |
Same problem here again. However, it is not a matter of password. This is mariadb-operator / (maxscale-admin secret password). The Service is exposed via ingress {URL}=My URL (removed for privacy reasons). However, when I log in, I immediatley get logged out with this error in the console:
|
Thanks a lot for testing this @K-MeLeOn ! Very much appreciated 🙏🏻 I can confidently add the
Not entirely sure about this, I will ask and add the permissions in the |
Good news @pasztorl ! This if the resources are in ready status we are good to go. I will keep this issue open until we add the
That's a different problem, could you please file another issue for this? The operator uses this credentials to create resources in the MaxScale API, so if you don't see any error log, they should be valid. Be sure to use the kubectl get secret maxscale-admin -o jsonpath="{.password}" | base64 -d Also, take into account that, if you are accessing MaxScale via an Ingress controller, the headers might have been modified, which might result in a 401. Please try to access to the MaxScale instance directly via a port-forward to understand where the problem is first. |
I'm connecting directly to the admin port, not using ingress (yet). Here is the maxscale log:
So when the maxscale statefulset created I sopped the operator and when the permission fixed started again. So It is possible that the operator can not continue it's job and miss something? |
Update: now I dropped the namespace and recreated the MariaDB crd. When MaxScale statefulset created i stopped (kill -STOP) the operator. I fixed the permissions, then kill -CONT on the operator process. Related logs from the maxscale pod:
In the operator logs:
|
You get it right.The secret contains the password not the user itself:
The logs above are expected ☝🏻 . |
Oops, I just thought this part can't work, because it's part of the container itself! There's already a module in there ( && chown -R 998:996 /usr/lib64/maxscale |
We have added support for initContainer in this PR: The operator adds one init container to change the permissions to Closing! 🙏🏻 This will be released in |
Hi all, could we use fsGroup instead? For CSI/storage provider that do not support fsGroup, we allow the user to add volume initContainer. This prevents the need to a chown -R and an additional container for some providers. This is what is used in bitnami helm charts as well, with volume initContainers disabled by default. To allow the maxscale to work, just use: securityContext:
fsGroup: 996 |
Hey @lwj5 ! Thanks for your suggestion, I've manged to spin up a podSecurityContext:
fsGroup: 996 I am using the synology CSI driver, which by default has apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: csi.san.synology.com
spec:
attachRequired: true
fsGroupPolicy: ReadWriteOnceWithFSType
podInfoOnMount: true
requiresRepublish: false
storageCapacity: false
volumeLifecycleModes:
- Persistent For context, the CSI driver specification allows to specify a Most of the CSI drivers nowadays support by default compatible values of Not sure about longhorn though, see the following issues:
I think it will be sensible to default the I will be moving forward with this change and I will add a troubleshooting section in the Thanks! Let me know what you think. |
Added the takeaways from this discussion to the docs: |
Documentation
Describe the bug
I'm testing the latest operator by creating a mariadb replication with maxscale. The maxscale statefulset pods failing to start because of a permission problem.
Here is the log output from the maxscale pod:
I see that the pod have a volume which mounted to this directory, the problem is that it tries to create a directory inside this mountpoint as maxscale user which is not allowed by the mountpoint filesystem permissions.
Here is the statefulset spec for the maxscale what the operator created:
I've also tried image: mariadb/maxscale:23.08.4 with the same result.
MariaDB created with this manifest:
The text was updated successfully, but these errors were encountered: