New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.5.6 -> v1.6.10 doesn't upgrades RGW #9134
Comments
I am guessing upgrading didn't happen for RGW due to |
@thotz the Also, sorry for the changes on the instance number... That was me trying to check if there is an error on the Maybe this could help as timeline |
In the operator log I see that you must have restarted the operator, and updated the rgw settings to start 3 daemons, then scale back to 0, and back to 1 rgw daemon. Did you see the number of rgw daemons change? Or the rgw pods were never created or updated? I would certainly expect that the rgw daemons would be updated. What does the image on the rgw deployment(s) show? The previous ceph image? |
Yes @travisn i restarted the operator. With the instance count changes there wasn't any new pods or even when scaled down to 0, the same pods kept running. |
Ok I found a bug in the code where errors are being swallowed when creating/updating the rgw pod. You must be hitting some error and then the rgw update is ignored. The only place that looks like should cause this error is in this method if the tls secret is not found with the certification. Does your CephObjectStore CR have the tls enabled or perhaps still needs to be configured? |
I don't remember and today i don't have access to the environment (it's an on prem env), but actually believe that there is no needed, because it's just for workloads usage. |
There is no need for tls? Ok, anyway we will get this fix in the next release and see if we can get a real error message out of the log after that. |
Fine, but there is another question @travisn ... For some reason, after the update to |
That looks likely. If the value is |
yes... but i didn't set that value... was an automatic update after operator upgrade. |
Is this a bug report or feature request?
Deviation from expected behavior:
After follows the procedure to upgrade the operator, all daemons go to v1.6.10 labels, but rgw.
Same for an ceph-version upgrade (v15.2.8-0 -> v15.2.13-0)
Also having HEALTH_WARN because insceure global_id reclaim for clients and mons
Expected behavior:
RGW upgrade and HEALTH_OK to complete a full upgrade to v1.7.x
How to reproduce it (minimal and precise):
Follow the upgrade process from https://rook.io/docs/rook/v1.6/ceph-upgrade.html (including CSI upgrades).
Disconnected environment, so all container images are being pulled from a local mirror.
File(s) to submit:
cluster.yaml
operator.log
Environment:
uname -a
):Linux bnc000mon48.openshift.bncrcp.inst.bncr.fi.cr 4.18.0-193.47.1.el8_2.x86_64 #1 SMP Thu Mar 4 03:03:32 EST 2021 x86_64 x86_64 x86_64 GNU/Linux
vSphere
kubectl version
):Openshift
ceph health
in the Rook Ceph toolbox):The text was updated successfully, but these errors were encountered: