New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph: fix improper json parsing #7354
ceph: fix improper json parsing #7354
Conversation
pkg/operator/ceph/object/admin.go
Outdated
@@ -67,7 +67,7 @@ func RunAdminCommandNoMultisite(c *Context, args ...string) (string, error) { | |||
} | |||
|
|||
// start the rgw admin command | |||
output, err := c.Context.Executor.ExecuteCommandWithTimeout(timeout, command, args...) | |||
output, err := c.Context.Executor.ExecuteCommandWithTimeout(timeout, true, command, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can instead call ExecuteCommandWithOutputFile()
to have the output written to a file so the stdout and stderr won't conflict. Does that work for you? Then we wouldn't need the new flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good idea, I'll fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this function can't be used because radosgw-admin
doesn't have an option to specify the output file. I introduced a new function that returns only stdout and can specify timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that extracting json from combined output seems to be better. Some error handling routine s use whether the output contains some strings, to judge the type of errors. I'll fix my PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satoru-takeuchi How did you repro this issue? I heard another instance of it for an external cluster scenario, but I can't get a repro yet. Anyway, looking forward to the fix, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue happens in the CI of my k8s cluster. Then OB is created under heavy load consists of RBD creation and many applications' initialization. However, I don't know the reliable simple reproducer yet.
7287334
to
57a523c
Compare
pkg/operator/ceph/object/admin.go
Outdated
@@ -75,6 +76,17 @@ func RunAdminCommandNoMultisite(c *Context, args ...string) (string, error) { | |||
return output, nil | |||
} | |||
|
|||
func ExtractJson(output string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about putting this inside the Executor directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that embedding this logic in Executor.Execute*() methods? If so, I didn't adopt this way because of two reasons. First, some callers don't parse the output as json. In addition, this behavior seems to be radosgw-admin specific. So I hesitate to put this logic in Executer which is for all commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but do the extract json in an executor that ran with json output only. I think it's fine for now and more an RGW bug so I'm fine with that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call ExtractJson()
from RunAdminCommandNoMultisite()
? That seems like the right place instead of calling it everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Year, it's nice idea. The new PR is far simpler than before.
I introduced a flag in runAdminCommand()
to decide whether calling extractJSON()
inside them. It's because some callers checking output before parsing JSON. In this case, that calllers directly call extractJSON()
after checking output.
pkg/operator/ceph/object/admin.go
Outdated
@@ -75,6 +76,17 @@ func RunAdminCommandNoMultisite(c *Context, args ...string) (string, error) { | |||
return output, nil | |||
} | |||
|
|||
func ExtractJson(output string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call ExtractJson()
from RunAdminCommandNoMultisite()
? That seems like the right place instead of calling it everywhere else.
57a523c
to
21dbf97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some small suggestions
@@ -230,7 +230,7 @@ func (r *ReconcileObjectZone) createCephZone(zone *cephv1.CephObjectZone, realmN | |||
} | |||
|
|||
// create zone | |||
_, err = object.RunAdminCommandNoMultisite(objContext, "zone", "get", realmArg, zoneGroupArg, zoneArg) | |||
_, err = object.RunAdminCommandNoMultisite(objContext, false, "zone", "get", realmArg, zoneGroupArg, zoneArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the result is ignored, how about still passing true
whenever we have a get
.
_, err = object.RunAdminCommandNoMultisite(objContext, false, "zone", "get", realmArg, zoneGroupArg, zoneArg) | |
_, err = object.RunAdminCommandNoMultisite(objContext, true, "zone", "get", realmArg, zoneGroupArg, zoneArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed.
@@ -308,7 +308,7 @@ func (r *ReconcileObjectZone) reconcileCephZoneGroup(zone *cephv1.CephObjectZone | |||
zoneGroupArg := fmt.Sprintf("--rgw-zonegroup=%s", zone.Spec.ZoneGroup) | |||
objContext := object.NewContext(r.context, r.clusterInfo, zone.Name) | |||
|
|||
_, err := object.RunAdminCommandNoMultisite(objContext, "zonegroup", "get", realmArg, zoneGroupArg) | |||
_, err := object.RunAdminCommandNoMultisite(objContext, false, "zonegroup", "get", realmArg, zoneGroupArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, err := object.RunAdminCommandNoMultisite(objContext, false, "zonegroup", "get", realmArg, zoneGroupArg) | |
_, err := object.RunAdminCommandNoMultisite(objContext, true, "zonegroup", "get", realmArg, zoneGroupArg) |
@@ -230,7 +230,7 @@ func (r *ReconcileObjectZoneGroup) createCephZoneGroup(zoneGroup *cephv1.CephObj | |||
} | |||
|
|||
// create zone group | |||
output, err = object.RunAdminCommandNoMultisite(objContext, "zonegroup", "get", realmArg, zoneGroupArg) | |||
output, err = object.RunAdminCommandNoMultisite(objContext, false, "zonegroup", "get", realmArg, zoneGroupArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output, err = object.RunAdminCommandNoMultisite(objContext, false, "zonegroup", "get", realmArg, zoneGroupArg) | |
output, err = object.RunAdminCommandNoMultisite(objContext, true, "zonegroup", "get", realmArg, zoneGroupArg) |
@@ -279,7 +279,7 @@ func (r *ReconcileObjectZoneGroup) reconcileCephRealm(zoneGroup *cephv1.CephObje | |||
realmArg := fmt.Sprintf("--rgw-realm=%s", zoneGroup.Spec.Realm) | |||
objContext := object.NewContext(r.context, r.clusterInfo, zoneGroup.Name) | |||
|
|||
_, err := object.RunAdminCommandNoMultisite(objContext, "realm", "get", realmArg) | |||
_, err := object.RunAdminCommandNoMultisite(objContext, false, "realm", "get", realmArg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, err := object.RunAdminCommandNoMultisite(objContext, false, "realm", "get", realmArg) | |
_, err := object.RunAdminCommandNoMultisite(objContext, true, "realm", "get", realmArg) |
38a8f02
to
de5c40f
Compare
Looks good, there is just a unit test failure now. |
Sometimes `radosgw-admin` succeeds after showing logs to stderr. We should skip non-json strings if parsing output as json. Here is an example. ``` 2021-02-26 04:10:44.190418 I | op-bucket-prov: creating Ceph user "ceph-user-aSzNqgE7" E0226 04:11:37.901310 8 controller.go:199] error syncing 'logging/loki-bucket': error provisioning bucket: Provision: can't create ceph user: error creating ceph user "ceph-user-aSzNqgE7": failed to unmarshal json. 2021-02-26T04:11:21.425+0000 7f6714be4980 1 robust_notify: If at first you don't succeed: (110) Connection timed out 2021-02-26T04:11:21.426+0000 7f6714be4980 0 ERROR: failed to distribute cache for ceph-hdd-object-store.rgw.meta:users.uid:ceph-user-aSzNqgE7 2021-02-26T04:11:32.168+0000 7f6714be4980 1 robust_notify: If at first you don't succeed: (110) Connection timed out 2021-02-26T04:11:32.168+0000 7f6714be4980 0 ERROR: failed to distribute cache for ceph-hdd-object-store.rgw.meta:users.keys:23Z8GUEXR0TJDO86PSJR { "user_id": "ceph-user-aSzNqgE7", "display_name": "ceph-user-aSzNqgE7", ... "mfa_ids": [] }: invalid character '-' after top-level value: failed to unmarshal json. ``` In this case, some logs like "robust_notify:..." was shown in stderr. Unmarsharing was failed due to tried to parse these logs as json. Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
de5c40f
to
a89d35e
Compare
This failure was resolved by returning json from fake |
ceph: fix improper json parsing (bp #7354)
Description of your changes:
Sometimes radosgw-admin succeed after showing error message.
Here is an example.
In this case, some messages like "robust_notify: If at first you don't succeed: (110) Connection timed out"
was shown in stderr before showing standard error that consists of json. Since stdout and stderr
are joined, the caller can't parse json properly.
Checklist:
make codegen
) has been run to update object specifications, if necessary.