New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support duration chaos for Network Chaos #198
Support duration chaos for Network Chaos #198
Conversation
Codecov Report
@@ Coverage Diff @@
## master #198 +/- ##
=========================================
Coverage ? 43.64%
=========================================
Files ? 18
Lines ? 685
Branches ? 0
=========================================
Hits ? 299
Misses ? 353
Partials ? 33 Continue to review full report at Codecov.
|
controllers/common/types.go
Outdated
@@ -80,6 +80,7 @@ func (r *Reconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) { | |||
r.Log.Error(err, "failed to recover chaos") | |||
return ctrl.Result{Requeue: true}, nil | |||
} | |||
return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug during handling the recover logic in common chaos
controller, I fixed in this request when I found it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need r.Update(ctx, chaos)
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need
r.Update(ctx, chaos)
here?
good catch. I adjust the common reconciler.
import ( | ||
"context" | ||
|
||
"github.com/pingcap/chaos-mesh/controllers/networkchaos/netem" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group this import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I execute make groupimports
locally and there is no change. Is this something wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wield 😯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be default behavior golang/go#27365 (comment) and golang/go#26846 (comment) you need to put imports together so goimport will treat them as a group and sort within the group. Otherwise it assumes you put them into different group for some reason, e.g. third party, protobuf etc.
r.Log.Error(err, fmt.Sprintf("unable to get podchaos[%s/%s]'s duration", networkchaos.Namespace, networkchaos.Name)) | ||
return ctrl.Result{}, nil | ||
} | ||
if scheduler == nil && duration == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if duration != nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhouqiang-cl Networkchaos
controller only handle the situation when scheduler
and duration
are both defined or omitted. For the other situation, it would log the error and return the empty response.
More over, I also think Adding validation admission webhook for each chaos kind to check the scheduler
and duration
situation is necessary as I mentioned in #171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got
@at15 Can you take a look at other part of this PR too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhouqiang-cl I added some nitpicking, I am not familiar with the code base, just came across the PR in slack ...
controllers/networkchaos/types.go
Outdated
} | ||
} | ||
|
||
func (r *Reconciler) defaultResponse(networkchaos *v1alpha1.NetworkChaos) ctrl.Result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename it to invalidAction
or ignoreInvalidAction
because it is logging error for invalid input (that should get caught in admission webhook?) instead of doing default chaos.
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package netem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have package level go doc, i.e. https://pkg.go.dev/github.com/pingcap/chaos-mesh does not tell much when you look at it. Also some package name might need some explanation like netem
is for ... network event manager? ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice advice, I think it should be recorded as an new issue.
controllers/networkchaos/types.go
Outdated
|
||
ctrl "sigs.k8s.io/controller-runtime" | ||
"sigs.k8s.io/controller-runtime/pkg/client" | ||
|
||
"github.com/pingcap/chaos-mesh/api/v1alpha1" | ||
commonNetem "github.com/pingcap/chaos-mesh/controllers/common/networkchaos/netem" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the common and twwphase wrapping and renaming a bit duplicated. Is it possible to eliminate them? I am not very familiar with the code base ... it seems we have two set of interfaces under common and twophase. And the wrappers are almost same (except for the last parameter)
// under common/types
// InnerCommonReconcile used in common chaos reconcile
type InnerCommonReconcile interface {
Apply(ctx context.Context, req ctrl.Request, chaos InnerCommonObject) error
Recover(ctx context.Context, req ctrl.Request, chaos InnerCommonObject) error
Object() InnerCommonObject
}
// under twophase/types
type InnerReconciler interface {
Apply(ctx context.Context, req ctrl.Request, chaos InnerObject) error
Recover(ctx context.Context, req ctrl.Request, chaos InnerObject) error
Object() InnerObject
}
// under common/podfailure
// CommonReconciler is reconciler for podfailure
type CommonReconciler struct {
*podfailure.Reconciler
}
// Apply would perform common chaos for podchaos
func (r *CommonReconciler) Apply(ctx context.Context, req ctrl.Request, chaos common.InnerCommonObject) error {
return r.Perform(ctx, req, chaos)
}
// under twophase/podfailure
// TwoPhaseReconciler reconcile the networkchaos
type TwoPhaseReconciler struct {
*podfailure.Reconciler
}
// Apply implement InnerReconciler.Apply
func (r *TwoPhaseReconciler) Apply(ctx context.Context, req ctrl.Request, chaos twophase.InnerObject) error {
return r.Perform(ctx, req, chaos)
}
Instead of having a layout where we duplicate package names under common and twophase
- podfailure
- netem
- common
- podfailure
- netem
- twophase
- podfaillure
- netem
We can define those wrappers in podfailure
and netem
directly with new factory func like NewCommonReconciler
, to avoid having two extra packages for each chaos and use netem.NewCommonReconciler
instead of commonNetem.NewReconciler
. This removes subpackage from common
and twophase
packages and solves the renaming problem.
// podfailure
// NewReconciler would create new reconciler for podfailure chaos
func NewReconciler(c client.Client, log logr.Logger, req ctrl.Request) *Reconciler {
return &Reconciler{Client: c, Log: log}
}
func NewCommonReconciler(...) *CommonReconciler {
}
func NewTwoPhaseReconciler(...) *TwoPhaseReconciler {
}
btw: the wrappers should be generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deeply agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in #201
@at15 thank you very much❤️ |
@Yisaer please solve the conflict |
} else { | ||
// Start failure action | ||
r.Log.Info("Performing Action") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust the common reconciler, make sure the chaos object would be udated after Performing or Recovering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
err = r.Apply(ctx, req, chaos) | ||
if err != nil { | ||
r.Log.Error(err, "failed to apply chaos action") | ||
updateError := retry.RetryOnConflict(retry.DefaultRetry, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If apply action
is failed, we should probably not update the chaos object and recover it. @YangKeao what do you think? Maybe we need a separate pr to fix this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
controllers/common/types.go
Outdated
@@ -63,33 +63,32 @@ func (r *Reconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) { | |||
err = r.Recover(ctx, req, chaos) | |||
if err != nil { | |||
r.Log.Error(err, "failed to recover chaos") | |||
return ctrl.Result{Requeue: true}, nil | |||
return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, so we won't retry if the recovery fails? I think it will cause more unrecovered chaos actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed. What's your opinion? @cwen0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And could we set a field like Status
, let user be aware of it when observer chaos result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, requeue
is better and if recover action
was failed, we also can block this delete action. @mahjonp There is already a status field in chaos object, but we have not used it well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this pr's handle is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
What problem does this PR solve?
#171
Make
Network
chaos support duration chaos feature.What is changed and how does it work?
Making each kind of
Networkchaos
implement common controller interface. I have testednetem
andpartiton
chaos locally in kind and works fine.Check List
Tests
Code changes
Side effects
Related changes
Does this PR introduce a user-facing change?: