Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support duration chaos for Network Chaos #198

Merged

Conversation

Yisaer
Copy link
Contributor

@Yisaer Yisaer commented Feb 5, 2020

What problem does this PR solve?

#171
Make Network chaos support duration chaos feature.

What is changed and how does it work?

Making each kind of Networkchaos implement common controller interface. I have tested netem and partiton chaos locally in kind and works fine.

Check List

Tests

  • Manual test (add detailed scripts or steps below)

Code changes

  • Has Go code change

Side effects

  • None

Related changes

  • Need to update the documentation

Does this PR introduce a user-facing change?:

Support duration chaos for network chaos by omitting the scheduler and duration.

@codecov-io
Copy link

codecov-io commented Feb 5, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@3aee6ba). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #198   +/-   ##
=========================================
  Coverage          ?   43.64%           
=========================================
  Files             ?       18           
  Lines             ?      685           
  Branches          ?        0           
=========================================
  Hits              ?      299           
  Misses            ?      353           
  Partials          ?       33

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3aee6ba...3379f38. Read the comment docs.

@@ -80,6 +80,7 @@ func (r *Reconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
r.Log.Error(err, "failed to recover chaos")
return ctrl.Result{Requeue: true}, nil
}
return ctrl.Result{}, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug during handling the recover logic in common chaos controller, I fixed in this request when I found it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need r.Update(ctx, chaos) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need r.Update(ctx, chaos) here?

good catch. I adjust the common reconciler.

@Yisaer Yisaer changed the title [WIP] Support duration chaos for Network Chaos Support duration chaos for Network Chaos Feb 5, 2020
import (
"context"

"github.com/pingcap/chaos-mesh/controllers/networkchaos/netem"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group this import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I execute make groupimports locally and there is no change. Is this something wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is wield 😯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be default behavior golang/go#27365 (comment) and golang/go#26846 (comment) you need to put imports together so goimport will treat them as a group and sort within the group. Otherwise it assumes you put them into different group for some reason, e.g. third party, protobuf etc.

r.Log.Error(err, fmt.Sprintf("unable to get podchaos[%s/%s]'s duration", networkchaos.Namespace, networkchaos.Name))
return ctrl.Result{}, nil
}
if scheduler == nil && duration == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if duration != nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouqiang-cl Networkchaos controller only handle the situation when scheduler and duration are both defined or omitted. For the other situation, it would log the error and return the empty response.

More over, I also think Adding validation admission webhook for each chaos kind to check the scheduler and duration situation is necessary as I mentioned in #171

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got

@zhouqiang-cl
Copy link
Contributor

@YangKeao @cwen0 PTAL

@zhouqiang-cl
Copy link
Contributor

@at15 Can you take a look at other part of this PR too

Copy link
Contributor

@at15 at15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouqiang-cl I added some nitpicking, I am not familiar with the code base, just came across the PR in slack ...

}
}

func (r *Reconciler) defaultResponse(networkchaos *v1alpha1.NetworkChaos) ctrl.Result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename it to invalidAction or ignoreInvalidAction because it is logging error for invalid input (that should get caught in admission webhook?) instead of doing default chaos.

// See the License for the specific language governing permissions and
// limitations under the License.

package netem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have package level go doc, i.e. https://pkg.go.dev/github.com/pingcap/chaos-mesh does not tell much when you look at it. Also some package name might need some explanation like netem is for ... network event manager? ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice advice, I think it should be recorded as an new issue.

controllers/twophase/networkchaos/netem/types.go Outdated Show resolved Hide resolved
controllers/twophase/networkchaos/netem/types.go Outdated Show resolved Hide resolved

ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"

"github.com/pingcap/chaos-mesh/api/v1alpha1"
commonNetem "github.com/pingcap/chaos-mesh/controllers/common/networkchaos/netem"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the common and twwphase wrapping and renaming a bit duplicated. Is it possible to eliminate them? I am not very familiar with the code base ... it seems we have two set of interfaces under common and twophase. And the wrappers are almost same (except for the last parameter)

// under common/types
// InnerCommonReconcile used in common chaos reconcile
type InnerCommonReconcile interface {
	Apply(ctx context.Context, req ctrl.Request, chaos InnerCommonObject) error
	Recover(ctx context.Context, req ctrl.Request, chaos InnerCommonObject) error
	Object() InnerCommonObject
}

// under twophase/types
type InnerReconciler interface {
	Apply(ctx context.Context, req ctrl.Request, chaos InnerObject) error
	Recover(ctx context.Context, req ctrl.Request, chaos InnerObject) error
	Object() InnerObject
}

// under common/podfailure
// CommonReconciler is reconciler for podfailure
type CommonReconciler struct {
	*podfailure.Reconciler
}

// Apply would perform common chaos for podchaos
func (r *CommonReconciler) Apply(ctx context.Context, req ctrl.Request, chaos common.InnerCommonObject) error {
	return r.Perform(ctx, req, chaos)
}

// under twophase/podfailure
// TwoPhaseReconciler reconcile the networkchaos
type TwoPhaseReconciler struct {
	*podfailure.Reconciler
}

// Apply implement InnerReconciler.Apply
func (r *TwoPhaseReconciler) Apply(ctx context.Context, req ctrl.Request, chaos twophase.InnerObject) error {
	return r.Perform(ctx, req, chaos)
}

Instead of having a layout where we duplicate package names under common and twophase

- podfailure
- netem
- common
   - podfailure
   - netem
- twophase
   - podfaillure
   - netem

We can define those wrappers in podfailure and netem directly with new factory func like NewCommonReconciler, to avoid having two extra packages for each chaos and use netem.NewCommonReconciler instead of commonNetem.NewReconciler. This removes subpackage from common and twophase packages and solves the renaming problem.

// podfailure

// NewReconciler would create new reconciler for podfailure chaos
func NewReconciler(c client.Client, log logr.Logger, req ctrl.Request) *Reconciler {
	return &Reconciler{Client: c, Log:  log}
}

func NewCommonReconciler(...) *CommonReconciler {
}

func NewTwoPhaseReconciler(...) *TwoPhaseReconciler {
}

btw: the wrappers should be generated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deeply agree with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #201

controllers/networkchaos/types.go Outdated Show resolved Hide resolved
@zhouqiang-cl
Copy link
Contributor

@at15 thank you very much❤️

@Yisaer
Copy link
Contributor Author

Yisaer commented Feb 6, 2020

Thanks a lot for the code review.
Current layout is a bit of complex and I can't agree it anymore. As @at15 's advice, I would refactor the current layout first to make the whole controller structure easier.

The Duration Networkchaos feature is suspended until #201 is merged.

@zhouqiang-cl zhouqiang-cl added this to In progress in Weekly Plan 2020-02-08 ~ 2020-02-15 via automation Feb 7, 2020
@zhouqiang-cl
Copy link
Contributor

@Yisaer please solve the conflict

@zhouqiang-cl
Copy link
Contributor

@cwen0 @YangKeao PTAL

Comment on lines +68 to +70
} else {
// Start failure action
r.Log.Info("Performing Action")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust the common reconciler, make sure the chaos object would be udated after Performing or Recovering.

@Yisaer Yisaer requested a review from mahjonp February 7, 2020 10:56
mahjonp
mahjonp previously approved these changes Feb 7, 2020
Copy link
Contributor

@mahjonp mahjonp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

controllers/common/types.go Show resolved Hide resolved
err = r.Apply(ctx, req, chaos)
if err != nil {
r.Log.Error(err, "failed to apply chaos action")
updateError := retry.RetryOnConflict(retry.DefaultRetry, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If apply action is failed, we should probably not update the chaos object and recover it. @YangKeao what do you think? Maybe we need a separate pr to fix this issue.

Weekly Plan 2020-02-08 ~ 2020-02-15 automation moved this from In progress to Review in progress Feb 7, 2020
cwen0
cwen0 previously approved these changes Feb 7, 2020
Copy link
Member

@cwen0 cwen0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Yisaer Yisaer requested a review from mahjonp February 7, 2020 12:55
@@ -63,33 +63,32 @@ func (r *Reconciler) Reconcile(req ctrl.Request) (ctrl.Result, error) {
err = r.Recover(ctx, req, chaos)
if err != nil {
r.Log.Error(err, "failed to recover chaos")
return ctrl.Result{Requeue: true}, nil
return ctrl.Result{}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, so we won't retry if the recovery fails? I think it will cause more unrecovered chaos actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed. What's your opinion? @cwen0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could we set a field like Status, let user be aware of it when observer chaos result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, requeue is better and if recover action was failed, we also can block this delete action. @mahjonp There is already a status field in chaos object, but we have not used it well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this pr's handle is ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Weekly Plan 2020-02-08 ~ 2020-02-15 automation moved this from Review in progress to Reviewer approved Feb 7, 2020
@cwen0 cwen0 merged commit f380702 into chaos-mesh:master Feb 7, 2020
Weekly Plan 2020-02-08 ~ 2020-02-15 automation moved this from Reviewer approved to Done Feb 7, 2020
sjwsl pushed a commit to sjwsl/chaos-mesh that referenced this pull request May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants