Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NewDeployment wait timeout configurable #1260

Merged
merged 12 commits into from Aug 16, 2019

Conversation

@vadasambar
Copy link
Contributor

commented Aug 5, 2019

details

  • add "deploytimeout" flag to specify timeout in seconds

Fix for #1213


This change is Reviewable

Make NewDeployment wait timeout configurable
details
-----------
- add "deploytimeout" flag to specify timeout in seconds
return nil, errors.New("failed to create deployment within timeout window")

// this error appears in the executor pod logs
timeoutError := fmt.Sprintf("failed to create deployment within timeout window of %d seconds", waitTimeOut)

This comment has been minimized.

Copy link
@bhavin192

bhavin192 Aug 6, 2019

Contributor

You can use fmt.Errorf here.

This comment has been minimized.

Copy link
@vadasambar

vadasambar Aug 6, 2019

Author Contributor

Makes sense. I have made the change. Thank you :)

Use Errorf instead of errors.New
details
----------
- Errorf reads better in this case instead of errors.New
@life1347
Copy link
Member

left a comment

Reviewable status: 0 of 4 files reviewed, 5 unresolved discussions (waiting on @bhavin192 and @vadasambar)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

		MaxScale         int
		TargetCPUPercent int
		Timeout          int

The term Timeout is too general for many cases and may confuse with function response timeout.
It's better to describe the purpose of the timeout at field name.
How about renaming it to SpecializationTimeout?


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

func (deploy *NewDeploy) waitForDeploy(depl *v1beta1.Deployment, replicas int32, waitTimeOut int) (*v1beta1.Deployment, error) {
	for i := 0; i < waitTimeOut; i++ {

We should be careful every time we add a field since it may break the compatibility of existing CRDs.
All existing ExecutionStrategy doesn't contain the timeout setting and the default value of int in Go is 0 , so the value will be zero for all existing function setting and break user service once they upgrade the fission.
Please check the timeout to ensure that it's >=0. If not then use the default one.


pkg/fission-cli/function.go, line 154 at r2 (raw file):

		if c.IsSet("deploytimeout") {
			timeout = c.Int("deploytimeout")
			if timeout < 0 {

Consider that kubernetes takes time for creating, pulling images, scheduling pod to nodes and time for fetcher to specialize pod, it will never be zero.
Since this flag is for the functions that take a long specialization and people normally don't need to set this except for the case described in the issue, I suggest for now we should set the minimum specialization timeout to 120 (the original one).


pkg/fission-cli/main.go, line 97 at r2 (raw file):

	maxScale := cli.IntFlag{Name: "maxscale", Usage: "Maximum number of pods (Uses resource inputs to configure HPA)"}
	targetcpu := cli.IntFlag{Name: "targetcpu", Usage: "Target average CPU usage percentage across pods for scaling"}
	deployTimeoutFlag := cli.IntFlag{Name: "deploytimeout", Usage: "Timeout for newdeploy function creation"}

Since we may need to implement the same thing to poolmanager as well, so if the flag name is something like specializationtimeout, st then we don't need to add another flag for poolmanager in future.

Add test cases for deploytimeout
other
--------
- add default newdeploy timeout for rest of the test cases
@vadasambar
Copy link
Contributor Author

left a comment

Reviewable status: 0 of 5 files reviewed, 5 unresolved discussions (waiting on @bhavin192 and @life1347)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

SpecializationTimeout

This name makes more sense. Let me rename them :)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

We should be careful every time we add a field since it may break the compatibility of existing CRDs.
All existing ExecutionStrategy doesn't contain the timeout setting and the default value of int in Go is 0 , so the value will be zero for all existing function setting and break user service once they upgrade the fission.
Please check the timeout to ensure that it's >=0. If not then use the default one.

I have set the default time out to 120 seconds. Please check DEFAULT_DEPLOY_TIMEOUT in pkg/fission-cli/function.go


pkg/fission-cli/function.go, line 154 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Consider that kubernetes takes time for creating, pulling images, scheduling pod to nodes and time for fetcher to specialize pod, it will never be zero.
Since this flag is for the functions that take a long specialization and people normally don't need to set this except for the case described in the issue, I suggest for now we should set the minimum specialization timeout to 120 (the original one).

This piece of code ensures that the timeout set through cli is a non-negative number. Default specialization timeout has been set to 120 in pkg/fission-cli/function.go (please check DEFAULT_DEPLOY_TIMEOUT)


pkg/fission-cli/main.go, line 97 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Since we may need to implement the same thing to poolmanager as well, so if the flag name is something like specializationtimeout, st then we don't need to add another flag for poolmanager in future.

Makes sense. Let me change the name to specializationtimeout. Thank you :)

@life1347
Copy link
Member

left a comment

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

I have set the default time out to 120 seconds. Please check DEFAULT_DEPLOY_TIMEOUT in pkg/fission-cli/function.go

But it only affects the newly created functions, what about the old existing functions?
They don't have timeout field.


pkg/fission-cli/function.go, line 154 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

This piece of code ensures that the timeout set through cli is a non-negative number. Default specialization timeout has been set to 120 in pkg/fission-cli/function.go (please check DEFAULT_DEPLOY_TIMEOUT)

I know you've added DEFAULT_DEPLOY_TIMEOUT, but what if users enter --specializationtimeout 1, isn't it overrides the default one? Then the function will never get up since it doesn't have enough time to do things describe above, right? You have to check whether the input value is a reasonable value and prompt user the correct message.

if timeout < DEFAULT_DEPLOY_TIMEOUT {
    return nil, errors.New("deploytimeout must be greater than or equal to 120")
}

pkg/fission-cli/function.go, line 453 at r3 (raw file):

}

func fnUpdate(c *cli.Context) error {

We should allow users to update timeout setting in fnUpdate as well.

@life1347
Copy link
Member

left a comment

Reviewable status: 0 of 5 files reviewed, 6 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

SpecializationTimeout

This name makes more sense. Let me rename them :)

Please also add the validation to

func (is InvokeStrategy) Validate() error {
to make sure that the timeout setting that the user provided is a reasonable value.

vadasambar added 3 commits Aug 7, 2019
Set specialization timeout of 120 seconds if not present
details
----------
- this is to make sure existing functions have a default specialization timeout
Throw error if specialization timeout is less than 120 seconds
details
-----------
- 120 seconds is the minimum specialization timeout
- make changes to reflect the same in tests
@vadasambar
Copy link
Contributor Author

left a comment

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions (waiting on @bhavin192 and @life1347)


pkg/apis/fission.io/v1/typefields.go, line 174 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Please also add the validation to

func (is InvokeStrategy) Validate() error {
to make sure that the timeout setting that the user provided is a reasonable value.

I will add the validation for SpecializationTimeout. Thank you.


pkg/executor/newdeploy/newdeploy.go, line 434 at r1 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. I have made the change. Thank you :)

Done.


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

But it only affects the newly created functions, right? What about the old existing functions?
They don't have timeout field.

Makes sense. I will add a default specialization timeout here as well. Something similar to cli flag validation

if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
		specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
	}

Thanks for pointing it out! Please let me know if there's any other change.


pkg/fission-cli/function.go, line 154 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

I know you've added DEFAULT_DEPLOY_TIMEOUT, but what if users enter --specializationtimeout 1, isn't it overrides the default one? Then the function will never get up since it doesn't have enough time to do things describe above, right? You have to check whether the input value is a reasonable value and prompt user the correct message.

if timeout < DEFAULT_DEPLOY_TIMEOUT {
    return nil, errors.New("deploytimeout must be greater than or equal to 120")
}

Thank you for pointing it out! I will change the minimum timeout to 120.


pkg/fission-cli/function.go, line 453 at r3 (raw file):

Validate
Makes sense. I will work on this.


pkg/fission-cli/main.go, line 97 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. Let me change the name to specializationtimeout. Thank you :)

Done.

@vadasambar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

If there are any more changes, please let me know. Thank you :)

Fix fn update throws error 'specializationtimeout' flag is not specified
other
---------
- do not allow setting 'specializationtimeout' if executortype is not of the type 'newdeploy'
@vadasambar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

Travis Build has failed due to network timeout. Can someone please trigger a rebuild?

Timeout waiting for network availability.

@vishal-biyani vishal-biyani added this to the 1.5 milestone Aug 8, 2019

@life1347
Copy link
Member

left a comment

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @bhavin192, @life1347, and @vadasambar)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. I will add a default specialization timeout here as well. Something similar to cli flag validation

if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
		specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
	}

Thanks for pointing it out! Please let me know if there's any other change.

I suggest moving the code block in different places

// if no specializationTimeout is set, use default value
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
	specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
}

into this function (waitForDeploy ). So that people don't need to write the same thing again before calling waitForDeploy.

@vadasambar
Copy link
Contributor Author

left a comment

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @bhavin192 and @life1347)


pkg/executor/newdeploy/newdeploy.go, line 419 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

I suggest moving the code block in different places

// if no specializationTimeout is set, use default value
if specializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
	specializationTimeout = DEFAULT_SPECIALIZATION_TIMEOUT
}

into this function. So that people don't need to write the same thing again before calling waitForDeploy.

Done

@life1347

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

From the test_spec_merge error log, looks like we have to comment out the validation part and shows warning for now to prevent compatibility issue.

test_specs/test_spec_merge/test_spec_merge.sh.log
========== start test_specs/test_spec_merge/test_spec_merge.sh.log ==========
~/gopath/src/github.com/fission/fission/test/tests/test_specs/test_spec_merge ~/gopath/src/github.com/fission/fission
Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: Poolsize can only be configured when environment version equals to 3, default poolsize 3 will be used for creating environment pool.
Fatal error: Failed to validate specs: 1 error occurred:
	* ExecutionStrategy.SpecializationTimeout: Invalid value: 0: [SpecializationTimeout must be a value equal to or greater than 120]
Cleaning up...
~/gopath/src/github.com/fission/fission
Fatal error: Spec directory specs doesn't exist. Please check directory path or run "fission spec init" to create it.

Here is the diff I've made, you may be able to apply it directly. Changes in diff

  1. Comment out specializationtimeout validation
  2. Shows warning when running fission spec apply without aborting the task.
diff --git a/pkg/apis/fission.io/v1/validation.go b/pkg/apis/fission.io/v1/validation.go
index 085a9ca8..4eddaf6d 100644
--- a/pkg/apis/fission.io/v1/validation.go
+++ b/pkg/apis/fission.io/v1/validation.go
@@ -347,9 +347,10 @@ func (es ExecutionStrategy) Validate() error {
 			result = multierror.Append(result, MakeValidationErr(ErrorInvalidValue, "ExecutionStrategy.TargetCPUPercent", es.TargetCPUPercent, "TargetCPUPercent must be a value between 1 - 100"))
 		}
 
-		if es.SpecializationTimeout < 120 {
-			result = multierror.Append(result, MakeValidationErr(ErrorInvalidValue, "ExecutionStrategy.SpecializationTimeout", es.SpecializationTimeout, "SpecializationTimeout must be a value equal to or greater than 120"))
-		}
+		// TODO Add validation warning
+		//if es.SpecializationTimeout < 120 {
+		//	result = multierror.Append(result, MakeValidationErr(ErrorInvalidValue, "ExecutionStrategy.SpecializationTimeout", es.SpecializationTimeout, "SpecializationTimeout must be a value equal to or greater than 120"))
+		//}
 	}
 
 	return result.ErrorOrNil()
diff --git a/pkg/fission-cli/spec.go b/pkg/fission-cli/spec.go
index ec9ef25a..736aa729 100644
--- a/pkg/fission-cli/spec.go
+++ b/pkg/fission-cli/spec.go
@@ -437,6 +437,10 @@ func (fr *FissionResources) validate(c *cli.Context) error {
 		if _, ok := environments[fmt.Sprintf("%s:%s", f.Spec.Environment.Name, f.Spec.Environment.Namespace)]; !ok {
 			log.Warn(fmt.Sprintf("Environment %s is referenced in function %s but not declared in specs", f.Spec.Environment.Name, f.Metadata.Name))
 		}
+		strategy := f.Spec.InvokeStrategy.ExecutionStrategy
+		if strategy.ExecutorType == fv1.ExecutorTypeNewdeploy && strategy.SpecializationTimeout < DEFAULT_SPECIALIZATION_TIMEOUT {
+			log.Warn(fmt.Sprintf("SpecializationTimeout in function spec.InvokeStrategy.ExecutionStrategy should be a value equal to or greater than %v", DEFAULT_SPECIALIZATION_TIMEOUT))
+		}
 	}
 
 	// (ErrorOrNil returns nil if there were no errors appended.)

After applying the diff, result becomes

Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: You have provided both - container spec and pod spec and while merging the pod spec will take precedence.
Warning: Poolsize can only be configured when environment version equals to 3, default poolsize 3 will be used for creating environment pool.
Warning: SpecializationTimeout in function spec.InvokeStrategy.ExecutionStrategy should be a value equal to or greater than 120
Comment out specialization timeout in validations
details/other
-------------------
- this is to prevent test_spec_merge from failing
- add warning if specializationtimeout is lower than default value
@vadasambar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

I could apply the diff without any problems.

@life1347

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

CI result with kubernetes integration test: https://travis-ci.org/fission/fission/builds/571724467
LGTM, I will merge this PR later today.


if c.IsSet("specializationtimeout") {
if c.String("executortype") != types.ExecutorTypeNewdeploy {
log.Fatal("specializationtimeout must be greater than or equal to 120 seconds")

This comment has been minimized.

Copy link
@life1347

life1347 Aug 14, 2019

Member

Sorry, I suddenly notice that the message here is wrong, please correct it.

This comment has been minimized.

Copy link
@vadasambar

vadasambar Aug 16, 2019

Author Contributor

I should have noticed that. Thank you for pointing it out. I have fixed it.

@@ -94,6 +94,7 @@ func NewCliApp() *cli.App {
minScale := cli.IntFlag{Name: "minscale", Usage: "Minimum number of pods (Uses resource inputs to configure HPA)"}
maxScale := cli.IntFlag{Name: "maxscale", Usage: "Maximum number of pods (Uses resource inputs to configure HPA)"}
targetcpu := cli.IntFlag{Name: "targetcpu", Usage: "Target average CPU usage percentage across pods for scaling"}
specializationTimeoutFlag := cli.IntFlag{Name: "specializationtimeout, st", Usage: "Timeout for newdeploy function creation"}

This comment has been minimized.

Copy link
@life1347

life1347 Aug 14, 2019

Member
Usage: "Timeout for newdeploy to wait for function pod creation"

This comment has been minimized.

Copy link
@vadasambar

vadasambar Aug 16, 2019

Author Contributor

Fixed

@life1347 life1347 merged commit 8091056 into fission:master Aug 16, 2019

6 of 7 checks passed

code-review/reviewable 7 files, 5 discussions left (bhavin192, life1347)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
security/snyk - environments/jvm/pom.xml (soamvasani) No new issues
Details
security/snyk - environments/nodejs/package.json (soamvasani) No manifest changes detected
security/snyk - environments/python/requirements.txt (soamvasani) No manifest changes detected
security/snyk - environments/ruby/Gemfile.lock (soamvasani) No manifest changes detected
security/snyk - examples/nodejs/package.json (soamvasani) No manifest changes detected
@life1347

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

Merged! @vadasambar Thanks for the PR! And the changes' detail of each commit is great.

@vadasambar

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.