Skip to content

Extend script execution timeout#15779

Merged
roperzh merged 7 commits intomainfrom
feat-extend-script-timeout
Jan 3, 2024
Merged

Extend script execution timeout#15779
roperzh merged 7 commits intomainfrom
feat-extend-script-timeout

Conversation

@mna
Copy link
Copy Markdown
Member

@mna mna commented Dec 20, 2023

#15196 This is the work of @ghernandez345 except for adding the ResponseController thing in Go to override the server timeout for that specific sync endpoint so that the calls don't timeout waiting for a script response (the default HTTP server timeout was 90s for our server).

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/ or orbit/changes/.
    See Changes files for more information.
  • Input data is properly validated, SELECT * is avoided, SQL injection is prevented (using placeholders for values in statements)
  • Added/updated tests
  • Manual QA for all new/changed functionality
    • For Orbit and Fleet Desktop changes:
      • Manual QA must be performed in the three main OSs, macOS, Windows and Linux.
      • Auto-update manual QA, from released version of component to new version (see tools/tuf/test).

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 20, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (f3d400d) 66.04% compared to head (a3c3dcc) 66.03%.
Report is 1 commits behind head on main.

Files Patch % Lines
cmd/fleet/serve.go 0.00% 7 Missing ⚠️
server/fleet/scripts.go 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15779      +/-   ##
==========================================
- Coverage   66.04%   66.03%   -0.01%     
==========================================
  Files        1067     1067              
  Lines       93546    93554       +8     
  Branches     2337     2337              
==========================================
+ Hits        61781    61782       +1     
- Misses      27141    27148       +7     
  Partials     4624     4624              
Flag Coverage Δ
backend 67.22% <33.33%> (-0.01%) ⬇️
frontend 52.04% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

expectErrMsg: fleet.RunScriptHostTimeoutErrMsg,
},
// TODO: this would take 5 minutes to run, we don't want that kind of slowdown in our test suite
// but can be useful to have around for manual testing.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd have a simple way to reduce the timeout duration, but for now this is only possible within the server/service package and the response with shorter timeouts is well tested in this package (in the integration tests), so I just commented-it out.

Comment thread cmd/fleet/serve.go
}
}
apiHandler.ServeHTTP(rw, req)
})
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly, but the issue is in the promhttp third-party package, they wrap the raw http.ResponseWriter but don't provide the Unwrap method required to unwrap it back to the original value, and this is required for ResponseController to work properly.

In the meantime and pressed by time I went with this ugly approach, but I'm sure the prometheus folks would be open to integrate the change as this is the right way to do it since Go 1.20, and then we could have a cleaner approach and do this timeout extension where the route is defined, e.g. with a ue.WithHTTPMiddleware(...).POST("/...") (which I attempted at first before noticing the prometheus thing).

gillespi314
gillespi314 previously approved these changes Dec 20, 2023
Copy link
Copy Markdown
Contributor

@gillespi314 gillespi314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Happy to review again if there are further changes.

Comment thread cmd/fleetctl/scripts_test.go
@roperzh roperzh marked this pull request as ready for review January 3, 2024 19:37
@roperzh roperzh requested review from a team as code owners January 3, 2024 19:37
Copy link
Copy Markdown
Contributor

@roperzh roperzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled and tested and works perfectly

@roperzh
Copy link
Copy Markdown
Contributor

roperzh commented Jan 3, 2024

@georgekarrv @sabrinabuckets per @lukeheath's request I'm including this in the release.

@roperzh roperzh merged commit d943fbb into main Jan 3, 2024
@roperzh roperzh deleted the feat-extend-script-timeout branch January 3, 2024 19:39
roperzh pushed a commit that referenced this pull request Jan 4, 2024
for #15196. The main problem was
that we have two timeouts:

1. The timeout used by the host to kill the script execution
2. The timeout used by the server to wait for the script results

Before the changes in #15779, the
server timeout was longer than the host timeout, but we inadvertently
set both values to 5 minutes, which breaks the logic we have to handle
both kinds of timeouts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants