Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort jobs as early as possible #9927

Merged
merged 3 commits into from
May 13, 2024
Merged

Conversation

hubertp
Copy link
Contributor

@hubertp hubertp commented May 12, 2024

Pull Request Description

We don't seem to run abortJobs under a lock, and especially not under the write compilation lock, in other scenarios. This is causing some major slowdown when there is a long running execution or compilation, as currently experienced in the cloud.

This should reduce chances of a timeout.

Also added an option to override the global executor. Currently it would always default to the runtime number of available process which may be suboptimal.

Important Notes

Pending testing on the impact it will have.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

We don't seem to run `abortJobs` under a lock, and especially not under
the write compilation lock, in other scenarios. This is causing some
major slowdown when there is a long running execution or compilation,
as currently experienced in the cloud.

This should reduce chances of a timeout.
@hubertp hubertp added the CI: No changelog needed Do not require a changelog entry for this PR. label May 12, 2024
@@ -130,6 +130,7 @@ class CollaborativeBuffer(
stop(Map.empty)

case IOTimeout =>
logger.warn("Timeout reached when awaiting file's content")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be this the right moment to dump all stacktraces or turn on profiling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not there yet, but yes, probably.

@@ -42,12 +42,12 @@ public Future<BoxedUnit> executeAsynchronously(RuntimeContext ctx, ExecutionCont
private void setExecutionEnvironment(
Runtime$Api$ExecutionEnvironment executionEnvironment, UUID contextId, RuntimeContext ctx) {
var logger = ctx.executionService().getLogger();
ctx.jobControlPlane().abortJobs(contextId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is moving abortJobs call outside of critical section, right? But the PR description says:

We don't seem to run abortJobs under a lock

How's that accurate? We do seem to run abortJobs under a lock and this PR is changing that, am I not right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant in other locations that call abortJobs

@hubertp hubertp added the CI: Ready to merge This PR is eligible for automatic merge label May 13, 2024
@mergify mergify bot merged commit c67218c into develop May 13, 2024
37 checks passed
@mergify mergify bot deleted the wip/hubert/9789-setexecutionenv-slow branch May 13, 2024 07:39
hubertp added a commit that referenced this pull request May 13, 2024
`System.getProperty` does not return `null`, it returns `"null"`
:facepalm:.
I broke the internet, sorry.
@hubertp hubertp mentioned this pull request May 13, 2024
hubertp added a commit that referenced this pull request May 13, 2024
`System.getProperty` does not return `null`, it returns `"null"`
:facepalm:.
I broke the internet, sorry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants