Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[151] Runner executor #404

Merged
merged 83 commits into from
Oct 2, 2023
Merged

[151] Runner executor #404

merged 83 commits into from
Oct 2, 2023

Conversation

giventocode
Copy link
Contributor

@giventocode giventocode commented Sep 15, 2023

Closes: #151

In this PR:

  • TES runner performs the execution of commands (executor).
  • Moves the batch script preparation (creation and uploading) logic outside the BatchScheduler.
    • TaskExecutionScriptingManager Manages the creation and uploading of the Batch script and all its dependencies, this is the entry-point class the BatchScheduler uses.

    • BatchNodeScriptBuilder. Fluid builder of the script to be executed by Batch. Used by TaskExecutionScriptingManager.

    • TaskToNodeTaskConverter Handles the creation of a NodeTask from a TesTask. Used by TaskExecutionScriptingManager. Creation of the NodeTask's inputs and outputs has the following key functionality:

      • Handles content inputs by creating them as blobs in storage and setting the NodeTasks' inputs to the blobs' URLs.
      • Handles local file paths in inputs when Cromwell is using the local filesystem with a blob FUSE driver.
    • NodeTaskBuilder Fluid builder of NodeTask instances. Contains the logic that sets the URL transformation strategy for inputs/outputs using the configuration settings. Used by TaskToNodeTaskConverter.

  • Added new properties to the NodeTask model:
    • id and workflowId, uniquely identifies the task and its associated workflow id.
    • Renamed SasTransformationStrategy property in FileOutput to TransformationStrategy
  • Logging entries and formatting changes to facilitate troubleshooting.
  • Added support for local path mapping to URL for Cromwell local directories (running with blob FUSE driver) and configured external storage. accounts
  • I added the RuntimeOptions.NodeManagedIdentityResourceId (workflow ID or global ID as fallback) to the node task and changed the runner to use it ManagedIdentityCredential(new ResourceIdentifier(runtimeOptions.NodeManagedIdentityResourceId));
    If NodeManagedIdentiyResourceId is not set, the runner will use DefaultAzureCredentials() as a fallback, which works when identity is assigned and available.
    The TES server now uploads the TES task to storage account.
  • Implemented retries in the runner:
    • When downloading docker images
    • When CredentialUnavailableException is thrown while getting the auth token to generate SAS tokens.
  • Known issues:

@giventocode giventocode changed the title [WIP] Runner executor [151] Runner executor Sep 25, 2023
@giventocode giventocode marked this pull request as ready for review September 25, 2023 14:39
src/Tes.Runner/Docker/DockerExecutor.cs Outdated Show resolved Hide resolved
src/TesApi.Web/BatchNodeScriptBuilder.cs Outdated Show resolved Hide resolved
src/TesApi.Web/BatchScheduler.cs Outdated Show resolved Hide resolved
src/TesApi.Web/Runner/NodeTaskBuilder.cs Outdated Show resolved Hide resolved
Copy link
Collaborator

@BMurri BMurri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nitpicks and cleanup. Please open issues for any you don't do now.

@@ -42,7 +42,7 @@ public class RuntimeOptions
{
public TerraRuntimeOptions? Terra { get; set; }

public DockerCleanUpOptions DockerCleanUp { get; set; }
public string? NodeManagedIdentityResourceId { get; set; }
}

public class DockerCleanUpOptions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is no longer used, please remove.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been line 48

{
return retryPolicy.Execute(() => GetTokenCredentialImpl(runtimeOptions));
}
private TokenCredential GetTokenCredentialImpl(RuntimeOptions runtimeOptions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a blank line here

@@ -169,8 +154,7 @@ public partial class BatchScheduler : IBatchScheduler
_batchPoolFactory = poolFactory;
batchPrefix = batchSchedulingOptions.Value.Prefix;
logger.LogInformation("BatchPrefix: {BatchPrefix}", batchPrefix);
taskRunScriptContent = File.ReadAllLines(Path.Combine(AppContext.BaseDirectory, "scripts/task-run.sh"));
taskCleanupScriptContent = File.ReadAllLines(Path.Combine(AppContext.BaseDirectory, "scripts/clean-executor.sh"));
File.ReadAllLines(Path.Combine(AppContext.BaseDirectory, "scripts/task-run.sh"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't storing the contents of the file anywhere, this line can go away. If that is the case, please also delete the two files in the scripts directory.

return await UploadContentAndCreateTesInputAsync(tesTask, input.Path, input.Content, cancellationToken);
}

private static string EscapeBashArgument(string arg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you dumping the commands from the TesTask into a script in a single string or are you passing them to the Docker API? If the latter, we should NOT be shell escaping them. The Docker API does not turn them into a single string command-line, it passes them to the process's main() function directly.

@giventocode giventocode merged commit cf39c62 into main Oct 2, 2023
7 of 8 checks passed
@giventocode giventocode deleted the ja-nodetask-builder branch October 2, 2023 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[#148 Implement TES Task Runner] Execute Task commands via the Runner
3 participants