Skip to content

Conversation

@jeanschmidt
Copy link
Contributor

@jeanschmidt jeanschmidt commented Oct 17, 2022

Adds LaunchTemplate ${var.environment}-action-linux-runner-nvidia where main startup script points to the SSM where cloudWatch agent config is set on SSM aws_ssm_parameter.cloudwatch_agent_config_runner_linux_nvidia[0].name. This template is then chose when the runner names ends contains .nvidia.gpu on its name.

@vercel
Copy link

vercel bot commented Oct 17, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
torchci ⬜️ Ignored (Inspect) Nov 2, 2022 at 11:08AM (UTC)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2022
@jeanschmidt jeanschmidt changed the title adds new LaunchTemplate for linuxt instances with Nvidia GPUs enablin… adds new LaunchTemplate for linuxt instances with Nvidia GPUs enabling its monitoring Oct 17, 2022
@jeanschmidt jeanschmidt changed the title adds new LaunchTemplate for linuxt instances with Nvidia GPUs enabling its monitoring adds new LaunchTemplate for linux instances with Nvidia GPUs enabling its monitoring Oct 17, 2022
console.debug(`Created SSM Parameters(s): ${createdSSMParams.join(',')}`);
}

function getLaunchTemplateName(runnerParameters: RunnerInputParameters): Array<string | undefined> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: returning an object with two named fields, or a tuple type [string | undefined, string | undefined] is more suited here.

Copy link
Contributor

@izaitsevfb izaitsevfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I wonder, could there be an easier way to do optional measurements? Or do we want to track the metrics from nvidia runners separately?

@jeanschmidt jeanschmidt merged commit e32b496 into main Nov 2, 2022
@jeanschmidt jeanschmidt deleted the jeanschmidt/monitor_nvidia_gpu branch November 2, 2022 11:13
kit1980 pushed a commit that referenced this pull request Nov 23, 2022
… its monitoring (#890)

Adds LaunchTemplate `${var.environment}-action-linux-runner-nvidia`
where main startup script points to the SSM where cloudWatch agent
config is set on SSM
`aws_ssm_parameter.cloudwatch_agent_config_runner_linux_nvidia[0].name`.
This template is then chose when the runner names ends contains
`.nvidia.gpu` on its name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants