Skip to content

Debugging using with ssh for Github Actions

Eli Uriegas edited this page Sep 9, 2021 · 23 revisions

Debug using with-ssh for Github Actions

Context

with-ssh is a feature for Github Actions that was created to replicate the features set of CircleCI’s Re-run with SSH feature that a lot of the development team is used to.

Platform availability:

Comparisons to CircleCI’s Re-run with SSH

Similarities

  • 2 hour time limit after job has finished for when SSH sessions will be closed out and runner will return to the regular runner pool
  • Public keys for ssh are pulled from Github using https://github.com/${github.actor}.keys

Differences

  • Only works for FB employees on the FB VPN
  • Requires a label (with-ssh in this case) to be applied to the Pull Request
    • As opposed to clicking Re-run with SSH through the CircleCI UI
    • NOTE: SSH keys will not be added to jobs ran before the label is applied so workflows will need to be re-ran after the with-ssh label has been applied
      • This is true even if the job is re-run using GitHub’s “re-run jobs” button
      • A completely new workflow run needs to be triggered after the label is applied
  • Only works for pull_request events and will not work on main branch push events

Known limitations

  • Only works for FB employees on FB VPN
    • No current planned support for outside collaborators yet

Workflow for users

  1. Label your PR with the with-ssh label
    1. Screen Shot 2021-08-04 at 11 29 30 AM
  2. Push a new commit / re-run completed workflows, see below for re-running jobs through the Github UI
    1. image (1)
  3. Traverse to logs for a build or test job that runs the add-github-ssh-key step added (currently all of our linux workflows have this enabled)
    1. Screen Shot 2021-08-04 at 11 47 56 AM
  4. Use the SSH command provided to log into the node:
    1. Screen Shot 2021-08-04 at 11 49 25 AM

Notes for users

General

  • The default timeout for these jobs is 2 hours after workflows have completed
    • Users will be kicked after workflows have either timed out or have been cancelled

Windows

  • The Windows workspace is currently located at C:\actions-runner\_work\pytorch\pytorch\pytorch-${run_id}
  • To use other shells for Windows just append the shell you'd like to run to your ssh command like:
    • ssh runneruser@ec2-3-238-136-38.compute-1.amazonaws.com -- bash.exe

Planned features

cc @ezyang @seemethere @malfet @walterddr @lg20987 @pytorch/pytorch-dev-infra

Clone this wiki locally