Skip to content

Cannot use automate_training on Compute Canada cluster #496

Open
@jcohenadad

Description

@jcohenadad

Based on internal discussions, it appears that the script automate_training.py does not run on Compute Canada. The problem might be coming from the allocation of specific GPUs via this script.

The purpose of this issue is to document the problem, what has been tried, the communication with Compute Canada, and (ultimately) the solution. So that, in the future, if others run into the same problem, they can start from this issue thread instead of re-investigating from the beginning (and re-emailing Compute Canada folks with the same question).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions