Skip to content
This repository was archived by the owner on Jun 6, 2024. It is now read-only.
This repository was archived by the owner on Jun 6, 2024. It is now read-only.

[Bug] Network traffic monitor  #4834

@zheng-ningxin

Description

@zheng-ningxin

Organization Name: Microsoft

Short summary about the issue/question:
I submit a job that occupies a whole GPU node. Basically, this job trains a classification model on imagenet, and the imagenet dataset is stored on the corresponding nfs storage. I want to monitor the network bandwidth consumed by this job and find that the consumed network bandwidth is zero.
image
Then I change to the node-view and find that the consumed network bandwidth is not zero this time(the job started at 11.05). So I think the network bandwidth monitoring script may have some bugs.
image

Brief what process you are following:

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions