incorrect metrics VRAM metrics

`
         compute_procs = pynvml.nvmlDeviceGetComputeRunningProcesses(handle)
        except pynvml.NVMLError:
            compute_procs = []

        try:
            graphics_procs = pynvml.nvmlDeviceGetGraphicsRunningProcesses(handle)
        except pynvml.NVMLError:
            graphics_procs = []

        procs = {p.pid: p for p in compute_procs + graphics_procs}.values()
        summed_ns_mib = 0.0

        for proc in procs:
            pid = proc.pid
            mem_used = proc.usedGpuMemory / 1024 / 1024
`
im facing issue with vRAM metric calcuation as i was allocating 10gb but in the gpu memory consumption is very less around 1gb on checking the process usedGpuMemory

but using cudaMalloc it is showing me vRAM consumption of 10gb on checking the process usedGpuMemory

i tested same work load(60GB) with

`cudaMalloc → gives the 58.2 GB` only this pid 1 runs

`cudaMallocManaged → gives 5GB` as only pid 1 runs

fetching the device computation information from nvml
in nvidia-smi tool it is showing whole gpu used
but while checking for process specific the cudaMallocManaged is not working as expected

is there any way to get the process specific or is there any technical reason behind it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect metrics VRAM metrics #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

incorrect metrics VRAM metrics #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions