Skip to content

incorrect metrics VRAM metrics #63

@HashiramaSenjuhari

Description

@HashiramaSenjuhari

`
compute_procs = pynvml.nvmlDeviceGetComputeRunningProcesses(handle)
except pynvml.NVMLError:
compute_procs = []

    try:
        graphics_procs = pynvml.nvmlDeviceGetGraphicsRunningProcesses(handle)
    except pynvml.NVMLError:
        graphics_procs = []

    procs = {p.pid: p for p in compute_procs + graphics_procs}.values()
    summed_ns_mib = 0.0

    for proc in procs:
        pid = proc.pid
        mem_used = proc.usedGpuMemory / 1024 / 1024

`
im facing issue with vRAM metric calcuation as i was allocating 10gb but in the gpu memory consumption is very less around 1gb on checking the process usedGpuMemory

but using cudaMalloc it is showing me vRAM consumption of 10gb on checking the process usedGpuMemory

i tested same work load(60GB) with

cudaMalloc → gives the 58.2 GB only this pid 1 runs

cudaMallocManaged → gives 5GB as only pid 1 runs

fetching the device computation information from nvml
in nvidia-smi tool it is showing whole gpu used
but while checking for process specific the cudaMallocManaged is not working as expected

is there any way to get the process specific or is there any technical reason behind it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions