Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuArrays allocates a lot of memory on the default GPU #153

Closed
anj00 opened this issue Dec 27, 2018 · 3 comments
Closed

CuArrays allocates a lot of memory on the default GPU #153

anj00 opened this issue Dec 27, 2018 · 3 comments
Labels
cuda array Stuff about CuArray. performance How fast can we go?

Comments

@anj00
Copy link

anj00 commented Dec 27, 2018

Hello,

In every multi GPU config I use I see one GPU (default) is having significantly higher memory usage . I don't know if this is me missing some configuration setting or a bug/missing feature. Would really appreciate hints on how to solve.

Trouble is that (for a particular algorithm) the mem overhead all piled up on a single gpu makes it impossible to actually use the default gpu for calculations. no more memory left

Here is a typical setup
Win 10
Julia 1.0.3
CUDAdrv v0.8.6
CUDAnative v0.9.1
CuArrays v0.8.1
(above combination of packages is the latest I could make working together)
CPU: 9900k with all win graphics running off build in GPU. So GPUs having 0 load/0 mem usage before we start Julia

4x2070 8GB GPUs (but have same problem on a 2 gpu configs, different models)

Here is the simplest example (prepares 16 julia workers, 4 per GPU)

"master" code

using Distributed
addprocs(15)
@everywhere include("worker_code.jl")

and content of the worker_code.jl is

using CUDAdrv: CuDevice, CuContext, DeviceSet
dev = CuDevice((myid() % length(DeviceSet())))
ctx = CuContext(dev)
println("Running on ", dev)

using CuArrays

As you see above code doesn't create anything custom yet. just prepares the basics. Yet, the memory usage on the default GPU is 2.9GB vs on 3 others only 0.4GB

I would expected CuArrays to distribute mem it needs evenly across GPUs (taking into account dev or context we set in CUDAdrv). is it possible with some flags?

@vchuravy
Copy link
Member

You need to switch the device before loading CuArrays, since your dev and ctx didn't change the default device

asyncmap(collect(zip(workers(), CUDAnative.devices()))) do (p, d)
    remotecall_wait(() -> CUDAnative.device!(d), p)
    nothing
end

@anj00
Copy link
Author

anj00 commented Dec 28, 2018

Thanks @vchuravy! your proposal works perfectly! I now have very even memory usage distribution.

In the context of the code above I did

using CUDAdrv: CuDevice, CuContext, DeviceSet

use_dev = myid() % length(DeviceSet())

dev = CuDevice(use_dev)
ctx = CuContext(dev)
println("Running on ", dev)

using CUDAnative
CUDAnative.device!(use_dev)

using CuArrays

@maleadt maleadt closed this as completed Oct 25, 2019
@maleadt
Copy link
Member

maleadt commented Oct 25, 2019

Oh, I guess the needs docs is a reason to leave this open.

@maleadt maleadt reopened this Oct 25, 2019
@maleadt maleadt transferred this issue from JuliaGPU/CuArrays.jl May 27, 2020
@maleadt maleadt added cuda array Stuff about CuArray. performance How fast can we go? labels May 27, 2020
@maleadt maleadt closed this as completed Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. performance How fast can we go?
Projects
None yet
Development

No branches or pull requests

3 participants