You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For instance, if we have available GPU shares [0.3, 0.3] for two GPUs and try to allocate 0.4 vGPU, the current FractionAllocMap in the agent allocates [0.3, 0.1] vGPU.
But this may cause problems with GPU applications that assume that all (multiple) GPUs they have access to has identical resources. For such applications, we need to allocate [0.2, 0.2].
More test cases:
[0.3, 0.2, 0.1] allocate 0.1 => [0, 0, 0.1] # should favor smaller fit to reduce fragmentation
[0.3, 0.2, 0.1] allocate 0.15 => [0, 0.15, 0] # should favor smaller fit but use the largest chunk possible
[0.3, 0.2, 0.1] allocate 0.2 => [0, 0.2, 0] # should favor smaller fit but use the largest chunk possible
[0.3, 0.2, 0.1] allocate 0.3 => [0.3, 0, 0]
[0.3, 0.2, 0.1] allocate 0.4 => [0.2, 0.2, 0]
[0.3, 0.2, 0.1] allocate 0.5 => [0.3, 0.2, 0] or [0.2, 0.2, 0.1] # if both possible, i'd prefer the lesser number of GPUs with bigger chunks
[0.3, 0.2, 0.1] allocate 0.6 => [0.3, 0.2, 0.1]
[0.3, 0.2, 0.1] allocate 0.7 => insufficient
[0.3, 0.3] allocate 0.3 => [0.3, 0]
[0.3, 0.3] allocate 0.4 => [0.2, 0.2]
[0.3, 0.3] allocate 0.5 => [0.25, 0.25]
[0.3, 0.3] allocate 0.6 => [0.3, 0.3]
[0.3, 0.3] allocate 0.7 => insufficient
[0.2, 0.2, 0.2] allocate 0.2 => [0.2, 0, 0]
[0.2, 0.2, 0.2] allocate 0.3 => [0.15, 0.15, 0]
[0.2, 0.2, 0.2] allocate 0.4 => [0.2, 0.2, 0]
[0.2, 0.2, 0.2] allocate 0.5 => [0.17, 0.17, 0.16]
[0.2, 0.2, 0.2] allocate 0.6 => [0.2, 0.2, 0.2]
[0.2, 0.2, 0.2] allocate 0.7 => insufficient
We also need to limit allocation of too small fractions which results in under ~500 MiB GPU memory (this should be configurable though) since it would not be able to execute anything with some deep learning frameworks due to the framework's default GPU memory footprint (e.g., PyTorch and TensorFlow).
Now this became the default in Backend.AI v20.09 with an addition of quantum_size configuration to limit fragmentation.
We will revisit the allocator to reduce fragmentation further.
For instance, if we have available GPU shares [0.3, 0.3] for two GPUs and try to allocate 0.4 vGPU, the current
FractionAllocMap
in the agent allocates [0.3, 0.1] vGPU.But this may cause problems with GPU applications that assume that all (multiple) GPUs they have access to has identical resources. For such applications, we need to allocate [0.2, 0.2].
More test cases:
We also need to limit allocation of too small fractions which results in under ~500 MiB GPU memory (this should be configurable though) since it would not be able to execute anything with some deep learning frameworks due to the framework's default GPU memory footprint (e.g., PyTorch and TensorFlow).
Internal ticket: OP#706
The text was updated successfully, but these errors were encountered: