In [1]:
import torch

In [2]:
torch.cuda.is_available()

True

Initializes PyTorch's CUDA state. This is required when interacting with its C API.
When working in Python the CUDA state is initialized on demand.

In [3]:
torch.cuda.init()

In [4]:
torch.cuda.current_device()

0

In [5]:
torch.cuda.device_count()

1

In [14]:
torch.cuda.memory_allocated()

512

Behind the scenes PyTorch uses a <b>caching memory
allocator</b> to speed up memory allocations - this allows fast
memory deallocation without device synchronizations

In [15]:
# torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved
torch.cuda.memory_reserved()

2097152

<b>"cuda"</b> refers to the default CUDA device used by PyTorch - this is something
that can be changed using the device context manager

In [9]:
cuda = torch.device('cuda')

cuda

device(type='cuda')

In [10]:
cuda0 = torch.device('cuda:0')
cuda1 = torch.device('cuda:1')
cuda2 = torch.device('cuda:2')

In [11]:
cuda0, cuda1, cuda2

(device(type='cuda', index=0),
 device(type='cuda', index=1),
 device(type='cuda', index=2))

When you create a torch tensor and you haven't specified a CUDA device,
this tensor, by default, is created on the CPU.

In [12]:
x = torch.tensor([10., 20.])

x

tensor([10., 20.])

Create a tensor on the default CUDA device

In [13]:
x_default = torch.tensor([10., 20.], device=cuda)

x_default

tensor([10., 20.], device='cuda:0')

In [16]:
x0 = torch.tensor([10., 20.], device=cuda0)

x0

tensor([10., 20.], device='cuda:0')

In [17]:
x1 = torch.tensor([10., 20.], device=cuda1)

x1

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Returns a copy of this object in CUDA memory - if the tensor is
<b>already in CUDA memory</b> and on the <b>correct device</b> no copy is performed

In [18]:
y = x.cuda()

y

tensor([10., 20.], device='cuda:0')

In [19]:
print('Outside the context: ', torch.cuda.current_device())
with torch.cuda.device(0):
    print('Inside the context: ', torch.cuda.current_device())
print('Inside the context again: ', torch.cuda.current_device())

Outside the context:  0
Inside the context:  0
Inside the context again:  0


In [20]:
with torch.cuda.device(0):
    a = torch.tensor([10., 20.])
    a0 = torch.tensor([10., 20.], device=cuda0)
    a, a0

In [22]:
a0

tensor([10., 20.], device='cuda:0')

torch.Tensor.to() is used to perform tensor data type or device conversion

In [24]:
b1 = a.to(device=cuda0)

b1

tensor([10., 20.], device='cuda:0')

Operations cannot be performed on tensors on different CUDA devices

In [25]:
sum_a = a + a0

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

In [26]:
sum_a = a.cuda() + a0

In [27]:
sum_a

tensor([20., 40.], device='cuda:0')

In [28]:
torch.cuda.memory_allocated()

3072

In [29]:
torch.cuda.memory_reserved()

2097152

In [30]:
torch.cuda.empty_cache()

In [31]:
torch.cuda.memory_reserved()

2097152

Operations prefixed with "new_" create new tensors of the same type on the same device

In [32]:
preserve_context = x.new_full([2,2], fill_value=1.1)

preserve_context

tensor([[1.1000, 1.1000],
        [1.1000, 1.1000]])

In [33]:
preserve_context0 = x0.new_full([2,2], fill_value=1.1)

preserve_context0

tensor([[1.1000, 1.1000],
        [1.1000, 1.1000]], device='cuda:0')