<div style="display: flex; justify-content: space-between; align-items: center;">
    <div style="text-align: left; flex: 4">
        <strong>Author:</strong> Amirhossein Heydari — 
        📧 <a href="mailto:amirhosseinheydari78@gmail.com">amirhosseinheydari78@gmail.com</a> — 
        🐙 <a href="https://github.com/mr-pylin/pytorch-workshop" target="_blank" rel="noopener">github.com/mr-pylin</a>
    </div>
    <div style="text-align: right; flex: 1;">
        <a href="https://pytorch.org/" target="_blank" rel="noopener noreferrer">
            <img src="../assets/images/pytorch/logo/pytorch-logo-dark.svg" 
                 alt="PyTorch Logo"
                 style="max-height: 48px; width: auto; background-color: #ffffff; border-radius: 8px;">
        </a>
    </div>
</div>
<hr>


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [array_like Sequences](#toc2_)    
- [PyTorch Tensors](#toc3_)    
  - [Ones, Zeros, Full, Empty](#toc3_1_)    
  - [Index & Slice](#toc3_2_)    
  - [Math operations](#toc3_3_)    
    - [Pointwise Ops](#toc3_3_1_)    
      - [Broadcasting](#toc3_3_1_1_)    
  - [Reshape & View](#toc3_4_)    
  - [Mutable Objects](#toc3_5_)    
    - [Copy Tensors](#toc3_5_1_)    
    - [torch.Tensor to numpy.ndarray](#toc3_5_2_)    
    - [numpy.ndarray to torch.Tensor](#toc3_5_3_)    
    - [In-Place Operations](#toc3_5_4_)    
  - [GPU Acceleration](#toc3_6_)    
  - [Reproducibility](#toc3_7_)    
    - [`torch.backends.cudnn.deterministic`](#toc3_7_1_)    
    - [`torch.backends.cudnn.benchmark`](#toc3_7_2_)    
    - [`torch.use_deterministic_algorithms`](#toc3_7_3_)    
  - [Random Sampling from a Distribution](#toc3_8_)    
  - [`torch.Tensor.item()`](#toc3_9_)    
  - [Miscellaneous](#toc3_10_)    
    - [`torch.float32` is preferred over `torch.float64` in most deep learning tasks](#toc3_10_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [None]:
import numpy as np
import torch

In [None]:
# set print options to increase line width
torch.set_printoptions(linewidth=200)

# <a id='toc2_'></a>[array_like Sequences](#toc0_)

- `list`
  - Used for storing elements of different data types
  - Flexible: there is no length & shape limit
  - Not optimized for mathematical operations
- `numpy.ndarray`
  - Implemented in C
  - Used for mathematical operations
  - Arrays are homogeneous: they can store elements of the same data type
- `troch.Tensor`
  - PyTorch's core functionality is implemented in C++
  - Optimized for deep learning operations e.g. auto gradient
  - Support GPU acceleration [NVIDIA/AMD GPUs]

📝 **Docs**:

- More on Lists: [docs.python.org/3/tutorial/datastructures.html#more-on-lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)
- `numpy.ndarray`: [numpy.org/doc/stable/reference/generated/numpy.ndarray.html](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)
- `torch.Tensor`: [docs.pytorch.org/docs/stable/tensors.html](https://docs.pytorch.org/docs/stable/tensors.html)

📚 **Tutorials**:

- Tensors: [pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)


In [None]:
# scalar : 0-dimensional array/tensor
scalar_1 = 2
scalar_2 = np.array(2)
scalar_3 = torch.tensor(2)

# log
print(f"scalar_1: {scalar_1} | ndim: 0 | dtype: {type(scalar_1)}")
print(f"scalar_2: {scalar_2} | ndim: {scalar_2.ndim} | dtype: numpy.{scalar_2.dtype}")
print(f"scalar_3: {scalar_3} | ndim: {scalar_3.ndim} | dtype: {scalar_3.dtype}")

In [None]:
# vector : 1-dimensional list/array/tensor
vector_1 = [1, 2, 3]
vector_2 = np.array(vector_1)
vector_3 = torch.tensor(vector_1)

# log
print(f"vector_1: {str(vector_1):<17} | ndim: 1 | dtype: {type(vector_1[0])}")
print(f"vector_2: {str(vector_2):<17} | ndim: {vector_2.ndim} | dtype: numpy.{vector_2.dtype}")
print(f"vector_3: {vector_3} | ndim: {vector_3.ndim} | dtype: {vector_3.dtype}")

In [None]:
# matrix : 2-dimensional list/array/tensor
matrix_1 = [[0, 1], [2, 3]]
matrix_2 = np.array(matrix_1)
matrix_3 = torch.tensor(matrix_1)

# log
print(f"matrix_1:\n{matrix_1}\nndim : 2\ndtype: {type(matrix_1[0][0])}")
print("-" * 50)
print(
    f"matrix_2:\n{matrix_2}\nmatrix_2.ndim : {matrix_2.ndim}\nmatrix_2.shape: {matrix_2.shape}\nmatrix_2.dtype: numpy.{matrix_2.dtype}"
)
print("-" * 50)
print(
    f"matrix_3:\n{matrix_3}\nmatrix_3.ndim : {matrix_3.ndim}\nmatrix_3.shape: {matrix_3.shape}\nmatrix_3.dtype: {matrix_3.dtype}"
)

In [None]:
# 3-dimensional list/array/tensor
list_3d_1 = [[[0, 1], [2, 3]], [[4, 5], [6, 7]]]
array_3d_1 = np.array(list_3d_1)
tensor_3d_1 = torch.tensor(list_3d_1)

# log
print(f"lst:\n{list_3d_1}\nndim : 3\ndtype: {type(list_3d_1[0][0][0])}")
print("-" * 50)
print(
    f"arr:\n{array_3d_1}\narr.ndim : {array_3d_1.ndim}\narr.shape: {array_3d_1.shape}\narr.dtype: numpy.{array_3d_1.dtype}"
)
print("-" * 50)
print(
    f"tsr:\n{tensor_3d_1}\ntsr.ndim : {tensor_3d_1.ndim}\ntsr.shape: {tensor_3d_1.shape}\ntsr.dtype: {tensor_3d_1.dtype}"
)

# <a id='toc3_'></a>[PyTorch Tensors](#toc0_)


## <a id='toc3_1_'></a>[Ones, Zeros, Full, Empty](#toc0_)

📝 **Docs**:

- Creation Ops: [docs.pytorch.org/docs/stable/torch.html#creation-ops](https://docs.pytorch.org/docs/stable/torch.html#creation-ops)


In [None]:
# ones
ones_1 = torch.ones(size=())
ones_2 = torch.ones(size=(2, 2))

# zeros
zeros_1 = torch.zeros(size=(2,))
zeros_2 = torch.zeros(size=(3,), dtype=torch.int16)

# full
full_1 = torch.full(size=(3,), fill_value=3, dtype=torch.int16)

# empty
empty_1 = torch.empty(size=(2, 3))

# log
for variable in ["ones_1", "ones_2", "zeros_1", "zeros_2", "full_1", "empty_1"]:
    print(f"{variable}:\n{eval(variable)}")
    print(f"{variable}.size() : {eval(variable).size()}")
    print(f"{variable}.ndim   : {eval(variable).ndim}")
    print(f"{variable}.dtype  : {eval(variable).dtype}")
    print(f"type({variable})  : {type(eval(variable))}")
    print("-" * 50)

## <a id='toc3_2_'></a>[Index & Slice](#toc0_)

- Indexing a tensor in the PyTorch C++ API works very similar to the Python API.
- All index types such as `None` / `...` / `integer` / `boolean` / `slice` / `tensor` are available in the C++ API, making translation from Python indexing code to C++ very simple.

📝 **Docs**:

- Tensor Indexing API: [pytorch.org/cppdocs/notes/tensor_indexing.html](https://pytorch.org/cppdocs/notes/tensor_indexing.html)


In [None]:
tensor_2d_1 = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# index
index_1 = tensor_2d_1[0]
index_2 = tensor_2d_1[1]
index_3 = tensor_2d_1[-1]
index_4 = tensor_2d_1[0, 0]
index_5 = tensor_2d_1[2, -2]

# log
print(f"index_1: {index_1}")
print(f"index_2: {index_2}")
print(f"index_3: {index_3}")
print(f"index_4: {index_4}")
print(f"index_5: {index_5}")

In [None]:
tensor_2d_2 = torch.arange(12).reshape((3, 4))

# slice
slice_1 = tensor_2d_2[0, :]  # same as tensor_2d_2[0]
slice_2 = tensor_2d_2[:, 1]
slice_3 = tensor_2d_2[:2, 2:]
slice_4 = tensor_2d_2[-1:, 0]

# log
print(f"slice_1:\n{slice_1}\n")
print(f"slice_2:\n{slice_2}\n")
print(f"slice_3:\n{slice_3}\n")
print(f"slice_4:\n{slice_4}")

## <a id='toc3_3_'></a>[Math operations](#toc0_)

📝 **Docs**:

- Math operations: [docs.pytorch.org/docs/stable/torch.html#math-operations](https://docs.pytorch.org/docs/stable/torch.html#math-operations)


### <a id='toc3_3_1_'></a>[Pointwise Ops](#toc0_)


In [None]:
tensor_2d_3 = torch.arange(4).reshape(2, 2)
tensor_2d_4 = torch.full(size=(2, 2), fill_value=2, dtype=torch.int64)

# arithmetic operations
arithmetic_1 = tensor_2d_3 + tensor_2d_4   # torch.add
arithmetic_2 = tensor_2d_3 - tensor_2d_4   # torch.sub
arithmetic_3 = tensor_2d_3 * tensor_2d_4   # torch.multiply
arithmetic_4 = tensor_2d_3 / tensor_2d_4   # torch.divide
arithmetic_5 = tensor_2d_3 // tensor_2d_4  # torch.floor_divide
arithmetic_6 = tensor_2d_3 % tensor_2d_4   # torch.remainder
arithmetic_7 = tensor_2d_3**tensor_2d_4    # torch.power

# log
print(f"tensor_2d_3:\n{tensor_2d_3}\n")
print(f"tensor_2d_4:\n{tensor_2d_4}")
print("-" * 50)
for i in range(7):
    print(f"arithmetic_{i+1}:\n{eval(f'arithmetic_{i+1}')}\n")

#### <a id='toc3_3_1_1_'></a>[Broadcasting](#toc0_)

📝 **Docs**:

- Broadcasting semantics: [docs.pytorch.org/docs/stable/notes/broadcasting.html](https://docs.pytorch.org/docs/stable/notes/broadcasting.html)


In [None]:
tensor_2d_5 = torch.arange(4).reshape(2, 2) + 1
tensor_2d_6 = torch.tensor([[1], [2]])

# broadcasting
broadcasting_1 = tensor_2d_5 + 1
broadcasting_2 = tensor_2d_5 + tensor_2d_6

# log
print(f"tensor_2d_5:\n{tensor_2d_5}\n")
print(f"tensor_2d_5:\n{tensor_2d_6}")
print("-" * 50)
print(f"broadcasting_1:\n{broadcasting_1}\n")
print(f"broadcasting_2:\n{broadcasting_2}\n")

## <a id='toc3_4_'></a>[Reshape & View](#toc0_)

- torch.Tensor.**view**:
  - requires the tensor to be contiguous
  - less flexible due to the contiguity requirement
  - generally faster since it doesn't involve copying data, just changes the metadata
  - [docs.pytorch.org/docs/stable/generated/torch.Tensor.reshape.html](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.reshape.html)
- torch.Tensor.**reshape**:
  - it can handle non-contiguous tensors by copying data if necessary
  - more flexible as it can work with both contiguous and non-contiguous tensors
  - might be slower if it needs to copy the data to create a contiguous block
  - [docs.pytorch.org/docs/stable/generated/torch.Tensor.view.html](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.view.html)

✍️ **Note**:

- it is advisable to use `reshape`, which returns a `view` if the shapes are compatible, and copies otherwise.


In [None]:
tensor_2d_7 = torch.arange(16).reshape(4, 4)

# reshape
reshape_1 = tensor_2d_7.reshape(2, 8)
reshape_2 = tensor_2d_7.reshape(2, -1, 2)

# log
print(f"tensor_2d_7:\n{tensor_2d_7}")
print("-" * 50)
print(f"reshape_1:\n{reshape_1}\n")
print(f"reshape_1.shape: {reshape_1.shape}")
print("-" * 50)
print(f"reshape_2:\n{reshape_2}\n")
print(f"reshape_2.shape: {reshape_2.shape}")

In [None]:
# assignment by index
tensor_2d_7[0, 0] = 100

# log
print(f"tensor_2d_7:\n{tensor_2d_7}")
print("-" * 50)
print(f"reshape_1:\n{reshape_1}\n")

## <a id='toc3_5_'></a>[Mutable Objects](#toc0_)

- mutable objects refer to objects that can be modified after they are created.
- For example, `list`, `numpy.ndarray`, `torch.Tensor` are mutable objects.


### <a id='toc3_5_1_'></a>[Copy Tensors](#toc0_)

- `torch.clone`:
  - creates a hard/deep copy
  - This function is differentiable, so gradients will flow back from the result of this operation to `input`
- [docs.pytorch.org/docs/stable/generated/torch.clone.html](https://docs.pytorch.org/docs/stable/generated/torch.clone.html)


In [None]:
tensor_1d_1 = torch.zeros(size=(5,))

# clone
clone_1 = tensor_1d_1.clone()

# assignment by index
tensor_1d_1[0] = 1

# log
print(f"clone_1: {clone_1}")

### <a id='toc3_5_2_'></a>[torch.Tensor to numpy.ndarray](#toc0_)

- [docs.pytorch.org/docs/stable/generated/torch.Tensor.numpy.html](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.numpy.html)


In [None]:
tensor_1d_2 = torch.tensor([1, 2, 6, 3])

# shared memory
tensor_to_numpy_1 = tensor_1d_2.numpy()

# hard/deep copy [using <np.array> is deprecated!]
tensor_to_numpy_2 = np.copy(tensor_1d_2)  # alternative: tensor_1d_2.clone().numpy() 

# assignment by index
tensor_1d_2[0] = 0

# deep dive into data buffer addresses (virtual memory address)
dba_1 = hex(tensor_1d_2.data_ptr())
dba_2 = hex(tensor_to_numpy_1.__array_interface__['data'][0])
dba_3 = hex(tensor_to_numpy_2.__array_interface__['data'][0])

# log
print(f"tensor_1d_2             : {tensor_1d_2}")
print(f"type(tensor_1d_2)       : {type(tensor_1d_2)}")
print(f"Memory Address (hex)    : {dba_1}")
print("-" * 50)
print(f"tensor_to_numpy_1       : {tensor_to_numpy_1}")
print(f"type(tensor_to_numpy_1) : {type(tensor_to_numpy_1)}")
print(f"Memory Address (hex)    : {dba_2}")
print("-" * 50)
print(f"tensor_to_numpy_2       : {tensor_to_numpy_2}")
print(f"type(tensor_to_numpy_2) : {type(tensor_to_numpy_2)}")
print(f"Memory Address (hex)    : {dba_3}")

### <a id='toc3_5_3_'></a>[numpy.ndarray to torch.Tensor](#toc0_)

- [docs.pytorch.org/docs/stable/generated/torch.from_numpy.html](https://docs.pytorch.org/docs/stable/generated/torch.from_numpy.html)


In [None]:
array_1d_1 = np.array([1, 4, 2, 3])

# convert + shared memory
numpy_to_tensor_1 = torch.from_numpy(array_1d_1)

# convert + copy
numpy_to_tensor_2 = torch.tensor(array_1d_1)

# assignment by index
array_1d_1[0] = 0

# deep dive into data buffer addresses (virtual memory address)
dba_1 = hex(array_1d_1.__array_interface__['data'][0])
dba_2 = hex(numpy_to_tensor_1.data_ptr())
dba_3 = hex(numpy_to_tensor_2.data_ptr())

# log
print(f"array_1d_1              : {array_1d_1}")
print(f"type(array_1d_1)        : {type(array_1d_1)}")
print(f"Memory Address (hex)    : {dba_1}")
print("-" * 50)
print(f"numpy_to_tensor_1       : {numpy_to_tensor_1}")
print(f"type(numpy_to_tensor_1) : {type(numpy_to_tensor_1)}")
print(f"Memory Address (hex)    : {dba_2}")
print("-" * 50)
print(f"numpy_to_tensor_2       : {numpy_to_tensor_2}")
print(f"type(numpy_to_tensor_2) : {type(numpy_to_tensor_2)}")
print(f"Memory Address (hex)    : {dba_3}")

### <a id='toc3_5_4_'></a>[In-Place Operations](#toc0_)

- Operations that have a `_` suffix are in-place.
- In-place operations save some memory, but can be problematic when computing derivatives because of an immediate loss of history.
- Hence, their use is discouraged.

📚 **Tutorials**:

- in-place operations: [pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#:~:text=3.%2C%203.%2C%203.%5D%5D\)-,In%2Dplace%20operations,-Operations%20that%20have)


In [None]:
tensor_1d_3 = torch.tensor([1.0, 2.0, 3.0, 4.0])

# in-place addition
tensor_1d_3.add_(2)  # tensor_1d_3 += 2

# out-of-place addition
another_tensor = torch.add(tensor_1d_3, 2)  # another_tensor = tensor_1d_3 + 2

# log
print(f"tensor_1d_3    : {tensor_1d_3}")
print(f"another_tensor : {another_tensor}")

## <a id='toc3_6_'></a>[GPU Acceleration](#toc0_)

- PyTorch supports GPU acceleration, leveraging industry-standard libraries to enable high-performance computing.

❓ **Supported GPU Platforms**:

<table style="margin:0 auto; text-align:center">
  <thead>
    <tr>
      <th>GPU Vendor</th>
      <th>Operating System</th>
      <th>Supported</th>
      <th>Library/Backend</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>NVIDIA</td>
      <td>Linux</td>
      <td>✅</td>
      <td>CUDA, cuDNN, cuBLAS</td>
    </tr>
    <tr>
      <td>NVIDIA</td>
      <td>Windows</td>
      <td>✅</td>
      <td>CUDA, cuDNN, cuBLAS</td>
    </tr>
    <tr>
      <td>NVIDIA</td>
      <td>Mac</td>
      <td>❌</td>
      <td>N/A</td>
    </tr>
    <tr>
      <td>AMD</td>
      <td>Linux</td>
      <td>✅</td>
      <td>ROCm (HIP)</td>
    </tr>
    <tr>
      <td>AMD</td>
      <td>Windows</td>
      <td>⚠️</td>
      <td>DirectML</td>
    </tr>
    <tr>
      <td>AMD</td>
      <td>Mac</td>
      <td>✅</td>
      <td>MPS (Metal)</td>
    </tr>
    <tr>
      <td>Intel</td>
      <td>Linux</td>
      <td>✅</td>
      <td>SYCL</td>
    </tr>
    <tr>
      <td>Intel</td>
      <td>Windows</td>
      <td>✅</td>
      <td>SYCL</td>
    </tr>
    <tr>
      <td>Intel</td>
      <td>Mac</td>
      <td>❌</td>
      <td>N/A</td>
    </tr>
  </tbody>
</table>


✍️ **Notes**:

- PyTorch comes **prebuilt** with `CUDA` and `cuDNN` binaries for NVIDIA GPUs, so you do not need to install `CUDA` or `cuDNN` separately.
- Tensors on the `GPU` cannot be directly converted to `np.ndarray` or other structures that do not support GPU operations.

📝 **Docs**:

- CUDA semantics: [docs.pytorch.org/docs/stable/notes/cuda.html](https://docs.pytorch.org/docs/stable/notes/cuda.html)
- HIP (ROCm) semantics: [docs.pytorch.org/docs/stable/notes/hip.html](https://docs.pytorch.org/docs/stable/notes/hip.html)
- Getting Started on Intel GPU: [docs.pytorch.org/docs/stable/notes/get_start_xpu.html](https://docs.pytorch.org/docs/stable/notes/get_start_xpu.html)
- MPS backend: [docs.pytorch.org/docs/stable/notes/mps.html](https://docs.pytorch.org/docs/stable/notes/mps.html)


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

# alternative
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# log
print(f"device: {device}")

In [None]:
if torch.cuda.is_available():

    # number of cuda devices
    num_cuda_devices = torch.cuda.device_count()
    print(f"num_cuda_devices  : {num_cuda_devices}")

    # cuda models
    for i in range(num_cuda_devices):
        print(f"cuda {i}:")
        print(f"\tname                  : {torch.cuda.get_device_properties(i).name}")
        print(f"\ttotal_memory          : {torch.cuda.get_device_properties(i).total_memory} bytes")
        print(f"\tmulti_processor_count : {torch.cuda.get_device_properties(i).multi_processor_count}")

In [None]:
tensor_1d_4 = torch.ones(5)                 # CPU                                     [default]
tensor_1d_5 = torch.ones(5, device=device)  # CPU/GPU (depends on device)             [dynamic]
tensor_1d_6 = tensor_1d_3.to(device)        # convert to CPU/GPU (depends on device)  [dynamic]
tensor_1d_7 = tensor_1d_3.cuda()            # convert to GPU                          [static]
tensor_1d_8 = tensor_1d_3.cpu()             # convert to CPU                          [static]

# log
print(f"tensor_1d_4        : {tensor_1d_4}")
print(f"tensor_1d_4.device : {tensor_1d_4.device}\n")
print(f"tensor_1d_5        : {tensor_1d_5}")
print(f"tensor_1d_5.device : {tensor_1d_5.device}\n")
print(f"tensor_1d_6        : {tensor_1d_6}")
print(f"tensor_1d_6.device : {tensor_1d_6.device}\n")
print(f"tensor_1d_7        : {tensor_1d_7}")
print(f"tensor_1d_7.device : {tensor_1d_7.device}\n")
print(f"tensor_1d_8        : {tensor_1d_8}")
print(f"tensor_1d_8.device : {tensor_1d_8.device}")

In [None]:
tensor_1d_9 = torch.ones(size=(5,), device=device)

# torch.Tensor to numpy.ndarray
try:
    tensor_1d_9.numpy()
except TypeError as e:
    print(e)

In [None]:
tensor_1d_10 = torch.ones(size=(5,), device=device)

# torch.Tensor to numpy.ndarray
tensor_to_numpy_3 = tensor_1d_10.cpu().numpy()

# log
print(f"tensor_to_numpy_3       : {tensor_to_numpy_3}")
print(f"type(tensor_to_numpy_3) : {type(tensor_to_numpy_3)}")

## <a id='toc3_7_'></a>[Reproducibility](#toc0_)

- **Seed**: An initial value used to initialize a pseudo-random number generator, ensuring reproducibility of random sequences.

- **Platform and Release Variations**:
  - Completely reproducible results are not guaranteed across:
    - Different PyTorch releases
    - Individual commits
    - Different platforms (e.g., CPU vs. GPU, different OS)

- **Performance Trade-offs**:
  - Deterministic operations are often slower than nondeterministic operations.
  
- **Benefits of Determinism**:
  - Determinism can save time in development by facilitating:
    - Experimentation
    - Debugging
    - Testing

- **Random Seed Functions**:

<table style="margin:0 auto; text-align:left">
  <thead>
    <tr>
      <th>Function</th>
      <th>Scope</th>
      <th>Devices Affected</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>torch.manual_seed(seed)</code></td>
      <td>CPU and CUDA/ROCm GPUs</td>
      <td>All CPU and GPU devices</td>
    </tr>
    <tr>
      <td><code>torch.cuda.manual_seed(seed)</code></td>
      <td>Current CUDA/ROCm GPU</td>
      <td>Single GPU (current device)</td>
    </tr>
    <tr>
      <td><code>torch.cuda.manual_seed_all(seed)</code></td>
      <td>All CUDA/ROCm GPUs</td>
      <td>All GPU devices</td>
    </tr>
  </tbody>
</table>

📝 **Docs**:

- Reproducibility: [docs.pytorch.org/docs/stable/notes/randomness.html](https://docs.pytorch.org/docs/stable/notes/randomness.html)


In [None]:
# set a seed only for GPU
torch.cuda.manual_seed(42)

# deep dive into random number generator state
state_1 = torch.get_rng_state()
state_1_int = int.from_bytes(state_1[:8].numpy().tobytes(), byteorder='little')
state_2 = torch.cuda.get_rng_state()
state_2_int = int.from_bytes(state_2[:8].numpy().tobytes(), byteorder='little')

# log
print(f"torch.get_rng_state()             : {state_1}")
print(f"state_1 as integer [only 64 bits] : {state_1_int}")
print(f"torch.cuda.get_rng_state()        : {state_2}")
print(f"state_2 as integer [only 64 bits] : {state_2_int}")

In [None]:
# set a seed for both CPU & GPU
torch.manual_seed(42)

# deep dive into random number generator state
state_1 = torch.get_rng_state()
state_1_int = int.from_bytes(state_1[:8].numpy().tobytes(), byteorder='little')
state_2 = torch.cuda.get_rng_state()
state_2_int = int.from_bytes(state_2[:8].numpy().tobytes(), byteorder='little')

# log
print(f"torch.get_rng_state()             : {state_1}")
print(f"state_1 as integer [only 64 bits] : {state_1_int}")
print(f"torch.cuda.get_rng_state()        : {state_2}")
print(f"state_2 as integer [only 64 bits] : {state_2_int}")

### <a id='toc3_7_1_'></a>[`torch.backends.cudnn.deterministic`](#toc0_)

- This flag ensures that the CUDA Deep Neural Network library (cuDNN) uses deterministic algorithms.
- the results will be the same for every run when the same input and seed are provided.
- Default value is `False`.

💥 **Impact**

- When set to `True`, It can slow down your computations because deterministic algorithms are typically slower due to fewer optimizations

📝 **Docs**:

- [docs.pytorch.org/docs/stable/backends.html#torch.backends.cudnn.deterministic](https://docs.pytorch.org/docs/stable/backends.html#torch.backends.cudnn.deterministic)


In [None]:
torch.backends.cudnn.deterministic = True

### <a id='toc3_7_2_'></a>[`torch.backends.cudnn.benchmark`](#toc0_)

- This flag enables the cuDNN auto-tuner to find the best algorithm for your hardware.
- It is useful when the input sizes to your model are changing or not fixed.
- Default value is `False`

💥 **Impact**

- When set to `True`, cuDNN will select the best algorithm for your hardware, potentially improving performance.
- If you need exact reproducibility, you should not set benchmark to `True`!

📝 **Docs**:

- [docs.pytorch.org/docs/stable/backends.html#torch.backends.cudnn.benchmark](https://docs.pytorch.org/docs/stable/backends.html#torch.backends.cudnn.benchmark)


In [None]:
torch.backends.cudnn.benchmark = False

### <a id='toc3_7_3_'></a>[`torch.use_deterministic_algorithms`](#toc0_)

- This function ensures that all the operations that could be non-deterministic are forced to use deterministic algorithms.
- Default value is `False`.

💥 **Impact**

- When set to `True`, This could lead to slower performance as deterministic algorithms are often slower due to the lack of certain optimizations.

📝 **Docs**:

- [docs.pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms](https://docs.pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms)


In [None]:
torch.use_deterministic_algorithms(True)

# check
print(f"torch.are_deterministic_algorithms_enabled(): {torch.are_deterministic_algorithms_enabled()}")

## <a id='toc3_8_'></a>[Random Sampling from a Distribution](#toc0_)

- Random sampling from a distribution refers to the process of generating random samples from a specific probability distribution.
- In most cases, the goal is to sample from these distributions to simulate or model real-world phenomena.

📝 **Docs**:

- [docs.pytorch.org/docs/stable/torch.html#random-sampling](https://docs.pytorch.org/docs/stable/torch.html#random-sampling)


In [None]:
# a tensor filled with random numbers from a uniform distribution on the interval [0,1)
rand_1 = torch.rand(size=(5,))

# log
print(f"rand_1       : {rand_1}")
print(f"rand_1.dtype : {rand_1.dtype}")

In [None]:
# a tensor of random numbers drawn from separate normal distributions
normal_1 = torch.normal(mean=0, std=0.1, size=(5,))

# log
print(f"normal_1        : {normal_1}")
print(f"normal_1.mean() : {normal_1.mean()}")
print(f"normal_1.std()  : {normal_1.std()}")
print(f"normal_1.dtype  : {normal_1.dtype}")

## <a id='toc3_9_'></a>[`torch.Tensor.item()`](#toc0_)

- What you see is not necessarily the actual value!
- [docs.pytorch.org/docs/stable/generated/torch.Tensor.item.html](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.item.html)


In [None]:
tensor_1d_11 = torch.rand(size=(6,))

# item()
value_1 = tensor_1d_11[0]
value_2 = tensor_1d_11[0].item()

# log
print(f"tensor_1d_11: {tensor_1d_11}")
print("-" * 50)
print(f"value_1       : {value_1}")
print(f"value_1.dtype : {value_1.dtype}\n")
print(f"value_2       : {value_2}")
print(f"type(value_2) : {type(value_2)}")

## <a id='toc3_10_'></a>[Miscellaneous](#toc0_)


### <a id='toc3_10_1_'></a>[`torch.float32` is preferred over `torch.float64` in most deep learning tasks](#toc0_)

<table style="margin:0 auto;">
  <thead>
    <tr>
      <th style="text-align:center">Aspect</th>
      <th style="text-align:center">Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><b>Performance and Speed</b></td>
      <td>Single-precision (<code>torch.float32</code>) operations are faster and require less computational effort compared to double-precision (<code>torch.float64</code>).</td>
    </tr>
    <tr>
      <td><b>Memory Usage</b></td>
      <td><code>torch.float32</code> uses 32 bits (4 bytes) per value, while <code>torch.float64</code> uses 64 bits (8 bytes), leading to lower memory requirements for <code>float32</code>.</td>
    </tr>
    <tr>
      <td><b>Adequate Precision</b></td>
      <td>For most deep learning tasks, <code>torch.float32</code> offers sufficient precision. Double-precision (<code>torch.float64</code>) is often unnecessary.</td>
    </tr>
    <tr>
      <td><b>Energy Efficiency</b></td>
      <td>Single-precision arithmetic is more energy-efficient than double-precision, making it ideal for tasks that demand lower power consumption.</td>
    </tr>
    <tr>
      <td><b>Industry Standards</b></td>
      <td><code>torch.float32</code> is the standard in deep learning frameworks and is widely used across research and production environments.</td>
    </tr>
    <tr>
      <td><b>Hardware Constraints</b></td>
      <td>Many hardware platforms, including embedded systems and mobile devices, have limited computational resources and memory, making <code>torch.float32</code> more suitable.</td>
    </tr>
  </tbody>
</table>

✍️ **Notes**:

- `torch.float32` is also called `float` or `single-precision`.
- `torch.float64` is also called `double` or `double-precision`.
