## Introduction

Tensors are the core data structure in PyTorch. Through creating and working with tensors, you will learn how tensors are similar to NumPy arrays, but offer additional features for deep learning.

## Setup

### Import Libraries

In [None]:
import time
import torch
import numpy as np

## Section 1: Tensors are Fully Compatible with NumPy

PyTorch tensors can easily be converted to and from NumPy arrays, making it simple to use PyTorch alongside existing NumPy code. Many NumPy-style operations work directly on tensors, such as indexing, slicing, and common mathematical functions.

| Code | Description |
| :-- | :-- |
| `x.numpy()` | Convert a tensor to a NumPy array. |
| `torch.tensor(arr)` | Convert a NumPy array to a tensor. |
| `torch.from_numpy(arr)` | Convert a NumPy array to a tensor (shares memory with the original array). |
| `x = torch.tensor(2)` <br> `float(x)` | Turn torch tensor containing scalar into float number. |
| `x = torch.tensor(2.4)` <br> `x.item()` | Get value of tensor as standard Python number (the type is automatically recognized). |
| `x.sum()` | Sum all elements in the tensor. |
| `x.mean()` | Calculate the mean of all elements. |
| `x[0]` | Index the first element. |
| `x[1:3]` | Slice elements from index 1 to 2. |
| `torch.matmul(A,B)` | Performs matrix multiplication on the two tensors A and B. |

**Example**: Create a tensor `x` containing the values 10, 20, 30 and convert it to a NumPy array using `.numpy()`.

In [None]:
x = torch.tensor([10, 20, 30])

In [None]:
x.numpy()

array([10, 20, 30])

**Exercise**: Create a NumPy array and convert it to a PyTorch tensor using `torch.tensor()`.

In [None]:
arr = np.array([1.5, 2.5, 3.5])

In [None]:
torch.tensor(arr)

tensor([1.5000, 2.5000, 3.5000], dtype=torch.float64)

**Exercise**: Create a NumPy array and convert it to a tensor using `torch.from_numpy()`. This method shares memory with the original array, so changes to one will affect the other.

In [None]:
arr = np.array([1.0, 2.0, 3.0])

In [None]:
torch.from_numpy(arr)

tensor([1., 2., 3.], dtype=torch.float64)

**Exercise**: Use `.item()` to get the value of the tensor `x` below as a standard Python number.

In [None]:
x = torch.tensor(3.14)

In [None]:
x.item()

3.140000104904175

**Exercise**: Convert the tensor object of the scalar number 5 into a float object.

In [None]:
x = torch.tensor([5])
x

tensor([5])

In [None]:
float(x)

5.0

**Exercise**: Calculate the sum of all elements in tensor `x`.

In [None]:
x = torch.tensor([1.0, 2.0, 3.0, 4.0])

In [None]:
x.sum()

tensor(10.)

**Exercise**: Calculate the mean of all elements in tensor `y` below.

In [None]:
y = torch.tensor([2.0, 4.0, 6.0, 8.0])

In [None]:
y.mean()

tensor(5.)

**Exercise**: Multiply tensor A with tensor B.

In [None]:
A = torch.tensor([[1,2], [3,4], [5,6]])
B = torch.tensor([[7,8,9], [10,11,12]])

In [None]:
torch.matmul(A,B)

tensor([[ 27,  30,  33],
        [ 61,  68,  75],
        [ 95, 106, 117]])

## Section 2: Torch Tensors can Run Off of GPUs

One of PyTorch's key advantages over NumPy is its ability to run computations on GPUs, which can dramatically speed up neural network training. You can check if a GPU is available, move tensors between CPU and GPU, or create tensors directly on the GPU.

| Code | Description |
| :-- | :--|
| `x.device` | Where is the tensor stored? |
| `torch.cuda.is_available()` | Is a GPU available? |
| `torch.cuda.device_count()` | Number of GPUs  |
| `torch.cuda.get_device_name()` | Name of GPUs (if any) |
| `x = torch.tensor([1, 2, 3], device='cuda')` | Create a tensor on the GPU |
| `x_cpu = x.to('cpu')` | Copy the tensor to the CPU. |
| `x_gpu = x_cpu.to('cuda')` | Copy the tensor to the GPU. |

This section should be done in Google Colab. Open the notebook in Google Colab by clicking the button below.

**Example**: Check if a GPU is available on your machine using `torch.cuda.is_available()`.

In [None]:
import torch
torch.cuda.is_available()

False

**Exercise**: Click on "Runtime" in the toolbar, then "Change runtime type", select "T4 GPU" and click "Save". The session is restarted when you change the runtime type, so run the first cell below to import the libraries again. Then, check if a GPU is available on your machine. What is the output now?


In [None]:
import torch

In [None]:
torch.cuda.is_available()

True

**Exercise**: Check how many GPUs are available using `torch.cuda.device_count()`.

In [None]:
torch.cuda.device_count()

1

**Exercise**: If a GPU is available, get the name of the GPU using `torch.cuda.get_device_name()`. If no GPU is available, this cell will produce an error.

In [None]:
torch.cuda.get_device_name()

'Tesla T4'

**Exercise**: Create a tensor `x` containing the values 1, 2, 3 and check which device it is stored on using `.device`.

In [None]:
x = torch.tensor([1, 2, 3])
x.device

device(type='cpu')

**Exercise**: If a GPU is available, move the tensor `x` to the GPU using `.to('cuda')`. If no GPU is available, this cell will produce an error.

In [None]:
x_gpu = x.to('cuda')
x_gpu.device

device(type='cuda', index=0)

**Exercise**: Create a tensor directly on the GPU by specifying `device='cuda'` in `torch.tensor()`. If no GPU is available, this cell will produce an error.

In [None]:
y = torch.tensor([4, 5, 6], device='cuda')
y.device

device(type='cuda', index=0)

**Exercise**: Move the tensor `y` created above back to the CPU using `.to('cpu')`.

In [None]:
y_cpu = y.to('cpu')
y_cpu.device

device(type='cpu')

## (Extra, Demo) Section 3: PyTorch Can Massively Speed up Calculations

This section should be done in Google Colab. Open the notebook in Google Colab by clicking the button below if you're not already working in Google Colab.

**Exercise**: Run cells below to compare how long it takes to do a matrix multiplication with pure Python, Numpy, PyTorch on CPU, and PyTorch on GPU.

In [None]:
import time
import torch
import numpy as np

In [None]:
# Check if GPU is available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

Using device: cuda


In [None]:
# Large matrix multiplication comparison
size = 500  # smaller size for pure Python because it's too slow

# Create large random matrices with NumPy
A_np = np.random.randn(size, size).astype(np.float32)
B_np = np.random.randn(size, size).astype(np.float32)

# Time pure Python (nested loops)
A_list = A_np.tolist()
B_list = B_np.tolist()

start = time.perf_counter()
C_python = [[sum(A_list[i][k] * B_list[k][j] for k in range(size))
             for j in range(size)] for i in range(size)]
python_time = time.perf_counter() - start

print(f"Matrix multiplication ({size}x{size}):")
print(f"  Pure Python:   {python_time:.4f} seconds")

# Time NumPy (CPU)
start = time.perf_counter()
C_np = np.matmul(A_np, B_np)
numpy_time = time.perf_counter() - start
print(f"  NumPy (CPU):   {numpy_time:.4f} seconds")

# Time PyTorch CPU
A_cpu = torch.from_numpy(A_np)
B_cpu = torch.from_numpy(B_np)

start = time.perf_counter()
C_cpu = torch.matmul(A_cpu, B_cpu)
pytorch_cpu_time = time.perf_counter() - start
print(f"  PyTorch (CPU): {pytorch_cpu_time:.4f} seconds")

# Time PyTorch GPU (if available)
if device == 'cuda':
    A_gpu = A_cpu.to('cuda')
    B_gpu = B_cpu.to('cuda')

    # Warm-up run (first GPU call has extra overhead)
    _ = torch.matmul(A_gpu, B_gpu)
    torch.cuda.synchronize()

    # Timed run
    start = time.perf_counter()
    C_gpu = torch.matmul(A_gpu, B_gpu)
    torch.cuda.synchronize()  # wait for GPU to finish
    gpu_time = time.perf_counter() - start

    print(f"  PyTorch (GPU): {gpu_time:.4f} seconds")
    print(f"\nSpeedups vs Pure Python:")
    print(f"  NumPy:       {python_time/numpy_time:.0f}x faster")
    print(f"  PyTorch CPU: {python_time/pytorch_cpu_time:.0f}x faster")
    print(f"  PyTorch GPU: {python_time/gpu_time:.0f}x faster")
else:
    print(f"\nSpeedups vs Pure Python:")
    print(f"  NumPy:       {python_time/numpy_time:.0f}x faster")
    print(f"  PyTorch CPU: {python_time/pytorch_cpu_time:.0f}x faster")

Matrix multiplication (500x500):
  Pure Python:   14.4273 seconds
  NumPy (CPU):   0.0041 seconds
  PyTorch (CPU): 0.0092 seconds
  PyTorch (GPU): 0.0002 seconds

Speedups vs Pure Python:
  NumPy:       3494x faster
  PyTorch CPU: 1566x faster
  PyTorch GPU: 80552x faster


In [None]:
# Large matrix multiplication comparison
size = 4096

# Create large random matrices with NumPy
A_np = np.random.randn(size, size).astype(np.float32)
B_np = np.random.randn(size, size).astype(np.float32)

# Time NumPy (CPU)
start = time.perf_counter()
C_np = np.matmul(A_np, B_np)
numpy_time = time.perf_counter() - start

print(f"Matrix multiplication ({size}x{size}):")
print(f"  NumPy (CPU):   {numpy_time:.4f} seconds")

# Time PyTorch CPU
A_cpu = torch.from_numpy(A_np)
B_cpu = torch.from_numpy(B_np)

start = time.perf_counter()
C_cpu = torch.matmul(A_cpu, B_cpu)
pytorch_cpu_time = time.perf_counter() - start

print(f"  PyTorch (CPU): {pytorch_cpu_time:.4f} seconds")

# Time PyTorch GPU (if available)
if device == 'cuda':
    A_gpu = A_cpu.to('cuda')
    B_gpu = B_cpu.to('cuda')

    # Warm-up run (first GPU call has extra overhead)
    _ = torch.matmul(A_gpu, B_gpu)
    torch.cuda.synchronize()

    # Timed run
    start = time.perf_counter()
    C_gpu = torch.matmul(A_gpu, B_gpu)
    torch.cuda.synchronize()  # wait for GPU to finish
    gpu_time = time.perf_counter() - start

    print(f"  PyTorch (GPU): {gpu_time:.4f} seconds")
    print(f"\nGPU is {numpy_time/gpu_time:.1f}x faster than NumPy")

Matrix multiplication (4096x4096):
  NumPy (CPU):   1.1602 seconds
  PyTorch (CPU): 1.2312 seconds
  PyTorch (GPU): 0.0467 seconds

GPU is 24.9x faster than NumPy
