## 1. What is PyTorch? The Core Idea ðŸ’¡

At its heart, PyTorch is a Python library for scientific computing that's loved by researchers and developers for its simplicity and power. It provides two main features that make it perfect for deep learning:

* **Tensors:** These are multi-dimensional arrays, similar to NumPy arrays. The superpower of PyTorch tensors is that they can be easily moved to a **GPU** for massive speedups in computation.
* **Automatic Differentiation:** PyTorch can automatically calculate gradients (derivatives). This is the magic that allows neural networks to "learn" from data, and it's managed by a system called `autograd`.

It's known for being "Pythonic," meaning its design feels natural and intuitive to anyone familiar with Python. Let's dive into the most fundamental building block: the Tensor.

In [10]:
import torch
import numpy as np

# Print the PyTorch version we are using
print(f"PyTorch Version: {torch.__version__}")

PyTorch Version: 2.10.0


## 2. The Building Blocks: PyTorch Tensors ðŸ§±

Everything in PyTorch revolves around the **Tensor**. A tensor is a number, vector, matrix, or any n-dimensional array. It's the primary data structure we'll be working with.

### Creating Tensors

You can create tensors in several ways.

#### From a Python list:

The most basic way is to create a tensor directly from a Python list.

In [11]:
# Create a simple 1-dimensional tensor (a vector)
data = [[1, 2], [3, 4]]
my_tensor = torch.tensor(data)

my_tensor

tensor([[1, 2],
        [3, 4]])

#### From a NumPy array:

PyTorch integrates seamlessly with NumPy. You can create a tensor from a NumPy array and vice-versa. This is incredibly useful since many data processing libraries (like Scikit-learn, Pandas) are built on NumPy.

In [12]:
# Create a NumPy array
numpy_array = np.array([[5., 6.], [7., 8.]])
print(f"NumPy array:\n {numpy_array}\n")

NumPy array:
 [[5. 6.]
 [7. 8.]]



In [13]:
# Convert NumPy array to a PyTorch tensor
numpy_to_tensor = torch.from_numpy(numpy_array)
print(f"Tensor from NumPy:\n {numpy_to_tensor}")

Tensor from NumPy:
 tensor([[5., 6.],
        [7., 8.]], dtype=torch.float64)


#### Using built-in functions:

PyTorch also provides functions to create tensors with specific shapes and values, which is very common when initializing a neural network's weights.

In [4]:
# Create a tensor of shape (3, 4) with all ones
ones_tensor = torch.ones(3, 4)
print(f"Tensor of ones:\n {ones_tensor}\n")

# Create a tensor of shape (3, 4) with all zeros
zeros_tensor = torch.zeros(3, 4)
print(f"Tensor of zeros:\n {zeros_tensor}\n")

# Create a tensor of shape (3, 4) with random numbers from a standard normal distribution
random_tensor = torch.randn(3, 4)
print(f"Random tensor:\n {random_tensor}")

Tensor of ones:
 tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Tensor of zeros:
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

Random tensor:
 tensor([[-1.9324e+00, -2.2962e+00,  2.3067e-01, -5.9618e-01],
        [-1.8841e+00, -1.3308e+00,  1.0977e+00, -6.1980e-01],
        [ 3.5177e-01, -1.3615e+00,  4.1155e-05,  1.0963e-01]])


### Tensor Attributes

A tensor has attributes that describe its `shape`, `dtype` (data type), and the `device` (CPU or GPU) where it's stored.

In [5]:
# Let's inspect our random tensor
print(f"Shape of tensor: {random_tensor.shape}")
print(f"Datatype of tensor: {random_tensor.dtype}")
print(f"Device tensor is stored on: {random_tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Moving Tensors to the GPU

One of the key advantages of PyTorch is its ability to perform computations on a GPU for significant speed-ups. You can check if a GPU is available and move your tensor to it using the `.to()` method.

In [6]:
# 1. Check for Apple Silicon GPU (MPS)
if torch.backends.mps.is_available():
    device = "mps"
    print("Apple Silicon GPU (MPS) is available! We'll use the GPU.")
# 2. Check for NVIDIA GPU (CUDA)
elif torch.cuda.is_available():
    device = "cuda"
    print("NVIDIA GPU (CUDA) is available! We'll use the GPU.")
# 3. Fallback to CPU
else:
    device = "cpu"
    print("No GPU available, we'll use the CPU.")

# --- Using the Device ---

# Move our tensor to the selected device
tensor_on_device = random_tensor.to(device)

random_tensor = torch.randn(3, 4)
print(f"\nOur random tensor is now on: {tensor_on_device.device}")

Apple Silicon GPU (MPS) is available! We'll use the GPU.

Our random tensor is now on: mps:0


### Tensor Operations

Operations on tensors work much like you'd expect. We can perform standard arithmetic in an intuitive way.

In [14]:
# Let's create two tensors with the SAME dtype
t1 = torch.tensor([[1, 2], [3, 4]], dtype=torch.int32)
t2 = torch.ones(2, 2, dtype=torch.int32) * 10 # Creates a 2x2 tensor of 10s

print(f"t1 : {t1}")
print(f"t2 : {t2}")

t1 : tensor([[1, 2],
        [3, 4]], dtype=torch.int32)
t2 : tensor([[10, 10],
        [10, 10]], dtype=torch.int32)


In [15]:
# check their dtypes
print(f"t1 dtype: {t1.dtype}")
print(f"t2 dtype: {t2.dtype}\n")

t1 dtype: torch.int32
t2 dtype: torch.int32



In [16]:
# Addition
print("Addition:\n", t1 + t2)

Addition:
 tensor([[11, 12],
        [13, 14]], dtype=torch.int32)


In [17]:
# Element-wise multiplication
print("\nMultiplication:\n", t1 * t2)


Multiplication:
 tensor([[10, 20],
        [30, 40]], dtype=torch.int32)


In [18]:
# Matrix multiplication
print("\nMatrix Multiplication:\n", t1.matmul(t2))


Matrix Multiplication:
 tensor([[30, 30],
        [70, 70]], dtype=torch.int32)


## What are Directed Acyclic Graphs?

* **Graph:** It's a structure made of **nodes** (the tasks or data) and **edges** (the dependencies or operations connecting them).
* **Directed:** The edges have a direction, represented by arrows. This means the relationship is one-way. If you need to mix flour and eggs (Inputs) to make batter (Output), the arrow points from the ingredients *to* the batter, not the other way around. The flow is fixed.
* **Acyclic:** This is the crucial part. It means there are **no cycles or loops**. You can never start at a node, follow the directed arrows, and end up back at the same node. In our recipe analogy, you can't un-bake a cake to get the batter back. The process only moves forward.

* In short, a DAG is a flowchart with no loops, where every step flows in one direction from start to finish.
* In PyTorch, the DAG is called a computational graph. It's built automatically in the background to keep track of every operation you perform on your tensors.

In [8]:
# requires_grad=True tells PyTorch to track operations for this tensor
a = torch.tensor([2.0], requires_grad=True)
b = torch.tensor([3.0], requires_grad=True)

c = a * b
d = c + 5.0

In [9]:
  (Tensor a) -----> (*) -----> (Tensor c) -----> (+) -----> (Tensor d)
                       ^                            ^
                       |                            |
  (Tensor b) ----------+         (Scalar 5.0) ------+

SyntaxError: invalid syntax. Perhaps you forgot a comma? (3738323657.py, line 1)