<a href="https://colab.research.google.com/github/snatched11/100-Days-Of-PyTorch/blob/main/Day%201-%20Tensors.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**What is a Tensor?**
* A tensor is a multidimensional array, like NumPy arrays but optimized for GPU acceleration.

* Everything in PyTorch ‚Äî embeddings, model weights, attention matrices ‚Äî is a tensor.

* Tensors can track gradients for backprop using requires_grad=True.

| Property        | Description                                |
| --------------- | ------------------------------------------ |
| `shape`         | Dimensions of the tensor, e.g., `(3,4)`    |
| `dtype`         | Data type (`float32`, `int64`)             |
| `device`        | Location of the tensor (`cpu` or `cuda:0`) |
| `requires_grad` | Tracks operations for autograd             |


**Tensor creation**

In [None]:
import torch
a = torch.tensor ([1,2,3], dtype=torch.float32)

In [None]:
zeros = torch.zeros(2, 3)
zeros

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [None]:
ones = torch.ones(2, 3)
ones

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

Using device: cpu


In [None]:
tensor_on_device = torch.zeros(3,3).to(device)
print("Tensor device:", tensor_on_device.device)

Tensor device: cpu


**Challenge**

In [None]:
# Step 1: Create a 3D tensor of shape (2,3,4) with random numbers
tensor3d = torch.randn(2,3,4)

# Step 2: Move it to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor3d = tensor3d.to(device)

# Step 3: Multiply by 2
tensor3d = tensor3d * 2

# Step 4: Slice the first 2 "matrices" along the first axis
sliced = tensor3d[:2,:,:]

# Step 5: Print shape, dtype, device, and the resulting tensor
print ("Tensor 3d ", tensor3d)
print("Sliced tensor shape:", sliced.shape)
print("Dtype:", sliced.dtype)
print("Device:", sliced.device)
print("Tensor:", sliced)


Tensor 3d  tensor([[[ 1.7096, -1.1733,  0.5819,  2.1277],
         [-1.6149, -0.9564, -2.2833, -2.0570],
         [-1.1769, -0.7180, -1.0052, -0.4275]],

        [[-4.1561,  1.9215,  4.1073,  1.0599],
         [ 2.8609, -0.9351,  2.5445, -2.6114],
         [-1.4519,  1.9645,  2.4712,  0.0575]]])
Sliced tensor shape: torch.Size([2, 3, 4])
Dtype: torch.float32
Device: cpu
Tensor: tensor([[[ 1.7096, -1.1733,  0.5819,  2.1277],
         [-1.6149, -0.9564, -2.2833, -2.0570],
         [-1.1769, -0.7180, -1.0052, -0.4275]],

        [[-4.1561,  1.9215,  4.1073,  1.0599],
         [ 2.8609, -0.9351,  2.5445, -2.6114],
         [-1.4519,  1.9645,  2.4712,  0.0575]]])


**tensor3d**

matrix 0  ‚Üí tensor3d[0]

matrix 1  ‚Üí tensor3d[1]   ‚Üê this is the 2nd matrix


In [None]:
dropped = tensor3d[[0]]
print(dropped.shape)
print(dropped)

torch.Size([1, 3, 4])
tensor([[[ 1.7096, -1.1733,  0.5819,  2.1277],
         [-1.6149, -0.9564, -2.2833, -2.0570],
         [-1.1769, -0.7180, -1.0052, -0.4275]]])


In [None]:
dropped = tensor3d[:1]
print(dropped.shape)
print(dropped)

torch.Size([1, 3, 4])
tensor([[[ 1.7096, -1.1733,  0.5819,  2.1277],
         [-1.6149, -0.9564, -2.2833, -2.0570],
         [-1.1769, -0.7180, -1.0052, -0.4275]]])


In [None]:
result = tensor3d[1:]
print(result)
print(result.shape)


tensor([[[-4.1561,  1.9215,  4.1073,  1.0599],
         [ 2.8609, -0.9351,  2.5445, -2.6114],
         [-1.4519,  1.9645,  2.4712,  0.0575]]])
torch.Size([1, 3, 4])


**Transformer and tensor connection**

S1: "I like AI"

S2: "AI likes math"

S3: "Math is fun"

S4: "I like math"

**Batch size = 4**


**Step 2Ô∏è‚É£ Tokenization (numbers only)**

I ‚Üí 1

like ‚Üí 2

AI ‚Üí 3

likes ‚Üí 4

math ‚Üí 5

is ‚Üí 6

fun ‚Üí 7

**Tokenized + padded to same length (seq_len = 3):**

[

 [1, 2, 3],   # I like AI

 [3, 4, 5],   # AI likes math

 [5, 6, 7],   # Math is fun

 [1, 2, 5]    # I like math

]

In [None]:
tokens = torch.tensor([
    [1,2,3],
    [3,4,5],
    [5,6,7],
    [1,2,5]
])

#(batch=4, seq_len=3)

**Step 3Ô∏è‚É£ Embedding**

vocab_size x hidden_dim

Assume, hidden_dim = 6

In [None]:
embedding_table = torch.randn(8, 6)  # vocab_size=8
X = embedding_table[tokens]
print(X.shape)
X


torch.Size([4, 3, 6])


tensor([[[-0.1929,  0.3516,  0.3627,  0.4145, -0.6670, -1.1954],
         [-0.8289, -0.9557,  0.4611,  0.0383, -1.2085, -1.7519],
         [-0.6806, -0.0751, -1.3931,  0.9885, -0.4611, -0.2347]],

        [[-0.6806, -0.0751, -1.3931,  0.9885, -0.4611, -0.2347],
         [ 0.2808, -0.7160,  1.3994, -0.4271, -0.0214,  1.0952],
         [ 0.2251, -1.8447,  0.8342,  0.2588, -0.0382,  0.5012]],

        [[ 0.2251, -1.8447,  0.8342,  0.2588, -0.0382,  0.5012],
         [-1.6523,  0.6432, -0.0726,  0.0510, -0.5297, -1.1286],
         [-1.6155,  0.3753, -1.5254, -0.6479,  1.3709,  0.2496]],

        [[-0.1929,  0.3516,  0.3627,  0.4145, -0.6670, -1.1954],
         [-0.8289, -0.9557,  0.4611,  0.0383, -1.2085, -1.7519],
         [ 0.2251, -1.8447,  0.8342,  0.2588, -0.0382,  0.5012]]])

Meaning:

4 sentences

each has 3 tokens

each token is a 6-dim vector

üìå This tensor is the input to the transformer

4 sentences ‚Üí batch dimension = 4

3 words per sentence ‚Üí seq_len = 3

Each word becomes a vector ‚Üí hidden_dim = 6

**Some interview questions**

**What is a tensor in PyTorch, and how is it different from a NumPy array?**

Answer:

A tensor is a multidimensional array that can:

* Run on CPU or GPU

* Track gradients for automatic differentiation

* Integrate directly with deep learning models

NumPy arrays:

* Are CPU-only

* Do not support autograd

* Are not designed for large-scale neural networks

üëâ In LLMs, every input, weight, and output is a tensor ‚Äî not a NumPy array.

**What does the shape (batch_size, seq_len, hidden_dim) represent?**

Answer:

* batch_size: number of sequences processed in parallel

* seq_len: number of tokens per sequence

* hidden_dim: size of vector representing each token

Example:

(4, 10, 768)


Means:

* 4 sentences

* each with 10 tokens

* each token represented as a 768-dim vector

üëâ This is the standard input shape for transformers.

**What is dtype and why does it matter?**

Answer:

dtype defines the data type of a tensor (e.g. float32, float16, int64).

It matters because:

* It affects memory usage

* It affects speed

* It affects numerical stability

Example:

float32 ‚Üí standard training

float16 / bfloat16 ‚Üí faster, less memory (used in LLMs)

int64 ‚Üí token IDs

üëâ Wrong dtype = slower models or runtime errors.

**Why embeddings are 3D tensors ?**

How the Tensor is Built
You can think of the transition from a single word to a 3D tensor like this:

* 1D (Vector): You have one word, like "Apple." It is represented by a list of 512 numbers.

* 2D (Matrix): You have a full sentence, like "Apple is a fruit." Now you have a grid (a matrix) where each row is the vector for one of those words.

* 3D (Tensor): You want to train the model on 100 sentences at once to make it faster. You stack 100 of those "sentence matrices" on top of each other. This "stack" is your 3D tensor.

Why do we use 3D ?

* Parallel Processing: Graphics cards (GPUs) are designed to do math on entire blocks of data simultaneously. Storing embeddings in a 3D tensor allows the GPU to calculate the relationships between all words in all sentences at the same time.

* Contextual Awareness: In models like BERT or GPT, the model needs to see the entire sequence (Dimension 2) to understand that "bank" in a river context is different from "bank" in a money context