<a href="https://colab.research.google.com/github/sufiyansayyed19/myTorch/blob/main/00_PyTorch_Methods_INTERVIEW_REVISION_SHEET.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch Methods â€” INTERVIEW REVISION SHEET

## Purpose

Ultra-compact revision sheet for PyTorch APIs commonly asked in interviews.
Focus: what it does, when to use it, and common traps.

Use this before interviews or rapid revision.

---

## 1. TENSOR CREATION & INSPECTION

**Creation**

* `torch.tensor(data)`
  * Creates a tensor with new memory
  * Safe copy from Python / NumPy

* `torch.zeros(shape)`, `torch.ones(shape)`, `torch.empty(shape)`
  * zeros / ones: initialized
  * empty: uninitialized (fast, unsafe for reading)

* `torch.rand`, `torch.randn`
  * rand: uniform [0, 1)
  * randn: normal distribution

* `torch.arange(start, end, step)`

* `torch.linspace(start, end, steps)`

**Inspection**

* `tensor.shape`
* `tensor.dtype`
* `tensor.device`
* `tensor.numel()`

**Common trap:**
* Assuming `torch.tensor` shares memory with NumPy (it does not)

---

## 2. SHAPE & STRUCTURE METHODS

* `reshape(new_shape)`
  * Safer default
  * May return copy or view

* `view(new_shape)`
  * Shares memory
  * Requires contiguous tensor

* `flatten(start_dim=0)`

* `squeeze(dim?)` / `unsqueeze(dim)`

* `permute(dims)`

* `transpose(dim0, dim1)`

* `contiguous()`

**Common traps:**
* Using `view` on non-contiguous tensors
* Losing batch dimension with `squeeze()`

---

## 3. TENSOR MATH & REDUCTIONS

**Elementwise**
* `+`, `-`, `*`, `/`
* `abs`, `pow`, `sqrt`

**Reductions**
* `sum(dim?)`
* `mean(dim?)`
* `max` / `min` (values + indices)
* `argmax` / `argmin`

**Key rule:**
* Reductions reduce dimensions unless `keepdim=True`

---

## 4. INDEXING & SELECTION

* Basic indexing: `x[i]`, `x[i, j]`
* Slicing: `x[:, :2]`
* Boolean masking: `x[x > 0]`
* `torch.where(condition, a, b)`
* `torch.gather(input, dim, index)`
* `torch.nonzero(x)`

**Common trap:**
* Indexing often reduces dimensions

---

## 5. MEMORY & COPY SEMANTICS

* Assignment: `b = a` (shared memory)
* `clone()`: new memory
* `detach()`: breaks gradient tracking, shares memory
* `requires_grad_(True/False)`
* `copy_(src)`: in-place copy
* `is_contiguous()`
* In-place ops: `add_()`, `mul_()`, etc.

**Common traps:**
* Silent bugs from shared memory
* Breaking gradients with in-place ops

---

## 6. DEVICE & DTYPE UTILITIES

* `tensor.to(device=?, dtype=?)`
* `tensor.cpu()`
* `tensor.cuda()` (conceptual)

**Dtype casts**
* `float()`, `long()`, `int()`, `type()`

**Common traps:**
* Mixing CPU and GPU tensors
* Integer tensors in division

---

## 7. RANDOMNESS & INITIALIZATION

* `torch.manual_seed(seed)`
* `rand`, `randn`, `randint`
* `normal(mean, std, size)`
* `uniform_()` (in-place)

**Common traps:**
* Forgetting to set seed
* Confusing `rand` vs `randn`

---

## 8. nn.Module & PARAMETERS

* `nn.Module`
  * Base class for models
  * Define `forward()`, not `backward()`

* `nn.Parameter`
  * Marks tensor as trainable

* `model.parameters()`
* `model.named_parameters()`
* `register_buffer(name, tensor)`
* `state_dict()`

**Key rule:**
* Parameters learn, buffers do not

---

## 9. LOSS FUNCTIONS

* `nn.MSELoss`
  * Regression

* `nn.CrossEntropyLoss`
  * Multi-class classification
  * Input: (N, C), Target: (N)
  * Applies softmax internally

* `nn.BCELoss`
  * Binary classification (probabilities)

* `nn.BCEWithLogitsLoss`
  * Binary classification (logits)
  * Sigmoid included

* `reduction`: mean / sum / none

**Common traps:**
* Applying softmax/sigmoid twice
* Wrong target shape

---

## 10. OPTIMIZERS

* `torch.optim.SGD`
* `torch.optim.Adam`

**Core calls**
* `optimizer.zero_grad()`
* `loss.backward()`
* `optimizer.step()`

**Key rule:**
* Gradients accumulate unless zeroed

**Common traps:**
* Forgetting `zero_grad()`
* Assuming Adam needs no LR tuning

---

## 11. DATASET & DATALOADER

* `Dataset`
  * `__len__()`
  * `__getitem__(idx)`

* `DataLoader`
  * `batch_size`
  * `shuffle`

**Key ideas**
* Dataset: one sample
* DataLoader: batches of samples

**Common trap:**
* Putting batching logic inside Dataset

---

## ONE-LINE GLOBAL SUMMARY

PyTorch is predictable if you track three things: shape, memory, and gradients.

In [1]:
import torch
import numpy as np

In [2]:
# torch.tensor(data)
x = torch.tensor([1, 2, 3])
print(x)

tensor([1, 2, 3])


In [3]:
# torch.zeros(shape)
x = torch.zeros((2, 3))
print(x)

tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [4]:
# torch.ones(shape)
x = torch.ones((2, 2))
print(x)

tensor([[1., 1.],
        [1., 1.]])


In [5]:
# torch.empty(shape)
x = torch.empty((2, 2))
print(x)

tensor([[5.1777e-28, 0.0000e+00],
        [5.1774e-28, 0.0000e+00]])


In [6]:
# torch.rand
x = torch.rand((2, 2))
print(x)

tensor([[0.8812, 0.0924],
        [0.5765, 0.5940]])


In [7]:
# torch.randn
x = torch.randn((2, 2))
print(x)

tensor([[0.2833, 0.7113],
        [0.0904, 0.0659]])


In [8]:
# torch.arange(start, end, step)
x = torch.arange(0, 10, 2)
print(x)

tensor([0, 2, 4, 6, 8])


In [9]:
# torch.linspace(start, end, steps)
x = torch.linspace(0, 1, 5)
print(x)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


In [10]:
# tensor.shape
x = torch.randn(3, 4)
print(x.shape)

torch.Size([3, 4])


In [11]:
# tensor.dtype
x = torch.tensor([1.0, 2.0])
print(x.dtype)

torch.float32


In [12]:
# tensor.device
x = torch.ones(1)
print(x.device)

cpu


In [13]:
# tensor.numel()
x = torch.zeros(2, 3)
print(x.numel())

6


In [14]:
# reshape(new_shape)
x = torch.arange(6)
y = x.reshape(2, 3)
print(y)

tensor([[0, 1, 2],
        [3, 4, 5]])


In [15]:
# view(new_shape)
x = torch.arange(6)
y = x.view(3, 2)
print(y)

tensor([[0, 1],
        [2, 3],
        [4, 5]])


In [16]:
# flatten(start_dim=0)
x = torch.randn(2, 3, 4)
y = x.flatten(start_dim=1)
print(y.shape)

torch.Size([2, 12])


In [17]:
# squeeze(dim?)
x = torch.zeros(1, 3, 1)
y = x.squeeze()
print(y.shape)

torch.Size([3])


In [18]:
# unsqueeze(dim)
x = torch.zeros(3, 3)
y = x.unsqueeze(0)
print(y.shape)

torch.Size([1, 3, 3])


In [19]:
# permute(dims)
x = torch.randn(2, 3, 4)
y = x.permute(2, 0, 1)
print(y.shape)

torch.Size([4, 2, 3])


In [20]:
# transpose(dim0, dim1)
x = torch.randn(2, 3)
y = x.transpose(0, 1)
print(y.shape)

torch.Size([3, 2])


In [21]:
# contiguous()
x = torch.randn(3, 2).t()
print(x.is_contiguous())
y = x.contiguous()
print(y.is_contiguous())

False
True


In [22]:
# Elementwise (+, -, *, /)
a = torch.tensor([1, 2])
b = torch.tensor([3, 4])
print(a + b, a * b)

tensor([4, 6]) tensor([3, 8])


In [23]:
# abs, pow, sqrt
x = torch.tensor([-1.0, 4.0])
print(torch.abs(x), torch.pow(x, 2), torch.sqrt(torch.abs(x)))

tensor([1., 4.]) tensor([ 1., 16.]) tensor([1., 2.])


In [24]:
# sum(dim?)
x = torch.ones(2, 3)
print(x.sum(), x.sum(dim=0))

tensor(6.) tensor([2., 2., 2.])


In [25]:
# mean(dim?)
x = torch.tensor([1.0, 2.0, 3.0])
print(x.mean())

tensor(2.)


In [26]:
# max / min
x = torch.randn(3)
val, idx = x.max(dim=0)
print(val, idx)

tensor(1.8942) tensor(1)


In [27]:
# argmax / argmin
x = torch.tensor([1, 5, 2])
print(x.argmax())

tensor(1)


In [28]:
# Basic indexing & Slicing
x = torch.randn(3, 3)
print(x[0, 1], x[:, :2])

tensor(0.4983) tensor([[-0.2434,  0.4983],
        [-0.3668,  0.4063],
        [ 0.7006, -0.2079]])


In [29]:
# Boolean masking
x = torch.tensor([1, -1, 2])
print(x[x > 0])

tensor([1, 2])


In [30]:
# torch.where(condition, a, b)
x = torch.tensor([1, -1, 2])
y = torch.where(x > 0, x, torch.zeros_like(x))
print(y)

tensor([1, 0, 2])


In [31]:
# torch.gather(input, dim, index)
x = torch.tensor([[1, 2], [3, 4]])
indices = torch.tensor([[0, 0], [1, 0]])
print(torch.gather(x, 1, indices))

tensor([[1, 1],
        [4, 3]])


In [32]:
# torch.nonzero(x)
x = torch.tensor([1, 0, 2])
print(torch.nonzero(x))

tensor([[0],
        [2]])


In [33]:
# clone()
x = torch.ones(2)
y = x.clone()
print(y)

tensor([1., 1.])


In [34]:
# detach()
x = torch.ones(2, requires_grad=True)
y = x.detach()
print(y.requires_grad)

False


In [35]:
# requires_grad_(True/False)
x = torch.ones(2)
x.requires_grad_(True)
print(x.requires_grad)

True


In [36]:
# copy_(src)
x = torch.zeros(2)
y = torch.ones(2)
x.copy_(y)
print(x)

tensor([1., 1.])


In [37]:
# is_contiguous()
x = torch.randn(3, 2)
print(x.is_contiguous())

True


In [38]:
# In-place ops (add_)
x = torch.ones(2)
x.add_(5)
print(x)

tensor([6., 6.])


In [39]:
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

In [40]:
# tensor.to(device=?, dtype=?)
x = torch.ones(1)
y = x.to(device='cpu', dtype=torch.float64)
print(y.dtype, y.device)

torch.float64 cpu


In [41]:
# tensor.cpu()
x = torch.ones(1)
y = x.cpu()
print(y.device)

cpu


In [42]:
# tensor.cuda() (conceptual)
if torch.cuda.is_available():
    x = torch.ones(1).cuda()
    print(x.device)

In [43]:
# float()
x = torch.tensor([1, 2])
print(x.float().dtype)

torch.float32


In [44]:
# long()
x = torch.tensor([1.0, 2.0])
print(x.long().dtype)

torch.int64


In [45]:
# int()
x = torch.tensor([1.0, 2.0])
print(x.int().dtype)

torch.int32


In [46]:
# type()
x = torch.tensor([1, 2])
print(x.type(torch.FloatTensor))

tensor([1., 2.])


In [47]:
# torch.manual_seed(seed)
torch.manual_seed(42)
print(torch.rand(1))

tensor([0.8823])


In [48]:
# rand
print(torch.rand(2, 2))

tensor([[0.9150, 0.3829],
        [0.9593, 0.3904]])


In [49]:
# randn
print(torch.randn(2, 2))

tensor([[ 0.3258, -0.8676],
        [ 1.5231,  0.6647]])


In [50]:
# randint
print(torch.randint(0, 10, (2, 2)))

tensor([[4, 1],
        [2, 5]])


In [51]:
# normal(mean, std, size)
print(torch.normal(mean=0.0, std=1.0, size=(2, 2)))

tensor([[-0.1671, -0.1079],
        [-1.4285, -0.2810]])


In [52]:
# uniform_() (in-place)
x = torch.empty(2, 2).uniform_(0, 1)
print(x)

tensor([[0.2695, 0.3588],
        [0.1994, 0.5472]])


In [53]:
# nn.Module
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    def forward(self, x):
        return self.linear(x)
model = SimpleModel()
print(model)

SimpleModel(
  (linear): Linear(in_features=10, out_features=1, bias=True)
)


In [54]:
# nn.Parameter
class ParamModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.p = nn.Parameter(torch.randn(1))
model = ParamModel()
print(list(model.parameters()))

[Parameter containing:
tensor([1.3525], requires_grad=True)]


In [55]:
# model.parameters()
model = nn.Linear(2, 1)
for p in model.parameters(): print(p)

Parameter containing:
tensor([[ 0.4087, -0.3091]], requires_grad=True)
Parameter containing:
tensor([0.4082], requires_grad=True)


In [56]:
# model.named_parameters()
model = nn.Linear(2, 1)
for name, p in model.named_parameters(): print(name, p.shape)

weight torch.Size([1, 2])
bias torch.Size([1])


In [57]:
# register_buffer(name, tensor)
class BuffModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer('my_buffer', torch.ones(3))
model = BuffModel()
print(model.my_buffer)

tensor([1., 1., 1.])


In [58]:
# state_dict()
model = nn.Linear(2, 1)
print(model.state_dict().keys())

odict_keys(['weight', 'bias'])


In [59]:
# nn.MSELoss
criterion = nn.MSELoss()
loss = criterion(torch.tensor([1.0]), torch.tensor([1.5]))
print(loss)

tensor(0.2500)


In [60]:
# nn.CrossEntropyLoss
criterion = nn.CrossEntropyLoss()
logits = torch.randn(1, 5)
target = torch.tensor([3])
print(criterion(logits, target))

tensor(1.7099)


In [61]:
# nn.BCELoss
criterion = nn.BCELoss()
probs = torch.tensor([0.1, 0.9])
target = torch.tensor([0.0, 1.0])
print(criterion(probs, target))

tensor(0.1054)


In [62]:
# nn.BCEWithLogitsLoss
criterion = nn.BCEWithLogitsLoss()
logits = torch.tensor([1.0, -1.0])
target = torch.tensor([1.0, 0.0])
print(criterion(logits, target))

tensor(0.3133)


In [63]:
# torch.optim.SGD & torch.optim.Adam
model = nn.Linear(10, 1)
optimizer_sgd = torch.optim.SGD(model.parameters(), lr=0.01)
optimizer_adam = torch.optim.Adam(model.parameters(), lr=0.001)

In [64]:
# optimizer.zero_grad()
optimizer_sgd.zero_grad()

In [65]:
# loss.backward()
loss = torch.tensor(1.0, requires_grad=True)
loss.backward()
print(loss.grad)

tensor(1.)


In [66]:
# optimizer.step()
optimizer_sgd.step()

In [67]:
# Dataset
class MyDataset(Dataset):
    def __len__(self):
        return 10
    def __getitem__(self, idx):
        return torch.randn(3), torch.tensor(1)
dataset = MyDataset()
print(len(dataset), dataset[0])

10 (tensor([ 0.1498, -0.2089, -0.3870]), tensor(1))


In [68]:
# DataLoader
loader = DataLoader(dataset, batch_size=2, shuffle=True)
for x, y in loader:
    print(x.shape, y)
    break

torch.Size([2, 3]) tensor([1, 1])
