## Tensor Introduction

#### Analogy
Imagine a library:

- **Scalar**: A single book.
- **Vector**: A shelf with a row of books.
- **Matrix**: A bookshelf with rows (shelves) and columns (positions on each shelf).
- **Tensor**: A whole library building with multiple floors, each containing several bookshelves. Each book’s location can be identified by specifying the floor, shelf, and position on the shelf. This layered structure is similar to how tensors work in multiple dimensions.

#### Notes
Tensors are kind of like rigid and strictly defined multidimensional hashes, at least as far as indexing goes. This is a very naive comparison, but one very useful for my understanding. For example, a perl hash can read like $HASH->{floor}->{shelf}->{position}.

Tensors are a type of multidimensional array that abides by mathematical rules to ensure they capture intrinsic geometric or physical relationships no matter the coordinate system viewed in.

## Important tensor functions

#### Descriptive analysis
Given a tensor defined as `t = torch.tensor([1, 2, 3])`

- `t.ndim` - number of dimensions
- `t.shape` - shape of the tensor
- `t.dtype` - data type of tensor

#### Tensor construction
- `t = torch.tensor([1, 2, 3], dtype=torch.float32)` - set data type to float **almost always want to do this**
- `t = torch.zeros(3)` - create a 0 tensor of size 3
- `t = torch.ones(3)` - create tensor of 1's of size 3
- `t = torch.rand(3,3,3)` - create a 3x3x3 tensor of random values
- `t = torch.arange(10)` - create a tensor of 10 elements that go from 0 to 10
- Create a tensor from an image:
  ```
  from PIL import Image
  import numpy
  
  img = Image.open('cat.jpg')
  torch.as_tensor(np.array(img)) // as_tensor likes numpy arrays
  ```

#### Basic tensor operations
Given tensors defined as:
`a = torch.ones(10)`
`b = torch.arange(10)`

- `a + b` - add the tensors (same for subtract, multiply, divide, and powers)

**Tensors must be the same size for these operations to work**

#### Linear algebra
- `t.mT` - transpose the tensor
- `t.permute(1, 2, 0)` - permute the tensor (like transpose but for multiple dimensions)

#### Other useful functions
- `torch.as_tensor(np.array(var))` - convert array into tensor
- `t.view(2,3)` - view a tensor as a different shape (this is for a size 6 tensor)
- `t.reshape(2,3)` - copies underlying data and view as a different shape
- `%timeit` - times whatever operation follows
- `.to('cuda')` - allows you to offload computation to a GPU

In [1]:
import torch

t = torch.tensor([1, 2, 3])
print(t.ndim)

1


## Manipulating Tensors

### Reshaping

In [27]:
print("We will start by creating a tensor with six elements..\n")

a = torch.arange(6)
print(f"We can see that the shape of the tensor is just 6 using a.shape: \n\t{a.shape}\n")
print(f"But we can also view this tensor as 2x3 using a.view(2,3) or a.reshape(2,3): \n\t{a.view(2,3)}")

print("\nThe difference between the two is that view will not copy the underlying data, "
      "it will simply change the shape. Reshape will copy the underlying data.")

print(f"\n\nNotice the size of a remains 6 after using view(): {a.size()}")
a.reshape(2,3)
print(f"And after using reshape: {a.size()}")

We will start by creating a tensor with six elements..

We can see that the shape of the tensor is just 6 using a.shape: 
	torch.Size([6])

But we can also view this tensor as 2x3 using a.view(2,3) or a.reshape(2,3): 
	tensor([[0, 1, 2],
        [3, 4, 5]])

The difference between the two is that view will not copy the underlying data, it will simply change the shape. Reshape will copy the underlying data.


Notice the size of a remains 6 after using view(): torch.Size([6])
And after using reshape: torch.Size([6])


### Transposing

In [44]:
print("Next we will look at some basic linear algebra, starting with transposing..\n")

a = torch.rand(2,3)
print(f"Starting with a random tensor of size 2, 3: {a.size()}")
print(f"{a}")
print(f"\nThe transpose:\n{a.mT}")

Next we will look at some basic linear algebra, starting with transposing..

Starting with a random tensor of size 2, 3: torch.Size([2, 3])
tensor([[0.9049, 0.9831, 0.1013],
        [0.7978, 0.7693, 0.9045]])

The transpose:
tensor([[0.9049, 0.7978],
        [0.9831, 0.7693],
        [0.1013, 0.9045]])


### Permuting

In [52]:
print("And we will also look at permutations, which allow us to transpose along any dimension\n")

a = torch.rand(2,3,4)
print(f"Starting with a random tensor of size 2, 3, 4: {a.size()}")
print(f"{a}")
print(f"\nThe transpose using permutation:\n{a.permute(1,2,0)}")
print(f"\nThe shape of the permuted tensor is: {a.permute(1,2,0).shape}")
print("\na.permute(1,2,0) takes the data originally stored in dimension 3 and stores it in \ndimension 2, 2 to 1, and 1 to 3. So we can see how we sort of manually reordered \nthe dimensions.")

And we will also look at permutations, which allow us to transpose along any dimension

Starting with a random tensor of size 2, 3, 4: torch.Size([2, 3, 4])
tensor([[[0.4102, 0.6469, 0.8001, 0.8156],
         [0.1233, 0.5795, 0.5240, 0.3115],
         [0.0099, 0.0912, 0.5016, 0.7316]],

        [[0.4813, 0.9499, 0.1991, 0.5247],
         [0.0276, 0.6686, 0.7974, 0.6779],
         [0.0204, 0.6619, 0.7100, 0.4141]]])

The transpose using permutation:
tensor([[[0.4102, 0.4813],
         [0.6469, 0.9499],
         [0.8001, 0.1991],
         [0.8156, 0.5247]],

        [[0.1233, 0.0276],
         [0.5795, 0.6686],
         [0.5240, 0.7974],
         [0.3115, 0.6779]],

        [[0.0099, 0.0204],
         [0.0912, 0.6619],
         [0.5016, 0.7100],
         [0.7316, 0.4141]]])

The shape of the permuted tensor is: torch.Size([3, 4, 2])

a.permute(1,2,0) takes the data originally stored in dimension 3 and stores it in 
dimension 2, 2 to 1, and 1 to 3. So we can see how we sort of manually re

### The Singleton Dimension

The singleton dimension is a dimension with size 1. It acts as a placeholder, and can be expanded to match the size of another tensors corresponding dimension during **broadcasting**. 

With broadcasting, tensors with the following dimensions can have basic math operations performed on them:
- (3,3,1,3,1,1)
- (3,3,5,3,4,7)
  
This is because the singleton dimensions will broadcast (copy) its existing values across it as many times as needed, since the dimension is effectively empty.

**Using the singleton dimension in this way is essentially using the outer product**

### Conceptualizing higher dimensional tensors
Consider a tensor of shape (3,3,5,3,4,7), this tensor has six axes:
- Axis 0 (size 3): You have 3 blocks (like 3 independent groups).
- Axis 1 (size 3): Each block contains 3 matrices.
- Axis 2 (size 5): Each matrix consists of 5 slices.
- Axis 3 (size 3): Each slice contains 3 rows.
- Axis 4 (size 4): Each row consists of 4 columns.
- Axis 5 (size 7): Each column has 7 values.

We can visualize this like a warehouse:
- There are 3 floors → axis 0
- Each floor has 3 aisles → axis 1
- Each aisle has 5 shelves → axis 2
- Each shelf has 3 rows of bins → axis 3
- Each row of bins contains 4 columns of bins → axis 4
- Each bin holds 7 individual items → axis 5

No real standard naming exists outside of row and column for 2D tensors, slice or depth is often used for 3D tensors. For 4D, in ML we often use batch, channel, or feature map. Beyond that nobody really works with the tensors explicitly, so best to refer to them by axis beyond 4 dimensions.

**Each dimension of a tensor can be thought of as a container for the next**

### Adding Singleton Dimension

In [91]:
a = torch.arange(6)
print(f"Starting with a random tensor of size 6\n\n{a}\n\nWe can add a dimension of size 1 using"
      f" the keyword None and indexing where we want this \ndimension to be. For example,"
      f"a[None] puts the dimension at the beginning, and \na[:,None] at the end. \n\n{a[None]}"
      f"\n\nNotice the shape of a with the dimension added: {a[None].shape}")

# Another example with more dimensions
print("\nAnother example using a = torch.arange(6).view(3,2), a[None, :, :, None].shape:")
a = torch.arange(6).view(3,2)
print (a[None, :, :, None].shape)

print("\nAnother way to index the very end of a tensor is a[..., None].")

print("\nP.S. Apparently a.unsqueeze(idx) works too.")

Starting with a random tensor of size 6

tensor([0, 1, 2, 3, 4, 5])

We can add a dimension of size 1 using the keyword None and indexing where we want this 
dimension to be. For example,a[None] puts the dimension at the beginning, and 
a[:,None] at the end. 

tensor([[0, 1, 2, 3, 4, 5]])

Notice the shape of a with the dimension added: torch.Size([1, 6])

Another example using a = torch.arange(6).view(3,2), a[None, :, :, None].shape:
torch.Size([1, 3, 2, 1])

Another way to index the very end of a tensor is a[..., None].

P.S. Apparently a.unsqueeze(idx) works too.


### Removing Singleton Dimension

In [90]:
a = torch.arange(6).view(3,2,1,1)

print(f"We can also remove singleton dimensions using the squeeze function. Let's start\nwith"
      f" a tensor of the following shape: \n\n{a.shape}. \n\nWe can certainly remove a dimension by setting"
      f"it to 0 using: a[..., 0], but this is \nnot ideal.\n\nWe can use squeeze to completely"
      f" remove the last dimension with a.squeeze(idx), where \nidx is the index of the dimension"
      f"to remove. a with the last dimension removed: \n\n{a.squeeze(-1).shape}")

print("\nSqueeze, when called with no arguments, removes all singleton dimensions in the"
      " tensor. \nThis is very dangerous and should never be done.")

We can also remove singleton dimensions using the squeeze function. Let's start
with a tensor of the following shape: 

torch.Size([3, 2, 1, 1]). 

We can certainly remove a dimension by settingit to 0 using: a[..., 0], but this is 
not ideal.

We can use squeeze to completely remove the last dimension with a.squeeze(idx), where 
idx is the index of the dimensionto remove. a with the last dimension removed: 

torch.Size([3, 2, 1])

Squeeze, when called with no arguments, removes all singleton dimensions in the tensor. 
This is very dangerous and should never be done.


### Broadcasting

In [100]:
print(f"Using broadcasting, we can add (or perform other basic math) on multiple tensors"
      f" that are \nnot the same size, but the size difference is on the axis for which"
      f" one of the tensors \nhas a singleton dimension.\n\n")

a = torch.rand(4,1)
b = torch.rand(4,5)

print(f"a's shape = {a.shape}")
print(f"b's shape = {b.shape}")
print("\n")
print(a + b)

print("\n\nAn illustrative example:\n")

a = torch.arange(4).view(4,1)
b = torch.arange(5).view(1,5) * 10

print(f"a's shape = {a.shape}")
print(f"b's shape = {b.shape}")
print("\n")
print(a + b)

Using broadcasting, we can add (or perform other basic math) on multiple tensors that are 
not the same size, but the size difference is on the axis for which one of the tensors 
has a singleton dimension.


a's shape = torch.Size([4, 1])
b's shape = torch.Size([4, 5])


tensor([[1.4773, 1.1399, 0.6014, 1.3290, 1.5759],
        [1.1367, 1.5279, 1.3871, 1.5639, 0.7121],
        [1.5028, 1.2352, 1.5918, 1.1748, 1.4737],
        [0.6673, 1.1397, 1.2079, 1.0231, 1.1857]])


An illustrative example:

a's shape = torch.Size([4, 1])
b's shape = torch.Size([1, 5])


tensor([[ 0, 10, 20, 30, 40],
        [ 1, 11, 21, 31, 41],
        [ 2, 12, 22, 32, 42],
        [ 3, 13, 23, 33, 43]])


In [103]:
print("Here is a clever use of broadcasting to calculate pairwise squared Euclidean \ndistances"
     " between rows of x\n\n")

x = torch.randn(10,2)
d = torch.zeros(10,10)

d = (x[:, None, :] - x[None, :, :]).pow(2).sum(-1)

print(d)

Here is a clever use of broadcasting to calculate pairwise squared Euclidean 
distances between rows of x


tensor([[ 0.0000,  7.4260, 13.2203, 22.2614, 13.7267, 10.3759,  2.8846, 11.7383,
         15.2080, 19.5554],
        [ 7.4260,  0.0000,  0.8759,  4.7477,  1.7006,  0.2576,  3.3030,  0.7801,
          1.6822,  4.3813],
        [13.2203,  0.8759,  0.0000,  1.7125,  0.5600,  0.1841,  6.2261,  0.1886,
          0.8633,  1.9291],
        [22.2614,  4.7477,  1.7125,  0.0000,  1.0380,  2.9086, 10.8271,  1.7488,
          3.7939,  0.3338],
        [13.7267,  1.7006,  0.5600,  1.0380,  0.0000,  0.8927,  5.2269,  0.1936,
          2.7968,  0.6276],
        [10.3759,  0.2576,  0.1841,  2.9086,  0.8927,  0.0000,  4.7240,  0.2576,
          0.9868,  2.8763],
        [ 2.8846,  3.3030,  6.2261, 10.8271,  5.2269,  4.7240,  0.0000,  4.6168,
          9.6616,  8.3670],
        [11.7383,  0.7801,  0.1886,  1.7488,  0.1936,  0.2576,  4.6168,  0.0000,
          1.7984,  1.4749],
        [15.2080,  1

In [109]:
print("Here is a clever use of broadcasting to calculate the maximum distance between "
      "\ntwo points in x\n\n")

x = torch.randn(10,2)
d = torch.zeros(10,10)

d = (x[:, None, :] - x[None, :, :]).pow(2).sum(-1)

print(d.max(), (d.argmax() // 10, d.argmax() %10))

Here is a clever use of broadcasting to calculate the maximum distance between 
two points in x


tensor(17.9494) (tensor(3), tensor(8))
