In [1]:
import torch


# Tensors

`Tensors` are the backbone of neural network programming. The **inputs**, **outputs**, and **tranformations** are all represented using `tensors`. 

By definition, a `tensor` is a mathematical object that holds $n$-dimensional data.  

The term itself is a generalization of multiple terms referring to the same concept. People will refer to them differently depending on their background:

- Mathematicians: `scalar, vector, matrix`
- Computer scientists: `0d-array, 1d-array, 2d-array`

-----------------------------

# Tensor creation

All the **factory functions** have an optional argument `dtype`, where we can directly specify the tensor type.

(**Note**: factory functions accept parameter inputs and returns a particular type of object)

## Tensor creation with data
**The first two ways `create a copy of the data`. This means they create tensors that occupy different memory addresses**

1. `torch.Tensor(data)` - This is the class constructor for tensors. **All tensors are instances of this class.** Calling the constructor directly is not typically done because of its limitations (no dtype arg, etc...)

2. `torch.tensor(data)` - The **main** factory function that creates Tensor objects for us. `<--Generally, this is the go-to option for making tensors`

**The last two ways `share the same memory as the data itself`. This can be memory efficient, but can have unintended side affects (`mutating data == mutating tensor (and vice versa)`)**

3. `torch.as_tensor(data)` - Also a factory function. Creates a tensor object.

4. `torch.from_numpy(data)` - Also a factory function. Creates a tensor from a numpy array.


**We can also convert a tensor into a numpy array by using the `.numpy` tensor method**
```python
>>> torch.tensor([[1,2,3]]).numpy()
array([[1,2,3]])
```



## Tensor creation without data

**There exists many methods of instantiating tensors without data, such as (but not limited to)**:

-------
1. `torch.eye()` - returns the identity matrix
```python
>>> torch.eye(2)
tensor([[1., 0.],
        [0., 1.]])
```
-------
2. `torch.zeros()` - returns a tensor of zeros of specified shape
```python
>>> torch.zeros(2,2)
tensor([[0., 0.],
        [0., 0.]])
```
-------
3. `torch.ones()` - returns a tensor of ones of specified shape
```python
>>> torch.ones(3,3)
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.,]])
```
-------
4. `tensor.fill_()` - fills an already existing tensor with a specified value
```python
>>> t = torch.ones(2,2)
>>> t.fill_(5)
tensor([[5,5],
        [5,5]])
```
-------
5. `torch.rand()` - returns a tensor of random floats from a `uniform distribution` (a flat line)
```python
>>> torch.rand(2,2)
tensor([[0.9219, 0.5228],
        [0.8266, 0.2281]])
```
-------
6. `torch.randn()` - returns a tensor of random floats from a `normal distribution` (a gaussian curve)
```python
>>> torch.randn(2,2)
tensor([[ 0.8713, -0.0966],
        [ 2.4263,  0.3114]])
```
-------
7. `torch.randint()` - returns a tensor of random integers.
```python
>>> torch.randint(1,100, (2,2))
tensor([[ 7, 32],
        [19, 87]])
```
-------
8. `torch.arange()` - returns a tensor of integers from a given span
```python
>>> torch.arange(1,10,2)
tensor([1, 3, 5, 7, 9])
```
-------

# Pytorch tensor attributes

There are numerous pytorch-specific attributes that torch tensors have:
```python
>>> t = torch.tensor([[1,2,3]])
```
## .dtype 
Returns the **data type** of the tensor
Tensor operations must occur with **tenors of the same data type**
```python
>>> t.dtype
torch.float32
```
## .device 
Returns which device (cpu or cuda) the tensor's data is allocated

Tensor operations must occur with **tensors on the same device**
```python
>>> t.device
cpu
>>> device = torch.device("cuda:0")
device(type="cuda", index=0) # cuda supports multiple devices, hence the indexing

>>> tensor = torch.tensor([[1,2,3]], device=device)
tensor([[1,2,3]], device="cuda:0")

```
## .layout

Returns how out tensor's data is stored in memory
```python
>>> tensor = torch.tensor([[1,2,3]])
>>> tensor.layout
torch.strided
```


# Rank, Axes, and Shape

## Rank

The `rank` of a tensor refers to the number of dimensions present within the tensor. Suppose we have a rank-2 tensor. This means the following:
- We have a **matrix**
 - We have a 2d-array
- we have a 2d-tensor
    
The `rank` tells us how many **indices** are required to access a specific data element in the tensor:
```python
>>> a = torch.tensor([1,2,3,4]) 
>>> a[0] <--- Rank 1 Tensor
1 

>>> b = torch.tensor([[1,2,3,4], [5,6,7,8]]) 
>>> b[1][0]
5 <--- Rank 2 Tensor 
```

## Axes

An `axis` of a tensor is a specific dimension. For example, a **rank-2** tensor has 2 dimensions, or 2 `axes`

Elements are said to **exist** or **run** along a particular axis. The `length of each axis` tells us **how many indices are available along the axis**

Consider the tensor: 
```python
>>> t = torch.tensor([[1,2,3], [4,5,6],[7,8,9]])

```
Each element along the first axis is an **array** (1d-tensor), and each element along the second axis is a **scalar** (0d-tensor)

```python
>>> t[0]
tensor([1,2,3])
>>> t[1]
tensor([4,5,6])
>>> t[2]
tensor([7,8,9])
>>> t[0][0]
1
>>> t[1][0]
4
>>> t[2][0]
7
>>> t[0][1]
2
>>> t[1][1]
5
>>> t[2][1]
8
>>> t[0][2]
3
>>> t[1][2]
6
>>> t[2][2]
9
```

## Shape

The `shape` of a tensor is determined by the length of each axis. The `shape` tells us how many indices are available along each axis.

**The `rank` of a tensor is equal to the length of its `shape`**

Consider the tensor: 
```python
>>> t = torch.tensor([[1,2,3,4],[4,5,6,7],[7,8,9,10]])
>>> t.shape
torch.Size([3,4])
>>> t.size()
torch.Size([3,4])
>>> len(t.shape) # <--- Gives us the rank
2
```

In the above tensor, we have **3 rows** and **4 columns**. Therefore, we can index a total of 3 times for each axis.

**The `product` of the `shape values`** == **# of `elements in tensor`**
```python
>>> torch.tensor(t.shape).prod()
tensor(12)
```
We can check **how many elements are in a tensor** with the `.numel` method
```python
>>> t.numel()
12
```

# Tensor operations

## Reshaping operations

Often times we need to **`change the shape`** of our tensors **without mutating the data within them**. This is because as data propogates through a NN, different layers expect different shapes from tensors.

-------
### Reshape

The `.reshape()` method lets us do this. We can take a tensor and reshape it into a specified shape **as long as the product of the reshaping values == the amount of items in the tensor**.

(Note: the `.view()` method essentially does the same thing, but has slightly different behaviour. In general, use `.reshape()` instead of view.)
```python
>>> t = torch.tensor([[1,1,1,1], [2,2,2,2],[3,3,3,3]])

>>> t.reshape(1,12)
tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])

>>> t.reshape(2,6)
tensor([[1, 1, 1, 1, 2, 2],
        [2, 2, 3, 3, 3, 3]])

>>> t.reshape(6,2)
tensor([[1, 1],
        [1, 1],
        [2, 2],
        [2, 2],
        [3, 3],
        [3, 3]])
>>> t.reshape(3,4)
tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])

>>> t.reshape(4,3)
tensor([[1, 1, 1],
        [1, 2, 2],
        [2, 2, 3],
        [3, 3, 3]])

>>> t.reshape(12,1)
tensor([[1],
        [1],
        [1],
        [1],
        [2],
        [2],
        [2],
        [2],
        [3],
        [3],
        [3],
        [3]])
```
------
### Squeeze and unsqueeze

`.squeeze()` removes all axes that have a length of 1
```python
>>> t.reshape(1,12).squeeze()
tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

```

`.unsqueeze()` adds a dimension with length of 1 to a specified axis
```python
>>> t.reshape(1,12).squeeze().unsqueeze(dim=0)
tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])
```
------

### Flatten

To `flatten` a tensor means to turn it into a lower rank tensor, or 1-dimensional tensor. Flattening must be done when transitioning from a convolutional layer to a fully-connected one.

Pytorch has a `.flatten()` method that does just that, but here is an implementation using previous methods:

```python

def flatten(t):
        t = t.reshape(1,-1) # the -1 tells reshape to figure out what the value should be based on the other element and # of elements in the tensor (think python [-1] indexing)
        t = t.squeeze()
        return t

```

## Element-wise operations
`Element-wise operations` are operations between two tensors that operate over elements that share corresponding indices between the two tensors.

Other names for element wise include: `component-wise` and `point-wise`

**Tensors must have the `same shape` to be operated upon.**

Basic arithmetic operations between tensors are element-wise: **addition, subtraction, multiplication, division**

```python

>>> t1 = torch.tensor([[1,2], [3,4])
>>> t2 = torch.tensor([[9,8], [7,6]])

>>> t1 + t2
torch.tensor([[10,10], [10,10]])

>>> torch.add(t1,t2)
torch.tensor([[10,10], [10,10]])
```

In the code snippet above, all the other arithmetic operations follow the same syntax.

### Broadcasting

**`Broadcasting`** means reshaping a smaller tensor to match the shape of a larger one.

Often times, we need to perform an arithmetic operation `between a tensor and a scalar`.
 
In the code snippet below, the scalar values (a.k.a 0-d tensors) are being `broadcasted` into a 2-dimensional tensor before the operation is performed
```python
>>> t1 - 2
tensor([[-1,0], [1,2]])

>>> t1.div(2)
tensor([[0.5000, 1.0000], [1.5000, 2.0000]])
```


We can also use numpy's `np.broadcast_to()` function to check what a scalar value is being broadcasted to.
```python
>>> np.broadcast_to(2, t1.shape)
array([[2,2], [2,2]])
```

**`Comparison operations`** are also **element wise**. When comparing a tensor and a scalar value, the scalar is **broadcasted** and then compared.

## Reduction operations

A **`Reduction`** operation on a tensor is an operation that reduces the number of elements contained within the tensor.

`Reduction operations` let us perform operations on a **single tensor**

```python
>>> t = torch.tensor([[0,1,0], [2,0,2], [0,3,0]])

>>> t.sum()
tensor(8)

>>> t.prod()
tensor(0)

>>> t.numel() # number of elements
9

>>> t.sum().numel() < t.numel() # this case: 1 < 9>
True

>>> t.mean()
tensor(0.8889)

>>> t.std()
tensor(1.1667)

>>> t.max()
tensor(3)

>>> t.argmax() # returns index of highest value. Flattens the tensor before doing this. 
tensor(7)

>>> t.min()
tensor(0)

>>> t.argmin()
tensor(0)
```
To get the **scalar value** from a reduced tensor, we can use the `.item()` tensor method
```python

>>> t.max().item()
3
```

We can also specify which **axes to perform the reduction**.
```python

>>> t = torch.tensor([[1,1,1,1], 
                      [2,2,2,2], 
                      [3,3,3,3]])

>>> t.sum(dim=0)
tensor([6, 6, 6, 6])

>>> t.sum(dim=1)
tensor([4, 8, 12])

>>> t[0].sum()
tensor(4)

>>> t[1].sum()
tensor(8)

>>> t[2].sum()
tensor(12)

>>> t.max(dim=0)
torch.return_types.max(values=tensor([3, 3, 3, 3]), indices=tensor([2, 2, 2, 2]))

>>> t.argmax(dim=0)
tensor([2, 2, 2, 2])

>>> t.max(dim=1)
torch.return_types.max(values=tensor([1, 2, 3]), indices=tensor([0, 0, 0]))

>>> t.argmax(dim=1)
tensor([0, 0, 0]) # index 0 of all three arrays

```


