In [1]:
import torch

A tensor represents a (possibly multi-dimensional) array of numerical values. With one axis, a tensor is called a *vector*. With two axes, a tensor is called a *matrix*. With *k*>2 axes, we drop the specialised names and just refer to the object as a  $k^\mathrm{th}$ *order tensor*. 

In [2]:
x = torch.arange(12, dtype=torch.float32)
x

tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

We can access a tensor's shape (the length along each axis) by inspecting its shape property.

In [3]:
x.shape

torch.Size([12])

If we just want to know the total number of elements in a tensor, i.e. the product of all the shape elements, we can inspect its size. Because we are dealing with a vector here, the single element of its shape is identical to its size.

In [4]:
x.numel()

12

To change the shape of a tensor without altering either the number of elements or their values, we can invoke the reshape function. For example, we can transform our tensor, x, from a row vector with shape (12,) to a matrix with shape (3, 4). This new tensor contains the exact same values, but views them as a matrix organized as 3 rows and 4 columns. To reiterate, although the shape has changed, the elements have not. Note that the size is unaltered by reshaping.

In [5]:
X = x.reshape(3, 4)
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

Note that specifying every shape component
to `reshape` is redundant.
Because we already know our tensor's size,
we can work out one component of the shape given the rest.
For example, given a tensor of size $n$
and target shape ($h$, $w$),
we know that $w = n/h$.
To automatically infer one component of the shape,
we can place a `-1` for the shape component
that should be inferred automatically.
In our case, instead of calling `x.reshape(3, 4)`,
we could have equivalently called `x.reshape(-1, 4)` or `x.reshape(3, -1)`.

Practitioners often need to work with tensors
initialized to contain all zeros or ones.
[**We can construct a tensor with all elements set to zero**] (~~or one~~)
and a shape of (2, 3, 4) via the `zeros` function.

In [6]:
torch.zeros((2, 3, 4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

Similarly, we can create tensors with each element set to 1 as follows:

In [7]:
torch.ones((2, 3, 4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

Often, we want to randomly sample the values for each element in a tensor from some probability distribution. For example, when we construct arrays to serve as parameters in a neural network, we will typically initialiaze their values randomly. The following snippet creates a tensor with shape (3, 4). Each of its elements is randomly sampled from a standard Gaussian (normal) distribution with a mean of 0 and a standard deviation of 1.

In [8]:
torch.randn(3, 4)

tensor([[-0.7613, -0.2004,  0.2136, -0.2919],
        [-0.6873,  0.9467,  1.3393, -0.2268],
        [-0.8859, -0.9897,  0.3200, -1.1477]])

We can also specify the exact values for each element in the desired tensor by supplying a Python list (or list of lists) containing the numerical values. Here, the outermost list corresponds to axis 0, and the inner list to axis 1.

In [9]:
torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

### Indexing and Slicing

As with  Python lists, we can access tensor elements by indexing (starting with 0). To access an element based on its position relative to the end of the list, we can use negative indexing.
Finally, we can access whole ranges of indices via slicing (e.g., `X[start:stop]`), where the returned value includes the first index (`start`) *but not the last* (`stop`). Finally, when only one index (or slice) is specified for a $k^\mathrm{th}$ order tensor, it is applied along axis 0.
Thus, in the following code, [**`[-1]` selects the last row and `[1:3]` selects the second and third rows**].

In [11]:
X[-1], X[1:3]

(tensor([ 8.,  9., 10., 11.]),
 tensor([[ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]]))

Beyond assigning a value to the entire `Variable` , we can write elements of a `Variable` by specifying indices

In [12]:
X[1, 2] = 17
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5., 17.,  7.],
        [ 8.,  9., 10., 11.]])

If we want [**to assign multiple elements the same value, we apply the indexing on the left-hand side of the assignment operation.**] For instance, `[:2, :]`  accesses the first and second rows, where `:` takes all the elements along axis 1 (column). While we discussed indexing for matrices, this also works for vectors and for tensors of more than 2 dimensions.

In [13]:
X[:2, :] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

## Operations

Now that we know how to construct tensors
and how to read from and write to their elements,
we can begin to manipulate them
with various mathematical operations.
Among the most useful tools 
are the *elementwise* operations.
These apply a standard scalar operation
to each element of a tensor.
For functions that take two tensors as inputs,
elementwise operations apply some standard binary operator
on each pair of corresponding elements.
We can create an elementwise function 
from any function that maps 
from a scalar to a scalar.

In mathematical notation, we denote such
*unary* scalar operators (taking one input)
by the signature 
$f: \mathbb{R} \rightarrow \mathbb{R}$.
This just means that the function maps
from any real number onto some other real number.
Most standard operators can be applied elementwise
including unary operators like $e^x$.

In [14]:
torch.exp(x)

tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,
        162754.7969, 162754.7969, 162754.7969,   2980.9580,   8103.0840,
         22026.4648,  59874.1406])

Likewise, we denote *binary* scalar operators, which map pairs of real numbers to a (single) real number via the signature $f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$. Given any two vectors $\mathbf{u}$ and $\mathbf{v}$ *of the same shape*, and a binary operator $f$, we can produce a vector $\mathbf{c} = F(\mathbf{u},\mathbf{v})$
by setting $c_i \gets f(u_i, v_i)$ for all $i$, where $c_i, u_i$, and $v_i$ are the $i^\mathrm{th}$ elements of vectors $\mathbf{c}, \mathbf{u}$, and $\mathbf{v}$. Here, we produced the vector-valued $F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d$ by *lifting* the scalar function to an elementwise vector operation. The common standard arithmetic operators for addition (`+`), subtraction (`-`),  multiplication (`*`), division (`/`), and exponentiation (`**`) have all been *lifted* to elementwise operations for identically-shaped tensors of arbitrary shape.


In [15]:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([ 2.,  4.,  8., 16.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

In addition to elementwise operations, we can also perform linear algebra operations, such as dot product and matrix manipulations. We will elaborate on these shortly in `sec_linear-algebra`.

We can also [***concatenate* multiple tensors together,**] stacking them end-to-end to form a larger tensor. We just need to provide a list of tensors and tell the system along which axis to concatenate. The example below shows what happens when we concatenate two matrices along rows (axis 0) vs. columns (axis 1).
We can see that the first output's axis-0 length ($6$) is the sum of the two input tensors' axis-0 lengths ($3 + 3$); while the second output's axis-1 length ($8$) is the sum of the two input tensors' axis-1 lengths ($4 + 4$).

In [17]:
X = torch.arange(12, dtype=torch.float32).reshape((3, 4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [ 2.,  1.,  4.,  3.],
         [ 1.,  2.,  3.,  4.],
         [ 4.,  3.,  2.,  1.]]),
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
         [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
         [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]]))

Sometimes, we want to [**construct a binary tensor via *logical statements*.**] 
Take `X == Y` as an example.For each position `i, j`, if `X[i, j]` and `Y[i, j]` are equal, then the corresponding entry in the result takes value `1`, otherwise it takes value `0`.

In [18]:
X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

**Summing all the elements in the tensor** yields a tensor with only one element

In [19]:
X.sum()

tensor(66.)

## Broadcasting

By now, you know how to perform elementwise binary operations on two tensors of the same shape.  Under certain conditions, even when shapes differ, we can still **perform elementwise binary operations by invoking the *broadcasting mechanism*.** Broadcasting works according to the following two-step procedure: (i) expand one or both arrays by copying elements along axes with length 1 so that after this transformation, the two tensors have the same shape; (ii) perform an elementwise operation on the resulting arrays.

In [23]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

(tensor([[0],
         [1],
         [2]]),
 tensor([[0, 1]]))

Since `a` and `b` are $3\times1$ and $1\times2$ matrices, respectively, their shapes do not match up.
Broadcasting produces a larger $3\times2$ matrix by replicating matrix `a` along the columns and matrix `b` along the rows before adding them elementwise.

In [24]:
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

## Saving Memory

**Running operations can cause new memory to be allocated to host results.** For example, if we write `Y = X + Y`, we dereference the tensor that `Y` used to point to and instead point `Y` at the newly allocated memory. We can demonstrate this issue with Python's `id()` function, which gives us the exact address of the referenced object in memory. Note that after we run `Y = Y + X`, `id(Y)` points to a different location. That's because Python first evaluates `Y + X`, allocating new memory for the result and then points `Y` to this new location in memory.

In [25]:
before = id(Y)
Y = Y + X
id(Y) == before

False

This might be undesirable for two reasons. First, we do not want to run around allocating memory unnecessarily all the time. In machine learning, we often have hundreds of megabytes of parameters and update all of them multiple times per second. Whenever possible, we want to perform these updates *in place*. Second, we might point at the same parameters from multiple variables. If we do not update in place, we must be careful to update all of these references, lest we spring a memory leak or inadvertently refer to stale parameters.

Fortunately, (**performing in-place operations**) is easy. We can assign the result of an operation to a previously allocated array `Y` by using slice notation: `Y[:] = <expression>`. To illustrate this concept, 
we overwrite the values of tensor `Z`, after initializing it, using `zeros_like`, to have the same shape as `Y`.

In [26]:
Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

id(Z): 139897137255904
id(Z): 139897137255904


**If the value of `X` is not reused in subsequent computations, we can also use `X[:] = X + Y` or `X += Y` to reduce the memory overhead of the operation.**

In [27]:
before = id(X)
X += Y
id(X) == before

True

## Conversion to Other Python Objects

**Converting to a NumPy tensor (`ndarray`)**, or vice versa, is easy. The torch Tensor and numpy array will share their underlying memory, and changing one through an in-place operation will also change the other.

In [28]:
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)

To **convert a size-1 tensor to a Python scalar**, we can invoke the `item` function or Python's built-in functions.

In [30]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

## Summary

 * The tensor class is the main interface for storing and manipulating data in deep learning libraries.
 * Tensors provide a variety of functionalities including construction routines; indexing and slicing; basic mathematics operations; broadcasting; memory-efficient assignment; and conversion to and from other Python objects.

## Exercises

1. Run the code in this section. Change the conditional statement `X == Y` to `X < Y` or `X > Y`, and then see what kind of tensor you can get.
1. Replace the two tensors that operate by element in the broadcasting mechanism with other shapes, e.g., 3-dimensional tensors. Is the result the same as expected?

In [34]:
X, Y

(tensor([[ 2.,  3.,  8.,  9.],
         [ 9., 12., 15., 18.],
         [20., 21., 22., 23.]]),
 tensor([[ 2.,  2.,  6.,  6.],
         [ 5.,  7.,  9., 11.],
         [12., 12., 12., 12.]]))

In [33]:
X < Y, X > Y

(tensor([[False, False, False, False],
         [False, False, False, False],
         [False, False, False, False]]),
 tensor([[False,  True,  True,  True],
         [ True,  True,  True,  True],
         [ True,  True,  True,  True]]))

In [49]:
X = torch.arange(30, dtype=torch.float32).reshape((2, 5, 3))
Y = torch.arange(15, dtype=torch.float32).reshape((1, 5, 3))
X, Y

(tensor([[[ 0.,  1.,  2.],
          [ 3.,  4.,  5.],
          [ 6.,  7.,  8.],
          [ 9., 10., 11.],
          [12., 13., 14.]],
 
         [[15., 16., 17.],
          [18., 19., 20.],
          [21., 22., 23.],
          [24., 25., 26.],
          [27., 28., 29.]]]),
 tensor([[[ 0.,  1.,  2.],
          [ 3.,  4.,  5.],
          [ 6.,  7.,  8.],
          [ 9., 10., 11.],
          [12., 13., 14.]]]))

In [50]:
X + Y

tensor([[[ 0.,  2.,  4.],
         [ 6.,  8., 10.],
         [12., 14., 16.],
         [18., 20., 22.],
         [24., 26., 28.]],

        [[15., 17., 19.],
         [21., 23., 25.],
         [27., 29., 31.],
         [33., 35., 37.],
         [39., 41., 43.]]])