# numpy.pad

`numpy.pad` is a function that pads an array ([documentation](https://numpy.org/doc/stable/reference/generated/numpy.pad.html)). Padding an array means to increase the size of one or more dimensions of the array and fill the new elements with values, often called "padded values".

There are many different ways we can choose to pad an array. We could add different numbers padded elements on the start and end of a given dimension. We could pad just one dimension, leaving the rest as they are. We could use many different strategies to determine what the padded values will be. One of the most common ways to pad an array is to simply add zero-value padding of constant width to the left and right sides of all dimensions. This is the default behavior of `numpy.pad`:

In [1]:
import numpy as np

a = np.ones((2, 3))
a_padded = np.pad(a, 2)

print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[[1. 1. 1.]
 [1. 1. 1.]]
a_padded:
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]


We can pad arrays with more than two dimensions as well:

In [2]:
a = np.ones((2, 2, 2))
a_padded = np.pad(a, 1)
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')

a: 
[[[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]]
a_padded:
[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 1. 1. 0.]
  [0. 1. 1. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 1. 1. 0.]
  [0. 1. 1. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]


Let's look at some of the different ways we can call `numpy.pad`.

## `pad_width`

`numpy.pad`'s `pad_width` argument can be a single `int`, as we saw above. However, it can also be a sequence of numbers. If we give a pair of numbers, we can apply different pad widths to the start and end of all dimensions. The padding we add to the start of a dimension is often called "before-padding", and the padding we add to the end of a dimension is often called "after-padding". In the following example, we add after-padding of width 3 to each dimension and we don't add any before-padding.

In [3]:
a = np.ones((2, 2))
a_padded = np.pad(a, (0, 3))
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')

a: 
[[1. 1.]
 [1. 1.]]
a_padded:
[[1. 1. 0. 0. 0.]
 [1. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


We can specify all different before- and after-padding widths for each dimension by giving a sequence of pairs for the `pad_width` argument. The Nth pair of numbers will be applied to the Nth dimension of the array.

In [4]:
a = np.ones((2, 2))
a_padded = np.pad(a, ((0, 1), (2, 3)))
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')

a: 
[[1. 1.]
 [1. 1.]]
a_padded:
[[0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]


## `mode`

The `mode` argument lets you pick different ways to determine the pad values for each new element. Every mode is compatible with every valid `pad_width`.

### `mode='constant'`

This option uses constant values to fill the padding. This is the default option for `mode`, and the default behavior of this mode is to use `0` for every pad value, as we saw above. We can specify a different pad value with the optional `constant_values` kwarg.

In [5]:
a = np.ones((2, 2))
a_padded = np.pad(a, 2, constant_values=2)
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')

a: 
[[1. 1.]
 [1. 1.]]
a_padded:
[[2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2.]
 [2. 2. 1. 1. 2. 2.]
 [2. 2. 1. 1. 2. 2.]
 [2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2.]]


We can also specify a pair of numbers for `constant_values`. The first number is the value used for the before-padding and the second number is used for the after-padding. These are applied to all dimensions.

Notice a conflict here--pad elements on corners can sometimes be the before-padding of one dimension and the after-padding of another dimension. To resolve this conflict, the padding is logically applied to each dimension in order, starting from dim 0. So the pad values of the higher dimensions override the pad values of lower dimensions. For instance, in the following example, the element `a_padded[0][-1]` (top right corner) is before-padding in dimension 0, but it's after-padding in dimension 1. Dimension 1 overrides dimension 0, since it is a higher dimension, so element `a_padded[0][-1]` is assigned the after-pading value we that specified, `3`.


In [6]:
a = np.ones((2, 2))
a_padded = np.pad(a, 2, constant_values=(2, 3))
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')


a: 
[[1. 1.]
 [1. 1.]]
a_padded:
[[2. 2. 2. 2. 3. 3.]
 [2. 2. 2. 2. 3. 3.]
 [2. 2. 1. 1. 3. 3.]
 [2. 2. 1. 1. 3. 3.]
 [2. 2. 3. 3. 3. 3.]
 [2. 2. 3. 3. 3. 3.]]



We can obtain the same result as above by separating the two dimensions into two different `numpy.pad` calls. The first call applies dimension 0 padding, and the second call applies the dimension 1 padding.

In [7]:

a_padded_dim0 = np.pad(a, ((2, 2), (0, 0)), constant_values=(2, 3))
a_padded_dim0_dim1 = np.pad(a_padded_dim0, ((0, 0), (2, 2)), constant_values=(2, 3))

print(f'a_padded_dim0_dim1:\n{a_padded_dim0_dim1}')

a_padded_dim0_dim1:
[[2. 2. 2. 2. 3. 3.]
 [2. 2. 2. 2. 3. 3.]
 [2. 2. 1. 1. 3. 3.]
 [2. 2. 1. 1. 3. 3.]
 [2. 2. 3. 3. 3. 3.]
 [2. 2. 3. 3. 3. 3.]]


But if we apply the padding to dimension 1 first, we get a different result.

In [8]:
a_padded_dim1 = np.pad(a, ((0, 0), (2, 2)), constant_values=(2, 3))
a_padded_dim1_dim0 = np.pad(a_padded_dim1, ((2, 2), (0, 0)), constant_values=(2, 3))

print(f'a_padded_dim1_dim0:\n{a_padded_dim1_dim0}')

a_padded_dim1_dim0:
[[2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2.]
 [2. 2. 1. 1. 3. 3.]
 [2. 2. 1. 1. 3. 3.]
 [3. 3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3. 3.]]


We can also specify a sequence of pairs of numbers for `constant_values` to apply different before- and after-padding to each dimension. The Nth pair corresponds with the Nth dimension.

Again, notice we have a conflict where corner elements can correspond with more than one dimension. And again, since the padding for each dimension is applied in order, the higher dimensions override the lower dimensions. In the example below, corner padding elements are assigned either `4` or `5` since those are the padding constants we specified for the higher dimension.

In [9]:
a = np.ones((2, 2))
a_padded = np.pad(a, 2, constant_values=((2, 3), (4, 5)))
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')

a: 
[[1. 1.]
 [1. 1.]]
a_padded:
[[4. 4. 2. 2. 5. 5.]
 [4. 4. 2. 2. 5. 5.]
 [4. 4. 1. 1. 5. 5.]
 [4. 4. 1. 1. 5. 5.]
 [4. 4. 3. 3. 5. 5.]
 [4. 4. 3. 3. 5. 5.]]


### `mode='edge'`

With 'edge' mode, the each pad element is filled with the value of the nearest element on the edge of the original array.

In [10]:
a = np.random.randint(0, 10, (3, 3))
a_padded = np.pad(a, 2, mode='edge')
print(f'a: \n{a}')
print(f'a_padded:\n{a_padded}')

a: 
[[6 0 7]
 [2 0 2]
 [1 7 7]]
a_padded:
[[6 6 6 0 7 7 7]
 [6 6 6 0 7 7 7]
 [6 6 6 0 7 7 7]
 [2 2 2 0 2 2 2]
 [1 1 1 7 7 7 7]
 [1 1 1 7 7 7 7]
 [1 1 1 7 7 7 7]]


### `mode='linear_ramp'`

This mode is similar to `mode='constant'`, except that if we have a pad width of more than `1`, the values between the edge of the padded array and the edge of the original array are a linear ramp between the two edge values. Somewhat confusingly, the padding elements on the edge of the padded array are referred to as "end values", even if they are before-padding values. (Perhaps we can improve this in the `torch` pad function interface. "edge values" would probably be clearer, although maybe that would cause confusion with 'edge' mode.)

Just as in 'constant' mode, padding is logically applied to each dimension in order, which explains why we get a value of `1` in positions `[1][1]`, `[1][5]`, `[5][1]`, and `[5][5]` of the result `a_padded`. (I'm going to stop mentioning that padding is applied to each dimension in order for the remaining `mode`s, so assume they all act the same way unless I specifically say differently.)

In [11]:
a = np.ones((3, 3)) * 4
a_padded = np.pad(a, 2, mode='linear_ramp')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a_padded_dim0 = np.pad(a, ((2, 2), (0, 0)), mode='linear_ramp')
print(f'a_padded_dim0:\n{a_padded_dim0}')
a_padded_dim0_dim1 = np.pad(a_padded_dim0, ((0, 0), (2, 2)), mode='linear_ramp')
print(f'a_padded_dim0_dim1:\n{a_padded_dim0_dim1}')

a:
[[4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]]
a_padded:
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 2. 2. 2. 1. 0.]
 [0. 2. 4. 4. 4. 2. 0.]
 [0. 2. 4. 4. 4. 2. 0.]
 [0. 2. 4. 4. 4. 2. 0.]
 [0. 1. 2. 2. 2. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]
a_padded_dim0:
[[0. 0. 0.]
 [2. 2. 2.]
 [4. 4. 4.]
 [4. 4. 4.]
 [4. 4. 4.]
 [2. 2. 2.]
 [0. 0. 0.]]
a_padded_dim0_dim1:
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 2. 2. 2. 1. 0.]
 [0. 2. 4. 4. 4. 2. 0.]
 [0. 2. 4. 4. 4. 2. 0.]
 [0. 2. 4. 4. 4. 2. 0.]
 [0. 1. 2. 2. 2. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]


With 'linear_ramp' mode, we can specify end values other than the default `0` using the optional `end_values` kwarg. This argument acts exactly the same as the `constant_values` argument for 'constant' mode. (Perhaps we can combine `end_values` and `constant_values` into one arg for PyTorch)

In [12]:
a = np.zeros((3, 3))
a_padded = np.pad(a, 2, mode='linear_ramp', end_values=((2, 3), (4, 5)))
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
a_padded:
[[4.   3.   2.   2.   2.   3.5  5.  ]
 [4.   2.5  1.   1.   1.   3.   5.  ]
 [4.   2.   0.   0.   0.   2.5  5.  ]
 [4.   2.   0.   0.   0.   2.5  5.  ]
 [4.   2.   0.   0.   0.   2.5  5.  ]
 [4.   2.75 1.5  1.5  1.5  3.25 5.  ]
 [4.   3.5  3.   3.   3.   4.   5.  ]]


### `mode='maximum/mean/median/minimum'`

In these modes, padding values are set to a reduced value (max, mean, median, or minimum) of a subset of the corresponding vector along the axis being padded. By default, the reduction is applied to the entire corresponding vector.

In [13]:
a = np.random.randint(0, 10, (3, 3)).astype(float)
a_padded = np.pad(a, 2, mode='mean')
print(f'a:\n{a}')
print(f'a_padded:\n{np.around(a_padded, 3)}')

a:
[[9. 5. 5.]
 [3. 8. 4.]
 [3. 4. 4.]]
a_padded:
[[5.    5.    5.    5.667 4.333 5.    5.   ]
 [5.    5.    5.    5.667 4.333 5.    5.   ]
 [6.333 6.333 9.    5.    5.    6.333 6.333]
 [5.    5.    3.    8.    4.    5.    5.   ]
 [3.667 3.667 3.    4.    4.    3.667 3.667]
 [5.    5.    5.    5.667 4.333 5.    5.   ]
 [5.    5.    5.    5.667 4.333 5.    5.   ]]


We can use the optional `stat_length` argument to specify the length of a subset of each vector to reduce. The subset vector is always going to be directly next to the padding. If this argument is a single `int`, it is used in the reduction for each before- and after-padding value for all dimensions. Notice that we can get the same behavior as `mode='edge'` if we use `stat_length=1`:

In [14]:
a = np.random.randint(0, 10, (3, 3)).astype(float)
a_padded = np.pad(a, 2, mode='mean', stat_length=1)
print(f'a:\n{a}')
print(f'a_padded:\n{np.around(a_padded, 3)}')

a:
[[8. 0. 7.]
 [7. 0. 1.]
 [2. 9. 3.]]
a_padded:
[[8. 8. 8. 0. 7. 7. 7.]
 [8. 8. 8. 0. 7. 7. 7.]
 [8. 8. 8. 0. 7. 7. 7.]
 [7. 7. 7. 0. 1. 1. 1.]
 [2. 2. 2. 9. 3. 3. 3.]
 [2. 2. 2. 9. 3. 3. 3.]
 [2. 2. 2. 9. 3. 3. 3.]]


`stat_length` can also be a pair of ints (different length for before- and after-padding reductions) or a sequence of pairs (one pair for each dimension). If an element of `stat_length` is greater than the length of the corresponding dimension, the entire vector is reduced for the corresponding padding elements.

In [15]:
a = np.random.randint(0, 10, (3, 3)).astype(float)
a_padded = np.pad(a, 2, mode='mean', stat_length=((1, 2), (3, 4)))
print(f'a:\n{a}')
print(f'a_padded:\n{np.around(a_padded, 3)}')

a:
[[5. 9. 5.]
 [7. 6. 2.]
 [2. 0. 5.]]
a_padded:
[[6.333 6.333 5.    9.    5.    6.333 6.333]
 [6.333 6.333 5.    9.    5.    6.333 6.333]
 [6.333 6.333 5.    9.    5.    6.333 6.333]
 [5.    5.    7.    6.    2.    5.    5.   ]
 [2.333 2.333 2.    0.    5.    2.333 2.333]
 [3.667 3.667 4.5   3.    3.5   3.667 3.667]
 [3.667 3.667 4.5   3.    3.5   3.667 3.667]]


### `mode='reflect'`

This mode has two differerent types, which can be chosen with the `reflect_type` argument. By default, we have `reflect_type='even'`, but we can also specify `reflect_type='odd'`.

#### `mode='reflect', reflect_type='even'`

With `reflect_type='even'`, padded elements are a reflection, in reverse order, of the elements from the input array. It's as if we placed a mirror on top of the elements on the edges of the input array and copied the numbers that appear in the reflection. If the padding width exceeds the input array length for a given dimension, the reflection continues in the opposite direction, bouncing back and forth. The elements on the edges of the input array are included only once per bounce (we'll see later on that `mode='symmetric'` works similarly, except that it includes the edge elements twice in a row).

In the example below, we reflection pad a 5-element vector with an after-pad width of 10. In the result, the first pad value at position `a_padded[5]` is equal to the value to the left of the last element of `a`, or `a[3]`. Then, as we continue further to the right within `a_padded`, we continue moving to the left within `a` until we get to `a_padded[8]`, which equals `a[0]`.  After that point, we bounce back, and `a_padded[9]` equals `a[1]`. This continues until we fill up the full width of the padding.

In [16]:
np.random.seed(0)
a = np.random.randint(0, 10, (5,))
a_padded = np.pad(a, (0, 10), mode='reflect', reflect_type='even')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[5 0 3 3 7]
a_padded:
[5 0 3 3 7 3 3 0 5 0 3 3 7 3 3]


And of course, we can reflection pad higher dimensional arrays. It's interesting to point out that with this type of padding, if we split the operation up into multiple `numpy.pad` calls, one for each dimension, we obtain the same result no matter which dimension we pad first.

In [17]:
a = np.array([[0, 1, 2], [1, 2, 3], [2, 3, 4]])
a_padded = np.pad(a, 4, mode='reflect')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a_padded_dim0_first = np.pad(
    np.pad(a, ((4, 4), (0, 0)), mode='reflect'),
    ((0, 0), (4, 4)),
    mode='reflect')

a_padded_dim1_first = np.pad(
    np.pad(a, ((0, 0), (4, 4)), mode='reflect'),
    ((4, 4), (0, 0)),
    mode='reflect')

assert (a_padded_dim0_first == a_padded_dim1_first).all()

a:
[[0 1 2]
 [1 2 3]
 [2 3 4]]
a_padded:
[[0 1 2 1 0 1 2 1 0 1 2]
 [1 2 3 2 1 2 3 2 1 2 3]
 [2 3 4 3 2 3 4 3 2 3 4]
 [1 2 3 2 1 2 3 2 1 2 3]
 [0 1 2 1 0 1 2 1 0 1 2]
 [1 2 3 2 1 2 3 2 1 2 3]
 [2 3 4 3 2 3 4 3 2 3 4]
 [1 2 3 2 1 2 3 2 1 2 3]
 [0 1 2 1 0 1 2 1 0 1 2]
 [1 2 3 2 1 2 3 2 1 2 3]
 [2 3 4 3 2 3 4 3 2 3 4]]



#### `mode='reflect', reflect_type='odd'`

With the aptly named `reflect_type = 'odd'`, we do get odd behavior. The documentation explains it as:

> For the ‘odd’ style, the extended part of the array is created by subtracting the reflected values from two times the edge value.

So we have a formula: `pad_value = (2 * edge_value) - reflected_value`

In the example below, we are adding ten elements to the right of a 5-element input vector, using odd reflection padding. At element `a_padded[5]`, the edge value from the input is `7`, and the reflected value is `3`, so we get `(2 * 7) - 3 = 11`. The pattern continues through `a_padded[8]`, where the reflected value is `5`, so we get `(2 * 7) - 5 = 9`.

What happens at `a_padded[9]`? Well, the reflected value *should* be `0` here, since we're now bouncing back from the leftmost element in the input array, and that would mean `reflected_value == a[1]`. We can see that `a_padded[9] == 4`. Using the formula above, we should have `4 = (2 * edge_value) - 0` --> `edge_value = 2`. But there is no edge value of 2 in the input array. What's going on?

Maybe we were wrong to assume that `reflected_value == a[1]`. Perhaps the `edge_value` is always supposed to stay the same for all of the padding on one side of the input array? So let's go ahead and assume that in this example, `edge_value = 7` always. So for `a_padded[9] == 4`, we have `4 = (2 * 7) - reflected_value` -- > `reflected_value = 10`. Wrong again, there is no element with value 10 anywhere.

Perhaps the edge value changes each time we bounce back. If that's true, I would assume that we use the previous `reflected_value` that we used before the bounce. For `a_padded[9]`, that would be `edge_value = a[0] = 5`. So let's plug that into the equation to get `4 = (2 * 5) - reflected_value` --> `reflected_value = -6`. Huh? There's no `-6` in the input.

I'm stumped. I'll try to figure this out later.

In [18]:
np.random.seed(0)
a = np.random.randint(0, 10, (5,))
a_padded = np.pad(a, (0, 10), mode='reflect', reflect_type='odd')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[5 0 3 3 7]
a_padded:
[ 5  0  3  3  7 11 11 14  9  4  7  7 11 15 15]


### `mode='symmetric'`

This mode is similar to reflection padding, with a slight difference. Instead of placing a mirror on top of the edge elements of the input, the mirror is placed next to the edge elements, so that they get included in the reflection.

It also uses the `reflect_type` argument, and `reflect_type='even'` is the default.



In [19]:
np.random.seed(0)
a = np.random.randint(0, 10, (5,))
a_padded = np.pad(a, (0, 10), mode='symmetric', reflect_type='even')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[5 0 3 3 7]
a_padded:
[5 0 3 3 7 7 3 3 0 5 5 0 3 3 7]


I'm still stumped on how `reflect_type=odd` works.

In [20]:
np.random.seed(0)
a = np.random.randint(0, 10, (5,))
a_padded = np.pad(a, (0, 10), mode='symmetric', reflect_type='odd')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[5 0 3 3 7]
a_padded:
[ 5  0  3  3  7  7 11 11 14  9  9  4  7  7 11]


### `mode='wrap'`

With this mode, we just wrap around the input array with periodic boundaries to find the pad value. It follows the formulas `after_pad[i] = input[i % len(input)]` for `0 <= i < len(after_pad)`, and `before_pad[len(before_pad) - 1 - i] = input[(len(before_pad) - i) % len(input)]` for `0 <= i < len(before_pad)`.


In [21]:
np.random.seed(0)
a = np.random.randint(0, 10, (5,))
a_padded = np.pad(a, (9, 10), mode='wrap')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

def my_wrap_pad(input, before_len, after_len):
    before_pad = np.empty((before_len)).astype(int)
    after_pad = np.empty((after_len)).astype(int)
    
    for i in range(0, before_len):
        before_pad[before_len - 1 - i] = input[(before_len - i) % len(input)]
    
    for i in range(0, after_len):
        after_pad[i] = input[i % len(input)]
        
    return np.concatenate([before_pad, input, after_pad])

print(f'custom implementation:\n{my_wrap_pad(a, 9, 10)}')

a:
[5 0 3 3 7]
a_padded:
[0 3 3 7 5 0 3 3 7 5 0 3 3 7 5 0 3 3 7 5 0 3 3 7]
custom implementation:
[0 3 3 7 5 0 3 3 7 5 0 3 3 7 5 0 3 3 7 5 0 3 3 7]


### `mode='empty'`

In this mode, we do not assign any particular value to padding elements. They simply receive whatever random values happen to be in memory when the result array is allocated.

In [22]:
a = np.random.randint(0, 10, (2, 2))
a_padded = np.pad(a, 1, mode='empty')
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
[[9 3]
 [5 2]]
a_padded:
[[     93985394645104                   0 8315178877616024947
  7071278284481194355]
 [3256719580876464950                   9                   3
  3834028039434482534]
 [2478515131577936930                   5                   2
  3689343304282551603]
 [2462380867116938290 2481042943426848118 6520405438716256826
  3329705475912400162]]




Why would you want this behavior? Well, you could potentially create your own custom padding mode function that is different from the modes already offered in `np.pad`. It would be more performant to tell `np.pad` not to assign any values to the padded elements--even setting to them to 0 has a performance cost. Although, you could alternatively, set `mode` to your custom padding function to get even better performance. Speaking of which...


### `mode=<function>`

We can specify our own padding function, instead of a builtin one.

I imagine that this isn't going to be possible in PyTorch. I don't think native_functions.yaml supports functions as arguments. Until we do support that, I won't even bother learning how to use this mode.

# torch.nn.functional.pad

PyTorch has a padding function as well, [`torch.nn.functional.pad`](https://pytorch.org/docs/master/generated/torch.nn.functional.pad.html), but it's not as user friendly and it doesn't have as many features. We want to add a NumPy-like `torch.pad` function, but we need to find out if there's anything that `torch.nn.functional.pad` supports which `numpy.pad` does not.

It would also be good to note some of the things that `torch.nn.functional.pad` does not support.

Let's look at the arguments for `torch.nn.functional.pad`

## `pad`

The `pad` argument is similar to `numpy.pad`'s `pad_width` argument. It specifies basically the same thing, but in a different format.

`pad` must be a 1-D sequence with an even number of elements. Each pair of elements corresponds to a different dimension of the input. The first number of a pair specifies the width of the before-padding, and the second number specifies after-padding.

One very important difference between this function and `numpy.pad` is that the dimension corresponding to each pair of padding widths goes in opposite order. Recall that `numpy.pad` has a mapping of `pair[N] --> dim[N]`. But `torch.nn.functional.pad` has the opposite order, `pair[N] --> dim[ndim - 1 - N]`.

In the example below, we can see that the first pair is being applied to dimension 1 of `a`, because we get no padding on the left and padding of width `1` on the right. Then, the second pair is applied to dimension 0, since the top padding has width 2 and the bottom width 3.

In [23]:
import torch

a = torch.ones(2, 2)
a_padded = torch.nn.functional.pad(a, (0, 1, 2, 3))
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
tensor([[1., 1.],
        [1., 1.]])
a_padded:
tensor([[0., 0., 0.],
        [0., 0., 0.],
        [1., 1., 0.],
        [1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])



Another difference between this function and `torch.nn.functional.pad` is that specifying just one pair will not apply padding to every dimension, it will *only* apply it to the last dimension.

In [24]:
a = torch.ones(2, 2)
a_padded = torch.nn.functional.pad(a, (2, 2))
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
tensor([[1., 1.],
        [1., 1.]])
a_padded:
tensor([[0., 0., 1., 1., 0., 0.],
        [0., 0., 1., 1., 0., 0.]])


These are definitely points of confusion, and it will be great to fix them with `torch.pad`.

We can also specify an empty tuple for the `pad` argument to specify no padding at all. This is not supported in `numpy.pad`. However, it doesn't seem particularly useful, so maybe we should not add support for it in `torch.pad`

In [25]:
a = torch.ones(2, 2)
a_padded = torch.nn.functional.pad(a, ())
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
tensor([[1., 1.],
        [1., 1.]])
a_padded:
tensor([[1., 1.],
        [1., 1.]])


## `mode`

This function also has multiple different modes to specify how the padding values should be filled. However, there are only four of them.

### `mode='constant'`

This is similar to the `numpy.pad` mode of the same name--we just fill the padding with constant values. However, this function only supports a single scalar value to be used as padding everywhere, with the `value` argument. It does not allow us to specify different constants for after- and before-padding for each dimension, so it's not as robust as `numpy.pad`'s `constant_values` argument.

In [26]:
a = torch.ones(2, 2)
a_padded = torch.nn.functional.pad(a, (2, 2, 2, 2), value=8)
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
tensor([[1., 1.],
        [1., 1.]])
a_padded:
tensor([[8., 8., 8., 8., 8., 8.],
        [8., 8., 8., 8., 8., 8.],
        [8., 8., 1., 1., 8., 8.],
        [8., 8., 1., 1., 8., 8.],
        [8., 8., 8., 8., 8., 8.],
        [8., 8., 8., 8., 8., 8.]])


Constant mode supports inputs of any dimension.

In [27]:
print('1-D:')
print(torch.nn.functional.pad(torch.ones(1), (1, 0), value=8))

print('2-D:')
print(torch.nn.functional.pad(torch.ones(1, 1), (1, 0, 1, 0), value=8))

print('3-D:')
print(torch.nn.functional.pad(torch.ones(1, 1, 1), (1, 0, 1, 0, 1, 0), value=8))

print('4-D:')
print(torch.nn.functional.pad(torch.ones(1, 1, 1, 1), (1, 0, 1, 0, 1, 0, 1, 0), value=8))

1-D:
tensor([8., 1.])
2-D:
tensor([[8., 8.],
        [8., 1.]])
3-D:
tensor([[[8., 8.],
         [8., 8.]],

        [[8., 8.],
         [8., 1.]]])
4-D:
tensor([[[[8., 8.],
          [8., 8.]],

         [[8., 8.],
          [8., 8.]]],


        [[[8., 8.],
          [8., 8.]],

         [[8., 8.],
          [8., 1.]]]])



### `mode='reflect'`

This mode is similar to `numpy.pad`'s reflection padding. However, there are some limitations:
* Only inputs with 3 or 4 dimensions are supported
* After- and before-padding widths must be less than the corresponding dimension's size. In other words, we cannot reflect more than once like we did above with `numpy.pad`
* Can only operate on the last N-2 dimensions of the input, where N is the number of dimensions in the input. So, if the input is 3-D, it can only pad the third dimension, and the `pad` argument *has* to have length 2. With a 4-D input, we can only pad the 3rd and 4th dims, and `pad` has to have length 4

Here, we use a 3-D input:


In [28]:
torch.manual_seed(0)
a = torch.randint(0, 10, (2, 2, 3)).to(torch.float)
a_padded = torch.nn.functional.pad(a, (2, 2), mode='reflect')
print(f'a:\n{a}')
print(f'a.size():\n{a.size()}')
print(f'a_padded:\n{a_padded}')
print(f'a_padded.size():\n{a_padded.size()}')

a:
tensor([[[4., 9., 3.],
         [0., 3., 9.]],

        [[7., 3., 7.],
         [3., 1., 6.]]])
a.size():
torch.Size([2, 2, 3])
a_padded:
tensor([[[3., 9., 4., 9., 3., 9., 4.],
         [9., 3., 0., 3., 9., 3., 0.]],

        [[7., 3., 7., 3., 7., 3., 7.],
         [6., 1., 3., 1., 6., 1., 3.]]])
a_padded.size():
torch.Size([2, 2, 7])


Let's try a 4-D input:

In [29]:
torch.manual_seed(0)
a = torch.randint(0, 10, (2, 2, 2, 3)).to(torch.float)
a_padded = torch.nn.functional.pad(a, (2, 2, 1, 1), mode='reflect')
print(f'a:\n{a}')
print(f'a.size():\n{a.size()}')
print(f'a_padded:\n{a_padded}')
print(f'a_padded.size():\n{a_padded.size()}')

a:
tensor([[[[4., 9., 3.],
          [0., 3., 9.]],

         [[7., 3., 7.],
          [3., 1., 6.]]],


        [[[6., 9., 8.],
          [6., 6., 8.]],

         [[4., 3., 6.],
          [9., 1., 4.]]]])
a.size():
torch.Size([2, 2, 2, 3])
a_padded:
tensor([[[[9., 3., 0., 3., 9., 3., 0.],
          [3., 9., 4., 9., 3., 9., 4.],
          [9., 3., 0., 3., 9., 3., 0.],
          [3., 9., 4., 9., 3., 9., 4.]],

         [[6., 1., 3., 1., 6., 1., 3.],
          [7., 3., 7., 3., 7., 3., 7.],
          [6., 1., 3., 1., 6., 1., 3.],
          [7., 3., 7., 3., 7., 3., 7.]]],


        [[[8., 6., 6., 6., 8., 6., 6.],
          [8., 9., 6., 9., 8., 9., 6.],
          [8., 6., 6., 6., 8., 6., 6.],
          [8., 9., 6., 9., 8., 9., 6.]],

         [[4., 1., 9., 1., 4., 1., 9.],
          [6., 3., 4., 3., 6., 3., 4.],
          [4., 1., 9., 1., 4., 1., 9.],
          [6., 3., 4., 3., 6., 3., 4.]]]])
a_padded.size():
torch.Size([2, 2, 4, 7])


Of course, we can sometimes work around the input dimensionality restrictions by reshaping the input first. For instance, if we want to reflection pad a 1-D tensor, we can reshape it to add two dimensions of size 1, and then remove those dimensions from the output. But this is not user friendly--it could have been done automatically.

In [30]:
torch.manual_seed(0)
a = torch.arange(10).to(torch.float)
a_reshaped = a.reshape([1, 1] + list(a.size()))

a_padded = torch.nn.functional.pad(a_reshaped, (0, 9), mode='reflect')
a_padded = a_padded.reshape(a_padded.size()[-1:])
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
a_padded:
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 8., 7., 6., 5., 4., 3., 2., 1.,
        0.])


Likewise with a 2-D tensor:

In [31]:
torch.manual_seed(0)
a = torch.arange(16).reshape(4, 4).to(torch.float)
a_reshaped = a.reshape([1, 1] + list(a.size()))

a_padded = torch.nn.functional.pad(a_reshaped, (3, 3, 3, 3), mode='reflect')
a_padded = a_padded.reshape(a_padded.size()[-2:])
print(f'a:\n{a}')
print(f'a_padded:\n{a_padded}')

a:
tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.]])
a_padded:
tensor([[15., 14., 13., 12., 13., 14., 15., 14., 13., 12.],
        [11., 10.,  9.,  8.,  9., 10., 11., 10.,  9.,  8.],
        [ 7.,  6.,  5.,  4.,  5.,  6.,  7.,  6.,  5.,  4.],
        [ 3.,  2.,  1.,  0.,  1.,  2.,  3.,  2.,  1.,  0.],
        [ 7.,  6.,  5.,  4.,  5.,  6.,  7.,  6.,  5.,  4.],
        [11., 10.,  9.,  8.,  9., 10., 11., 10.,  9.,  8.],
        [15., 14., 13., 12., 13., 14., 15., 14., 13., 12.],
        [11., 10.,  9.,  8.,  9., 10., 11., 10.,  9.,  8.],
        [ 7.,  6.,  5.,  4.,  5.,  6.,  7.,  6.,  5.,  4.],
        [ 3.,  2.,  1.,  0.,  1.,  2.,  3.,  2.,  1.,  0.]])


### `mode='replicate'`

This is `torch.nn.functional.pad`'s equivalent to `numpy.pad`'s `mode='edge'`. However, it only accepts inputs with 3, 4, or 5 dimensions, and only the last N - 2 dimensions are padded, where N is the dimensionality of the input.

In [32]:
a = torch.arange(16).reshape(1, 1, 4, 4).to(torch.float)
a_padded = torch.nn.functional.pad(a, (2, 2, 2, 2), mode='replicate')
print(f'a:\n{a}')
print(f'a.size():\n{a.size()}')
print(f'a_padded:\n{a_padded}')
print(f'a_padded.size():\n{a_padded.size()}')

a:
tensor([[[[ 0.,  1.,  2.,  3.],
          [ 4.,  5.,  6.,  7.],
          [ 8.,  9., 10., 11.],
          [12., 13., 14., 15.]]]])
a.size():
torch.Size([1, 1, 4, 4])
a_padded:
tensor([[[[ 0.,  0.,  0.,  1.,  2.,  3.,  3.,  3.],
          [ 0.,  0.,  0.,  1.,  2.,  3.,  3.,  3.],
          [ 0.,  0.,  0.,  1.,  2.,  3.,  3.,  3.],
          [ 4.,  4.,  4.,  5.,  6.,  7.,  7.,  7.],
          [ 8.,  8.,  8.,  9., 10., 11., 11., 11.],
          [12., 12., 12., 13., 14., 15., 15., 15.],
          [12., 12., 12., 13., 14., 15., 15., 15.],
          [12., 12., 12., 13., 14., 15., 15., 15.]]]])
a_padded.size():
torch.Size([1, 1, 8, 8])


### `mode='circular'`

This mode is analogous to the 'wrap' mode for `numpy.pad`. This only works for 3, 4, and 5 dimensions, and the padding cannot be applied to the first two dimensions. Also, padding widths cannot exceed the size of the corresponding dimension of the input.

In [33]:
torch.manual_seed(0)
a = torch.arange(9).reshape(1, 1, 3, 3).to(torch.float)
a_padded = torch.nn.functional.pad(a, (3, 3, 3, 3), mode='circular')
print(f'a:\n{a}')
print(f'a.size():\n{a.size()}')
print(f'a_padded:\n{a_padded}')
print(f'a_padded.size():\n{a_padded.size()}')

a:
tensor([[[[0., 1., 2.],
          [3., 4., 5.],
          [6., 7., 8.]]]])
a.size():
torch.Size([1, 1, 3, 3])
a_padded:
tensor([[[[0., 1., 2., 0., 1., 2., 0., 1., 2.],
          [3., 4., 5., 3., 4., 5., 3., 4., 5.],
          [6., 7., 8., 6., 7., 8., 6., 7., 8.],
          [0., 1., 2., 0., 1., 2., 0., 1., 2.],
          [3., 4., 5., 3., 4., 5., 3., 4., 5.],
          [6., 7., 8., 6., 7., 8., 6., 7., 8.],
          [0., 1., 2., 0., 1., 2., 0., 1., 2.],
          [3., 4., 5., 3., 4., 5., 3., 4., 5.],
          [6., 7., 8., 6., 7., 8., 6., 7., 8.]]]])
a_padded.size():
torch.Size([1, 1, 9, 9])
