## Problem

Rotate a matrix by 90 degrees, e.g.,
$$
    \begin{array}{cccc}
         1 &  2 &  3 &  4  \\
         5 &  6 &  7 &  8  \\
         9 & 10 & 11 & 12  \\
        13 & 14 & 15 & 16
    \end{array}
$$
becomes
$$
    \begin{array}{cccc}
        13 &  9 &  5 &  1  \\
        14 & 10 &  6 &  2  \\
        15 & 11 &  7 &  3  \\
        16 & 12 &  8 &  4
    \end{array}
$$
The rotation should happen in place, i.e., no auxiliary matrix can be used.

Several implementation are possible.

## Helper functions

First, you can write a number of functions to help test and visualize the solutions:
  * a function to create an $n \times n$ array with integer elements from $1$ to $n^2$,
  * a function to pretty-print such a matrix, and
  * a function to print the original matrix, run an algorithm, and print the result.

In [1]:
def create_matrix(size):
    return [list(range(1 + size*start, 1 + size*(start + 1))) for start in range(size)]

In [2]:
def print_matrix(matrix):
    print('\n'.join(' '.join(f'{x:3d}' for x in row) for row in matrix))

In [3]:
matrix4 = create_matrix(4)

In [4]:
print_matrix(matrix4)

  1   2   3   4
  5   6   7   8
  9  10  11  12
 13  14  15  16


In [5]:
def test_rotation(size, rotator, is_numpy=False):
    matrix = create_matrix(size)
    if is_numpy:
        matrix = np.array(matrix)
    print_matrix(matrix)
    matrix = rotator(matrix)
    print()
    print_matrix(matrix)

Note that we make provisions for handling numpy arrays, although this will not be required for the pure Python solutions.

## Solution 1

You can observe that if you first swap the elements above the diagonal with those below it, and subsequently exchange the first and last column, the second with the one but last, and so on, you will obtain the matrix rotated by 90 degrees.

In [6]:
def inplace_rotate1(matrix):
    n = len(matrix)
    # flip matrix along diagonal
    for i in range(n):
        for j in range(i + 1, n):
            matrix[i][j], matrix[j][i] = matrix[j][i], matrix[i][j]
    # swap columns
    for j in range(n//2):
        for i in range(n):
            matrix[i][j], matrix[i][n - j - 1] = matrix[i][n - j - 1], matrix[i][j]
    return matrix

Below you can test for a $4 \times 4$ and a $5 \times 5$ matrix.

In [7]:
test_rotation(4, inplace_rotate1)

  1   2   3   4
  5   6   7   8
  9  10  11  12
 13  14  15  16

 13   9   5   1
 14  10   6   2
 15  11   7   3
 16  12   8   4


In [8]:
test_rotation(5, inplace_rotate1)

  1   2   3   4   5
  6   7   8   9  10
 11  12  13  14  15
 16  17  18  19  20
 21  22  23  24  25

 21  16  11   6   1
 22  17  12   7   2
 23  18  13   8   3
 24  19  14   9   4
 25  20  15  10   5


## Solution 2

Given a lot of index fiddling, it is possible to obtain a faster algorithm by rotating quadrants of the matrix.  The first quadrant should go the the second, that should got to the third, and so on.  If $n$ is even, the quadrants are squares, otherwise the quadrants are rectangles.

In [9]:
def inplace_rotate2(matrix):
    n = len(matrix)
    for i in range((n + 1)//2):
        for j in range(n//2):
            matrix[i][j], matrix[n - j - 1][i], \
                matrix[n - i - 1][n - j - 1], matrix[j][n - i - 1] = \
                    matrix[n - j - 1][i], matrix[n - i - 1][n - j - 1], \
                        matrix[j][n - i - 1], matrix[i][j]
    return matrix

Again, you can test for a $4 \times 4$ and a $5 \times 5$ matrix.

In [10]:
test_rotation(4, inplace_rotate2)

  1   2   3   4
  5   6   7   8
  9  10  11  12
 13  14  15  16

 13   9   5   1
 14  10   6   2
 15  11   7   3
 16  12   8   4


In [11]:
test_rotation(5, inplace_rotate2)

  1   2   3   4   5
  6   7   8   9  10
 11  12  13  14  15
 16  17  18  19  20
 21  22  23  24  25

 21  16  11   6   1
 22  17  12   7   2
 23  18  13   8   3
 24  19  14   9   4
 25  20  15  10   5


## Solution 3

Using numpy, the problem can trivially be solved since this library has a `rot90` function that implements this matrix transformation.  In order to have a clockwize rotation, we have to swap the default axes of the array.

In [12]:
import numpy as np

In [13]:
def inplace_rotate3(matrix):
    return np.rot90(matrix, axes=(1, 0))

In [14]:
test_rotation(4, inplace_rotate3, is_numpy=True)

  1   2   3   4
  5   6   7   8
  9  10  11  12
 13  14  15  16

 13   9   5   1
 14  10   6   2
 15  11   7   3
 16  12   8   4


## Verification

To verify whether the three solutions yield the same resuls, you can comp

In [15]:
import itertools

In [16]:
for n in range(1, 10):
    solutions = [rotator(create_matrix(n))
                 for rotator in (inplace_rotate1, inplace_rotate2, inplace_rotate3)]
    for m_a, m_b in itertools.pairwise(solutions):
        assert(np.array_equal(m_a, m_b))

## Performance

To compare the performance of the three solutions, you can create a large matrix, say $1,000 \times 1,000$ and use `%timeit` to time the in-place rotations.

In [17]:
n = 1_000

In [18]:
matrix1000 = create_matrix(n)

In [19]:
%timeit inplace_rotate1(matrix1000)

107 ms ± 2.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [20]:
%timeit inplace_rotate2(matrix1000)

77.6 ms ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [21]:
%timeit inplace_rotate3(matrix1000)

35.6 ms ± 597 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


The solution that uses numpy's `rot90` function more than twice as fast as the pure Python implementations.  However, if you use a numpy array as input, the difference is even more striking.

In [22]:
array1000 = np.arange(1, n**2 + 1).reshape(n, n)

In [23]:
%timeit inplace_rotate1(array1000)

686 ms ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [24]:
%timeit inplace_rotate2(array1000)

402 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [25]:
%timeit inplace_rotate3(array1000)

5.22 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


As can be expected, applying the pure Python implementation on numpy arrays is terribly slow, while eliminating the conversion of a list of lists by providing a numpy arrays as input directly gives a very marked performance advantage.