<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Maximum-(or-minimum)-value-over-the-entire-matrix" data-toc-modified-id="Maximum-(or-minimum)-value-over-the-entire-matrix-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Maximum (or minimum) value over the entire matrix</a></span></li><li><span><a href="#Minimum-(or-maximum)-value-in-every-row/column" data-toc-modified-id="Minimum-(or-maximum)-value-in-every-row/column-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Minimum (or maximum) value in every row/column</a></span></li><li><span><a href="#The-average-value-of-every-column-or-row" data-toc-modified-id="The-average-value-of-every-column-or-row-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>The average value of every column or row</a></span></li><li><span><a href="#Sort-the-rows,-so-at-the-end-every-column-goes-from-low-to-high" data-toc-modified-id="Sort-the-rows,-so-at-the-end-every-column-goes-from-low-to-high-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Sort the rows, so at the end every column goes from low to high</a></span></li><li><span><a href="#Calculate-the-(cumulative)-sum-going-across-each-column" data-toc-modified-id="Calculate-the-(cumulative)-sum-going-across-each-column-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Calculate the (cumulative) sum going across each column</a></span></li><li><span><a href="#Other-functions-that-might-be-of-interest" data-toc-modified-id="Other-functions-that-might-be-of-interest-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Other functions that might be of interest</a></span></li></ul></div>

>All content is released under Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) and all source code is released under a [BSD-3 clause license](https://en.wikipedia.org/wiki/BSD_licenses). Parts of these materials were inspired by https://github.com/engineersCode/EngComp/ (CC-BY 4.0), L.A. Barba, N.C. Clementi.
>
>Please reuse, remix, revise, and reshare this content in any way, keeping this notice.
>
><img style="float: right;" width="150px" src="images/jupyter-logo.png">**Are you viewing this on jupyter.org?** Then this notebook will be read-only. <br>
>See how you can interactively run the code in this notebook by visiting our [instruction page about Notebooks](https://yint.org/notebooks). 

# Functions on the rows or columns of a matrix

In the [prior notebook](./) you learned about elementwise operations. In other words, NumPy performed the mathematical calculation on every element (entry) in the array.

Sometimes we need calculations the work on every row, or column, of an array. We will cover:
1. Find the maximum value in the entire array (over all rows and all columns)
2. Calculate the minimum value in every row (give back a column vector that has the minimum value of every row)
3. Calculate the average value of every column (give back a row vector that has the average value of every column)
4. Sort the rows, so at the end every column goes from low to high
5. Show the (cumulative) sum going across each column
6. Other functions that might be of interest

In this notebook we will talk about matrices, but the operations can be applied to multi-dimensional arrays, with 3, 4, or more dimensions.

We also introduce the important term ***``axis``***, which you regularly see in the NumPy documentation.

## Maximum (or minimum) value over the entire matrix

You have just received all the data in your matrix, and now you wish to find the largest, or smallest value.


In [1]:
import numpy as np
rnd = np.array([[ 7, 3, 11, 12, 2], [10, 13, 8, 8, 2], [3, 13, 6, 2, 3], [5, 3, 9, 2, 6]])
print('The matrix is:\n {}'.format(rnd))

max_value = np.amax(rnd)
print('The maximum value is {}'.format(max_value))

min_value = np.amin(rnd)
print('The minimum value is {}'.format(min_value))

The matrix is:
 [[ 7  3 11 12  2]
 [10 13  8  8  2]
 [ 3 13  6  2  3]
 [ 5  3  9  2  6]]
The maximum value is 13
The minimum value is 2


The ``np.amax(...)`` and ``np.amin(...)`` functions will work along the entire array: all dimensions, looking at every element.

### Enrichment:

The NumPy library will internally unfold, or flatten the array into a single long vector. Take a look at what that looks like when you use the ``.flatten(...)`` method on the array: ``rnd.flatten()``. It works from column to column, down each row:

```python
print(rnd.flatten())
```
``[ 7  3 11 12  2 10 13  8  8  2  3 13  6  2  3  5  3  9  2  6]``

This is actually how the data is stored internally in the computer's memory.

The reason we point this ``.flatten(...)`` function out is because sometimes knowing what the maximum value is is only half the work. The other half is knowing *where* that maximum value is. For that we have the ``np.argmax(...)`` function.

Try this code:

```python
max_position = np.argmax(rnd)
print('The maximum value is in position {} of the flattened array'.format(max_position))
```

Verify that that is actually the case, using the space below:

In [2]:
# Copy the above code here and run it.
# In which position is the maximum value?
# And the minimum value?



## Minimum (or maximum) value in every row/column

Above we found the minimum or maximum in the entire matrix. But what if we wanted that extreme value given per row, or per column?

Think of a matrix containing the daily temperatures per city; one city per column, and every row is a day of the year. 
* What is the max or min temperature for each city (per column)?
* What is the max or min temperature each day for all cities (per row)?

For this we also use the ``np.amax(matrix, axis=...)`` or ``np.amin(matrix, axis=...)`` function.

You must specify, as a second input, along which ***``axis``*** you want that extreme value to be calculated. 
* Axis 0 is the first axis, along the direction of the rows, going from top to bottom
* Axis 1 is the next axis, along the direction of the columns, going from left to right

See the code below.

In [3]:
import numpy as np
temps = np.array([[7, 9, 12, 10], [1, 4, 5, 2], [-3, 1, -2, -3], [-2, -1, -2, -2], [-3, -1, -2, -4]])
print('The temperatures are given one column per city, each row is a daily average:\n{}'.format(temps))

max_value_0 = np.amax(temps, axis=0)
print('The maximum value along axis 0 (row-wise, per city for all days) is {}'.format(max_value_0))

max_value_1 = np.amax(temps, axis=1)
print('The maximum value along axis 1 (column-wise, per day for all cities) is {}'.format(max_value_1))

# Notice the above output is 'flatten' and returned as a row, 
# instead of a column, as you might hope for. We can use the `keepdims` input though:
max_value_1_col = np.amax(temps, axis=1, keepdims=True)
print('The maximum value along axis 1 (column-wise, per day for all cities) is\n{}'.format(max_value_1_col))



The temperatures are given one column per city, each row is a daily average:
[[ 7  9 12 10]
 [ 1  4  5  2]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [-3 -1 -2 -4]]
The maximum value along axis 0 (row-wise, per city for all days) is [ 7  9 12 10]
The maximum value along axis 1 (column-wise, per day for all cities) is [12  5  1 -1 -1]
The maximum value along axis 1 (column-wise, per day for all cities) is
[[12]
 [ 5]
 [ 1]
 [-1]
 [-1]]


You can visually verify that the maximum values returned are what you expected.

Now try it below for the minimum values:

In [4]:
# Give the minimum temperature for all cities
# Print the minimum temperature for all days for every city


### Enrichment:

Many functions in NumPy take ***``axis``*** as in input argument, including the ``np.argmin(...)`` and ``np.argmax(...)`` functions you saw above. 

Try this in the code block above:
```python
max_position_0 = np.argmin(temps, axis=0)
print('The minimum temperature for each city occurred in position {} of each column'.format(max_position_0))
```

What position value is returned if there is more than one entry of the same minimum value (see column 3, for example, which has ``12, 5, -2, -2, -2``)?

## The average value of every column or row

Just like with the minimum or maximum value in the part above, you can expect to calculate averages per row and per column.

In [5]:
import numpy as np
temps = np.array([[7, 9, 12, 10], [1, 4, 5, 2], [-3, 1, -2, -3], [-2, -1, -2, -2], [-3, -1, -2, -4]])
print('The temperatures are given one column per city, each row is a daily average:\n{}'.format(temps))

mean_value_0 = np.mean(temps, axis=0)
print('The average value along axis 0 (row-wise, per city, over all days) is {}'.format(mean_value_0))

mean_value_1 = np.mean(temps, axis=1, keepdims=True) # <-- notice the extra input
print('The average value along axis 1 (column-wise, per day, over all cities) is:\n{}'.format(mean_value_1))

The temperatures are given one column per city, each row is a daily average:
[[ 7  9 12 10]
 [ 1  4  5  2]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [-3 -1 -2 -4]]
The average value along axis 0 (row-wise, per city, over all days) is [0.  2.4 2.2 0.6]
The average value along axis 1 (column-wise, per day, over all cities) is:
[[ 9.5 ]
 [ 3.  ]
 [-1.75]
 [-1.75]
 [-2.5 ]]


## Sort the rows, so at the end every column goes from low to high

Just as you might have been interested in finding the [minimum or maximum](#Minimum-(or-maximum)-value-in-every-row/column) value in every column, you might also be interested in sorting each column.

We want to sort every column **independently** of the others. In other words every column will be sorted from low to high by the end. This is in contrast to sorting based on one column, and the other rows follow with.

We have seen the ***``axis``*** input several times now, and here we will use it again to indicate which axis we would like to sort in.

In [6]:
import numpy as np
temps = np.array([[7, 9, 12, 10], [1, 4, 5, 2], [-3, 1, -2, -3], [-2, -1, -2, -2], [-3, -1, -2, -4]])
print('The temperatures are given one column per city, each row is a daily average:\n{}'.format(temps))

sorted_columns = np.sort(temps, axis=0)
print('The temperatures sorted in each column (along axis 0): \n{}'.format(sorted_columns))

sorted_rows = np.sort(temps, axis=1)
print('The temperatures sorted in each row (along axis 1): \n{}'.format(sorted_rows))

print('To be sure, the original data are left unchanged:\n{}'.format(temps))

The temperatures are given one column per city, each row is a daily average:
[[ 7  9 12 10]
 [ 1  4  5  2]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [-3 -1 -2 -4]]
The temperatures sorted in each column (along axis 0): 
[[-3 -1 -2 -4]
 [-3 -1 -2 -3]
 [-2  1 -2 -2]
 [ 1  4  5  2]
 [ 7  9 12 10]]
The temperatures sorted in each row (along axis 1): 
[[ 7  9 10 12]
 [ 1  2  4  5]
 [-3 -3 -2  1]
 [-2 -2 -2 -1]
 [-4 -3 -2 -1]]
To be sure, the original data are left unchanged:
[[ 7  9 12 10]
 [ 1  4  5  2]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [-3 -1 -2 -4]]


In the code above we see that the sort takes place and the result is provided in a new matrix, as the output:
```python
output_array = np.sort(input_array, axis=...)```

This is not efficient, especially if the ``input_array`` is really large. It means that a copy of the data is made, using up memory, and computer time; and only then does the sort take place in the copy of the data.

It is possible to simply sort the original array. This is called ***in-place sorting***. You will see that terminology in several places in NumPy's documentation: ***in-place***. It means that the original matrix is used, calculated on, and the result is in the same variable as the original matrix.

You perform an in-place sort as follows:
```python
input_array.sort(axis=...)
```
Dy definition an in-place operation means there is no need to assign the result to another variable as output.

Let's see what the implication of in-place sorting is:

In [7]:
import numpy as np
temps = np.array([[7, 9, 12, 10], [1, 4, 5, 2], [-3, 1, -2, -3], [-2, -1, -2, -2], [-3, -1, -2, -4]])
print('The temperatures are given one column per city, each row is a daily average:\n{}'.format(temps))

# In-place sort. We don't need to use `output=` but let's see what happens.
# The in-place sort results is in the original variable "temps"
output = temps.sort(axis=0)  
print('The sorted values along axis 0: \n{}'.format(temps))
print('Out of curiosity, the value of "output" is: {}'.format(output))

# So you can simply say:
temps.sort(axis=0)

# and the result will be sorted in the original variable.

The temperatures are given one column per city, each row is a daily average:
[[ 7  9 12 10]
 [ 1  4  5  2]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [-3 -1 -2 -4]]
The sorted values along axis 0: 
[[-3 -1 -2 -4]
 [-3 -1 -2 -3]
 [-2  1 -2 -2]
 [ 1  4  5  2]
 [ 7  9 12 10]]
Out of curiosity, the value of "output" is: None


### Enrichment

1. What if you want to sort the entire array, as if it is were 1 long sequence of numbers? See the sections above for a hint.
2. The above codes have all sorted from lowest values to highest. How can you sort from largest to smallest? Try this:
```python
np.sort(input_array)[::-1]```

## Calculate the (cumulative) sum going across each column

If the values in the rows or columns represent some property, such as weight in a container, then it might be interesting to calculate the cumulative value. 

The values in column 2, 3, 4 and 5 represent the weight (kilograms), added to 4 containers. Each row represents 1 minute. The first column is a counter. 

**The goal**: Find the point in time when the weight in the container just exceeds 100kg. You will see why we have a counter as column 1.

In [8]:
import numpy as np
n = 20
k = 4

counter = np.ones(shape=(n, 1)) 
weights = np.random.randint(low=4, high=9, size=(n, 4))
weight_matrix = np.hstack((counter, weights))

print('The weight added to the {0} containers every minute [ignore the first column]:\n{1}'.format(k, weight_matrix))

accumulation = np.cumsum(weight_matrix, axis=0)
print('The cumulative weight over time is:\n{}'.format(accumulation))
print('At which minute in time does the container weight exceed 100kg?')

The weight added to the 4 containers every minute [ignore the first column]:
[[1. 5. 8. 6. 7.]
 [1. 5. 8. 7. 5.]
 [1. 6. 8. 4. 4.]
 [1. 6. 4. 6. 5.]
 [1. 7. 7. 4. 4.]
 [1. 4. 8. 4. 7.]
 [1. 6. 4. 7. 5.]
 [1. 4. 7. 4. 4.]
 [1. 7. 5. 8. 4.]
 [1. 5. 8. 8. 8.]
 [1. 4. 8. 8. 4.]
 [1. 7. 5. 4. 5.]
 [1. 5. 7. 4. 7.]
 [1. 4. 7. 7. 8.]
 [1. 4. 5. 5. 8.]
 [1. 8. 8. 6. 8.]
 [1. 8. 7. 4. 5.]
 [1. 4. 7. 7. 8.]
 [1. 5. 7. 8. 8.]
 [1. 7. 7. 8. 6.]]
The cumulative weight over time is:
[[  1.   5.   8.   6.   7.]
 [  2.  10.  16.  13.  12.]
 [  3.  16.  24.  17.  16.]
 [  4.  22.  28.  23.  21.]
 [  5.  29.  35.  27.  25.]
 [  6.  33.  43.  31.  32.]
 [  7.  39.  47.  38.  37.]
 [  8.  43.  54.  42.  41.]
 [  9.  50.  59.  50.  45.]
 [ 10.  55.  67.  58.  53.]
 [ 11.  59.  75.  66.  57.]
 [ 12.  66.  80.  70.  62.]
 [ 13.  71.  87.  74.  69.]
 [ 14.  75.  94.  81.  77.]
 [ 15.  79.  99.  86.  85.]
 [ 16.  87. 107.  92.  93.]
 [ 17.  95. 114.  96.  98.]
 [ 18.  99. 121. 103. 106.]
 [ 19. 104. 128. 111. 

## Other functions that might be of interest

Investigate these functions as enrichment:
1. The 'distance' from the maximum to the minimum value, also called the range, or 'peak to peak': ``np.ptp(..., axis=...)``
2. The standard deviation (ignore this if you do not know yet what the standard deviation is): ``np.std(..., axis=...)``
3. The cumulative product, similar to the cumulative sum, except for multiplication: ``np.cumprod(..., axis=...)``
4. The difference, from row to row: ``np.diff(..., axis=...)``
5. Randomly shuffle the values around in the array: ``np.random.shuffle(...)``. Will only shuffle in the first dimension of the array (within the row direction, the axis=0 direction).

In [9]:
import numpy as np
temps = np.array([[7, 9, 12, 10], [1, 4, 5, 2], [-3, 1, -2, -3], [-2, -1, -2, -2], [-3, -1, -2, -4]])
print('The temperatures are given one column per city, each row is a daily average:\n{}'.format(temps))

print('The range, from maximum to minimum, for every city is {} degrees.'.format(np.ptp(temps, axis=0)))

print('The standard deviation for each city is {} degrees'.format(np.std(temps, axis=0)))

print('The cumulative product of the temperature values is:\n{}'.format(np.cumprod(temps, axis=0)))

# Notice that the output has one fewer row than the input
print('The difference from one day to the next of the temperatures is:\n{}'.format(np.diff(temps, axis=0)))

np.random.shuffle(temps)
print('The values, shuffled from row-to-row, are shuffled in-place:\n{}'.format(temps))

The temperatures are given one column per city, each row is a daily average:
[[ 7  9 12 10]
 [ 1  4  5  2]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [-3 -1 -2 -4]]
The range, from maximum to minimum, for every city is [10 10 14 14] degrees.
The standard deviation for each city is [3.79473319 3.77359245 5.6        5.12249939] degrees
The cumulative product of the temperature values is:
[[   7    9   12   10]
 [   7   36   60   20]
 [ -21   36 -120  -60]
 [  42  -36  240  120]
 [-126   36 -480 -480]]
The difference from one day to the next of the temperatures is:
[[-6 -5 -7 -8]
 [-4 -3 -7 -5]
 [ 1 -2  0  1]
 [-1  0  0 -2]]
The values, shuffled from row-to-row, are shuffled in-place:
[[-3 -1 -2 -4]
 [-3  1 -2 -3]
 [-2 -1 -2 -2]
 [ 7  9 12 10]
 [ 1  4  5  2]]
