# 7. Other Useful NumPy Functions

We start by importing NumPy under the alias `np`.

In [1]:
import numpy as np

In this part, we will look at some other useful NumPy functions.

The function `np.where()` returns elements chosen from two arrays based on a condition. This is maybe best demonstrated with an example.

In [2]:
arr = np.array([10, 20, 30, 40, 50])

condition = arr > 30
x = np.array([-1, -2, -3, -4, -5])
y = np.array([1, 2, 3, 4, 5])
result = np.where(condition, x, y)

print(f"Original Array: {arr}")
print(f"Condition (array > 30): {condition}")
print(f"x array (selected when True): {x}")
print(f"y array (selected when False): {y}")
print(f"Result of np.where: {result}")

Original Array: [10 20 30 40 50]
Condition (array > 30): [False False False  True  True]
x array (selected when True): [-1 -2 -3 -4 -5]
y array (selected when False): [1 2 3 4 5]
Result of np.where: [ 1  2  3 -4 -5]


The function `np.unique()` finds unique elements of an array. If we pass `return_counts=True`, the function will also return the count of each unique element (how many times it appeared in the original array). We can also take unique along a given axis.

In [62]:
arr = np.array([1, 2, 2, 3, 1, 5, 6, 5])
unique_elements = np.unique(arr)

print(f"Original Array: {arr}")
print(f"Unique Elements: {unique_elements}")

# With counting
unique_elements, counts = np.unique(arr, return_counts=True)
print(f"Unique Elements: {unique_elements}")
print(f"Counts: {counts}")

Original Array: [1 2 2 3 1 5 6 5]
Unique Elements: [1 2 3 5 6]
Unique Elements: [1 2 3 5 6]
Counts: [2 2 1 2 1]


In [63]:
# Unique values along an axis (unique rows in this case)
arr = np.array([[1, 0],
                [3, 1],
                [1, 1],
                [1, 0],
                [6, 2],
                [3, 1],
                [1, 3],
                [1, 0]])

unique_elements = np.unique(arr, axis=0)
print(f"Original Array:\n{arr}")
print(f"Unique Elements:\n{unique_elements}")

Original Array:
[[1 0]
 [3 1]
 [1 1]
 [1 0]
 [6 2]
 [3 1]
 [1 3]
 [1 0]]
Unique Elements:
[[1 0]
 [1 1]
 [1 3]
 [3 1]
 [6 2]]


There are two main sorting functions provided by NumPy:
- `np.sort()` - sorts elements of an array.
- `np.argsort()` - returns indices of sorted elements.

In [64]:
array = np.array([3, 1, 2, 5, 4])

sorted_array = np.sort(array)
sorted_indices = np.argsort(array)

print(f"Original Array: {array}")
print(f"Sorted Array: {sorted_array}")
print(f"Indices of Sorted Array: {sorted_indices}")

Original Array: [3 1 2 5 4]
Sorted Array: [1 2 3 4 5]
Indices of Sorted Array: [1 2 0 4 3]


Also these functions supports the `axis=` argument.

In [65]:
array_2d = np.array([[3, 1, 4],
                     [1, 5, 9],
                     [2, 6, 5]])

# Sort along axis 0 (columns)
sorted_axis0 = np.sort(array_2d, axis=0)
sorted_indices_axis0 = np.argsort(array_2d, axis=0)

# Sort along axis 1 (rows)
sorted_axis1 = np.sort(array_2d, axis=1)
sorted_indices_axis1 = np.argsort(array_2d, axis=1)

print("Original 2D Array:")
print(array_2d)
print()

print("Sorted along Axis 0 (columns):")
print("Sorted Array:")
print(sorted_axis0)
print("Indices of Sorted Array:")
print(sorted_indices_axis0)
print()

print("Sorted along Axis 1 (rows):")
print("Sorted Array:")
print(sorted_axis1)
print("Indices of Sorted Array:")
print(sorted_indices_axis1)

Original 2D Array:
[[3 1 4]
 [1 5 9]
 [2 6 5]]

Sorted along Axis 0 (columns):
Sorted Array:
[[1 1 4]
 [2 5 5]
 [3 6 9]]
Indices of Sorted Array:
[[1 0 0]
 [2 1 2]
 [0 2 1]]

Sorted along Axis 1 (rows):
Sorted Array:
[[1 3 4]
 [1 5 9]
 [2 5 6]]
Indices of Sorted Array:
[[1 0 2]
 [0 1 2]
 [0 2 1]]


## Exercise

You are given two files, `data_dates.npy` and `measurements.npy`. Both files contain NumPy arrays. The file `data_dates.npy` contains an array of shape `(367,)` containing date strings. The file `measurements.npy` contains an array of shape `(367, 2)` where the first and second column correspond to the average temperature and amount of precipitation for a given day. The measurements are taken from [Seklima](https://seklima.met.no/observations/) and are from Florida, Bergen.

In [66]:
dates = np.load("data_dates.npy", allow_pickle=True)
measurements = np.load("measurements.npy")

print("First 10 rows of data:")
print("Date\t\tTemperature\tPrecipitation")
for date, temp, prec in zip(dates[:10], measurements[:10, 0], measurements[:10, 1]):
    print(f"{date}\t{temp}\t\t{prec}")

First 10 rows of data:
Date		Temperature	Precipitation
14.06.2023	19.4		0.0
15.06.2023	20.2		0.0
16.06.2023	18.5		0.0
17.06.2023	17.4		0.0
18.06.2023	18.9		0.0
19.06.2023	19.9		0.0
20.06.2023	20.1		0.3
21.06.2023	17.9		0.2
22.06.2023	14.7		7.0
23.06.2023	14.7		1.6


### Task 1

Using `np.argsort()` find and print the 10 dates with the highest and lowest average temperatures in the dataset.

Print the dates together with the temperature on that day. The output should be something like this:

```
Top 10 dates with the highest temperatures
14.06.2023 Temperature: 19.4
21.05.2024 Temperature: 19.4
08.09.2023 Temperature: 19.5
...

Top 10 dates with the lowest temperatures
06.01.2024 Temperature: -7.5
05.01.2024 Temperature: -6.9
09.02.2024 Temperature: -5.2
...
```

**Hint:** You can extract the temperatures from `measurements` by indexing the first column with `measurements[:, 0]`. This will give you a 1-dimensional array.

In [67]:
# Your code here
sorted_idxs = ...
highest_idxs = ...
lowest_idxs = ...

# Solution
sorted_idxs = np.argsort(measurements[:, 0])
highest_idxs = sorted_idxs[-10:]
lowest_idxs = sorted_idxs[:10]

print("Top 10 dates with the highest temperatures")
for idx in highest_idxs:
    print(f"{dates[idx]} Temperature: {measurements[idx, 0]}")


print("\nTop 10 dates with the lowest temperatures")
for idx in lowest_idxs:
    print(f"{dates[idx]} Temperature: {measurements[idx, 0]}")

Top 10 dates with the highest temperatures
14.06.2023 Temperature: 19.4
21.05.2024 Temperature: 19.4
08.09.2023 Temperature: 19.5
19.06.2023 Temperature: 19.9
20.06.2023 Temperature: 20.1
15.06.2023 Temperature: 20.2
14.05.2024 Temperature: 20.3
22.05.2024 Temperature: 20.9
09.07.2023 Temperature: 21.4
23.05.2024 Temperature: 21.4

Top 10 dates with the lowest temperatures
06.01.2024 Temperature: -7.5
05.01.2024 Temperature: -6.9
09.02.2024 Temperature: -5.2
17.01.2024 Temperature: -5.1
07.01.2024 Temperature: -4.8
04.01.2024 Temperature: -4.6
01.12.2023 Temperature: -4.3
23.12.2023 Temperature: -3.8
08.02.2024 Temperature: -3.3
28.11.2023 Temperature: -3.3


### Task 2

Use `np.argmin()` and `np.argmax()` to find the two days with the lowest and highest temperatures.

Does this agree with your results from the previous task? What happens when there is a tie (as in this case for `argmax`)?

In [68]:
# Your code here
temperatures = ...
max_idx = ...
min_idx = ...

# Solution
temperatures = measurements[:, 0]
max_idx = np.argmax(temperatures)
min_idx = np.argmin(temperatures)

print(f"Lowest temperature was {temperatures[min_idx]} on {dates[min_idx]}")
print(f"Highest temperature was {temperatures[max_idx]} on {dates[max_idx]}")

Lowest temperature was -7.5 on 06.01.2024
Highest temperature was 21.4 on 09.07.2023


### Task 3

Compute the mean temperature and precipitation for all dates.

**Hint:** Use `np.mean()` on `measurements` together with the `axis=` argument to compute both means at the same time. Which axis should we compute the mean along?

In [69]:
# Your code here
means = ...

# Solution
means = np.mean(measurements, axis=0)
print(f"Mean temperature: {means[0]:.2f}, mean precipitation: {means[1]:.2f}")

Mean temperature: 8.67, mean precipitation: 6.59


### Task 4

Create an array `labels` using `np.where()` on the precipitation measurements such that it has value `0` if the precipitation is zero and `1` if precipitation that day was greater than zero.

Then use `np.stack()` to stack the precipitation measurements and the labels and print the first 20 rows.

At last, print the sum of all values in `labels`. What does this number mean? Can you find other ways to compute the same number using NumPy?

The output should be:
```
[[ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.3  1. ]
 [ 0.2  1. ]
 [ 7.   1. ]
 [ 1.6  1. ]
 [ 0.3  1. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [47.1  1. ]
 [ 0.   0. ]
 [10.5  1. ]
 [ 0.1  1. ]
 [ 3.   1. ]
 [ 8.7  1. ]
 [ 2.6  1. ]]
238
```

In [70]:
# Your code here
precipitation = ...
labels = ...
stacked = ...

# Solution
precipitation = measurements[:, 1]
labels = np.where(precipitation > 0, 1, 0)
stacked = np.stack((precipitation, labels), axis=-1)
print(stacked[:20])
print(labels.sum()) # another way to count rainy days: (precipitation > 0).sum()

[[ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [ 0.3  1. ]
 [ 0.2  1. ]
 [ 7.   1. ]
 [ 1.6  1. ]
 [ 0.3  1. ]
 [ 0.   0. ]
 [ 0.   0. ]
 [47.1  1. ]
 [ 0.   0. ]
 [10.5  1. ]
 [ 0.1  1. ]
 [ 3.   1. ]
 [ 8.7  1. ]
 [ 2.6  1. ]]
238
