# Numpy Basics.

The purpose of this tutorial is to understand what numpy is, how it differs from vanilla python coding, and how to fully take advantage of numpy to accelerate your code to the max!


# Contents

1. Numpy vs python, what makes them different?
2. The basics of numpy arrays, indexing, assignments and operations
3. Typecasting and views
4. Useful methods, size, shape, max/min
5. Broadcasting
6. All about boolean indexing
7. Extra numpy functions (.nan)
8. Loading and saving numpy data
9. Other cool numpy features




In [3]:
# imports
import numpy as np

#

#

#

# Numpy VS Python

How data is managed through the python API, lets see an example


In [2]:
# Lists can store any type of object
a = [1, 2.0, 3, 4.0, 5]

for i in range(5):  # check bounds of loop, i < 5, iterate over
    
    a[i] += 1       # check get a[i]
                    # check type of a[i]
                    # check if i index is within bounds of list a
                    # check if index i is negative (for wrapping purposes)
                    # check type of (1)
                    # add 1 to a[i]

print(a)

[2, 3.0, 4, 5.0, 6]


In [4]:
a = np.array([1, 2, 3, 4, 5], dtype = float)

a += 1              # check type of array
                    # add 1 to each element in C

print(a)

[2. 3. 4. 5. 6.]


In [5]:
# If you wanted to do indexing
a[[1,3]] += 10      # check type of array
                    # check index is within bounds (done through C)
                    # add 10 to element 1 and 3

print(a)

[ 2. 13.  4. 15.  6.]


# The Numpy array
* Indexing

In [9]:
# we can make a simple 1D numpy array like so
a = np.array([1,2,3,4,5], dtype = float)

# just like python lists, numpy arrays can be indexed
print(a[1])        # gets the first element

# we can retrieve multiple elements in a numpy array similar to lists using slicing
print(a[1:4])      # gets all elements between 1 and 4, not including 4

# we can use negative indexes
print(a[-1])       # gets last element

# unlike python lists, we can also retrieve multiple elements using lists
print(a[[0,1,3,4]])  # gets first two and last two elements 


2.0
[2. 3. 4.]
5.0
[1. 2. 4. 5.]


array([1., 2., 3., 4., 5.])

In [11]:
# Numpy arrays are ndarray objects, which means N-dimensional arrays, they are really designed to work with multidimensional data
print(type(a))

<class 'numpy.ndarray'>


In [16]:
# lets create a new numpy array, with 2 dimensions
a = np.array([[1,2,3], [4,5,6], [7,8,9]], dtype = float)
b = [[1,2,3], [4,5,6], [7,8,9]]
print(a)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


In [19]:
# indexing N-D numpy arrays are similar to python nested lists, but with a slight difference

# python lists
print(b[0][1])

# numpy array
print(a[0, 1])


2
2.0


In [23]:
# otherways of indexing
# i could get all the first elements of each row
print(a[:,0])

# reverse them
print(a[::-1,0])

# same for colums
print(a[0,::-1])

[1. 4. 7.]
[7. 4. 1.]
[3. 2. 1.]


* Operators and Assignment

In [24]:
# basic python operators in Numpy are applied to all elements
a += 1      # same for -=, /=, *= and %=
print(a)

[[ 2.  3.  4.]
 [ 5.  6.  7.]
 [ 8.  9. 10.]]


In [25]:
# I can specify what elements I want to operate on
a[:,0] += 10.0

print(a)

[[12.  3.  4.]
 [15.  6.  7.]
 [18.  9. 10.]]


In [26]:
# I can do alot of things

# reverse 2nd row of elements
a[1, :] = a[1, ::-1]        # another way to say this is a[1] = a[1, ::-1], unspecified dims are defaulted to : (all)
print(a)

[[12.  3.  4.]
 [ 7.  6. 15.]
 [18.  9. 10.]]


In [27]:
# transpose a set of 4 elements 
a[0:2, 0:2] = a[0:2, 0:2].T 

print(a)

[[12.  7.  4.]
 [ 3.  6. 15.]
 [18.  9. 10.]]


* Views 

Something interesting has happened. The changes to array [b] have affected the data in array [a]. Why is this?

In [30]:
# lests reset our array
a = np.array([[1,2,3], [4,5,6], [7,8,9]], dtype = float)

# set a new array b to the 2nd row
b = a[1]

# multiply each element in b by 2
b *= 2

# print out both a and b
print(a)
print(b)

[[ 1.  2.  3.]
 [ 8. 10. 12.]
 [ 7.  8.  9.]]
[ 8. 10. 12.]


We can by pass this using the .copy() method

In [31]:
# make a copy of a
b = a[1].copy()

# multiply by 2
b *= 2

print(a)
print(b)



[[ 1.  2.  3.]
 [ 8. 10. 12.]
 [ 7.  8.  9.]]
[16. 20. 24.]


* Typecasting

In [32]:
# another useful feature in Numpy is typecasting, in python we can change the type of a varaible, within reason

# say i have a string of a number
c = "64.5"

# I can cast it as a float
print(float(c))

64.5


In [36]:
# same with numpy arrays, but slightly different as it changes the type of the array.
c = a.astype(int)

print(c)

[[ 1  2  3]
 [ 8 10 12]
 [ 7  8  9]]


In [37]:
# alot of numpy functions also allow for you to specify the type of the array
a = np.array([[1,2,3], [4,5,6], [7,8,9]], dtype = float)

print(a)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


# Useful Numpy methods, functions and attributes

* size and shape

In [38]:
# You can get the number of elements in an array using .size
print(a.size)

# you can get the shape of an array using .shape
print(a.shape)

# you can get the number of dimensions of an array using .ndim
print(a.ndim)

9
(3, 3)
2


* Flatten


In [39]:
# you can flatten out an array (1-D)
print(a.flatten())

[1. 2. 3. 4. 5. 6. 7. 8. 9.]


* min, max, mean, sum

In [49]:
# find min
print(a.min())      # can also use np.min(a)

# find max
print(a.max())      # can also use np.max(a)

# Many numpy functions/methods have an argument called [axis], this is sed to specify which dimension to apply an operation to
# for example

# find mean of elements in rows
print(a.mean(axis = 1))         # can also use np.mean(a, axis = 1)

# not specifying an axis will treat the array as a 1-D flattened array, so the operation is applied to all elements in array
print(a.mean())

# find sum across elements in columns
print(a.sum(axis = 0))          # can also use np.sum(a, axis = 0)

# not specifying axis
print(a.sum())


1.0
9.0
[2. 5. 8.]
5.0
[12. 15. 18.]
45.0


In [51]:
# we can also find where the maximum/minimum is
print(a.argmax())           # can also use np.argmax(a)
print(a.argmin())           # can also use np.argmin(a)

# we will see MUCH MORE of this later in this session

8
0


* Transpose

In [52]:
# we can also get the transpose of an array using .T
print(a.T)

[[1. 4. 7.]
 [2. 5. 8.]
 [3. 6. 9.]]


* Strides

To understand reshaping, we must understanding array strides.

In [53]:
print(a.shape)

print(a[1,1])

# see workshop slides ...


(3, 3)
5.0


In [55]:
# for a 3-D array
d = np.arange(1, 28).reshape(3,3,3)
print(d)

print(d[2,1,1])

[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]
23


Using strides, our new array shape is (3,3,3). To get 23 which is at index (2,1,1), we...

2 * 3 * 3 + 1*3 + 1

or 

x*N2*(N1) + y*N1 + z

where 

(N2, N1, N0) -> (3,3,3) and (x,y,z) -> (2,1,1)

In [56]:
# using strides, flatten the array so we have a 1-D array in view
d = d.flatten()

print(d[2*3*3 + 1*3 + 1])




23


* reshaping

In [58]:
# reshaping is powerful since we can change the strides

# make our 3x3x3 array
d = np.arange(1, 28).reshape(3,3,3)
print(d)

# reshape it to a 9x3 array
print(d.reshape(9,3))

[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [16 17 18]
 [19 20 21]
 [22 23 24]
 [25 26 27]]


# Lets combine all of this and do an example - Average x number of samples together 

# Example #1
I want to take an array and downsample it by some factor. For example I have an array of 10 elements and I want to downsample by two and return an array with 5 elements.


In [59]:
# The code!
def average(x, N):
    """
    x:      The array of data to average
    N:      Averaging/Downsampling factor
    
    """

    # get shape of array
    shape = x.shape

    # we will reshape our array so that the last dimension is of length N
    new_shape = (shape[0]//N, N)

    # reshape x array
    y = x.reshape(new_shape)

    # take mean along axis 1
    y = np.mean(y, axis = 1)

    # return new array
    return y




In [61]:
a = np.array([1,2,3,4,5,6,7,8,9,10], dtype = float)
print(a)

print(average(a,2))

[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
[1.5 3.5 5.5 7.5 9.5]


In [62]:
a = np.arange(20, dtype = float)

print(a)

print("Averaging 2 elements together")
print(average(a, 2))

print("Averaging 4 elements together")
print(average(a, 4))

print("Averaging 5 elements together")
print(average(a, 5))

print("Averaging 10 elements together")
print(average(a, 10))

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19.]
Averaging 2 elements together
[ 0.5  2.5  4.5  6.5  8.5 10.5 12.5 14.5 16.5 18.5]
Averaging 4 elements together
[ 1.5  5.5  9.5 13.5 17.5]
Averaging 5 elements together
[ 2.  7. 12. 17.]
Averaging 10 elements together
[ 4.5 14.5]


# Problem #1
You have an array, for every 3 elements I want the largest values. I then want you to take the average of those values.

ANSWER: 22470.6036

In [71]:
a = np.load("Shuffled_array.npy")

print(a)

[11493 29617 29552 ... 16484 22613 12072]


In [72]:
# Put code here

# a = a.reshape(a.size//3, 3)

# a_max = np.max(a, axis = 1)

# a_max_mean = np.mean(a_max)

# print(a_max_mean)

22470.6036


# Broadcasting

In [76]:
# we know we can multiply, add, divide etc. numpy arrays
a = np.array([[1,2,3], [4,5,6], [7,8,9]], dtype = float)
print(a)

print(a*2)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[ 2.  4.  6.]
 [ 8. 10. 12.]
 [14. 16. 18.]]


In [77]:
# but, we can also operate on a numpy array, with another numpy array
b = np.array([1,2,3])

print(a * b)        # How does this work? What are the limitations?

[[ 1.  4.  9.]
 [ 4. 10. 18.]
 [ 7. 16. 27.]]


In [80]:
c = np.array([[1,2,3],[4,5,6]])

print(c*b)

[[ 1  4  9]
 [ 4 10 18]]


In [82]:
d = np.arange(1, 28).reshape(3,3,3)

print(d * b)

[[[ 1  4  9]
  [ 4 10 18]
  [ 7 16 27]]

 [[10 22 36]
  [13 28 45]
  [16 34 54]]

 [[19 40 63]
  [22 46 72]
  [25 52 81]]]


Final Note on Broadcasting: You can combine this with reshaping, transposing etc. to manipulate matrices in any way you like. When you get good
enough at it, it becomes nothing more then block puzzle game!

# Boolean Indexing
Indexing with conditions

In [93]:
# if I make an new array with two elements
a = np.array([1,2])

# i can index with true or false statements, lets make a list of True and False statements
bool_list = [True, True]

print(a[bool_list])

[1 2]


In [94]:
# numpy also vectorizes condition statements
print(a > 0)

[ True  True]


In [98]:
# we can then combine these
a = np.arange(27).reshape(3,3,3)

b = a > 10

print(a)
print(b)

print("------------------------------------")
print(a[b])

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
[[[False False False]
  [False False False]
  [False False False]]

 [[False False  True]
  [ True  True  True]
  [ True  True  True]]

 [[ True  True  True]
  [ True  True  True]
  [ True  True  True]]]
------------------------------------
[11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]


In [117]:
# or, even simpler, 
print(a[a>10])

# trick '~'
print(a[~(a>10)])


# Important, bool array must match the array you are indexing

[11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]
[ 0  1  2  3  4  5  6  7  8  9 10]


In [126]:
# you can apply multiple conditions whist indexing, using seperate numpy functions

# for 'and' operator
bool_arr = np.logical_and(a > 10, a < 25)

print(a[bool_arr])

# for 'or' operator
bool_arr = np.logical_or(a % 2 == 0,a % 3 == 0)       # this will give you all elements that return a remainder

print(a[bool_arr])

[11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[ 0  2  3  4  6  8  9 10 12 14 15 16 18 20 21 22 24 26]


* Bitwise operators

https://numpy.org/doc/stable/reference/routines.logic.html

In [136]:
# logical and can be replaced by "&"

print(a[(a > 10) & (a < 25)])

# logical or can be replaced by "|"

print(a[(a > 10) | (a < 25)])

# logical xor (only OR, not AND) can be replaced by "^"

print(a[(a > 10) ^ (a < 25)])

# logical not can be replaced by "~"

print(a[~(a > 10)])

# You probably wont need "<<" or ">>"


[11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26]
[ 0  1  2  3  4  5  6  7  8  9 10 25 26]
[ 0  1  2  3  4  5  6  7  8  9 10]


# Extra numpy functions, .nan

In [139]:
# Sometimes, you might have a numpy array with undefined data that might look something like this
a = np.array([[1,2,3], [4, np.nan, 6]], dtype = float)

print(a)

[[ 1.  2.  3.]
 [ 4. nan  6.]]


In [141]:
# applying the methods we have learnt in this case will result in unwanted behaivor

# take the mean of each row
print(np.mean(a, axis = 1))

[ 2. nan]


In [142]:
# instead, there are built in numpy functions for this purpose
print(np.nanmean(a, axis = 1))      # in this case, all nans are simply ignored.

[2. 5.]


Note: This will be slower in general since numpy has to check each element to make sure it's a nan or not. This can be accelerated somewhat
if it knows ahead of time where the nans are

In [146]:
# lets look for nans
print(~np.isnan(a))

# then we can operate on just those values

mask = ~np.isnan(a)

a[mask] *= 1.0
a[mask] /= 10

print(a)



[[ True  True  True]
 [ True False  True]]
[[0.1 0.2 0.3]
 [0.4 nan 0.6]]


Definitely explore the different nan methods if you come across this issue, no doubt most functions you need will have a nan variant.

# Saving and Loading numpy arrays
Numpy offers a useful method for loading and saving data

In [147]:
# lets make an array
a = np.arange(27).reshape(3,3,3)

print(a)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]


In [148]:
# lets save it using the np.save method

np.save("test.npy", a)

In [149]:
# now lets load the data back in
b = np.load("test.npy")

print(b)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]


In [152]:
# you can save multiple arrays to a single file
a = np.arange(27).reshape(3,3,3)
b = np.arange(1000)
np.random.shuffle(b)
with open("test.npy", "wb") as file:        # make sure to include the 'b' specifier, since this is a binary file!!!
    np.save(file, a)
    np.save(file, b)

print(a)
print(b)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
[363 172 813 249 700 541 815 218 248 653 257 546 217  27 451 846  29 828
 944 322 472 731 829 586 723 386 910 135 761 888 881 584 577  98 594 465
 169 866 599 222 714 137 678 184  90 519 993 402 425 110 582 892 543 632
 156  81 597 855 175 327 788  40 475 966 801 809 560 370 769 614 162 513
 540 240 553 250 270 352 201 221  52  43 756 790 463 738  17 143 514 437
 951 958 442 351 367  11 615 785 378 239 736 565 912 224  26 920 947 179
 656 237 841 627 262 195 754 848 493 633 891 832 890 506  31 119 702 494
 820 533 937 167 310 121 652 375 729 446 100 309 787 994 255 680 647  94
 807 768  14 118 328 234  85 466 432 187 574 269 849 882 856 245 583 444
 365 933 247 388 203 867  86 824 960 905 283 163  38 588 579  73 354 681
 669 682 645 357 171 806 811 961 691 424 391 517 779 977 739 350 625  55
 142 404 659 380 854 450 189 504 190  20 277 778 823 293 333 985 986 667
 

In [153]:
# now load those arrays back in
with open("test.npy", "rb") as file:        # the order at which the arrays were saved is preserved!
    a = np.load(file)
    b = np.load(file)

print(a)
print(b)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
[363 172 813 249 700 541 815 218 248 653 257 546 217  27 451 846  29 828
 944 322 472 731 829 586 723 386 910 135 761 888 881 584 577  98 594 465
 169 866 599 222 714 137 678 184  90 519 993 402 425 110 582 892 543 632
 156  81 597 855 175 327 788  40 475 966 801 809 560 370 769 614 162 513
 540 240 553 250 270 352 201 221  52  43 756 790 463 738  17 143 514 437
 951 958 442 351 367  11 615 785 378 239 736 565 912 224  26 920 947 179
 656 237 841 627 262 195 754 848 493 633 891 832 890 506  31 119 702 494
 820 533 937 167 310 121 652 375 729 446 100 309 787 994 255 680 647  94
 807 768  14 118 328 234  85 466 432 187 574 269 849 882 856 245 583 444
 365 933 247 388 203 867  86 824 960 905 283 163  38 588 579  73 354 681
 669 682 645 357 171 806 811 961 691 424 391 517 779 977 739 350 625  55
 142 404 659 380 854 450 189 504 190  20 277 778 823 293 333 985 986 667
 

* Memorymaps: In case your are memory limited

In [154]:
# lets say we have a very large file, perhaps 10GB in size
file = "/fred/oz002/tdial/HTR_paper_data/230708/230708_calib_I.npy"     # SIZE -> ~10 GB
# NOTE: you will not have access to this specific file, this is just for demonstrative purposes only, this can be any large file!

# now lets load in using np.load
a = np.load(file, mmap_mode = 'r')      # mmap_mode = 'r' means load in this file in read-only mode



In [156]:
# If I want a specific bit of data, i can index this memory map and do so
# indexing a memory map will copy the data (same with .copy())
print(a[1])     # will only load in that bit of data



[ 0.36547863 -0.08134608 -0.4068808  ...  0.08859127 -0.95454574
 -1.0632687 ]


NOTE: many numpy functions that manipulate array strides, shapes ect. wont work with memory maps, so keep in mind.

# THATS ALL FOR NOW...
Do go through some of the numpy tutorials online as well as there documentation, I cannot go through all of it in 1 hour. But all the basics are here. GOOD LUCK!