# 5 - Numpy

**Summary**

> *   What is a numpy array (`array`)
>> * Construction (`zeros`, `ones`, `arange`, `linspace`)
>> * Array shape (`shape`)
>> * Indexing and slicing
> * Operations with arrays
>> * Arithmentic operations (`dot`)
>> * Universal functions (`sin`, `exp`, `sqrt`)
>> * Operations among the array elements (`sum`, `min`, `mean`, `std`)
> * Manipulating arrays
>> * Changing the shape of an array (`reshape`)
>> * Append (`append`)
>> * Stacking together different arrays (`hstack`, `vstack`, `empty`)
> * Random functions
>> * Generating random data (`random.rand`, `random.randint`, `random.normal`, `random.poisson`)
>> * Sampling (`random.shuffle`, `random.choice`)
> * Why numpy instead of Python lists?







For a more detailed tutorial see: https://numpy.org/devdocs/user/quickstart.html

Official reference: https://docs.scipy.org/doc/numpy-1.16.0/reference/index.html

## What is a numpy array

Numpy is the core module for scientific computing. It manages **homogeneous multidimensional array** (homogeneous means that one array must contain only one type of variable, differetly from Python lists). Basically numpy arrays can do super fancy mathematical things with vector, matrices, and, in general, multi-dimensional objects.

The module must be imported with the following command:


In [0]:
import numpy as np  
# the "as" statement specifies the local label referring to numpy. 
# The usual convention is to use "np", but you can choose whatever name you want..

### Construction

Numpy arrays can be created from lists as follow:


In [2]:
a1 = np.array([1,2,3,4,5])  # 1-d array from 1-d list
print(a1)

a2 = np.array([[1,2],[3,4]])  # 2-d array from a nested list
print(a2)

[1 2 3 4 5]
[[1 2]
 [3 4]]


There exists several constructors for every need:

In [3]:
a = np.zeros((2,5))  # Array of zeros. The argument specifies the size of each dimension. 
print("zeors\n", a)

a = np.ones((4,3))  # Array of ones.
print("ones\n", a)

a = np.arange(2, 10, 2)  # Sequence of numbers (start included, stop excluded, step).
print("arange\n", a)

a = np.linspace(1, 30, 6)  # Sequence of numbers evenly spaced (start included, stop included, number of points).
print("linspace\n", a)

zeors
 [[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
ones
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
arange
 [2 4 6 8]
linspace
 [ 1.   6.8 12.6 18.4 24.2 30. ]


### Array shape

The array dimensions can be obtained with the `shape` property. In NumPy dimensions are called *axes*.

In [4]:
a = np.zeros(5)  # 1-d vector
print(a.shape)

a = np.zeros((3,2,4))  # 3-d vector
print(a.shape)
print(len(a))  # The len function reads only the number of elements in the first dimension/axis

(5,)
(3, 2, 4)
3


### Indexing and slicing

An array can be red through indexing and slicing almost as a list

In [5]:
m = np.array([[1,2,3],[4,5,6],[7,8,9]])
print("Complete matrix\n", m)

print("Getting one row\n", m[2])

print("Getting one element\n", m[2][2], m[2,2])  # Numpy can read the next dimension after a comma (in lists this is not possible)

print("Slicing a sub-matrix\n", m[1:,1:])  # One can slice different dimensions at the same time

print("Getting a column\n", m[:,0])  # In this way you can select one column

Complete matrix
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Getting one row
 [7 8 9]
Getting one element
 9 9
Slicing a sub-matrix
 [[5 6]
 [8 9]]
Getting a column
 [1 4 7]


A useful feature of numpy arrays are *boolean masks*, that are arrays of booleans of the same shape of a given array, say `a`. If `a` is indexed through the mask, only the elements corresponding to the position of the `True` elements are returned (in a new array).

In [6]:
a = np.arange(10)
print("Original array\n",a)

mask = a > 5
print("Mask for the elements greater than 5\n",mask)

print("Masked array\n",a[mask])

Original array
 [0 1 2 3 4 5 6 7 8 9]
Mask for the elements greater than 5
 [False False False False False False  True  True  True  True]
Masked array
 [6 7 8 9]


Boolean masks are extremely useful to avoid loops (in which a given property is checked for all the array elements) and save computational time. In this respect, see le last part of this tutorial.

## Operations with arrays

### Arithmentic operations

Arithmetic operators on arrays apply elementwise.

In [7]:
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
print("a: ", a, "b: ", b)
print()

print("a - b:",  a-b)

print("a * b:",  a*b)  # Elementwise, not a scalar product! 

print("b ^ 2: ", b**2)

print("is a < 35? ", a<35)  # Also boolean operations cen be applied elementwise

a:  [20 30 40 50] b:  [0 1 2 3]

a - b: [20 29 38 47]
a * b: [  0  30  80 150]
b ^ 2:  [0 1 4 9]
is a < 35?  [ True  True False False]


Matrix product using the method `dot()` or the symbol `@`

In [8]:
print(a.dot(b), a @ b)  # Scalar product. The two commands are equivalent


m1 = np.array([[1,2],[3,4]])

m2 = np.array([[1,0],[1,1]])

print(m1.dot(m2))  # Matrix product

260 260
[[3 2]
 [7 4]]


### Universal functions

Numpy contains several mathematical functions called *universal functions*. You can explore the numpy documentation linked above to see all of them. Again, they are applied element-wise.

In [9]:
print("sin(a):", np.sin(b))

print("exp(a):", np.exp(b))

print("square-root(a):", np.sqrt(b))

sin(a): [0.         0.84147098 0.90929743 0.14112001]
exp(a): [ 1.          2.71828183  7.3890561  20.08553692]
square-root(a): [0.         1.         1.41421356 1.73205081]


### Operations among the array elements

Some exmples of operation involving the array elements. See the documentation for more details and other examples.

In [10]:
print(a.sum())  # Sum of all the array elements

print(a.min())  # Minimum

print(a.mean())  # Average value

print(a.std())  # Standard deviation

140
20
35.0
11.180339887498949


The argument axis in the numpy functions specifies the dimension at which the operation is applied.

In [11]:
print(m)

print(m.sum(axis=0))  # Summing the elements in each column (across the first dimension)

print(m.sum(axis=1))  # Summing the elements in each row

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[12 15 18]
[ 6 15 24]



## Manipulating arrays

Here we show some function to modify arrays and their shape. As for all the other examples in this tutorial, here we show just a small fraction of the wide range of possibilities that Numpy proposes.

### Changing the shape of an array

`reshape` function

In [12]:
a = np.arange(12)  # Generating a 1-d array with 12 elements
print("Original array:\n", a)

print("Reshaped as a matrix:\n", a.reshape((3,4)))  # Be careful, the number of elements must be the same! You cannot reshape it in a 4x4 matrix

print("Reshaped as a 3-d array:\n", a.reshape((3,2,2)))

Original array:
 [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped as a matrix:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped as a 3-d array:
 [[[ 0  1]
  [ 2  3]]

 [[ 4  5]
  [ 6  7]]

 [[ 8  9]
  [10 11]]]


Transposing

In [13]:
m = a.reshape((4,3))
print("A matrix\n", m)
print("and its transpose\n", m.T)

A matrix
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
and its transpose
 [[ 0  3  6  9]
 [ 1  4  7 10]
 [ 2  5  8 11]]


### Append

The function `append` of numpy is slightly different from the `append` for lists.

In [14]:
a = np.array([])  # An empty array
for i in range(10):
  a = np.append(a, i**2)  # Appending a single element at the array end
 
print(a)


a = np.array([])
for i in range(10):
  a = np.append(a, [1,0])  # Appending another array/list
 
print(a)


[ 0.  1.  4.  9. 16. 25. 36. 49. 64. 81.]
[1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0.]


### Stacking together different arrays

In [15]:
a = np.zeros((2,2))
b = np.ones((2,2))
print ("a:\n", a, "\nb\n", b)

print("vertical stack\n", np.vstack((a,b)))
print("horizontal stack\n", np.hstack((a,b)))

a:
 [[0. 0.]
 [0. 0.]] 
b
 [[1. 1.]
 [1. 1.]]
vertical stack
 [[0. 0.]
 [0. 0.]
 [1. 1.]
 [1. 1.]]
horizontal stack
 [[0. 0. 1. 1.]
 [0. 0. 1. 1.]]


To dynamically generate a matrix row by row use vstack. Note that you need to initialize an empty array with the correct dimension. 

In [16]:
a = np.empty((0,2))  # To initialize an array 
for i in range(5):
  a = np.vstack((a, [1,0]))
print (a)

[[1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]]


## Random functions

All the random functions in Numpy are contained in the sub-module `random`.

See: https://docs.scipy.org/doc/numpy-1.16.0/reference/routines.random.html

### Generating random data

In [17]:
print(np.random.rand()) # A single rand number in [0,1)

r = np.random.rand(3,4)  # Generating random numbers uniformely in [0,1) with a given shape
print(r)

# Generating random integers. Arguments: start included, stop excluded, size.
r = np.random.randint(2,5,(3,4))  
print(r)

0.8249116739973619
[[0.98613611 0.75341721 0.08745534 0.15652279]
 [0.17693284 0.61881836 0.75192325 0.75986122]
 [0.31065762 0.37385463 0.486645   0.24158191]]
[[3 3 2 2]
 [3 2 2 3]
 [3 3 4 2]]


Data can be generated also from specific distributions (here few examples)

In [18]:
# From normal distribution. Args: mean, standard dev, size.
r = np.random.normal(1,2,(3,4))
print(r)

# From poisson distribution. Args: mean, size.
r = np.random.poisson(2,(3,4))
print(r)

[[-0.75906746  2.0912223   1.78758564 -1.5410368 ]
 [ 0.16005903  5.02498657 -1.02545321  0.21716835]
 [-0.18730408  0.47196262 -0.95146977 -1.30801109]]
[[3 0 3 2]
 [0 2 2 3]
 [2 5 3 1]]


### Sampling

Shuffling an array. 

Equivalent to a sampling without replacement: at each time an array element is randomly exctracted, put in a new array, and removed from the original array

In [19]:
a = [1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
print(a)

np.random.shuffle(a)  # The method modifies the array in-place

print(a)

[1, 1, 1, 1, 2, 2, 2, 3, 3, 4]
[1, 1, 2, 4, 3, 3, 1, 2, 1, 2]


Sampling without replacement: at each time an array element is randomly exctracted and put in a new array

In [20]:
new_a = np.random.choice(a, 15)  # Note that here the new array can be larger than the original
print (new_a)

[3 1 3 4 1 2 3 2 2 1 1 3 1 1 2]


## Why numpy instead of Python lists?

The obvious fact is that in Numpy there are several built-in functions for whatever necessity of a scientist.

Moreover, using Numpy there is a relevant increase in the performance of your algorithm, especially when one has to deal with a large amount of numbers (https://stackoverflow.com/questions/993984/what-are-the-advantages-of-numpy-over-regular-python-lists):

* Numpy arrays consume less memory than lists when stored.

* Using the bulit-in functions speeds up your program a lot (see below).


In [0]:
def sum_without_numpy(a_list):
  counter = 0
  for element in a_list:
    counter += element
  return counter

def sum_with_numpy(a_list):
  return np.sum(a_list)

In [22]:
import time

a_big_list = np.random.rand(1000000)

t0 = time.time()
result = sum_without_numpy(a_big_list)
python_time = time.time()-t0
print("Result ", result, " obtained in ", python_time)

t0 = time.time()
result = sum_with_numpy(a_big_list)
numpy_time = time.time()-t0
print("Result with numpy", result, " obtained in ", numpy_time)

print("Numpy is ", python_time / numpy_time, " times faster!")

Result  500243.7971134517  obtained in  0.14604878425598145
Result with numpy 500243.79711347405  obtained in  0.00092315673828125
Numpy is  158.2058367768595  times faster!


In general, iterations slow down your code. A useful trick to avoid them is to use boolean masks. In this example the task is to select the number larger than 0.5 and compute the average of this items. This can be done by iterating over the list elements or using the mask.

In [24]:
t0 = time.time()
count = 0
summation = 0
for elem in a_big_list:
  if elem > 0.5:
    count += 1
    summation += elem
python_time = time.time()-t0
print("Average of elements larger than 0.5 ", summation/count, " obtained in ", python_time)

t0 = time.time()
mask = a_big_list > 0.5
result = a_big_list[mask].mean()
numpy_time = time.time()-t0
print("Elements larger than 0.5 with boolean masks", result, " obtained in ", numpy_time)

print("Numpy is ", python_time / numpy_time, " times faster!")

Average of elements larger than 0.5  0.7500944533970322  obtained in  0.26610755920410156
Elements larger than 0.5 with boolean masks 0.7500944533970565  obtained in  0.012341737747192383
Numpy is  21.56159567275186  times faster!
