# Numpy introduction

## Numpy arrays

There are multiple ways of creating numpy arrays:
- from a list
- from scratch

In [2]:
import numpy as np

mylist = [1,2,3,4]
np_list = np.array(mylist)
alt_np_list = np.array([2,3,4,5])

print(mylist)
print(np_list)
print(alt_np_list)


[1, 2, 3, 4]
[1 2 3 4]
[2 3 4 5]


In NumPy arrays, all data must be of the same type. If we try to mix types, some will get upcast (for example, integer get upcast to floats):

In [3]:
import numpy as np

mylist = [1.1,2,3,4]
np_list = np.array(mylist)

print(mylist)
print(np_list)


[1.1, 2, 3, 4]
[1.1 2.  3.  4. ]


It is also possible to explicitly declarate the type:

In [4]:
typeList = np.array([1, 2, 3, 4], dtype="int8") #note that these are without sign
print(typeList)

[1 2 3 4]


We can create multidimensional arrays (and note the list comprehension syntax):

In [5]:
np_list = np.array([i for i in [2, 3, 6]])
print(np_list)

[2 3 6]


In [6]:
#A list of zeros and ones

zeros = np.zeros(10, dtype="int8")
ones = np.ones((2, 3, 6), dtype="int8")
print(zeros)
print(ones)

[0 0 0 0 0 0 0 0 0 0]
[[[1 1 1 1 1 1]
  [1 1 1 1 1 1]
  [1 1 1 1 1 1]]

 [[1 1 1 1 1 1]
  [1 1 1 1 1 1]
  [1 1 1 1 1 1]]]


In [7]:
#a list full of a single value
mylist = np.full((2, 3, 3), 3)
print(mylist)

[[[3 3 3]
  [3 3 3]
  [3 3 3]]

 [[3 3 3]
  [3 3 3]
  [3 3 3]]]


In [8]:
#this works exactly like python
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [9]:
#this splits the range into even parts
mylist = np.linspace(0, 1, 5)
print(mylist)

[0.   0.25 0.5  0.75 1.  ]


NumPy supports also various kind of randomness. For example, you can choose numbers according to certain distributions.

In [10]:
mylist = np.random.random(3)
print(mylist)

[0.61614433 0.25635162 0.29913193]


In [11]:
mylist = np.random.normal(0, 2, (2, 2, 2))
print(mylist)

[[[ 4.39307567  1.53667313]
  [-0.17654921  2.74499171]]

 [[-1.72326525 -5.2761079 ]
  [ 1.52376932  1.68048304]]]


In [12]:
mylist = np.random.randint(0, 10, 5)
print(mylist)

[4 9 1 8 0]


NumPy has also some shortcuts to create notable matrices.

In [13]:
#Identity
mylist = np.eye(5)
print(mylist)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [14]:
#Undeclared matrix: values are whatever is found in memory
mylist = np.empty(3)
print(mylist)

[0.61614433 0.25635162 0.29913193]


Each NumPy array has 4 important values:
* number of dimensions (`ndim`)
* elements in each dimension (`shape`)
* total number of entries (`size`)
* type of the array (`dytpe`)

In [15]:
img = np.random.randint(0, 10, (3,5,3))

print(img)
print("array dimensions: ")
print(img.ndim)
print(img.shape)
print(img.size)
print(img.dtype)

[[[3 5 9]
  [6 6 7]
  [8 7 4]
  [5 6 8]
  [3 2 6]]

 [[2 6 8]
  [6 5 8]
  [2 2 0]
  [3 2 7]
  [7 7 2]]

 [[3 2 5]
  [9 8 5]
  [8 4 2]
  [7 7 4]
  [9 9 6]]]
array dimensions: 
3
(3, 5, 3)
45
int64


Indexing is slightly different from standard Python (but is still supports negative indices):

In [16]:
img[1, 2, 2]

0

Every time an array is printed, or assigned to a new variable, we are actually printing a **view** of the array.  
The original array is not changed, but neither a copy is created.  
Every change in the new variable is also applied to the original array.  
An array can be copied with the `copy()` method.

In [17]:
#some practice with subindexing

mylist = np.random.randint(0, 10, (3, 3))
subarray = mylist[:1, 1:2]
print(mylist)
print(subarray)

[[6 8 3]
 [9 4 3]
 [5 3 9]]
[[8]]


In [18]:
col = mylist[:, 2]  #this is an entire column, the third
print(col)
row = mylist[0, :], #this is an entire row, the first
print(row)

[3 3 9]
(array([6, 8, 3]),)


In [19]:
col.ndim

1

It is possible to change the distribution of elements in an array through **reshaping**:

In [26]:
a = np.arange(27)
a.reshape(1, 3, 9)

array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
        [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23, 24, 25, 26]]])

In [31]:
a.reshape(9, 3, 1)
print(a) #note how reshape does not affect the original tensor
print(a.reshape(27))
print(a.reshape(1, 1, 1, 27)) #difference: vector vs 4-d tensor

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26]
[[[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
    22 23 24 25 26]]]]


In [33]:
a = a.reshape(1, 3, 9)
print(a.shape[0])
print(a.shape[1])
print(a.shape[2])

1
3
9


To expand a tensor in a new dimension, there is the dedicated operation `np.newaxis`:

In [36]:
a = a.reshape(27)
a = a[np.newaxis, :]
print(a)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  24 25 26]]


In [39]:
a = a.reshape(1, 3, 9)
print(a)
a = a[:, :, :, np.newaxis]
print(a)

[[[ 0  1  2  3  4  5  6  7  8]
  [ 9 10 11 12 13 14 15 16 17]
  [18 19 20 21 22 23 24 25 26]]]
[[[[ 0]
   [ 1]
   [ 2]
   [ 3]
   [ 4]
   [ 5]
   [ 6]
   [ 7]
   [ 8]]

  [[ 9]
   [10]
   [11]
   [12]
   [13]
   [14]
   [15]
   [16]
   [17]]

  [[18]
   [19]
   [20]
   [21]
   [22]
   [23]
   [24]
   [25]
   [26]]]]


## Operations on multiple tensors

In [45]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.concatenate([a, b])
print(c)
d = np.vstack([a, b])       #this requires the two arrays to have the same number of columns
print(d)
e = np.hstack([a, b])
print(e)

[1 2 3 4 5 6]
[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]


In [50]:
a = a[:, np.newaxis]
print(a)
b = b[:, np.newaxis]
np.hstack([a, b])

[[1]
 [2]
 [3]]


array([[1, 4],
       [2, 5],
       [3, 6]])

In [63]:
a = np.random.randint(0, 10, (3, 3))
print(a)
b = np.random.randint(0, 10, (3, 3))
print(b)
c = np.dstack([a, b])
print(c)
print(c.shape)    #this allows stacking in a third dimension

[[2 2 2]
 [7 5 6]
 [8 5 3]]
[[1 9 7]
 [9 0 9]
 [0 9 5]]
[[[2 1]
  [2 9]
  [2 7]]

 [[7 9]
  [5 0]
  [6 9]]

 [[8 0]
  [5 9]
  [3 5]]]
(3, 3, 2)


In [66]:
#Multiple splitting

a = np.arange(16).reshape((4, 4))
upper, intermediate, lower = np.hsplit(a, [1, 3])
print(a)
print(upper)
print(intermediate)
print(lower)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
[[ 0]
 [ 4]
 [ 8]
 [12]]
[[ 1  2]
 [ 5  6]
 [ 9 10]
 [13 14]]
[[ 3]
 [ 7]
 [11]
 [15]]


## UFunc

A way of speeding up vectorized operations. It exploits parallelization of the code. They appear as normal, simple operations, but actually they are wrappers for fast functions implemented by NumPy.

In [78]:
np.random.seed(0)  #at every execution of the program, this returns always the same random numbers

import time

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0/values[i]
    return output

values = np.random.randint(1, 100, size=10000000)
start_time = time.time()
out = compute_reciprocals(values)
end_time = time.time()
print(out)
print("Regular Python time:")
print(end_time - start_time)

start_time = time.time()
out = 1.0/values
end_time = time.time()
print(out)
print("UFunc time:")
print(end_time - start_time)

[0.02222222 0.02083333 0.01538462 ... 0.04166667 0.02857143 0.02083333]
Regular Python time:
21.27959632873535
[0.02222222 0.02083333 0.01538462 ... 0.04166667 0.02857143 0.02083333]
UFunc time:
0.037723541259765625
