Introduction to NumPy
===========


Matrix operations
====================


One of the main advantages of the *ndarray* structure is its matrix processing ability.
Thus, to multiply all the elements of an array by a scalar, just write *a * 5* by
example. To perform any logical or arithmetic operation between arrays, simply write *a <oper> b*:

In [1]:
import numpy as np
a = np.arange(20).reshape(5,4)
b = 2 * np.ones((5,4))    
c = np.arange(12,0,-1).reshape(4,3)
print('a=\n', a )
print('b=\n', b )
print('c=\n', c )

a=
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
b=
 [[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]
c=
 [[12 11 10]
 [ 9  8  7]
 [ 6  5  4]
 [ 3  2  1]]


Matrix multiplication by scalar: *b x 5*
--------------------------------------------------

In [2]:
b5 = 5 * b

print('b5=\n', b5 )

b5=
 [[10. 10. 10. 10.]
 [10. 10. 10. 10.]
 [10. 10. 10. 10.]
 [10. 10. 10. 10.]
 [10. 10. 10. 10.]]


Sum of arrays: *a + b*
---------------------------

In [3]:
amb = a + b

print('amb=\n', amb )

amb=
 [[ 2.  3.  4.  5.]
 [ 6.  7.  8.  9.]
 [10. 11. 12. 13.]
 [14. 15. 16. 17.]
 [18. 19. 20. 21.]]


Transposed of a matrix: *a.T*
-----------------------------------

Transposing a matrix swaps the coordinate axes. The element that
was in position *(r,c)* will now be in position *(c,r)*. The shape
The resulting matrix will therefore have the swapped values. The operation
transposition is done through shallow copying, therefore it is a
very efficient and should be used whenever possible.

See the following example:

In [4]:
at = a.T
print('a.shape=',a.shape )
print('a.T.shape=',a.T.shape )    
print('a=\n', a )
print('at=\n', at )

a.shape= (5, 4)
a.T.shape= (4, 5)
a=
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
at=
 [[ 0  4  8 12 16]
 [ 1  5  9 13 17]
 [ 2  6 10 14 18]
 [ 3  7 11 15 19]]


Matrix multiplication: *a x c*
------------------------------------

Matrix multiplication is done using the *dot* operator.
For multiplication to be possible, it is important that the number of
columns of the first *ndarray* is equal to the number of rows of
second. The dimensions of the result will be the number of lines in the
first *ndarray* by the number of columns in the second *ndarray*. Check out:

In [5]:
ac = a.dot(c)

print('a.shape:',a.shape )
print('c.shape:',c.shape )
print('a=\n',a )
print('c=\n',c )
print('ac=\n', ac )
print('ac.shape:',ac.shape )

a.shape: (5, 4)
c.shape: (4, 3)
a=
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
c=
 [[12 11 10]
 [ 9  8  7]
 [ 6  5  4]
 [ 3  2  1]]
ac=
 [[ 30  24  18]
 [150 128 106]
 [270 232 194]
 [390 336 282]
 [510 440 370]]
ac.shape: (5, 3)


Linspace and Orange
==================


The numpy functions **linspace** and **arange** have the same objective: generating numpy.arrays linearly
spaced in an interval indicated as a parameter.

The primary difference between these functions is how the division will be performed in the specified range.
In the linspace function, this division is done by defining the closed interval [start, end], that is, it contains the
beginning and end, and the amount of
elements that the final numpy.array will have. The step is therefore calculated as (end - start)/(n - 1).
This way, if we want to generate a numpy.array between 0 and 1 with 10 elements, we will use linspace as follows


In [6]:
# generates a numpy.array of 10 elements, linearly spaced between 0 and 1
print(np.linspace(0, 1.0, num=10).round(2) ) 

[0.   0.11 0.22 0.33 0.44 0.56 0.67 0.78 0.89 1.  ]


In the range function, the semi-open interval [start, end) and the step that will be taken between one element and another are defined.
In this way, to generate
a numpy.array between 0 and 1 with 10 elements, we have to calculate the step (0.1) and pass this step as a parameter.

In [7]:
# generates a numpy.array linearly spaced between 0 to 1 with step 0.1

print(np.arange(0, 1.0, 0.1) )


[0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


Confirm that the main difference between the two that can be seen in the examples above is that
in linspace the upper limit of the distribution is inclusive (closed range),
while in arange this does not occur (semi-open range).

Index and meshgrid functions
=================================

The *indices* and *meshgrid* functions are extremely useful in generating synthetic images and learning them also allows
understand the advantages of matrix programming, avoiding the sequential scanning of the image that is very common in programming in the C language.

Index operator in small numeric examples
===========================================================

The *indices* function receives as parameters a tuple with the dimensions (H,W) of the matrices to be created. In the following example, we are
generating matrices of 5 rows and 10 columns. This function returns a tuple of two matrices that can be obtained by doing their assignments
as in the following example where we create the matrices *r* and *c*, both of size (5,10), that is, 5 rows and 10 columns:

In [8]:
r,c = np.indices( (5, 10) )
print('r=\n', r )
print('c=\n', c )

r=
 [[0 0 0 0 0 0 0 0 0 0]
 [1 1 1 1 1 1 1 1 1 1]
 [2 2 2 2 2 2 2 2 2 2]
 [3 3 3 3 3 3 3 3 3 3]
 [4 4 4 4 4 4 4 4 4 4]]
c=
 [[0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]]


Note that the matrix *r* is a matrix where each element is its row coordinate and the matrix *c* is a matrix where each element is
its column coordinate. This way, any matrix operation done with *r* and *c*, in reality you are processing the
matrix coordinates. Thus, it is possible to generate several synthetic images from a function of their coordinates.

As NumPy processes matrices directly, without the need to do an explicit *for*, the program's notation is very simple
and efficiency too. The only drawback is the use of memory to calculate the index matrices *r* and *c*. We will
See later that this can be minimized.

For example, let the function be the sum of its coordinates $f(r,c) = r + c$:

In [9]:
f = r + c
print('f=\n', f )

f=
 [[ 0  1  2  3  4  5  6  7  8  9]
 [ 1  2  3  4  5  6  7  8  9 10]
 [ 2  3  4  5  6  7  8  9 10 11]
 [ 3  4  5  6  7  8  9 10 11 12]
 [ 4  5  6  7  8  9 10 11 12 13]]


Or even the difference function between the row and column coordinates $f(r,c) = r - c$:

In [10]:
f = r - c
print('f=\n', f )

f=
 [[ 0 -1 -2 -3 -4 -5 -6 -7 -8 -9]
 [ 1  0 -1 -2 -3 -4 -5 -6 -7 -8]
 [ 2  1  0 -1 -2 -3 -4 -5 -6 -7]
 [ 3  2  1  0 -1 -2 -3 -4 -5 -6]
 [ 4  3  2  1  0 -1 -2 -3 -4 -5]]


Or even the function $f(r,c) = (r + c) \% 2$ where % is the modulus operator. This function returns 1 if the sum of the coordinates is odd and 0 otherwise.
It's a chessboard-style image of values 0 and 1:

In [11]:
f = (r + c) % 2
print('f=\n', f )

f=
 [[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]]


Or even the straight line function $f(r,c) = (r = \frac{1}{2}c)$:

In [12]:
f = (r == c//2)
print('f=\n', f )

f=
 [[ True  True False False False False False False False False]
 [False False  True  True False False False False False False]
 [False False False False  True  True False False False False]
 [False False False False False False  True  True False False]
 [False False False False False False False False  True  True]]


Or even the parabolic function given by the sum of the square of its coordinates $f(r,c) = r^2 + c^2$:

In [13]:
f = r**2 + c**2
print('f=\n', f )

f=
 [[ 0  1  4  9 16 25 36 49 64 81]
 [ 1  2  5 10 17 26 37 50 65 82]
 [ 4  5  8 13 20 29 40 53 68 85]
 [ 9 10 13 18 25 34 45 58 73 90]
 [16 17 20 25 32 41 52 65 80 97]]


Or even the function of a circle with radius 4, with center at (0,0) $f(r,c) = (r^2 + c^2 < 4^2)$:

In [14]:
f = ((r**2 + c**2) < 4**2)
print('f=\n', f * 1 )

f=
 [[1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 1 0 0 0 0 0 0]
 [1 1 1 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]


In [15]:
a = np.array([[0,1],[2,3]])

print('a = \n', a )
print()

print('np.resize(a,(1,7)) = \n', np.resize(a,(1,7)) )
print()

print('np.resize(a,(2,5)) = \n', np.resize(a,(2,5)) )

a = 
 [[0 1]
 [2 3]]

np.resize(a,(1,7)) = 
 [[0 1 2 3 0 1 2]]

np.resize(a,(2,5)) = 
 [[0 1 2 3 0]
 [1 2 3 0 1]]


Clip
======


The clip function replaces values in an array that are below a minimum threshold or above a maximum threshold,
by these minimum and maximum thresholds, respectively. This function is especially useful in image processing to avoid
that the indexes exceed the limits of the images.

Examples
========

In [16]:
a = np.array([11,1,2,3,4,5,12,-3,-4,7,4])
print('a = ',a )
print('np.clip(a,0,10) = ', np.clip(a,0,10) )

a =  [11  1  2  3  4  5 12 -3 -4  7  4]
np.clip(a,0,10) =  [10  1  2  3  4  5 10  0  0  7  4]


Example with floating point
=============================

Note that if the clip parameters are floating point, the result will also be floating point:

In [17]:
a = np.arange(10).astype(np.int)
print('a=',a )
print('np.clip(a,2.5,7.5)=',np.clip(a,2.5,7.5) )

AttributeError: module 'numpy' has no attribute 'int'

Formatting arrays for printing
=================================

Printing floating point arrays
=======================================

When printing arrays with floating point values, NumPy generally prints the array with many places
decimals and with scientific notation, which makes visualization difficult.

In [18]:
A = np.exp(np.linspace(0.1,10,32)).reshape(4,8)/3000.
print('A: \n', A )

A: 
 [[3.68390306e-04 5.06993321e-04 6.97744275e-04 9.60263289e-04
  1.32155235e-03 1.81877265e-03 2.50306691e-03 3.44481975e-03]
 [4.74089730e-03 6.52461051e-03 8.97942724e-03 1.23578432e-02
  1.70073529e-02 2.34061923e-02 3.22125283e-02 4.43321564e-02]
 [6.10116684e-02 8.39666729e-02 1.15558259e-01 1.59035850e-01
  2.18871431e-01 3.01219527e-01 4.14550236e-01 5.70520443e-01]
 [7.85172815e-01 1.08058591e+00 1.48714510e+00 2.04666794e+00
  2.81670543e+00 3.87646151e+00 5.33493976e+00 7.34215526e+00]]


It is possible to reduce the number of decimal places and suppress exponential notation using
numpy's **set_printoption** function:

In [19]:
np.set_printoptions(suppress=True, precision=3)

print('A: \n', A )

A: 
 [[0.    0.001 0.001 0.001 0.001 0.002 0.003 0.003]
 [0.005 0.007 0.009 0.012 0.017 0.023 0.032 0.044]
 [0.061 0.084 0.116 0.159 0.219 0.301 0.415 0.571]
 [0.785 1.081 1.487 2.047 2.817 3.876 5.335 7.342]]


Printing binary arrays
===========================================

Boolean arrays are printed with the words **True** and **False**, as in the following example:

In [20]:
A = np.random.rand(5,10) > 0.5
print('A = \n', A )

A = 
 [[ True  True False  True False False False  True False False]
 [False False False False False  True  True False  True False]
 [False False False False False  True False  True  True  True]
 [ True False False False  True  True  True False False False]
 [ True  True  True False False False False False  True  True]]


To facilitate the visualization of these arrays, it is possible to convert the values to integers using
the **astype(int)** method:

In [21]:
print ('A = \n', A.astype(int))

A = 
 [[1 1 0 1 0 0 0 1 0 0]
 [0 0 0 0 0 1 1 0 1 0]
 [0 0 0 0 0 1 0 1 1 1]
 [1 0 0 0 1 1 1 0 0 0]
 [1 1 1 0 0 0 0 0 1 1]]
