# NumPy

It is a mathematical library with reference [here](https://numpy.org/).

`import numpy as np`

If you are looking for help on a certain topic you use `np.lookfor("solve")`. It has mathematical constants. The arrays it creates can contain several types of variables, however it will work as if it were of only one type, the one with the most information. It seems that no two arrays can be assigned without both varying equally.


In [1]:
import numpy as np
# np.lookfor("gauss")

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# show all ouputs of the same cell in the notebook, not only the last

These are multidimensional matrix type objects (meshes) that have an efficient structure for computational calculation, and mathematical functions can be applied to them to operate easily. They are indicators, structurally speaking, that contain raw data, where they are found and how to interpret them. It is composed of the following elements that at the same time function as attributes:

- `.data` memory address of the first byte.
- `.dtype` describes the element type
- `.shape` size
- `.strides` number of bytes skipped to the next element. If we have (32, 8), typical for an array of type _int64_, then we have to skip 32 bytes for the next row and 8 for the next column.
- `.size` number of total elements in the array

The simplest way to create arrays is to provide lists, each corresponding to a row. It is also convenient when working with large volumes of data to optimize the operations, so the type of data is indicated.

In [2]:
array1 = np.array(([1,2,3,4], [5,6,7,8]), dtype=np.int64)
array1.ndim # number of dimensions
array1.size # total number of elements
array1.shape # shape of array
len(array1) # size of first dimension
array1.astype(float) # change data type

2

8

(2, 4)

2

array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

In most cases we will not create arrays in this way, but the values will come from other files or it will be necessary to generate a sequence. For this purpose there are the following functions that create arrays as placeholders:

In [3]:
np.full((4,4), 2) #matrix full of 2s, indicated at the end
np.ones((5,5), dtype=np.int64) #array of 1s, can be given type, same as zeros
np.ones_like(array1) #builds an array equal to array1
np.zeros((5,5)) #0s array
np.eye(3) #identity array with that size
np.random.random((3,4)) #array random values based on uniform distribution [0, 1)
np.linspace(0,3,5) #vector of ten values, created equispatially from 0 to 3 (includes 3)
np.arange(0,30,10) #vector starting at 0 and jumping 10 until reaching the value before 30 (this is not included)
np.logspace(1, 20, num=5, base=10.0) #returns equispaced numbers on a logarithmic scale

x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)
xx, yy = np.meshgrid(x, y, sparse=True) # vectorized evaluations of N-D scalar/vector fields over N-D grids
# sparse True to conserve memory
xx
yy

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]], dtype=int64)

array([[1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int64)

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

array([[0.68998034, 0.26632362, 0.56800611, 0.07774563],
       [0.54495157, 0.36904528, 0.76911217, 0.13414322],
       [0.80328298, 0.27859485, 0.40634183, 0.56935069]])

array([0.  , 0.75, 1.5 , 2.25, 3.  ])

array([ 0, 10, 20])

array([1.00000000e+01, 5.62341325e+05, 3.16227766e+10, 1.77827941e+15,
       1.00000000e+20])

array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4]])

array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4]])

If the information comes from a plain text file such as txt or csv there is a very useful function that comes with several interesting parameters:

In [4]:
#array1 = np.genfromtxt('datos.csv', delimiter= ';', autostrip= True, skip_header=1, filling_values=-9999, usecols = (1, 3, 5), max_rows=6)

In order each parameter means the following: value delimiter, remove blanks, skip header, fill in missing values, columns to choose from and maximum number of rows it takes.

Once you are done working with them you can save them either as txt or a binary format to work with numpy again:

In [5]:
#np.savetxt('name.txt', array1, delimiter=',')
#np.save(name, array1) #with extension .npy
#np.savez(name2, x=array1, y=array2) #extension .npz, can also be saved as simple args

To open numpy binary files, use `load`:

In [6]:
#arrayload = np.load('name.npy') #loaded as an array
#arraygroup = np.load('name2.npz')
#arraygroup['x'] #array1
#arraygroup.close() #when you call the array you have to close it

A peculiar concept in this environment is called broadcasting which is used to be able to perform arithmetic operations with arrays of different dimensions. A standard case is when you want to add or multiply a vector to all rows of an array. To achieve this, three points must be taken into account: (i) the dimensions must be compatible, this is given if they are equal; (ii) if an array has a dimension value of 1 they will also be compatible; (iii) the maximum dimension of each array gives the final result.

In [7]:
x1 = np.ones((2,3))
y1 = np.zeros((4,1,3))
x1 + y1 # results in (4,2,3)
np.transpose(y1)

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

array([[[0., 0., 0., 0.]],

       [[0., 0., 0., 0.]],

       [[0., 0., 0., 0.]]])

There are two possibilities to resize an array. The first one is to use a function, in which it is virtually transformed into a vector and in order the elements are imposed, if it is greater than the number of elements, this vector is restarted until it is completed. The second uses a method, in which the missing is filled with 0s. If you want to change the dimensions respecting the current size (rows*columns) you will use another reshape method, but making sure that the result is consistent.

In [8]:
x1 = np.ones((2,3))
np.resize(x1, (3,3)) #function
x1.resize((3,3)) #method
x1.ravel() #1D row
x1.reshape(9,1) #column

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

array([1., 1., 1., 1., 1., 1., 0., 0., 0.])

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [0.],
       [0.],
       [0.]])

You can attach elements as in lists, indicate the position in which they are inserted or delete those indicated:

In [9]:
x1 = np.ones((2,3))
np.append(x1, [2,3,4,5])
np.append(x1, [[7],[9]], axis=1) #indicates adding a column with those values
np.insert(x1, 3, 4) #the first being the position and the second the value
np.insert(x1, 2, 4, axis=1) #position 2 value 4 according to columns (axis=1)
np.delete(x1,[2]) #position value to delete
np.delete(x1, 1, 0) #delete row of position 1 indicated by axis=0

array([1., 1., 1., 1., 1., 1., 2., 3., 4., 5.])

array([[1., 1., 1., 7.],
       [1., 1., 1., 9.]])

array([1., 1., 1., 4., 1., 1., 1.])

array([[1., 1., 4., 1.],
       [1., 1., 4., 1.]])

array([1., 1., 1., 1., 1.])

array([[1., 1., 1.]])

A function similar to append concatenates two arrays only if they have the same dimensions. If you want to stack several arrays they must have the same columns.

In [10]:
x1 = np.ones((2,3))
x2 = np.zeros((2,3))
x3 = np.zeros((4,3))
np.concatenate((x1,x2))
np.vstack((x1, x3)) 

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.]])

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

The library comes with frequently used mathematical functions, which can be applied to the components of a matrix or to a value. When the axis parameter is used it refers to the axis of the matrix, which for a 2D matrix would be 0 for the rows and 1 for the columns.

In [11]:
array = np.arange(-5,5,1)
array
array.sum() #sum of all elements
array.mean() #means all
np.median(array) #median
array.max(axis=0) #maximum value in rows
np.corrcoef(array) #correlation coefficient
np.std(array) #standard deviation
array.cumsum(axis=0) #cumulative sum of elements

np.sqrt(4) #square root
np.sin(np.pi) #sine
np.cos(np.pi) #cosine
np.log(np.e) #natural logarithm
np.exp(1) #exponential
np.dot([2j, 3j], [3j, 5j]) # matrix multiplication
a = [[4, 0], [0, 3]]
b = [[4, 1], [4, 8]]
np.dot(a, b)

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

-5

-0.5

-0.5

4

1.0

2.8722813232690143

array([ -5,  -9, -12, -14, -15, -15, -14, -12,  -9,  -5], dtype=int32)

2.0

1.2246467991473532e-16

-1.0

1.0

2.718281828459045

(-21+0j)

array([[16,  4],
       [12, 24]])

Comparisons of elements can be made by means of ==, < and >.  In the same way there are other functions to work with booleans, either by directly comparing these values as a result of some previous operation (True, True, False...) or by imposing some condition:

In [12]:
array = np.arange(-2,2,1)
np.logical_and(array>0, array<6)
np.logical_or(array>0, array<6)
np.logical_not(array>0, array<6)

array([False, False, False,  True])

array([ True,  True,  True,  True])

array([ True,  True,  True, False])

Comparisons return arrays of booleans. There are several methods:

In [13]:
np.all(array>0)
np.any(array>0)
#np.isclose(A,B,rtol=1e-6)
#np.allclose(A,B,rtol=1e-6) # compare with a tolerance due to the float accuracy problem

False

True

To deal only with certain parts of an array there are several options. The index operator is [] and inside it will contain references to elements (if there is more than one array), rows and columns.

In [14]:
A = np.array(([1,2,3,4], [5,6,7,8]))
A
##subsetting
A[0] 
A[0,0] 
##slicing
A[0,1:3] # 2,3
A[0,::2] # 1,3
A[1:] # 5,6,7,8
A[0,:3] #1,2,3
##boolean
A[A>3] #4,5,6,7,8
##fancy indexing
A[[1,1,0],[0,1,1]] #rows and columns 5,6,2
A[[1, 0, 1]][:,[0,1,2,0]] # take those rows, keep them (:) and sort them by those columns 5,6,7,5 ; 1,2,3,1; 5,6,7,5

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

array([1, 2, 3, 4])

1

array([2, 3])

array([1, 3])

array([[5, 6, 7, 8]])

array([1, 2, 3])

array([4, 5, 6, 7, 8])

array([5, 6, 2])

array([[5, 6, 7, 5],
       [1, 2, 3, 1],
       [5, 6, 7, 5]])

The topic of linear algebra (np.linalg) is covered with functions such as norm of a vector, inverse of a matrix, determinant, trace, system resolution, eigenvalues and eigenvectors, matrix decompositions (QR, SVD) and pseudoinverses. 

To solve a system of type x0 + 2 * x1 = 1 and 3 * x0 + 5 * x1 = 2

In [15]:
a = np.array([[1, 2], [3, 5]])
b = np.array([1, 2])
x = np.linalg.solve(a, b)
x
np.allclose(np.dot(a, x), b) # checking solution 

array([-1.,  1.])

True

Some applications with a graphical approach include histograms, as such are not represented visually, but serve to see the frequency of each value, that yes, we have to indicate the containers (bin). To visualize it we will use matplotlib, providing the input as a flat array:

In [16]:
np.histogram(array1, bins=range(0,20)) # range is set knowing the values within the array

(array([0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       dtype=int64),
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19]))