***NumPy*** : provides a multidimensional
array object to store homogeneous or heterogeneous data arrays and
supports vectorization of code.

Constructing arrays with lists :

In [1]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]

m = [v,v,v]
m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

In [2]:
m[1][0]

0.5

In [3]:
v1 = [0.5, 1.5]
v2 = [1, 2]
m = [v1, v2]
c = [m, m]
c

[[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]

In [9]:
c[1][1][0]

1

Instead of reference pointer, physical copies are used. As a consequence, a change in the original object does not have any impact anymore. We use ***deepcopy() function*** of the copy module for this

In [10]:
from copy import deepcopy

v = [0.5, 0.75, 1.0, 1.5, 2.0]

m = 3 * [deepcopy(v), ]

m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

In [13]:
v[0] = "Python"
v

['Python', 0.75, 1.0, 1.5, 2.0]

In [15]:
m

[[0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0],
 [0.5, 0.75, 1.0, 1.5, 2.0]]

There is a dedicated ***array module*** available in Python. Arrays
are sequence types and behave very much like lists, except that the type
of objects stored in them is constrained. The type is specified at object
creation time by using a type code, which is a single character.

In [27]:
v = [0.5, 0.75, 1.0, 1.5, 2.0]

In [28]:
import array
a = array.array("f", v)
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0])

In [29]:
a.append(0.5)
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5])

In [30]:
a.extend([5.0,6.75])
a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75])

In [31]:
2 * a

array('f', [0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75, 0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75])

Trying to append an object of a different data type than the one specified
raises a TypeError. However, the array object can easily be converted back to a list
object if such flexibility is required.

In [32]:
a.tolist()

[0.5, 0.75, 1.0, 1.5, 2.0, 0.5, 5.0, 6.75]

An advantage of the array class is that it has **built-in storage and retrieval
functionality**:

In [34]:
#open a file or a disk for writing binary data
f = open("array.apy", "wb")
#writes array data to file
a.tofile(f)
#closes the file
f.close()

In [35]:
#alternative for above process
with open("array.apy", "wb") as f:
    a.tofile(f)

In [44]:
#shows the file written on disk
!ls -n arr*

-rw-r--r-- 1 0 0 32 Aug 19 03:56 array.apy


In [40]:
#Instantiates a new array object with type code float .
b = array.array("f")

with open('array.apy', "rb") as f:
    #reads five elements in the b object
    b.fromfile(f,5)

b

array('f', [0.5, 0.75, 1.0, 1.5, 2.0])

In [46]:
#Instantiates a new array object with type code double
b = array.array("d")

with open("array.apy", "rb") as f:
    b.fromfile(f, 2)

b

array('d', [0.0004882813645963324, 0.12500002956949174])

**numpy.ndarray** is just such a class, built with the specific goal of handling
n-dimensional arrays both conveniently and efficiently — i.e., in a highly
performant manner.

In [47]:
import numpy as np

a = np.array([0, 0.5, 1.0, 1.5, 2.0])
a

array([0. , 0.5, 1. , 1.5, 2. ])

In [49]:
type(a)

numpy.ndarray

In [50]:
a = np.array(['a', 'b', 'c'])
a

array(['a', 'b', 'c'], dtype='<U1')

np.arange()
works similar to range() but takes as additional input the dtype parameter.

In [60]:
a = np.arange(8, dtype = np.float)
a

array([0., 1., 2., 3., 4., 5., 6., 7.])

A major feature of the ndarray class is the multitude of built-in methods.

In [61]:
a.sum()

28.0

In [62]:
a.std()

2.29128784747792

In [63]:
a.cumsum()

array([ 0.,  1.,  3.,  6., 10., 15., 21., 28.])

Initialize (instantiate) ndarray objects

In [66]:
c = np.zeros((2,3), dtype = "i", order = "C")
c

array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [70]:
c = np.ones((2,3,4), dtype = "i", order = "C")
c

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int32)

In [71]:
d = np.zeros_like(c, dtype = "f16", order = "c")
d

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float128)

In [73]:
d = np.ones_like(c, dtype = "f16", order = "c")
d

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float128)

In [79]:
e = np.empty((2,3,2))
e

array([[[2.63965441e-316, 2.50395503e-312],
        [2.37663529e-312, 2.41907520e-312],
        [2.31297541e-312, 2.46151512e-312]],

       [[2.41907520e-312, 2.44029516e-312],
        [8.27578359e-313, 8.70018274e-313],
        [3.99910963e+252, 2.66090406e-312]]])

In [81]:
f = np.empty_like(c)
f

array([[[  53427200,          0,        110,        118],
        [        40,        112,         97,        114],
        [        97,        109,        101,        116]],

       [[       101,        114,         95,        115],
        [        61,         39,         39,         41],
        [ 775501625, 1952543859, 1702065013,        125]]], dtype=int32)

In [82]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [86]:
g = np.linspace(5,15,12)
g

array([ 5.        ,  5.90909091,  6.81818182,  7.72727273,  8.63636364,
        9.54545455, 10.45454545, 11.36363636, 12.27272727, 13.18181818,
       14.09090909, 15.        ])

***Metainformation -***
Every ndarray object provides access to a number of useful attributes:

In [88]:
#number of elements
g.size

12

In [89]:
#number of bytes used to represent one element
g.itemsize

8

In [90]:
#number of dimensions
g.ndim

1

In [91]:
#shape of ndarray
g.shape

(12,)

In [92]:
#dtype pf elements
g.dtype

dtype('float64')

In [93]:
#total number of bytes used in memory
g.nbytes

96

***Reshaping and Resizing -***
Although ndarray objects are immutable by default, there are multiple
options to reshape and resize such an object. While reshaping in general
just provides another view on the same data, resizing in general creates a
new (temporary) object.

In [95]:
g = np.arange(15)
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [96]:
g.shape

(15,)

In [97]:
g.reshape(3,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [98]:
h = g.reshape(5,3)
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [99]:
h.T

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

In [100]:
h.transpose()

array([[ 0,  3,  6,  9, 12],
       [ 1,  4,  7, 10, 13],
       [ 2,  5,  8, 11, 14]])

During a reshaping operation, the total number of elements in the ndarray
object is unchanged. During a resizing operation, this number changes — it
either decreases (“down-sizing”) or increases (“up-sizing”).

In [101]:
g

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [103]:
np.resize(g, (5,4))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14,  0],
       [ 1,  2,  3,  4]])

***Stacking*** is a special operation that allows the horizontal or vertical
combination of two ndarray objects. However, the size of the “connecting”
dimension must be the same:

In [104]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [106]:
np.hstack((h, 2 * h))

array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16],
       [ 9, 10, 11, 18, 20, 22],
       [12, 13, 14, 24, 26, 28]])

In [107]:
np.vstack((h, 0.5 * h))

array([[ 0. ,  1. ,  2. ],
       [ 3. ,  4. ,  5. ],
       [ 6. ,  7. ,  8. ],
       [ 9. , 10. , 11. ],
       [12. , 13. , 14. ],
       [ 0. ,  0.5,  1. ],
       [ 1.5,  2. ,  2.5],
       [ 3. ,  3.5,  4. ],
       [ 4.5,  5. ,  5.5],
       [ 6. ,  6.5,  7. ]])

In [108]:
h

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [110]:
h.flatten(order = "C")

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [111]:
h.flatten(order = "F")

array([ 0,  3,  6,  9, 12,  1,  4,  7, 10, 13,  2,  5,  8, 11, 14])

In [112]:
h 

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [113]:
h > 5

array([[False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [114]:
(h > 5).astype(int)

array([[0, 0, 0],
       [0, 0, 0],
       [1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

Such Boolean arrays can be used for indexing and data selection. Notice
that the following operations flatten the data:

In [115]:
h[h > 5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14])

A powerful tool in this regard is the **np.where()** function, which allows the
definition of actions/operations depending on whether a condition is True or
False . The result of applying np.where() is a new ndarray object of the
same shape as the original one:

In [116]:
np.where(h > 7, 1, 0)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [117]:
np.where(h % 2 == 0, "even", "odd")

array([['even', 'odd', 'even'],
       ['odd', 'even', 'odd'],
       ['even', 'odd', 'even'],
       ['odd', 'even', 'odd'],
       ['even', 'odd', 'even']], dtype='<U4')

In [118]:
np.where(h <= 7, h * 2, h / 2)

array([[ 0. ,  2. ,  4. ],
       [ 6. ,  8. , 10. ],
       [12. , 14. ,  4. ],
       [ 4.5,  5. ,  5.5],
       [ 6. ,  6.5,  7. ]])

***Structured NumPy Arrays -***
Structured arrays are a generalization of the regular ndarray
object type in that the data type only has to be the same per column, like in
tables in SQL databases. One advantage of structured arrays is that a single
element of a column can be another multidimensional object and does not
have to conform to the basic NumPy data types.

In [121]:
dt = np.dtype([('Name', 'S10'), ('Age', 'i4'), ('Height', 'f'), ('Children/Pets', 'i4', 2)])

In [122]:
dt

dtype([('Name', 'S10'), ('Age', '<i4'), ('Height', '<f4'), ('Children/Pets', '<i4', (2,))])

In [123]:
#An alternative syntax to achieve the same result.
dt = np.dtype({'names': ['Name', 'Age', 'Height', 'Children/Pets'], 'formats':'O int float int,int'.split()})

In [124]:
dt

dtype([('Name', 'O'), ('Age', '<i8'), ('Height', '<f8'), ('Children/Pets', [('f0', '<i8'), ('f1', '<i8')])])

In [125]:
s = np.array([('Smith', 45, 1.83, (0, 1)), ('Jones', 53, 1.72, (2, 2))], dtype=dt)

In [128]:
s

array([('Smith', 45, 1.83, (0, 1)), ('Jones', 53, 1.72, (2, 2))],
      dtype=[('Name', 'O'), ('Age', '<i8'), ('Height', '<f8'), ('Children/Pets', [('f0', '<i8'), ('f1', '<i8')])])

In [129]:
#The object type is still ndarray .
type(s)

numpy.ndarray

In a sense, this construction comes quite close to the operation for
initializing tables in a SQL database: one has column names and column
data types, with maybe some additional information (e.g., maximum
number of characters per str object).

The single columns can now be easily
accessed by their names and the rows by their index values:

In [130]:
s["Name"]

array(['Smith', 'Jones'], dtype=object)

In [131]:
s["Height"].mean()

1.775

In [132]:
s[0]

('Smith', 45, 1.83, (0, 1))

In [133]:
s[1]["Age"]

53

***Vectorization of Code -***

Vectorization is a strategy to get more compact code that is possibly
executed faster. The fundamental idea is to conduct an operation on or to
apply a function to a complex object “at once” and not by looping over the
single elements of the object. In Python, functional programming tools such
as map() and filter() provide some basic means for vectorization.
However, NumPy has vectorization built in deep down in its core.

**Broadcasting :** This allows you to
combine objects of different shape within a single operation.

In [135]:
r = np.arange(12).reshape((4, 3))
r

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [136]:
r + 3

array([[ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [137]:
s = np.arange(0,12,4)
s

array([0, 4, 8])

In [138]:
r + s

array([[ 0,  5, 10],
       [ 3,  8, 13],
       [ 6, 11, 16],
       [ 9, 14, 19]])

These operations work with differently shaped ndarray objects as well, up
to a certain point:

In [141]:
s = np.arange(0,12,3)
s

array([0, 3, 6, 9])

In [142]:
r + s

ValueError: ignored

In [143]:
r.T + s

array([[ 0,  6, 12, 18],
       [ 1,  7, 13, 19],
       [ 2,  8, 14, 20]])

In [145]:
sr = s.reshape(-1,1)
sr

array([[0],
       [3],
       [6],
       [9]])

In [146]:
r + sr

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14],
       [18, 19, 20]])

On the NumPy level, looping over the ndarray object is taken care of by optimized code, most of it written in C and therefore generally faster than pure Python. This explains the " secret " behind the performance benefits of using NumPy for array-based use cases.

***Memory Layout -***

When ndarray objects are initialized by using np.zeros() , as in “Multiple
Dimensions”, an optional argument for the memory layout is provided. This
argument specifies, roughly speaking, which elements of an array get stored
in memory next to each other (contiguously). When working with small
arrays, this has hardly any measurable impact on the performance of array
operations. However, when arrays get large, and depending on the
(financial) algorithm to be implemented on them, the story might be
different. This is when memory layout comes into play

Consider the following construction of
multidimensional ndarray objects:

In [157]:
x = np.random.standard_normal((1000000, 5))
x

array([[ 0.88856452,  0.51274549,  0.92123913,  0.79005258, -1.53495154],
       [ 0.84872973,  0.69921004, -0.53638366,  0.05992931, -0.87694289],
       [ 1.22574041, -0.00375774,  0.5185445 ,  0.27948334,  0.19083482],
       ...,
       [-0.31248747, -0.89688893, -0.7838734 ,  1.30445266, -0.36016444],
       [ 0.14502176, -0.69081802, -1.72543806,  1.46816233,  1.39259037],
       [ 0.86040368,  1.12537485, -0.90881135, -1.60113848, -0.61654879]])

In [148]:
y = 2 * x + 3

In [149]:
C = np.array((x,y), order = "C")

In [150]:
F = np.array((x,y), order = "F")

In [152]:
#Memory is freed up (contingent on garbage collection).
x = 0.0; y = 0.0

In [156]:
C[:2].round(2)

array([[[-1.75,  0.34,  1.15, -0.25,  0.98],
        [ 0.51,  0.22, -1.07, -0.19,  0.26],
        [-0.46,  0.44, -0.58,  0.82,  0.67],
        ...,
        [-0.05,  0.14,  0.17,  0.33,  1.39],
        [ 1.02,  0.3 , -1.23, -0.68, -0.87],
        [ 0.83, -0.73,  1.03,  0.34, -0.46]],

       [[-0.5 ,  3.69,  5.31,  2.5 ,  4.96],
        [ 4.03,  3.44,  0.86,  2.62,  3.51],
        [ 2.08,  3.87,  1.83,  4.63,  4.35],
        ...,
        [ 2.9 ,  3.28,  3.33,  3.67,  5.78],
        [ 5.04,  3.6 ,  0.54,  1.65,  1.26],
        [ 4.67,  1.54,  5.06,  3.69,  2.07]]])