<h3 style="text-align:center;color:cadetblue;">Numpy</h3>

NumPy (**Num**erical **Py**thon) is an open source Python library that’s widely used in science and engineering. The NumPy library contains multidimensional array data structures, such as the homogeneous, N-dimensional `ndarray`, and a large library of functions that operate efficiently on these data structures.

Python lists are excellent, general-purpose containers. They can be “heterogeneous”, meaning that they can contain elements of a variety of types, and they are quite fast when used to perform individual operations on a handful of elements.

Depending on the characteristics of the data and the types of operations that need to be performed, other containers may be more appropriate; by exploiting these characteristics, we can improve speed, reduce memory consumption, and offer a high-level syntax for performing a variety of common processing tasks. NumPy shines when there are large quantities of “homogeneous” (same-type) data to be processed on the CPU.

**What is an _array_?**

In computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. For instance, if each element of the data were a number, we might visualize a “one-dimensional” array like a list:

<table>
<tr>
<td>1</td>
<td>5</td>
<td>2</td>
<td>0</td>
</tr>
</table>

A two-dimensional array would be like a table:
<table>
<tr>
<td>1</td><td>5</td><td>2</td><td>0</td>
</tr>
<tr>
<td>8</td><td>3</td><td>6</td><td>1</td>
</tr>
<tr>
<td>1</td><td>7</td><td>2</td><td>9</td>
</tr>
</table>

A three-dimensional array would be like a set of tables, perhaps stacked as though they were printed on separate pages. In NumPy, this idea is generalized to an arbitrary number of dimensions, and so the fundamental array class is called `ndarray`: it represents an “N-dimensional array”.

<img src="images/3d_array.png" style="width:500px;object-fit:cover;" />

Most NumPy arrays have some restrictions. For instance:

- All elements of the array must be of the same type of data.
- Once created, the total size of the array can’t change.
- The shape must be “rectangular”, not “jagged”; e.g., each row of a two-dimensional array must have the same number of columns.

When these conditions are met, NumPy exploits these characteristics to make the array faster, more memory efficient, and more convenient to use than less restrictive data structures.

Basic array operations

```bash
pip install numpy
```

In [2]:
import numpy as np

In [3]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

In [4]:
type(a)

numpy.ndarray

In [94]:
a1d = np.array([1, 2, 3, 4, 5])
a1d

array([1, 2, 3, 4, 5])

In [97]:
a1d.shape

(5,)

In [95]:
a2d = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
a2d

array([[1, 2, 3],
       [4, 5, 6]])

In [96]:
a2d.shape

(2, 3)

In [7]:
type(a2d)

numpy.ndarray

In [48]:
a3d = np.array([
    [
        [1, 2], [3, 4]
    ],
    [
        [5, 6], [7, 8]
    ]
])

a3d

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [11]:
l3 = [
    [1, 2, 3],
    [4, 5, 6, 7, 8]
]
np.array(l3)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [14]:
a1d

array([1, 2, 3, 4, 5])

In [13]:
a1d.dtype

dtype('int64')

In [21]:
a = np.array([1, 2, 3, 4, 5, 256], dtype=np.uint64) # unsigned integer
a

array([  1,   2,   3,   4,   5, 256], dtype=uint64)

In [None]:
# np.int8 -> (-128, 127)
# np.uint8 -> (0, 255)

In [29]:
np.arange(10, 15, 0.2)

array([10. , 10.2, 10.4, 10.6, 10.8, 11. , 11.2, 11.4, 11.6, 11.8, 12. ,
       12.2, 12.4, 12.6, 12.8, 13. , 13.2, 13.4, 13.6, 13.8, 14. , 14.2,
       14.4, 14.6, 14.8])

In [30]:
range(10, 15, 0.2)

TypeError: 'float' object cannot be interpreted as an integer

In [35]:
np.arange(10, 15, dtype=float)

array([10., 11., 12., 13., 14.])

In [41]:
np.linspace(0, 10, 11)

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

In [52]:
b = np.zeros((2, 3))
b

array([[0., 0., 0.],
       [0., 0., 0.]])

In [55]:
ones = np.ones((2, 3), dtype=int)
ones

array([[1, 1, 1],
       [1, 1, 1]])

In [57]:
ones.astype(float)

array([[1., 1., 1.],
       [1., 1., 1.]])

In [56]:
ones * 7 # element-wise multiplication

array([[7, 7, 7],
       [7, 7, 7]])

In [72]:
ns = np.full((2, 7), fill_value=10)
ns

array([[10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10]])

In [76]:
empty = np.empty((4, 2))
empty

array([[0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 5.59282311e-321],
       [7.56592337e-307, 9.34588060e-307]])

In [47]:
b.ndim

2

In [49]:
a3d.ndim

3

In [80]:
a = np.array([1, 2, 3], dtype=np.int64)
a.itemsize

8

In [79]:
a.strides

(1,)

In [67]:
c = np.array([1, '1', True])
c.dtype

dtype('<U21')

In [68]:
c

array(['1', '1', 'True'], dtype='<U21')

In [84]:
np.eye(5, 7)

array([[1., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0.]])

<img src="images/np_array_dataones.png" style="width:800px;object-fit:cover;" />

<img src="images/np_data_plus_ones.png" style="width:800px;object-fit:cover;" />

<img src="images/np_sub_mult_divide.png" style="width:1000px;object-fit:cover;" />

In [86]:
data = np.array([1, 2], dtype=int)
ones = np.ones(2, dtype=int)
data + ones

array([2, 3])

In [87]:
data - ones

array([0, 1])

In [88]:
data - np.array([1, 2, 3])

ValueError: operands could not be broadcast together with shapes (2,) (3,) 

In [91]:
data.shape

(2,)

In [93]:
a3d.shape

(2, 2, 2)

In [89]:
data * ones

array([1, 2])

In [90]:
data / ones

array([1., 2.])

In [101]:
np.array([1, 2]).shape

(2,)

In [103]:
np.array(1.6).shape

()

In [None]:
(2,)
(1, 4)

**Broadcasting**

<img src="images/np_multiply_broadcasting.png" style="width:800px;object-fit:cover;" />

<img src="images/array_broadcasting.png" style="width:800px;object-fit:cover;" />

In [104]:
a1 = np.array([1, 2, 3], dtype=int)
a2 = np.array([[1], [2], [3]], dtype=int)
print(a1.shape)
print(a2.shape)

(3,)
(3, 1)


In [105]:
a1 * a2

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

**More useful array operations**

<img src="images/np_aggregation.png" style="width:900px;object-fit:cover;" />

In [107]:
a1

array([1, 2, 3])

In [106]:
a1.max()

np.int64(3)

In [110]:
np.max(a1)

np.int64(3)

In [108]:
a1.min()

np.int64(1)

In [111]:
np.min(a1)

np.int64(1)

In [109]:
a1.sum()

np.int64(6)

In [112]:
np.sum(a1)

np.int64(6)

**Matrices**

<img src="images/np_create_matrix.png" style="width:800px;object-fit:cover;" />

Indexing and slicing operations are useful when you’re manipulating matrices:

<img src="images/np_matrix_indexing.png" style="width:1000px;object-fit:cover;" />

You can aggregate matrices the same way you aggregated vectors:
<img src="images/np_matrix_aggregation.png" style="width:1000px;object-fit:cover;" />

You can aggregate all the values in a matrix and you can aggregate them across columns or rows using the axis parameter. To illustrate this point, let’s look at a slightly modified dataset:

<img src="images/np_matrix_aggregation_row.png" style="width:1000px;object-fit:cover;" />

In [113]:
data = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
])
data.max(axis=0)

array([5, 6])

In [114]:
data.max(axis=1)

array([2, 4, 6])

In [115]:
a3d

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [118]:
a3d.max(axis=0)

array([[5, 6],
       [7, 8]])

In [119]:
[[1, 2],[3, 4]], [[5, 6],[7, 8]]

([[1, 2], [3, 4]], [[5, 6], [7, 8]])

In [120]:
data = [
    [1, 2],
    [3, 4],
    [5, 6]
]
arr = np.array(data)

In [122]:
data[0][1]

2

In [125]:
arr[0][1]

np.int64(2)

In [128]:
data[(0, 1)]

TypeError: list indices must be integers or slices, not tuple

In [129]:
arr[0, 1]

np.int64(2)

In [147]:
class List(list):
    def __getitem__(self, index):
        try:
            index = int(index)
            return super().__getitem__(index)
        except TypeError:
            raise

a1 = List([1, 2, 3, 4])

a1.append(3)
print(a1)
a1['2']

[1, 2, 3, 4, 3]


3

In [136]:
l1 = [1, 2, 3, 4]
l1['1']

TypeError: list indices must be integers or slices, not str

In [149]:
arr[0:2, 0]

array([1, 3])

In [152]:
data[0:2][0]

[1, 2]

In [153]:
data

[[1, 2], [3, 4], [5, 6]]

In [154]:
arr

array([[1, 2],
       [3, 4],
       [5, 6]])

In [156]:
a2 = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])

a2

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [157]:
a2[1:, 2:]

array([[ 7,  8],
       [11, 12]])

In [158]:
a2[::2, ::3]

array([[ 1,  4],
       [ 9, 12]])

In [170]:
a2[[0, 0, 2, 2], [0, 2, 0, 2]]

array([ 1,  3,  9, 11])

In [171]:
a2[[0, 0, 0, 0, 0]]

array([[1, 2, 3, 4],
       [1, 2, 3, 4],
       [1, 2, 3, 4],
       [1, 2, 3, 4],
       [1, 2, 3, 4]])

In [172]:
a2

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [174]:
a2[0, 0] = 10

In [175]:
a2

array([[10,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [176]:
data = [[10,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]]
sub_data = data[:2]
sub_data

[[10, 2, 3, 4], [5, 6, 7, 8]]

In [178]:
sub_data[0][0] = -1

In [179]:
sub_data

[[-1, 2, 3, 4], [5, 6, 7, 8]]

In [180]:
data

[[-1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

In [182]:
l1 = [1, 2, 3, 4, 5, 6, 7]
l2 = l1[:5]

In [184]:
l2[0] = 100
l2

[100, 2, 3, 4, 5]

In [185]:
l1

[1, 2, 3, 4, 5, 6, 7]

In [189]:
a1

array([10,  2,  3,  4,  5,  6,  7])

In [186]:
a1 = np.array(l1)
a2 = a1[:5]
a2

array([1, 2, 3, 4, 5])

In [187]:
a2[0] = 10
a2

array([10,  2,  3,  4,  5])

In [188]:
a1

array([10,  2,  3,  4,  5,  6,  7])

In [191]:
a2d

array([[1, 2, 3],
       [4, 5, 6]])

In [195]:
a2d.shape

(2, 3)

|1|2|3|4|5|6|

In [192]:
a2d.strides

(24, 8)

In [193]:
a2d[1, 1]

np.int64(5)

In [196]:
b2d = a2d.transpose()
b2d

array([[1, 4],
       [2, 5],
       [3, 6]])

In [197]:
b2d.shape

(3, 2)

In [198]:
b2d.strides

(8, 24)

In [199]:
b2d[1, 0]

np.int64(2)

In [200]:
a2d[0, 1]

np.int64(2)

In [201]:
a2d.shape

(2, 3)

In [202]:
b2d.shape

(3, 2)

In [203]:
a2d.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [210]:
a2d.reshape((1, 2, 1, 3, 1))

array([[[[[1],
          [2],
          [3]]],


        [[[4],
          [5],
          [6]]]]])

In [214]:
import csv

with open('simple.csv') as f:
    reader = csv.reader(f)
    next(reader)
    data = [[int(i) for i in row] for row in reader]
    arr1 = np.array(data)
arr1

array([[0, 0],
       [1, 1],
       [2, 4],
       [3, 9]])

In [218]:
arr2 = np.loadtxt('simple.csv', delimiter=',', skiprows=1, dtype=int)
arr2

array([[0, 0],
       [1, 1],
       [2, 4],
       [3, 9]])