## Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for `numerical computing` in Python. `Pandas`, `Matplotlib`, `Statmodels` and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, everything is an object, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

### Binary representation of numbers 

![image.png](attachment:image.png)

`Bynary representation` of numbers is the way computers store numbers. In binary, each digit is a power of 2. For example, the number 13 in binary is represented as 1101, which is equal to `1*2^3 + 1*2^2 + 0*2^1 + 1*2^0 = 8 + 4 + 0 + 1 = 13`.
The number of bits used to represent a number determines the range of values that can be represented. For example, an `8-bit` integer can represent values from `-128 to 127`, while a `32-bit` integer can represent values from `-2,147,483,648 to 2,147,483,647`.

The number of bits used to represent a number also determines the precision of the number. For example, a `32-bit` floating-point number can represent values with a precision of about 7 decimal digits, while a `64-bit` floating-point number can represent values with a precision of about 15 decimal digits.
The number of bits used to represent a number also determines the range of values that can be represented. For example, a 32-bit floating-point number can represent values from approximately `-3.4 x 10^38 to 3.4 x 10^38`, while a `64-bit` floating-point number can represent values from approximately `-1.8 x 10^308 to 1.8 x 10^308.`

Let´s check how it works the binary representation of numbers with some practical examples :

In [73]:
import numpy as np
import sys

With the `bin()` function, we can convert a number to its binary representation. The `bin()` function returns a string that starts with the prefix `0b`, which indicates that the number is in binary format. For example, the binary representation of the decimal number 29 is `0b11101`:


In [74]:
n = 29

print(bin(n)[2:])

11101


Let´s break down the binary representation of the number 29 step by step:

![image.png](attachment:image.png)

1. Start with the decimal number 29.
2. Divide the number by 2 and keep track of the quotient and the remainder. If the number is even, the remainder will be 0; if it is odd, the remainder will be 1.
3. Continue dividing the quotient by 2 until the quotient is 0.
4. Write down the remainders in reverse order to get the binary representation.
5. The binary representation of 29 is `11101`, which can be written as `0b11101` in Python.

Let´s see now how to convert a integer number to a binary without the `bin()` function. We can also use the `binary_repr` provided by the `numpy` library. The `binary_repr()` function takes an integer as input and returns its binary representation as a string. The function also has an optional parameter `width`, which specifies the number of bits to use in the binary representation. If the width is not specified, the function will use the minimum number of bits required to represent the number.

In the example below, we convert the decimal number 29 to its binary representation using the `binary_repr()` function. We also specify a width of 20 bits, which means that the binary representation will be padded with leading zeros to make it 20 bits long. The output is `00000000000000011101`, which is the binary representation of 29 with a width of 20 bits.

In [75]:
n = 29

binary_rep = lambda n: np.binary_repr(n, width=20)
print(binary_rep(n))

00000000000000011101


---

### Useful Numpy functions

#### `numpy.random`

 Generates random numbers. The `random` module in NumPy provides functions to generate random numbers from various probability distributions, including uniform, normal, and binomial distributions. For example, `numpy.random.rand(3)` generates an array of 3 random numbers uniformly distributed between 0 and 1.

In [114]:
np.random.random(size=2)

array([0.62804386, 0.35046026])

In [115]:
np.random.normal(size=2)

array([ 0.72078081, -0.58036119])

In [116]:
np.random.rand(2, 4)

array([[0.52208979, 0.0127048 , 0.14031548, 0.46310496],
       [0.15372924, 0.80092038, 0.94201718, 0.78026309]])

#### `numpy.arange()`

Creates an array of evenly spaced values within a specified range. It is similar to the built-in `range()` function, but it returns a NumPy array instead of a list. The `arange()` function takes three arguments: `start`, `stop`, and `step`. The `start` argument specifies the starting value of the sequence, the `stop` argument specifies the end value (exclusive), and the `step` argument specifies the spacing between values. If the `step` argument is not specified, it defaults to 1.

In [117]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [118]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [119]:
np.arange(0, 1, 0.1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

#### `nump.reshape()`

Reshapes an array without changing its data. The `reshape()` function takes a single argument, which is the new shape of the array. The new shape can be specified as a tuple, where each element of the tuple represents the size of the corresponding dimension. For example, `numpy.reshape((2, 3))` reshapes a 1D array with 6 elements into a 2D array with 2 rows and 3 columns.

In [120]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [121]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

#### `numpy.linspace()`
Creates an array of evenly spaced values over a specified range. It is similar to the `arange()` function, but it allows you to specify the number of values to generate instead of the step size. The `linspace()` function takes three arguments: `start`, `stop`, and `num`. The `start` argument specifies the starting value of the sequence, the `stop` argument specifies the end value (inclusive), and the `num` argument specifies the number of values to generate. If the `num` argument is not specified, it defaults to 50.

In [122]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [123]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [124]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

#### `numpy.zeros()`
Creates an array filled with zeros. The `zeros()` function takes a single argument, which is the shape of the array to create. The shape can be specified as a tuple, where each element of the tuple represents the size of the corresponding dimension. For example, `numpy.zeros((2, 3))` creates a 2D array with 2 rows and 3 columns, filled with zeros.

In [125]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [126]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [128]:
np.zeros((3, 3), dtype=np.int8)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=int8)

#### `numpy.ones()`
Creates an array filled with ones. The `ones()` function works similarly to the `zeros()` function, but it fills the array with ones instead of zeros. The shape of the array can be specified as a tuple, just like in the `zeros()` function.

In [129]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [130]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

#### `numpy.empty()`
 Creates an empty array. The `empty()` function creates an array without initializing its values. The shape of the array can be specified as a tuple, just like in the `zeros()` and `ones()` functions. The values in the empty array are not guaranteed to be zero or any other specific value; they will be whatever values were already present in the memory location allocated for the array.

In [131]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [132]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

#### `numpy.identity()`
 Creates a 2D identity matrix. The `identity()` function works similarly to the `eye()` function, but it allows you to specify the number of rows and columns separately. The size of the matrix can be specified as two integers, where the first integer represents the number of rows and the second integer represents the number of columns. For example, `numpy.identity(3, 4)` creates a 3x4 identity matrix.

In [133]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

#### `numpy.eye()`
Creates a 2D identity matrix. The `eye()` function creates a square matrix with ones on the diagonal and zeros elsewhere. The size of the matrix can be specified as a single integer, which represents the number of rows and columns in the square matrix. For example, `numpy.eye(3)` creates a 3x3 identity matrix.

In [134]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [135]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [136]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [137]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

---

### Basic Numpy Arrays

In [76]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [77]:
a = np.array([1, 2, 3, 4])

In [78]:
b = np.array([0, 0.5, 1, 1.5, 2])

In [79]:
a[0], a[1]

(1, 2)

In [80]:
a[0:]

array([1, 2, 3, 4])

In [81]:
a[1:3]

array([2, 3])

In [82]:
a[1:-1]

array([2, 3])

In [83]:
a[::2] # every second element with step 2 

array([1, 3])

In [84]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [85]:
b[[0, 2, -1]]

array([0., 1., 2.])

---

### Array Types

In the exampes below we will see how to create arrays with different types. The `dtype` parameter specifies the data type of the array elements. The available data types are:
* `int`: Integer type
* `float`: Floating-point type
* `bool`: Boolean type
* `str`: String type
* `object`: Object type
* `complex`: Complex number type
* `datetime`: Date and time type
* `timedelta`: Time duration type
* `void`: Void type (used for custom data types)
* `bytes`: Byte type (used for binary data)
* `unicode`: Unicode string type (used for text data)
* `int8`: 8-bit signed integer type
* `int16`: 16-bit signed integer type
* `int32`: 32-bit signed integer type
* `int64`: 64-bit signed integer type
* `uint8`: 8-bit unsigned integer type
* `uint16`: 16-bit unsigned integer type
* `uint32`: 32-bit unsigned integer type
* `uint64`: 64-bit unsigned integer type
* `float16`: 16-bit floating-point type
* `float32`: 32-bit floating-point type
* `float64`: 64-bit floating-point type
* `float128`: 128-bit floating-point type
* `complex64`: 64-bit complex number type (2 x 32-bit floats)
* `complex128`: 128-bit complex number type (2 x 64-bit floats)
* `complex256`: 256-bit complex number type (2 x 128-bit floats)
* `datetime64`: 64-bit date and time type
* `timedelta64`: 64-bit time duration type

Remember that we can store the different data types with different sizes of bytes, since we can increase the performance of the computations. For example, if we have a large array of integers that are all less than 256, we can use the `uint8` data type to store them, which will use only 1 byte per element instead of 4 bytes for the `int32` data type. This can save a lot of memory and improve performance when working with large arrays.



In [86]:
a

array([1, 2, 3, 4])

In [87]:
a.dtype # int32 means 32 bit integer 

dtype('int32')

In [88]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [89]:
b.dtype # float64 means 64 bit float

dtype('float64')

In [90]:
np.array([1, 2, 3, 4], dtype=np.float32)  # convert to float32 32 bit float

array([1., 2., 3., 4.], dtype=float32)

In [91]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [92]:
c = np.array(["a", "b", "c"])

---

### Dimensions and shapes

The `shape` of an array is a tuple that represents the size of each dimension of the array. For example, a 2D array with shape `(3, 4)` has 3 rows and 4 columns. The `ndim` attribute of an array returns the number of dimensions of the array. For example, a 1D array has `ndim` equal to 1, while a 2D array has `ndim` equal to 2. 

The `size` attribute of an array returns the total number of elements in the array. For example, a 2D array with shape `(3, 4)` has `size` equal to 12, since it has 3 rows and 4 columns (3 * 4 = 12). The `itemsize` attribute of an array returns the size of each element in bytes. For example, a `float64` array has `itemsize` equal to 8, since a `float64` takes up 8 bytes of memory.

In [93]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [94]:
A.shape

(2, 3)

In [95]:
A.ndim # number of dimensions , in this case 2D 

2

In [96]:
A.size # number of elements in the array

6

In this example, we create a 3D array with shape `(2, 3, 3)`, which means it has 2 blocks([matrices](https://www.geeksforgeeks.org/matrices/)), each with 3 rows and 4 columns.

In [97]:
B = np.array(
    [
        [
            [12, 11, 10],
            [9, 8, 7],
        ],
        [[6, 5, 4], [3, 2, 1]],
    ]
)

In [98]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [99]:
B.shape # 2x2x3 2 matrices of 2x3 

(2, 2, 3)

In [100]:
B.ndim # 3D array

3

In [101]:
B.size # 12 elements in total

12

If the shape isn´t consistent, we will get an error. For example, if we try to create a 2D array with shape `(3, 4)` and then reshape it to `(2, 3, 3)`, we will get an error because the total number of elements is not the same (12 != 18).

In [102]:
C = np.array(
    [
        [
            [12, 11, 10],
            [9, 8, 7],
        ],
        [[6, 5, 4]],
    ]
)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

---

### Indexing and Slicing of Matrices

The `indexing` and `slicing` of matrices in NumPy is similar to the indexing and slicing of lists in Python. We can use the `:` operator to slice an array, and we can use the `[]` operator to index an array.

In [None]:
# Square matrix
A = np.array(
    [
        # 0. 1. 2
        [1, 2, 3],  # 0
        [4, 5, 6],  # 1
        [7, 8, 9],  # 2
    ]
)

In [None]:
A[1] # 1D array of the second row

array([4, 5, 6])

In [None]:
A[1][0] # 1D array of the second row, first column

4

In [None]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [None]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [None]:
A[:2, 2:]

array([[3],
       [6]])

In [None]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
A[1] = np.array([10, 10, 10])

In [None]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [None]:
A[2] = 99 # replace the last row with 99

In [None]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

---

### Summary statistics

In [None]:
a = np.array([1, 2, 3, 4])

In [None]:
a.sum()

10

In [None]:
a.mean()

2.5

In [None]:
a.std() # standard deviation

1.118033988749895

In [None]:
a.var() # variance

1.25

In [None]:
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [None]:
A.sum()

45

In [None]:
A.mean()

5.0

In [None]:
A.std()

2.581988897471611

We can sum by the columns or by the rows. The `axis` parameter specifies the axis along which to perform the operation. For example, if we want to sum the elements of a 2D array along the columns, we can set `axis=0`. If we want to sum the elements along the rows, we can set `axis=1`. The default value of `axis` is `None`, which means that the operation will be performed on the entire array.

In [None]:
A.sum(axis=0) # sum along the columns

array([12, 15, 18])

In [None]:
A.sum(axis=1) # sum along the rows

array([ 6, 15, 24])

In [None]:
A.mean(axis=0) 

array([4., 5., 6.])

In [None]:
A.mean(axis=1)

array([2., 5., 8.])

In [None]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [None]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And many [more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

---

### Broadcasting and Vectorized operations

`Broadcasting` is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations. It automatically expands the smaller array to match the shape of the larger array, allowing for element-wise operations without the need for explicit loops.
This is particularly useful when performing operations on arrays of different dimensions or shapes, as it allows for more concise and efficient code. Broadcasting can be thought of as a way to "stretch" the smaller array to match the shape of the larger array, enabling element-wise operations without the need for explicit loops.

The `numpy` library provides a number of functions that can be used to perform vectorized operations on arrays. These functions are optimized for performance and can be used to perform operations on large arrays without the need for explicit loops. Some of the most commonly used vectorized functions in NumPy include:
* `numpy.add()`: Adds two arrays element-wise.
* `numpy.subtract()`: Subtracts one array from another element-wise.
* `numpy.multiply()`: Multiplies two arrays element-wise.
* `numpy.divide()`: Divides one array by another element-wise.
* `numpy.power()`: Raises one array to the power of another element-wise.
* `numpy.sqrt()`: Computes the square root of each element in an array.
* `numpy.exp()`: Computes the exponential of each element in an array.
* `numpy.log()`: Computes the natural logarithm of each element in an array.
* `numpy.sin()`: Computes the sine of each element in an array.
* `numpy.cos()`: Computes the cosine of each element in an array.
* `numpy.tan()`: Computes the tangent of each element in an array.
* `numpy.arcsin()`: Computes the inverse sine of each element in an array.
* `numpy.arccos()`: Computes the inverse cosine of each element in an array.
* `numpy.arctan()`: Computes the inverse tangent of each element in an array.
* `numpy.arctan2()`: Computes the inverse tangent of two arrays element-wise.
* `numpy.degrees()`: Converts radians to degrees.
* `numpy.radians()`: Converts degrees to radians.
* `numpy.clip()`: Clips the values in an array to a specified range.
* `numpy.where()`: Returns the indices of elements in an array that satisfy a specified condition.
* `numpy.nonzero()`: Returns the indices of non-zero elements in an array.
* `numpy.meshgrid()`: Creates a mesh grid from two or more arrays.
* `numpy.mgrid()`: Creates a mesh grid from two or more arrays using a more compact syntax.
* `numpy.ogrid()`: Creates a mesh grid from two or more arrays using a more compact syntax and broadcasting.
* `numpy.indices()`: Creates an array of indices for a given shape.
* `numpy.ravel()`: Flattens an array into a 1D array.
* `numpy.flatten()`: Flattens an array into a 1D array.
* `numpy.transpose()`: Transposes an array.
* `numpy.swapaxes()`: Swaps two axes of an array.
* `numpy.moveaxis()`: Moves an axis of an array to a new position.
* `numpy.rollaxis()`: Rolls the specified axis of an array to a new position.
* `numpy.expand_dims()`: Expands the dimensions of an array.
* `numpy.squeeze()`: Removes single-dimensional entries from the shape of an array.
* `numpy.concatenate()`: Joins two or more arrays along a specified axis.
* `numpy.stack()`: Joins two or more arrays along a new axis.
* `numpy.hstack()`: Joins two or more arrays horizontally.
* `numpy.vstack()`: Joins two or more arrays vertically.
* `numpy.dstack()`: Joins two or more arrays depth-wise.
* `numpy.column_stack()`: Joins two or more arrays column-wise.
* `numpy.row_stack()`: Joins two or more arrays row-wise.
* `numpy.hsplit()`: Splits an array into multiple sub-arrays horizontally.
* `numpy.vsplit()`: Splits an array into multiple sub-arrays vertically.
* `numpy.dsplit()`: Splits an array into multiple sub-arrays depth-wise.
* `numpy.split()`: Splits an array into multiple sub-arrays along a specified axis.
* `numpy.array_split()`: Splits an array into multiple sub-arrays along a specified axis, allowing for uneven splits.
* `numpy.tile()`: Constructs an array by repeating an input array a specified number of times along each axis.
* `numpy.repeat()`: Constructs an array by repeating elements of an input array a specified number of times along each axis.



In [None]:
a = np.arange(4)

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a + 10

array([10, 11, 12, 13])

In [None]:
a * 10

array([ 0, 10, 20, 30])

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a += 100

In [None]:
a

array([100, 101, 102, 103])

In [None]:
np.add(a, 10) # add 10 to each element

array([110, 111, 112, 113])

In [None]:
np.subtract(a, 10) # subtract 10 from each element

array([90, 91, 92, 93])

In [None]:
np.multiply(a, 10) # multiply each element by 10

array([1000, 1010, 1020, 1030])

In [None]:
np.divide(a, 10) # divide each element by 10

array([10. , 10.1, 10.2, 10.3])

In [None]:
np.power(a, 2) # raise each element to the power of 2

array([10000, 10201, 10404, 10609], dtype=int32)

---

### Boolean arrays

Also called was [masks](https://plainenglish.io/blog/numpy-masks-in-python)

Boolean arrays are arrays that contain only `True` and `False` values. They are often used in NumPy for indexing and filtering data. Boolean arrays can be created using comparison operators, logical operators, and functions that return boolean values. 

In [None]:
a = np.arange(5)

print(a <= 3)

[ True  True  True  True False]


In [None]:
a = np.arange(4)

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a[0], a[-1]

(0, 3)

In [None]:
a[[0, -1]]

array([0, 3])

In [None]:
a[[True, False, False, True]] # boolean indexing 

array([0, 3])

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a >= 2

array([False, False,  True,  True])

In [None]:
a[a >= 2]

array([2, 3])

In [None]:
a.mean()

1.5

In [None]:
a[a > a.mean()]

array([2, 3])

In [None]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [None]:
a[(a <= 2) & (a % 2 == 0)]

array([2])

In [None]:
A = np.random.randint(100, size=(3, 3))

In [None]:
A

array([[46, 38, 44],
       [83, 48,  3],
       [99,  7, 59]])

In [None]:
A[np.array([[True, False, True], [False, True, False], [True, False, True]])]

array([46, 44, 48, 99, 59])

In [None]:
A > 30

array([[ True,  True,  True],
       [ True,  True, False],
       [ True, False,  True]])

In [None]:
A[A > 30]

array([46, 38, 44, 83, 48, 99, 59])

---

### Size of objects in Memory

Let´s compare the size of objects in memory with Python and Numpy to check the difference of CPU performance. The `sys.getsizeof()` function returns the size of an object in bytes. The `numpy` library provides the `nbytes` attribute of an array, which returns the total number of bytes used by the array data. The `itemsize` attribute of an array returns the size of each element in bytes. The `nbytes` attribute is equal to the product of the `size` and `itemsize` attributes.

With `Numpy` we will notice that the size of the object in memory is smaller than with Python. This is because `Numpy` uses a more efficient representation of the data, which allows it to store the same amount of data in less memory. This is particularly important when working with large datasets, as it can significantly reduce the amount of memory required to store the data and improve performance when performing operations on the data. Therefore, using `Numpy` can lead to significant performance improvements when working with large datasets, as it allows for more efficient storage and computation of the data then with Python.

#### Int, floats

In [103]:
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [104]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [105]:
# Numpy size is much smaller
np.dtype(int).itemsize

4

In [106]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [107]:
np.dtype(float).itemsize

8

#### Lists are even larger

In [108]:
# A one-element list
sys.getsizeof([1])

64

In [109]:
# An array of one element in numpy
np.array([1]).nbytes

4

#### performance is also important

In [110]:
l = list(range(100000))

In [111]:
a = np.arange(100000)

In [None]:
%time np.sum(a ** 2) # much faster than python

CPU times: total: 0 ns
Wall time: 997 μs


216474736

In [113]:
%time sum([x ** 2 for x in l])

CPU times: total: 15.6 ms
Wall time: 8.14 ms


333328333350000