# Numpy Lesson

## Introduction

Numpy, short for Numerical Python, is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. 

Many computational and data science packages use Numpy as the main building block. It is a fundamental library for scientific computing in Python.

Some features of Numpy:
- `ndarray`, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
- Mathematical functions for fast operations on entire arrays of data without having to write loops.
- Tools for reading/writing array data to disk and working with memory-mapped files.
- Linear algebra, random number generation, and Fourier transform capabilities.
- A C API for connecting Numpy with libraries written in C, C++, or FORTRAN.

The advantages of using Numpy:

- Numpy internally stores data in a contiguous block of memory, independent of other built-in Python objects. Numpy's library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences (e.g. lists).
- Numpy operations perform complex computations on entire arrays without the need for Python for loops, which can be slow for large sequences. This is called _vectorization_.

![numpy_vs_list](../assets/numpy_vs_python_list.png)

You can install Numpy by using `conda` or `pip`:

```bash
conda install numpy
```

```bash
pip install numpy
```

Then, you can import Numpy as follows:

In [1]:
import numpy as np


where np is a standard alias for numpy.

To give you an idea of the performance difference, consider a Numpy array of one million integers, and the equivalent Python list:

In [4]:
my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

Let's multiply each sequence by 2, you can use the `%timeit` magic command to measure the execution time of the code:

In [8]:
%timeit my_arr2 = my_arr * 2

129 μs ± 1.02 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [9]:
%timeit my_list2 = [x * 2 for x in my_list]

19.1 ms ± 425 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Numpy operations and algorithms are generally 10 to 100 times faster than their pure Python counterparts, and use significantly less memory.

## Numpy ndarray

Numpy's `ndarray`, or N-dimensional array, is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements. 

In [10]:
data = np.array([1.5, -0.1, 3])

In [11]:
data

array([ 1.5, -0.1,  3. ])

Multiply all of the elements by 10.

In [12]:
data * 10

array([15., -1., 30.])

Add the corresponding values in each "cell" in the array.

In [13]:
data + data

array([ 3. , -0.2,  6. ])

> Practice the above array with different arithmetic operations: 
> 
> `-`, `/`, `**`, `%`, `//`.
>

In [19]:
print(data ** 2)
print(data / 2)
print(data // 3)
print(data - 3)
print(data % 3)

[2.25 0.01 9.  ]
[ 0.75 -0.05  1.5 ]
[ 0. -1.  1.]
[-1.5 -3.1  0. ]
[1.5 2.9 0. ]


### ndarray illustration

An ndarray is a multidimensional or n-dimensional array of fixed size with homogenous elements (i.e. all elements must be of the same type). Every array has a `shape`, a tuple indicating the size of each dimension, and a `dtype`, an object describing the data type of the array.

![ndarray](../assets/numpy_ndarray.png)

In [20]:
data.shape

(3,)

In [21]:
data.dtype

dtype('float64')

The easiest way to create an array is to use the `array` function.

In [24]:
data1 = [6, 7.5, 8, 0, 1]

arr1 = np.array(data1)

arr1

array([6. , 7.5, 8. , 0. , 1. ])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array.

In [25]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]

arr2 = np.array(data2)

arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [26]:
arr2.dtype

dtype('int64')

In [27]:
arr2.shape

(2, 4)

We can also check the number of dimensions.

In [28]:
arr2.ndim

2

Besides `array`, there are other functions for creating new arrays. We have seen `arange` above, which is similar to the built-in `range` function but returns an array instead of a list.

`ones` and `zeros` create arrays of 1s and 0s, respectively, with a given length or shape. `empty` creates an array without initializing its values to any particular value. To create a higher dimensional array with these methods, pass a tuple for the shape.

In [16]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [17]:
np.zeros((3,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

> Create a new array with 3 dimensions using `ones`.

In [31]:
np.ones((6,3,3))

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

You can also explicitly specify the data type of the array.

In [32]:
arr1 = np.array([1, 2, 3], dtype=np.float64)

arr1.dtype

dtype('float64')

In [36]:
arr2 = np.array([1, 2, 3], dtype=np.int32)

arr2.dtype

dtype('int32')

Data types provide a mapping directly onto an underlying disk or memory representation. The numerical data types are named the same way: a type name, like `float` or `int`, followed by a number indicating the number of bits per element. A standard double-precision floating point value (what's used under the hood in Python's `float` object) takes up 8 bytes or 64 bits. Thus, this type is known in Numpy as `float64`. See the following table for a list of the numerical data types.

| Data type | Type code | Description |
| --- | --- | --- |
| int8, uint8 | i1, u1 | Signed and unsigned 8-bit (1 byte) integer types |
| int16, uint16 | i2, u2 | Signed and unsigned 16-bit integer types |
| int32, uint32 | i4, u4 | Signed and unsigned 32-bit integer types |
| int64, uint64 | i8, u8 | Signed and unsigned 64-bit integer types |
| float16 | f2 | Half-precision floating point |
| float32 | f4 or f | Standard single-precision floating point. Compatible with C float |
| float64 | f8 or d | Standard double-precision floating point. Compatible with C double and Python float object |
| float128 | f16 or g | Extended-precision floating point |
| complex64, complex128, complex256 | c8, c16, c32 | Complex numbers represented by two 32, 64, or 128 floats, respectively |
| bool | ? | Boolean type storing True and False values |
| object | O | Python object type |
| string_ | S | Fixed-length ASCII string type (1 byte per character). For example, to create a string dtype with length 10, use 'S10' |
| unicode_ | U | Fixed-length Unicode type (number of bytes platform specific). Same specification semantics as string_ (e.g. 'U10') |

You can explicitly convert or cast an array from one dtype to another using `astype` method.

In [37]:
arr = np.array([1, 2, 3, 4, 5])

arr.dtype

dtype('int64')

In [38]:
float_arr = arr.astype(np.float64)

float_arr

array([1., 2., 3., 4., 5.])

In [39]:
float_arr.dtype

dtype('float64')

If you cast some floating-point numbers to be of integer dtype, the decimal part will be truncated.

In [40]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

You can also convert strings representing numbers to numeric form.

In [43]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

If you write `float` instead of `np.float64`, Numpy will guess the data type for you.

> Create an array with a shape of (3, 4) and a data type of `float64`. Then convert it to `float32`.

In [47]:
arr = np.array([[3.0, 2.0, 1.0, 0.0],[5.0, 9.0, 13.0, 17.0],[-3.0, -0.4, 2.55, 0.1]], dtype=np.float64)

arr.astype(np.float32)

array([[ 3.  ,  2.  ,  1.  ,  0.  ],
       [ 5.  ,  9.  , 13.  , 17.  ],
       [-3.  , -0.4 ,  2.55,  0.1 ]], dtype=float32)

## Arithmetic with ndarrays

Arithmetic operations are applied as batch operations on arrays without any `for` loops. This is called _vectorization_. Any arithmetic operations between equal-size arrays applies the operation element-wise.

![vectorization](../assets/vectorization.png)

In [48]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [49]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [50]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

Broadcasting is another powerful feature of Numpy. It describes how arithmetic works between arrays of different shapes. For example, you can just think of the smaller array (or scalar value) being replicated multiple times to match the shape of the larger array.

In [60]:
arr + np.array([1, 1, 1])

array([[2., 3., 4.],
       [5., 6., 7.]])

Here, `[1, 1, 1]` is stretched or broadcasted across the larger array `arr` so that it matches the shape.

In [52]:
arr1 = np.array([1, 2, 3, 4])

In [53]:
arr1 + 4

array([5, 6, 7, 8])

4 becomes [4, 4, 4, 4] beneath the hood, then arithmetic happens elementwise.

In [61]:
arr1 ** 2

array([ 1,  4,  9, 16])

In [64]:
1 / arr1

array([1.        , 0.5       , 0.33333333, 0.25      ])

To find out more about broadcasting, check out the [official documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html).

Comparison between arrays of the same size yield boolean arrays.

In [65]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [66]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

## Indexing and slicing

Indexing and slicing allow you to select subsets of array data.

One-dimensional arrays are simple; on the surface they act similarly to Python lists.

In [82]:
arr = np.arange(10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Indexing to select a single element.

In [83]:
arr[5]

5

Slicing to select a range of elements.

In [84]:
arr[5:8]


array([5, 6, 7])

You can also assign value to it, which will be propagated to the entire selection.

In [85]:
arr[5:8] = 12

arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

Array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array (in-place).

In [86]:
arr_slice = arr[5:8]

arr_slice

array([12, 12, 12])

In [87]:
# You can use .copy() to make a deep copy 

arr_to_copy = np.array([[1.0,3.0,4.0],[-3,-.2,.4]])
arr_copy = arr_to_copy.copy()
print("arr_to_copy\n",arr_to_copy)
print("arr_copy\n",arr_copy)
arr_copy[0][1] = -5
print("arr_copy_mod\n",arr_copy)
print("arr_to_copy\n",arr_to_copy)

arr_to_copy
 [[ 1.   3.   4. ]
 [-3.  -0.2  0.4]]
arr_copy
 [[ 1.   3.   4. ]
 [-3.  -0.2  0.4]]
arr_copy_mod
 [[ 1.  -5.   4. ]
 [-3.  -0.2  0.4]]
arr_to_copy
 [[ 1.   3.   4. ]
 [-3.  -0.2  0.4]]


In [88]:
arr_slice[1] = 10
arr

array([ 0,  1,  2,  3,  4, 12, 10, 12,  8,  9])

The "bare" slice `[:]` will assign to all values in an array. (Just like regular python)

In [89]:
arr_slice[:] = 64

arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays.

*Basically means instead of returning a number it returns an array. `[4,5,6]` because in index `1` it is that.*
```py
[1, 2, 3]                                       v at idx 1 = 2
[4, 5, 6] -> at idx 1  = [4, 5, 6]          [1, 2, 3, 4, 5]
[7, 8, 9]
```


In [90]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

arr2d[1]

array([4, 5, 6])

You can index it "twice" to get individual elements. These two expressions are equivalent.

In [91]:
arr2d[1][2]

6

In [92]:
arr2d[1, 2]

6

For 2D array indexing the syntax is `arr2d[row_index, col_index]` or `arr2d[axis_0_index, axis_1_index]`. Think of axis 0 as the "rows" of the array and axis 1 as the "columns."

![2d_array_indexing](../assets/ndarray_axis_index.png)

To slice out the first two rows of the `arr2d` array, you can pass `[:2]` as the row index.

In [102]:
# Rmb that the last number is NOT included in the slicing. i.e. 0, 1 row only
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

You can pass multiple slices just like you can pass multiple indexes:

In [94]:
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

You can mix indexing and slicing.

In [109]:
arr2d[1, :2]

array([4, 5])

Passing a slice with `:` means to select the entire axis. To select the first column.

*More precisely, its to select everything `-inf:inf`*

*`::` is possible too since its `start:end:stop(opt)` -> `-inf:inf:1(Default)`*

In [96]:
arr2d[:, :1] # or arr2d[:, 0]

array([[1],
       [4],
       [7]])

In [110]:
# check the shape

arr2d[:, :1].shape

(3, 1)

To select the first row.

In [111]:
arr2d[:1, :] # or arr2d[0, :]

array([[1, 2, 3]])

In [99]:
# check the shape

arr2d[:1, :].shape

(1, 3)

> Assign values such that the final array looks like the following:
>
> | 1 | 2 | 3 |
> | --- | --- | --- |
> | 4 | -1 | -1 |
> | 7 | 8 | 9 |


In [114]:
arr_to_make = arr2d.copy()
arr_to_make[1,1:] = -1
arr_to_make

array([[ 1,  2,  3],
       [ 4, -1, -1],
       [ 7,  8,  9]])

## Boolean indexing

Let's consider an example where we have an array of names with duplicates, and an array of scores (for 2 subjects) that correspond to each name.

In [117]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
scores = np.array([[75, 80], [85, 90], [95, 100], [100, 77], [85, 92], [95, 80], [72, 80]])

In [118]:
scores

array([[ 75,  80],
       [ 85,  90],
       [ 95, 100],
       [100,  77],
       [ 85,  92],
       [ 95,  80],
       [ 72,  80]])

If we want to select all the rows with the corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing `names` with the string 'Bob' yields a boolean array.

In [119]:
names == "Bob"

array([ True, False, False,  True, False, False, False])

This boolean array can be passed when indexing the array.

In [120]:
scores[names == "Bob"]

array([[ 75,  80],
       [100,  77]])

You can mix boolean indexing with other slicing and indexing methods.

In [124]:
# Note: This slices via COL not the usual method. eg. get first col
scores[names == "Bob", 1] # != scores[names == "Bob"][1]

array([100,  77])

To select everything but 'Bob', you can either use `!=` or negate the condition using `~`.

In [125]:
names != "Bob"

array([False,  True,  True, False,  True,  True,  True])

In [126]:
~(names == "Bob")

array([False,  True,  True, False,  True,  True,  True])

In [127]:
scores[names != "Bob"]

array([[ 85,  90],
       [ 95, 100],
       [ 85,  92],
       [ 95,  80],
       [ 72,  80]])

`~` operator can be useful when you want to invert a boolean array referenced by a variable.

In [128]:
cond = names == "Bob"

cond

array([ True, False, False,  True, False, False, False])

In [61]:
scores[~cond]

array([[ 85,  90],
       [ 95, 100],
       [ 85,  92],
       [ 95,  80],
       [ 72,  80]])

> Show the scores for `Joe`.

In [129]:
scores[names == "Joe"]

array([[85, 90],
       [95, 80],
       [72, 80]])

You can select two or more names by combining multiple boolean conditions. Use boolean arithmetic operators like `&` (and) and `|` (or). ***AKA Bitwise Operator***

In [130]:
mask = (names == "Bob") | (names == "Will")

mask

array([ True, False,  True,  True,  True, False, False])

In [131]:
scores[mask]

array([[ 75,  80],
       [ 95, 100],
       [100,  77],
       [ 85,  92]])

In [147]:
print(scores > 80)
both_score_above_80_cond = np.all(scores>80, axis=1)
scores[both_score_above_80_cond]

[[False False]
 [ True  True]
 [ True  True]
 [ True False]
 [ True  True]
 [ True False]
 [False False]]


array([[ 85,  90],
       [ 95, 100],
       [ 85,  92]])

You can also set the values based on these boolean arrays. For example, to set all scores less than 80 to 70:

In [148]:
%timeit my_arr2 = my_arr * 2

134 μs ± 1.71 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [149]:
scores[scores < 80] = 70

In [150]:
scores

array([[ 70,  80],
       [ 85,  90],
       [ 95, 100],
       [100,  70],
       [ 85,  92],
       [ 95,  80],
       [ 70,  80]])

To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order.

In [151]:
arr = np.zeros((8, 4))

for i in range(8):
    arr[i] = i

arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [153]:
arr[[4, 3, 0, 6, -1]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

Negative indices select rows from the end.

In [154]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

## Reshaping and transposing arrays

Arrays have the `reshape` method to change the shape of a given array to a new shape that has the same number of elements. For example, you can reshape a 1D array to a 2D array with 2 rows and 3 columns.

In [155]:
arr = np.arange(15).reshape((3, 5))

arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Arrays have the `transpose` method for rearranging data. For a 2D array, `transpose` will return a new view on the data with axes swapped.

In [160]:
arr.transpose()

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

`T` attribute is a shortcut for `transpose`.

In [161]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

> Create an array with 3 dimensions using `arange` and `reshape`.

In [167]:
rubik_cube = np.arange(54).reshape((6,3,3))
for i in range(6):
    rubik_cube[i][:] = i
    
print(rubik_cube)

[[[0 0 0]
  [0 0 0]
  [0 0 0]]

 [[1 1 1]
  [1 1 1]
  [1 1 1]]

 [[2 2 2]
  [2 2 2]
  [2 2 2]]

 [[3 3 3]
  [3 3 3]
  [3 3 3]]

 [[4 4 4]
  [4 4 4]
  [4 4 4]]

 [[5 5 5]
  [5 5 5]
  [5 5 5]]]


## Universal functions

A universal function, or `ufunc`, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [171]:
arr = np.arange(10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [172]:
# calculate the square root of each element in the array

np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [173]:
# calculate the exponential of each element in the array
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

These are referred to as unary ufuncs. Others, such as `add` or `maximum`, take 2 arrays (thus, binary ufuncs) and return a single array as the result.

In [174]:
x = np.array([3, 7, 15, 5, 12])
y = np.array([11, 2, 4, 6, 8])

np.maximum(x, y)

array([11,  7, 15,  6, 12])

You can refer to the [Numpy documentation](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs) for a list of all available universal functions.

> Search for a ufunc that returns element-wise quotient and remainder simultaneously.
>
> Run it on x and y. Note that it will return a tuple of two arrays.

In [175]:
np.divmod(x,y)

(array([0, 3, 3, 0, 1]), array([3, 1, 3, 5, 4]))

## Conditional Logic

If you want to evaluate all elements in an array based on a condition, you can use `np.where`, a vectorized version of the ternary expression `x if condition else y`.

Suppose we had a boolean array and two array of values:

In [176]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])

yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])

cond = np.array([True, False, True, True, False])

If we wanted to take a value from `xarr` whenever the corresponding value in `cond` is `True` otherwise take the value from `yarr`:

In [177]:
np.where(cond, xarr, yarr)

array([1.1, 2.2, 1.3, 1.4, 2.5])

The second and third arguments to `numpy.where` don’t need to be arrays; one or both of them can be scalars. A typical use of `where` in data analysis is to produce a new array of values based on another array. 

## Array methods

You can generate a random array using `np.random` module. The `randn` function returns a sample (or samples) from the "standard normal" distribution. A standard normal distribution is a normal distribution with a mean of 0 and standard deviation of 1.

Here we generate a random 3x4 array of samples from the standard normal distribution.

In [199]:
arr = np.random.randn(3, 4) #Legacy Method
arr_non_legacy = np.random.default_rng().standard_normal((3,4))
#you can put seed in default_rng(seed = {number})

print(arr)
print(arr_non_legacy)

[[-0.36073472 -0.1760946   0.41749927  0.80276732]
 [ 0.67446841  1.17618185  0.46012281  0.5379167 ]
 [-0.97574997  1.94941384  1.20623372 -0.02721719]]
[[-0.81452891  1.36377949 -0.63793677  0.66098764]
 [-1.29948251 -0.01026557  2.0209221   0.8231871 ]
 [ 0.6959076  -1.3335941   0.04088891 -0.47733463]]


In [200]:
# average
arr.mean()

0.4737339546684951

In [201]:
# you can also use universal function
np.mean(arr)

0.4737339546684951

In [202]:
# sum

arr.sum()

5.684807456021941

You can also provide an optional argument `axis` that specifies the axis along which the statistic is computed, resulting in an array with one fewer dimension.

In [203]:
arr.mean(axis=1)

array([0.17085932, 0.71217244, 0.5381701 ])

In [84]:
arr.mean(axis=0)

array([-0.19624432, -0.51458549, -0.3130933 , -0.13066649])

`axis=1` means "compute across the columns," where `axis=0` means "compute down the rows."

Refer to the diagram again for the illustration on axes.

![ndarray](../assets/numpy_ndarray.png)

> Compute the sum across the columns of `arr`.

In [204]:
arr.sum(axis = 0)

array([-0.66201627,  2.94950109,  2.08385581,  1.31346683])

For boolean arrays, `any` tests whether one or more values in an array is `True`, while `all` checks if every value is `True`.

In [205]:
bools = np.array([False, False, True, False])

In [206]:
bools.any()

True

In [207]:
bools.all()

False

Like Python’s built-in list type, NumPy arrays can be sorted with the `sort` method. Note that this method sorts a data array _in-place_, meaning that the array contents are rearranged rather than a new array being created.

In [215]:
arr = np.random.randn(8)
arr1 = np.random.default_rng().standard_normal(8)

print(arr)
print(arr1)

[-0.6404847  -0.26343994 -0.26478306 -0.04132875  0.43984684  1.3364937
  0.80706878 -0.1138939 ]
[-1.2935401  -0.16029701  0.76919615  0.68909085 -0.55073395  0.32391256
  0.53437201  0.96761605]


In [217]:
arr.sort()
sorted(arr1)

[-1.2935401044542767,
 -0.5507339538999123,
 -0.1602970093627756,
 0.3239125554660381,
 0.5343720054528612,
 0.6890908480756954,
 0.7691961513234468,
 0.9676160482226573]

In [218]:
arr

array([-0.6404847 , -0.26478306, -0.26343994, -0.1138939 , -0.04132875,
        0.43984684,  0.80706878,  1.3364937 ])

## Unique and Other Set Logic

You can use `unique` to return a sorted unique values of an array.

In [219]:
names = np.array(['Bob', 'Will', 'Joe', 'Bob', 'Will', 'Joe', 'Joe'])

np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [220]:
np.unique(np.array([3, 3, 3, 2, 2, 1, 1, 4, 4]))

array([1, 2, 3, 4])

`in1d` tests membership (???) of the values in one array in another, returning a boolean array.

In Official Docs: `Test whether each element of a 1-D array is also present in a second array.`

In [93]:
np.in1d([2, 3, 6], [1, 2, 3, 4, 5])

array([ True,  True, False])

Refer to the [official documentation](https://numpy.org/doc/stable/reference/routines.set.html) for more set operations.

> Search for a set function that finds the common values between two arrays.
>
> Run it on x and y arrays below.

In [223]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([3, 4, 5, 6, 7])

np.intersect1d(x, y)

array([3, 4, 5])

## Linear Algebra

Linear algebra operations, like matrix multiplication, decompositions, determinants, and other square matrix math, are an important part of many array libraries. 

Multiplying two two-dimensional arrays with `*` is an element-wise product, while matrix multiplications require either using the `dot` function or the `@` infix operator.

![matrix_multiplication](../assets/matrix_multiplication.png)

In [224]:
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])

x.dot(y)

array([[ 28,  64],
       [ 67, 181]])

In [225]:
# you can also use the @ operator

x @ y

array([[ 28,  64],
       [ 67, 181]])

In [226]:
# or the dot function

np.dot(x, y)

array([[ 28,  64],
       [ 67, 181]])

You can refer to the [official documentation](https://numpy.org/doc/stable/reference/routines.linalg.html) for more linear algebra operations.

> Search for a linalg function that computes the determinant of a matrix.
>
> Run it on the array below.

In [228]:
a = np.array([[1, 2], [3, 4]])
np.linalg.det(a)

-2.0000000000000004