## Intro to NumPy

* **NumPy** is the primary scientific computing library in Python that adds support for fast and efficient handling of large multi-dimensional arrays (matrices).
* The primary building block of NumPy is the `numpy.ndarray`, also known by and commonly referred to using the alias `numpy.array` or `np.array`. 

### Importing NumPy

In [2]:
import numpy as np

## NumPy Arrays
The easiest way to create a NumPy `np.array` is to use the `np.array()` constructor with an existing Python `list` or `tuple`. <br>
Note that the `np.array()` constructor expects a ***single*** `list` or `tuple` as the first argument.

In [2]:
np.array([1, 2, 3])

array([1, 2, 3])

To create a multi-dimensional `np.array`, we can simply use a nested `list` or `tuple`.

In [3]:
# nested lists
np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [7]:
# nested tuples
np.array((((1, 2), (3, 4)), ((5, 6), (7, 8)), ((9, 10), (11, 12))))

array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]]])

In [8]:
# this will result in an error
np.array(1, 2, 3)

TypeError: array() takes from 1 to 2 positional arguments but 3 were given

If desired, we can use the `dtype` argument to set the datatype of the `np.array` upon creation.

In [None]:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)

In [9]:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=str)

array([['1', '2', '3'],
       ['4', '5', '6'],
       ['7', '8', '9']], dtype='<U1')

In [10]:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=complex)

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j],
       [7.+0.j, 8.+0.j, 9.+0.j]])

### Array Attributes

NumPy arrays have multiple useful attributes that give us information about the array. Here are some of the most common and useful ones.

- `.size` gives you the **number of elements** in the array
- `.shape` gives you the **dimensions** (size in each dimension) of the array
- `.ndim` gives you the **number of dimensions** (dimensionality) of the array
- `.dtype` gives you the **datatype** of the array

In [11]:
a = np.array([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]],
              [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]])
print(a)

[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]


In [12]:
# number of elements
a.size

24

In [13]:
# dimensions
a.shape

(2, 3, 4)

In [14]:
# number of dimensions
a.ndim

3

In [15]:
# datatype
a.dtype

dtype('int64')

### Other ways of Creating Arrays

Often we might want to create an empty array with desired dimensions or initialize an array with a common value. NumPy contains a lot of helpful constructors for that.

- `np.zeros()` creates an array and fills it with zeros
- `np.ones()` creates an array and fills it with ones
- `np.full()` creates an array and sets all elements to a specified value

For a one-dimensional array, you just pass the number of elements to these constructors.

In [16]:
np.zeros(3)

array([0., 0., 0.])

In [17]:
np.ones(4)

array([1., 1., 1., 1.])

In [18]:
np.full(5, fill_value=7)

array([7, 7, 7, 7, 7])

If we would like a multi-dimensional array instead, we simply pass a `tuple` or `list` with the dimensions.

In [19]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [22]:
np.full([3, 2, 2], fill_value=7)

array([[[7, 7],
        [7, 7]],

       [[7, 7],
        [7, 7]],

       [[7, 7],
        [7, 7]]])

Note how the order of the dimensions in the tuple is from highest to lowest. This order of dimensions applies across all of NumPy.
- `(2, 3)` creates an array with two rows and three elements in each row
- `(2, 3, 4)` creates an array with two panes with each pane consisting of three rows, each of which have four elements

Also note how both `np.zeros()` and `np.ones()` create an array with floating-point elements by default. If we want an integer array, we have to specify that using the `dtype` argument.

In [23]:
np.zeros((2, 3), dtype=int)

array([[0, 0, 0],
       [0, 0, 0]])

There also exists a constructor `np.empty()` that creates an ***uninitialized*** array of specified dimensions. This means that it allocates space in the computer memory for this array, but does not change the contents of said memory. The result is not an array one would usually consider *empty*. Instead you get an array filled with random garbage, also commonly referred to as *dead squirrels* in computing jargon. Basically, an array created with `np.empty()` contains the numeric representation of **whatever was in memory before the creation of the array**. The reason one would use `np.empty()` would be to quickly create an array that will be **completely filled** with totally new values later on.

In [24]:
np.empty([5, 6])

array([[0.00000000e+000, 2.05833592e-312, 2.41907520e-312,
        2.56761491e-312, 1.93101617e-312, 1.03977794e-312],
       [6.79038653e-313, 9.33678148e-313, 1.08221785e-312,
        9.33678148e-313, 1.93101617e-312, 9.33678148e-313],
       [1.12465777e-312, 6.79038653e-313, 1.97345609e-312,
        6.79038653e-313, 1.16709769e-312, 6.79038653e-313],
       [9.33678148e-313, 1.20953760e-312, 1.97345609e-312,
        6.79038653e-313, 2.46151512e-312, 2.37663529e-312],
       [1.29441743e-312, 2.35541533e-312, 2.37663529e-312,
        2.14321575e-312, 8.70018275e-313, 0.00000000e+000]])

To create a sequence of elements, use `np.arange()` or `np.linspace()`.

- `np.arange()` works similarly to the built-in Python `range()` function and should be used for **integer** sequences
- `np.linspace()` takes a start and end point and the ***number*** of elements desired (instead of a step) and is more suitable for use with **floating-point** numbers

In [25]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [26]:
np.arange(-10, 10, 2)

array([-10,  -8,  -6,  -4,  -2,   0,   2,   4,   6,   8])

**Note** how `np.arange()` uses a half-open interval *`[start, stop)`* just like the normal `range()` function, meaning th

In [27]:
np.arange(11)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [28]:
np.arange(-10, 11, 2)

array([-10,  -8,  -6,  -4,  -2,   0,   2,   4,   6,   8,  10])

**While** possible, it is not recommended to use `np.arange()` with **floating-point** numbers as is it difficult to predict the final number of elements. (This is due to the *somewhat imprecise* way floating-point numbers are stored in computer memory.) Hence, it is recommended you use `np.linspace()` when working with floating-point numbers. (Note that `np.linspace()` can also be used with integers.)

In [29]:
np.linspace(0, np.pi, 20)

array([0.        , 0.16534698, 0.33069396, 0.49604095, 0.66138793,
       0.82673491, 0.99208189, 1.15742887, 1.32277585, 1.48812284,
       1.65346982, 1.8188168 , 1.98416378, 2.14951076, 2.31485774,
       2.48020473, 2.64555171, 2.81089869, 2.97624567, 3.14159265])

### Creating Random Arrays

Note that although `np.empty()` gives you an array with *random garbage*, the values themselves might not be **random** at all. To create an array with truly random values, we should use constructors from `np.random`.

- `np.random.random()` gives you an array with random elements uniformly distributed in an half-open range of *`[0.0, 1.0)`*
- `np.random.normal()` allows you to specify the mean and standard deviation of the uniform distribution of the random elements

In [30]:
np.random.random((3, 4))

array([[0.01967374, 0.10630848, 0.17578497, 0.38758572],
       [0.26286781, 0.86618898, 0.48810453, 0.57858092],
       [0.13420439, 0.2437913 , 0.93517337, 0.7215739 ]])

In [31]:
np.random.normal(0, 1, (3, 4))  # mean of zero and a standard deviation of one

array([[ 0.49284432,  0.02520874,  1.29517373, -0.79215277],
       [-0.40950395, -1.20288892,  1.15142491, -0.05660124],
       [ 0.05605237, -2.18719282,  0.2817525 ,  1.85399511]])

## Basic Operations

All arithmetic operators on NumPy arrays apply ***element-wise***.

In [32]:
a = np.array([[ 1,  2,  3],
              [ 4,  5,  6],
              [ 7,  8,  9]])

b = np.array([[10, 11, 12],
              [13, 14, 15],
              [16, 17, 18]])

In [33]:
a + 1

array([[ 2,  3,  4],
       [ 5,  6,  7],
       [ 8,  9, 10]])

In [35]:
b - 1

array([[ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17]])

In [36]:
a + b

array([[11, 13, 15],
       [17, 19, 21],
       [23, 25, 27]])

In [37]:
a * b

array([[ 10,  22,  36],
       [ 52,  70,  90],
       [112, 136, 162]])

In [40]:
b / 10

array([[1. , 1.1, 1.2],
       [1.3, 1.4, 1.5],
       [1.6, 1.7, 1.8]])

To do ***matrix multiplication*** in NumPy, you must use the `.dot()` method or the `@` matmul operator.

In [41]:
np.dot(a, b)

array([[ 84,  90,  96],
       [201, 216, 231],
       [318, 342, 366]])

In [42]:
a.dot(b)

array([[ 84,  90,  96],
       [201, 216, 231],
       [318, 342, 366]])

In [43]:
a @ b

array([[ 84,  90,  96],
       [201, 216, 231],
       [318, 342, 366]])

Logical operators also apply ***element-wise*** in NumPy and return a *boolean* array.

In [44]:
a > b

array([[False, False, False],
       [False, False, False],
       [False, False, False]])

In [45]:
a < b

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [47]:
a == 2

array([[False,  True, False],
       [False, False, False],
       [False, False, False]])

In [48]:
# true if element is even
a % 2 == 0

array([[False,  True, False],
       [ True, False,  True],
       [False,  True, False]])

### Broadcasting
Note how we could easily do both arithmetic and logical operations between NumPy arrays and scalar values (single numbers). This is because NumPy does something called ***broadcasting***, which allows you to do arithmetic operations between arrays of different but compatible dimensions. Basically, numpy will copy over the single value or either of the arrays as many times as needed in order to match up the dimensions.

*You can learn more about broadcasting here: https://numpy.org/doc/stable/user/basics.broadcasting.html*

In [49]:
a = np.array([[1, 1, 1],
              [1, 1, 1],
              [1, 1, 1]])

b = np.array([2, 3, 4])

c = np.array([[2],
              [3],
              [4]])

In [50]:
a * b

array([[2, 3, 4],
       [2, 3, 4],
       [2, 3, 4]])

In [51]:
a * c

array([[2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]])

In [52]:
b * c

array([[ 4,  6,  8],
       [ 6,  9, 12],
       [ 8, 12, 16]])

### Vectorization

Vectorization in very basic terms is parallelisation of calculations. Instead of using time-consuming loops, vectorization allows fast and efficient computation. 

Most of the functions you call using NumPy in your python code are merely wrappers for underlying code in C where most of the heavy lifting happens. In this way, NumPy can move the execution of loops to C, which is much more efficient than Python when it comes to looping. 

In fact, broadcasting works by providing a means of vectorizing array operations so that looping occurs in C instead of Python.

In [30]:
# multiplication comparision with and without vectorizaion
size = 100

a = np.ones((size,size), dtype='int')
b = np.full((size,size), fill_value = 5, dtype='int')

In [31]:
# element-wise array multiplication with and without numpy vectorization

def multiply_lists_no_vectorization(a,b):
    for i,j in zip(a,b):
        i*j
        
def multiply_arrays_with_vectorization(a,b):
    a*b

In [32]:
# compare computation time
%timeit -n 10000 -r 5 multiply_lists_no_vectorization(a, b)
%timeit -n 10000 -r 5 multiply_arrays_with_vectorization(a,b)

44.2 µs ± 1.37 µs per loop (mean ± std. dev. of 5 runs, 10,000 loops each)
5.02 µs ± 43.9 ns per loop (mean ± std. dev. of 5 runs, 10,000 loops each)


## Aggregation Functions

NumPy also contains some handy aggregation functions that either operate on the whole array or along a specified axis of the array.

In [53]:
a = np.array([[1, 2],
              [5, 3],
              [4, 6]])

In [54]:
a.max()

6

In [55]:
a.min()

1

In [56]:
a.sum()

21

To run these aggregations along a certain axis, you have to specify the axis number using the `axis` named argument. The axis numbers range from `0` to `ndim-1` with the axis of the lowest dimension having the number `0` and the axis of the highest dimension having the number `ndim-1`. However, most of the time you will be dealing with two-dimensional arrays, in which case it is good to just keep in mind the following.

- `axis=0` preforms the operation **across rows** and results in a single output value for each column
- `axis=1` preforms the operation **across columns** and results in a single output value for each row

In [57]:
a.max(axis=0)

array([5, 6])

In [58]:
a.max(axis=1)

array([2, 5, 6])

## Indexing and Slicing

You can select elements or ranges of elements from NumPy arrays as you would from a built-in Python `list`. If you are a avid MATLAB user, just keep in mind these three key differences:

1. Python uses **zero-based indexing**, meaning that the fist element of an array (or list) is at position zero.
2. Square brackets **`[ ]`** are the indexing operator in Python.
3. Negative indices count from the end, meaning that the **last** element of an array (or list) is at position `[-1]`. 

In [59]:
a = np.array([[ 1,  2,  3,  4],
              [ 5,  6,  7,  8], 
              [ 9, 10, 11, 12], 
              [13, 14, 15, 16], 
              [17, 18, 19, 20]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


In [60]:
a[0]

array([1, 2, 3, 4])

In [61]:
a[1]

array([5, 6, 7, 8])

In [62]:
a[1:3]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

*Remember* that when using *`[start:end]`* to slice in Python, the *`end`* index is exclusive, meaning that the element at index *`end`* is not included in the slice.

You can also use *`[start:end:step]`* with NumPy arrays. (Remember that omitting the *`start`* index means *slice from beginning* and omitting the *`end`* index means *slice until end*.)

In [63]:
a[::2]

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12],
       [17, 18, 19, 20]])

It is good to know that using `-1` as the step when slicing reverses the selection.

In [64]:
a[0][::-1]

array([4, 3, 2, 1])

In [65]:
a[::-1]

array([[17, 18, 19, 20],
       [13, 14, 15, 16],
       [ 9, 10, 11, 12],
       [ 5,  6,  7,  8],
       [ 1,  2,  3,  4]])

To access elements from multi-dimensional arrays, we can use **chained indexing**.

In [66]:
a[-1][0]

17

In [67]:
a[0][1:3]

array([2, 3])

However, chained indexing has its limitations. For example, slicing a multi-dimensional array also returns a multi-dimensional array, often leading to confusion when using chained indexing.

You can also index multi-dimensional NumPy arrays by including multiple comma-separated indices or ranges in the **`[ ]`** indexing operator, one for each dimension. It is recommended to use this approach as opposed to chained indexing. Note that the order of dimensions is again from highest to lowest.

In [68]:
# print out the matrix again for reference
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


In [69]:
# second element in first row
a[0, 1]

2

In [70]:
# first element in second row
a[1, 0]

5

In [71]:
# entire second row
a[1, :]

array([5, 6, 7, 8])

In [72]:
# entire second column
a[:, 1]

array([ 2,  6, 10, 14, 18])

In [73]:
# the last element from the second and third rows
a[1:3, -1]

array([ 8, 12])

In [74]:
# the middle 3x2 selection
a[1:4, 1:3]

array([[ 6,  7],
       [10, 11],
       [14, 15]])

In [75]:
# the upper-right-most 4x3 selection
a[:4, 1:]

array([[ 2,  3,  4],
       [ 6,  7,  8],
       [10, 11, 12],
       [14, 15, 16]])

You can use indexing to change single elements and slicing to change entire selections.

In [76]:
a[0, 0] = 0
print(a)

[[ 0  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


In [77]:
a[3:, 2:] = 0
print(a)

[[ 0  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14  0  0]
 [17 18  0  0]]


## Iterating

Iterating over multi-dimensional NumPy arrays is done with respect to the highest dimension (first axis).

In [78]:
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

In [79]:
for row in a:
    print(row)

[1 2 3]
[4 5 6]
[7 8 9]


To iterate over ***each element*** of a multi-dimensional array, one may use nested loops.

In [80]:
for row in a:
    for element in row:
        print(element)

1
2
3
4
5
6
7
8
9


However, nested loops are often inefficient and could easily lead to confusion and unmaintainable code. Hence, it is recommended to avoid nested loops if possible.

Luckily for you, NumPy includes functionality for easily iterating over all objects in a NumPy array. For example, you could use the `.flat` attribute.

In [81]:
for element in a.flat:
    print(element)

1
2
3
4
5
6
7
8
9


Note that `.flat` returns an iterator. Basically, that is just something that tells Python how to iterate over ***all*** the elements of the array using a `for` loop. It does not actually return a flattened one-dimensional version of the original array. To do that, we can use the `.flatten()` method.

In [82]:
a.flatten()

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

There are two ways to flatten an array:

- `order='F'` results in **Fortran**-like ***column-major*** behavior
- `order='C'` results in **C**-like ***row-major*** behavior (which is also the default)

In [1]:
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

NameError: name 'np' is not defined

In [None]:
a.flatten()

In [None]:
a.flatten(order='F')

## Copy vs View

Let's say we have the following matrix `a`.

In [86]:
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

In [87]:
b = a[1:, 1:]
print(b)

[[5 6]
 [8 9]]


In [88]:
# Now let's modify the upper-left-most element of b.
b[0, 0] = 0
print(b)

[[0 6]
 [8 9]]


*After* playing around with `b` and modifying its values we want to go back to `a` and take a look at the original values again.

In [89]:
print(a)

[[1 2 3]
 [4 0 6]
 [7 8 9]]


**The values in our original matrix `a` have also changed!**

That is because most NumPy operations return a ***view*** of the original array instead of a copy. This is computationally more efficient and allows NumPy to preform fast operations even on really large and complex arrays because the data is never copied over in computer memory. Instead we are shown the same array stored in memory using a slightly different view (you can think of it as a window) that perhaps blocks out some elements and changes the order of others. **Most** operations in NumPy, including **all** indexing and slicing operations, result in a different ***view*** of the original array, never a copy.

However, this is not how MATLAB handles things. In MATLAB, most operations result in a ***copy*** of the original array, allowing you to modify the outputs of various operations without having to worry about changing the original data. Hence, avid MATLAB users must keep in mind that this is not the case in NumPy to avoid unintentionally overwriting data.

It is also crucial to note that this behavior of returning a ***view*** is not universal in NumPy. While **most** operations return a ***view*** some might return a ***copy***. Furthermore, due to the optimizing behavior of Python, in some cases the same function or operation might sometimes return a view and other times return a copy, depending on the input and whatever is most efficient at the time. Hence, you should **always read the documentation** of a function or method to know for sure whether it returns a view or a copy in your particular use case.

However, when using NumPy, it is safe to assume that everything returns a ***view*** unless explicitly asked otherwise. To ensure you are working with a ***copy*** in NumPy, use the `.copy()` method.

In [90]:
# repeat exercise with copy instead now


## Shape Manipulation

NumPy makes it really easy to manipulate the shape of an array. Note that all of these manipulations just return a different ***view*** of the same array and do not actually create a new array or change data in computer memory.

In [91]:
a = np.array([[ 1,  2,  3,  4,  5,  6],
              [ 7,  8,  9, 10, 11, 12],
              [13, 14, 15, 16, 17, 18]])

In [92]:
a.reshape(2, 9)

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18]])

In [93]:
a.reshape(6, 3)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [94]:
a.T

array([[ 1,  7, 13],
       [ 2,  8, 14],
       [ 3,  9, 15],
       [ 4, 10, 16],
       [ 5, 11, 17],
       [ 6, 12, 18]])

In [95]:
a = np.arange(24).reshape(6,4)
print(a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


In [96]:
a.reshape(2, -1)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])

In [97]:
a.reshape(-1, 3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [21, 22, 23]])

In [99]:
a.reshape(2, -1, 3)

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17],
        [18, 19, 20],
        [21, 22, 23]]])

## Search, Sort, Filter

You can search an array for a certain value, and return the indexes that get a match with `np.where`

In [39]:
# Find the indexes where the value is 4:
a = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(a == 4)
print(x)

(array([3, 5, 6]),)


In [34]:
# Find the indexes where the values are even
a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(a%2 == 0)
print(x) 

(array([1, 3, 5, 7]),)


The NumPy ndarray object has a function called `sort()`, that will sort a specified array.

In [35]:
a= np.array([3, 2, 0, 1])
print(np.sort(a)) 

[0 1 2 3]


In [36]:
a = np.array(['banana', 'cherry', 'apple'])
print(np.sort(a))

['apple' 'banana' 'cherry']


Getting some elements out of an existing array and creating a new array out of them is called filtering. If the value at an index is `True` that element is contained in the filtered array, if the value at that index is `False` that element is excluded from the filtered array.

In [37]:
a = np.array([41, 42, 43, 44])
x = [True, False, True, False]

new_array = a[x]

print(new_array)

[41 43]


Sometimes you want the indices/position of the element, not the element itself. In that case `arg-` functions come in handy.
- `np.argmax`Returns the indices of the maximum values along an axis
- `np.argmin`Returns the indices of the minimum values along an axis
- `np.argsort` Returns the indices that would sort an array

In [40]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8])

In [41]:
np.argmax(a)

7

In [42]:
np.argmin(a)

0

In [43]:
np.argsort(a)

array([0, 1, 2, 3, 4, 5, 6, 7])

## File I/O with NumPy

### Writing a NumPy Array to a File

Let's say we have an array `a` that we would like to export to a file for some reason.

In [44]:
a = np.random.random((5,5))

In [45]:
print(a)

[[0.4087423  0.85036062 0.11727093 0.64444802 0.94214229]
 [0.98394459 0.86807377 0.41167469 0.44711477 0.50634084]
 [0.41937433 0.01886007 0.16057767 0.27276109 0.69533834]
 [0.2016641  0.32661234 0.06245383 0.15094267 0.15039554]
 [0.71624623 0.60866139 0.34005894 0.23732947 0.80218705]]


One option would be to use `np.save()` which saves the array to a binary `.npy` file.

In [46]:
np.save('array1', a)

However, in many cases you might actually want to be able to see the contents of the file and use it with other programs like MATLAB. In that case, it makes much more sense to save the NumPy array as a human-readable text file. This can be done using `np.savetxt()`.

In [47]:
np.savetxt('array2.txt', a)

### Reading a NumPy Array from a File

To read a binary `.npy` file into a NumPy array, we can use `np.load()`.

In [48]:
b = np.load('array1.npy')

In [49]:
b

array([[0.4087423 , 0.85036062, 0.11727093, 0.64444802, 0.94214229],
       [0.98394459, 0.86807377, 0.41167469, 0.44711477, 0.50634084],
       [0.41937433, 0.01886007, 0.16057767, 0.27276109, 0.69533834],
       [0.2016641 , 0.32661234, 0.06245383, 0.15094267, 0.15039554],
       [0.71624623, 0.60866139, 0.34005894, 0.23732947, 0.80218705]])

To read data from a text file into a NumPy array, we can use either `np.loadtxt()`

In [50]:
c = np.loadtxt('array2.txt')

In [51]:
c

array([[0.4087423 , 0.85036062, 0.11727093, 0.64444802, 0.94214229],
       [0.98394459, 0.86807377, 0.41167469, 0.44711477, 0.50634084],
       [0.41937433, 0.01886007, 0.16057767, 0.27276109, 0.69533834],
       [0.2016641 , 0.32661234, 0.06245383, 0.15094267, 0.15039554],
       [0.71624623, 0.60866139, 0.34005894, 0.23732947, 0.80218705]])

An important thing to note when saving floating-point arrays to text files is ***loss of significance***. Because we can only store a set number of significant digits in the text file, it is possible that the number of significant digits will be reduced when writing data to a file, introducing round-off errors and causing precision loss.

Note that this is not the case when using the binary `.npy` format.

When writing to a text file using the default setting of scientific notation with 16 significant digits, precision loss does not occur under normal circumstances. However, note that this is dependent on the *datatype* of your array.

However, when specifying the number of decimal points or significant digits, or exporting with floating-point notation, precision loss is commonplace and very likely to occur.

## Acknowledgement

This is a modified version of the Tufts University Data Lab Workshop https://tuftsdatalab.github.io/intro-numpy/ developed by Uku-Kaspar Uustalu.