# 02. The N-dimensional array, *ndarray*

Documentation:  https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html

From the docs:

>An **ndarray** is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its **shape**, which is a **tuple** of N positive integers that specify the sizes of each dimension. The type of items in the array is specified by a separate **data-type object (dtype)**, one of which is associated with each ndarray.

<img src="https://www.safaribooksonline.com/library/view/elegant-scipy/9781491922927/assets/elsp_0105.png" alt="ndarray image" width="800"/>

In [None]:
import numpy as np
print(f'numpy version: {np.__version__}')

## Creating Arrays

The **ndarray** is the primary data structure in numpy, and is used heavily in many statistical and machine learning toolkits. For this reason, it is important to become very familiar with it. Let's start off by learning a few different ways to create them.

### One Dimensional

One dimensional arrays are easy to create and understand - you simply pass in the number of elements you desire and optionally specify a datatype (**dtype**).

Let's create the one dimensional array below, the indexes are printing above the array showing the 0-based indexing.

$$\begin{matrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \end{matrix} \\
\begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}$$

In [None]:
one_d_zeros = np.zeros(10)
print(one_d_zeros)
print(one_d_zeros.dtype)

NumPy uses the IEEE 754 format, which in the case of float64 is 1 one sign bit, 11 exponent bits, and 52 bits of significand.<br/>
https://en.wikipedia.org/wiki/Double-precision_floating-point_format

You can retrieve the **shape** of an array with its property of the same name

In [None]:
one_d_zeros.shape

Here is a Jupyter trick - if the last line in a cell is not assigned to a variable, its `__repr__()` will be called and sent to the output. Doing this shows us the **dtype**, whereas calling print() instead uses the array's `__str__()` method which does not display the dtype.

In [None]:
np.zeros(10, dtype=np.float32)

### Multi-Dimensional

Multi-dimensional arrays are created by passing a **shape** sequence instead of a single length parameter. The $i^{th}$ entry in the **shape** specifies the length of the $i^{th}$ dimension. When creating a two dimensional array (a matrix), the first dimension is the number of rows, and the second dimension is the number of columns. The dimension is often called the **axis** in many numpy functions. See the picture above for a visual guide to dimensions.

Let's create the following 2D array filled with zeros, which has 2 rows (**axis 0**), and 3 columns (**axis 1**):

$$\begin{matrix} & 0 & 1 & 2 \end{matrix} \\
\begin{matrix} 0 \\ 1 \end{matrix} \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$$

In [None]:
np.zeros([2, 3])

In [None]:
np.zeros([2, 3, 4])

In [None]:
np.zeros([2, 3, 4]).shape

#### Exercise:

Now write code to create a 4D array with dimensions 1, 2, 2, 2 with dtype **np.bool** that is initialized to all False (0 will be converted to False).

How many total elements will it contain? Write code to calculate the total number of elements.

In [None]:
from functools import reduce
import operator
a = np.zeros([1, 2, 2, 2], dtype=np.bool)
element_count = a.shape[0] * a.shape[1] * a.shape[2] * a.shape[3]
element_count = reduce(operator.mul, a.shape, 1) # more generic

print(a)
print()
print(f'shape: {a.shape}')
print()
print(f'element count: {element_count}')

In [None]:
a = # Create a 4D array of bool here
element_count = # Calculate total element count here

print(a)
print()
print(f'shape: {a.shape}')
print()
print(f'element count: {element_count}')

### Other Initialization methods

#### Using Python sequences
So far we've been using **np.zeros()**, which intializes the array to contain all zero (or False) values. We can also use the **np.array()** method which takes any (nested) Python sequence or another array object.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

In [None]:
np.array([1,2,3,4,5])

We can use lists or tuples, even mix them

In [None]:
np.array( [(1,2,3), (4,5,6)] )

We can also copy of another array

In [None]:
a = np.zeros(4, dtype=np.int32)
b = np.array(a)
a[0] = 1                              # modify a to show that b has new data
print(f'a: {repr(a)}, b: {repr(b)}')  # call repr to show they both have the same dtype

#### Random data

https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.random.html
    
NumPy uses the fast Mersenne Twister algorithm to generate random numbers.

Unlike the array creation methods, most random generators accept the **shape** as separate arguments instead of as a single sequence object.

**np.random.rand()** generates floating point numbers between 0 and 1:

In [None]:
np.random.rand(2, 3)

**np.random.randint()** takes a argument specifying the upper limit of the range, and a size argument for the shape:

In [None]:
np.random.randint(5, size=(2, 3))

The methods we have been using in the **random** module use a global **RandomState** object. Alternaitvely you can construct a **RandomState** object and seed it explicitly.

In [None]:
rng = np.random.RandomState(seed=42)
rng.randn(3, 5)  # normally-dsitributed

#### Miscellaneous

In [None]:
# Identity matrix
np.eye(4)

In [None]:
# Create an array but do not initialize it, be careful!
np.empty([3,3])

In [None]:
# Create an array that is the same shape as another array, but filled with zeros or ones
a = np.random.randn(2, 2)
print(a)
print()
print(np.zeros_like(a))
print()
print(np.ones_like(a))

In [None]:
# Create a range
np.arange(10)

In [None]:
# Evenly-space floats
np.linspace(5, 10, num=11)

In [None]:
# Read from a text file
np.loadtxt('data.txt', delimiter=',')

## Data Types (dtype)

So far we have been using mainly integer and floating-point data, which are the two most common types. NumPy supports a variety of other types, however, including arbitrary objects.

https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.dtypes.html

In [None]:
np.array(['foo', 'bar'])

In [None]:
a = np.array([complex(3,2), complex(5,6)])
print(a)
print(a.dtype)

In [None]:
class Foo:
    def __init__(self, val):
        self.val = val
    def __repr__(self):
        return f'Foo(val={self.val})'

np.array([Foo(1), Foo(2)])

We can convert the dtype of an array using the astype() method. Be careful when doing this... things like overflow, underflow, and loss of precision can occur.

In [None]:
a = np.array([complex(3,2), complex(5,6)])

print('Original:')
print(a)

print()
print('Converted to int32:')
int_a = a.astype(np.int32)
print(int_a)

print()
print('Converted to float32:')
int_a.astype(np.float32)

## Array Access

There are three primary ways of accessing data in an array:
1. Single element access
2. Slicing
3. Advanced Indexing

### Single element access

Numpy array uses square bracket notation ([]) just like normal python, and also uses 0-based indexing.

The primary difference between indexing a Python list and an numpy array is that you must be provide an index for each dimension of the array. Remember that for two-dimensional arrays, the first dimension indexes rows and the second dimension indexes columns.

In [None]:
a = np.random.randn(3, 3)
print(a)
a[0, 1] # first row, second column

#### Exercise:

Write code in the cell below to access the bottom-right corner of the matrix **a**

$$\begin{bmatrix} a_{0,0} & a_{0,1} & a_{0,2} \\ a_{1,0} & a_{1,1} & a_{1,2} \\ a_{2,0} & a_{2,1} & \fbox{ $a_{2,2}$ } \end{bmatrix}$$

In [None]:
# Sample implementation
a[2, 2]

In [None]:
# Put your code here

Write code in the cell below that creates a 3 x 3 x 3 array filled with random integers and selects the "middle" element in this cube.

In [None]:
# Sample implementation
a = np.random.randn(3, 3, 3)
print(a)
a[1, 1, 1]

In [None]:
# Put your code here

### Single element updates

We can also change array data using indexing:

In [None]:
a = np.random.randn(2, 2)

print('Original a:')
print(a)

a[0, 0] = 99999

print()
print('Modified a:')
print(a)

### Slicing

Sometimes it is convenient to access a contiguous portion of an array, for example the first K elements, the last K, or a run somewhere in the middle. The syntax is the same as slicing lists and tuples in plain Python:

array [ start : stop : increment ]

First let's create the following array:

$$\begin{bmatrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 \end {bmatrix}$$

In [None]:
a = np.arange(8)
a

Now let's extract the first 3 elements of the array:

In [None]:
a[ : 3]

Notice how we didn't need to specify the starting index of 0, the stop index is not inclusive, and if we leave out the increment it default to 1.

Now let's use the increment specifier to extract only the even numbered elements:

In [None]:
a[ : : 2 ]

We can also specify a negative increment to reverse the array:

In [None]:
a[ : : -1 ]

Finyll, let's take the last three elements of the array, there are two ways to do this:

In [None]:
# First way
a[ len(a)-3 : ]

In [None]:
# Second way
a[ -3 : ]

From the docs:

>"Negative indices are interpreted as counting from the end of the array (i.e., if n_i < 0, it means n_i + d_i)."

#### Exercise:

For multidimensional arrays, you can mix indexing and slicing.

First let's create the following matrix:

$$\begin{bmatrix} 0 & 1 & 2 & 3 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \end {bmatrix}$$

In [None]:
a = np.arange(12).reshape(3, 4)
a

Now use what you've learned about indexing and slicing to write code that extracts the first 2 columns of the second row.

In [None]:
# Sample implementation
extraction = a[1, :2]
print(extraction)
np.array_equal(extraction, [4, 5])

In [None]:
extraction = # your code here
print(extraction)
np.array_equal(extraction, [4, 5])

One important question about our extraction above is whether the extracted array is a copy of the original, or just a view onto it. The code below tests this, which is it?

In [None]:
my_a = np.arange(12).reshape(3, 4)

print('Original my_a:')
print(my_a)

my_extraction = my_a[1, :2]
my_extraction[0] = 999

print('Modified extraction:')
print(my_extraction)

print('my_a is now:')
print(my_a)

### Advanced Indexing

https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

From the docs:

>Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an ndarray (of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.

>Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view)."

We won't have time today to go over all of the intricacies of advanced indexing, so be sure to carefully read the docs when you need this functionality.

We'll try the first kind described above, indexing with a sequence of integers:

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])
a

In [None]:
# Take the first two rows of the second column
a[[0, 1], 1]

In [None]:
# Take the first two rows of all columns
a[[0, 1], ]

We can also pass an index sequence for each dimension. Before running the cell below, what do you think it will extract?

In [None]:
# Getting fancy: passing both a row index and a column index
a[[0, 1, 2], [0, 1, 0]]

In [None]:
# This is equivalent to:
row_idx = [0, 1, 2]
col_idx = [0, 1, 0]
print(np.array([a[r, c] for r, c in zip(row_idx, col_idx)]))

# Or
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))

The definition above calls out tuples as being different: "a non-tuple sequence object... or a tuple with at least one sequence object".

That means that these two cases are different:

In [None]:
a[(2, 1)] # equivalent to a[2, 1]

In [None]:
a[(2, 1), ]

This may seem confusing, but you just need to remember that tuples can be used to select a single element in a multidimensional array, which can be handy when writing code that computes an index:

In [None]:
my_tuple_index = (0, 1)
a[my_tuple_index]

#### Exercise:

We can combine slicing and advanced indexing. Write code below that uses an integer index list to select the first two rows, and slicing to extract the last two columns.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a

In [None]:
# Sample implementation
my_extract = a[[0, 1], -2 : ]
print(my_extract)
np.array_equal(my_extract, [[2, 3], [5, 6]])

In [None]:
extract = # put your code here
print(extract)
np.array_equal(extract, [[2, 3], [5, 6]])

Finally, let's explore the second kind of advanced indexing: using boolean sequences.

Each True element in the boolean sequence will extract the corresponding element, and each False will be left out. This is very useful for filtering arrays, and in the next section we'll show how this can be combining with overloaded comparison operators like ">".

In [None]:
a = np.arange(4)
a

In [None]:
a[[True, False, True, True]]

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a

#### Exercise:

Combine slicing and boolean indexing to extract the first and third column of all rows

In [None]:
# Sample implementation
my_extract = a[:, [True, False, True]]
print(my_extract)
np.array_equal(my_extract, [[1, 3], [4, 6], [7, 9]])

In [None]:
extract = # your code here
print(extract)
np.array_equal(extract, [[1, 3], [4, 6], [7, 9]])

### Setting elements with indexing

So far we have shown how one can use slicing and indexing to extract data from an array, but they can also be used to set values.

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
a

In [None]:
a[:, 1] = 0
a

In [None]:
a[2, [1, 2]] = 99
a

#### Exercise:

Use any combination of the indexing methods to set the diagonal of the matrix to 1, 2, and 3. In other words, it should look like this:

$$\begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end {bmatrix}$$

In [None]:
a = np.zeros([3, 3])
a

In [None]:
# Sample implementation
my_a = np.zeros([3, 3])
my_a[[0, 1, 2], [0, 1, 2]] = [1, 2, 3]
print(my_a)
np.array_equal(my_a, [[1,0,0],[0,2,0],[0,0,3]])

In [None]:
# your code here
print(a)
np.array_equal(a, [[1,0,0],[0,2,0],[0,0,3]])