## Working with Numpy Arrays

Once we have created an array (or more than one), we will want to work with it in different ways. The most simple ways one can manipulate an array is to access data with in it, extract subsets of data, split it, reshape it or join it to another array.

Thus in this notebook we are going to address the following topics:

* Attributes of arrays: size, shape, memory consumption
* Indexing of arrays: Extracting and setting the value of individual array elements
* Slicing of arrays: Extracting and setting smaller subarrays within a larger array
* Reshaping of arrays: Changing the shape of a given array
* Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

First let's import the NumPy package:

In [None]:
import numpy as np
np.random.seed(1234567890)  # seed for reproducibility

## Array Attributes

Before we start manipulating an array, it's probably useful to understand the attributes of an array. In the previous notebook we already discussed the data type of an array, but there are other useful attributes we may be interested in. 

Let's start by creating an array:

In [None]:
x1 = np.random.randint(10, size=(3, 4))
x1

Every array has attributes that detail information about the dimensionality, shape and size:

In [None]:
print("x1 ndim: ", x1.ndim)
print("x1 shape:", x1.shape)
print("x1 size: ", x1.size)

and we can extract the data type:

In [None]:
print("dtype:", x1.dtype)

In [None]:
x1.astype(np.float).dtype

We can also find out the size of individual elements of an array (in bytes), and the size of the entire array:

In [None]:
print("itemsize:", x1.itemsize, "bytes")
print("nbytes:", x1.nbytes, "bytes")

NumPy arrays of different data types are stored with different sizes on our computer:

In [None]:
print("itemsize as float:", x1.astype(np.float).itemsize, "bytes")
print("nbytes as float:", x1.astype(np.float).nbytes, "bytes")

## Array Indexing

Similar to lists and other Python objects we discussed in the 'Basics' module, NumPy arrays are indexed, and we can access particular elements of an array using the square bracket notation:

In [None]:
x2 = np.random.randint(5, size = 10)
x2

In [None]:
x2[1]

Like all Python objects, it is important to remember that the index starts at zero, 0:

In [None]:
x2[0] # indexing starts at zero

And we can acccess elements by indexing from the back of an array using the -n notation:

In [None]:
x2[-1]

In [None]:
x2[-2]

### Indexing Multidimensional Arrays

We can also index arrays in higher dimensions.

For two dimensional arrays, providing a single value inside the square bracket while extract the entire row:

In [None]:
x1[0]

To extract individual elements we need to provide the complete index, using two dimensions:

In [None]:
x1[0,0]

The indexing 'from the back' method also works on multidimensional arrays:

In [None]:
x1[-1,-1]

In [None]:
x1[-1,-2]

### Using Indexing to Modify Values:

We can use the index notation to modify the values:

In [None]:
x1[0,0] = 5
x1

But we should keep in mind that unlike a Python list, Numpy requires elements in an array to have a fixed type:

In [None]:
x1[0,0] = 3.14
x1

In [None]:
try:
    x1[0,0] = 'hello'
except ValueError:
    print('Elements must have same type')
    
x1

## Array Slicing

Instead of only extracting individual array elements, we can use indexing to access sub-arrays through *slicing*. The standard syntax is

```python
x[start:stop:step]
```

and one or more of these can be left unspecified. By leaving a value unspecified you get the default values start=0, stop=size of dimension, step=1.

Let's see this in action:

In [None]:
x  = np.arange(10)
x

In [None]:
x[:5]

In [None]:
x[5:]

In [None]:
x[2:5]

In [None]:
x[::2]

In [None]:
x[2::2]

In [None]:
x[0:7:2]

In [None]:
x[1::2]

### Multi-dimensional arrays

One can access sub-arrays from multiple dimensions. We need to separate the slices we want to extract with a comma like we did when getting individual values:

In [None]:
x1

In [None]:
x1[1:]

In [None]:
x1[1:, :2]

In [None]:
x1[1::2, ::2] # odd rows, even columns

#### Accessing specific rows or columns

Often we want to extract an entire row or column a 2-D array:

In [None]:
x1[0, :]

In [None]:
# equiv to
x1[0]

In [None]:
x1[:,3]

## No Copy Views

An important to know fact about array slicing is that it returns a *view* of the array data, rather than a copy:

In [None]:
x3 = x1[:2, 1:3]
x3

In [None]:
x3[1,0] = 30
x3

In [None]:
x1 # yikes!

The decision to return a view when we slice is actually memory efficient. If we have a large array we are not storing multiple copies of the same information.

If we do want an explicit copy of the array - we can get it using the `copy()` method:

In [None]:
x4 = x1[:2, 1:3].copy()
x4

So that any modifications do not influence the original array:

In [None]:
x4[1,0] = 11
x4

In [None]:
x1 ## much more desirable

## Array Reshaping

An array can be reshaped to take on a different set of dimensions using the `reshape` method:

In [None]:
grid = np.arange(1,10)
grid

In [None]:
matrix = grid.reshape(3,3)
matrix

But the initial size of the array must match the size of the new, reshaped array:

In [None]:
grid.reshape(3,4)

Arrays by themselves do not possess the shape of a matrix, i.e there is not row and column information:

In [None]:
x = np.arange(1,4)
x

In [None]:
x.shape

We can use the `reshape` method to cast an array into multidimensions:

In [None]:
x.reshape(1,3)

In [None]:
x.reshape(1,3).shape

Or, we can use the `newaxis` keyword inside a slice operation:

In [None]:
x[:, np.newaxis]

In [None]:
x[:, np.newaxis].shape

In [None]:
x[np.newaxis]

In [None]:
x[np.newaxis].shape

## Array Concatenation

Multiple arrays can be combined into one using the functionality provided by `np.concatenate`, `np.vstack` and `np.hstack`

In [None]:
y1 = np.arange(1,4)
y1

In [None]:
y2 = np.arange(4,7)
y2

In [None]:
np.concatenate([y1,y2])

In [None]:
x1

In [None]:
np.concatenate([x1,x1])

By default, concatenate appends row-wise but if we want column-wise concatenation we can simply set `axis=1`:

In [None]:
np.concatenate([x1,x1], axis = 1)

We can only concatenate arrays of the same shape:

In [None]:
y3 = np.arange(4,10).reshape(2,3)
y3

In [None]:
y3.shape

In [None]:
y1.shape

In [None]:
np.concatenate([y1,y3])

We can instead use the vstack functionality to get the desired result:

In [None]:
np.vstack([y1,y3])

Arrays can be stacked along the column/horizontal axis too:

In [None]:
np.hstack([y3, [[99],[99]] ])

## Array Splitting

We can also do the opposite of concatenation, which is known as splitting using `np.split`, `np.hsplit` and `np.vsplit`:

In [None]:
z = np.arange(15)
z

In [None]:
np.split(z, 3) # split arrays must have equal dimension

In [None]:
z2 = z.reshape(3,5)
z2

We can put the elements from the split into their own variables using tuples:

In [None]:
z21, z22, z23  = np.split(z2, 3)

z21

In [None]:
np.split(z2, 5, axis = 1)

In [None]:
np.hsplit(z2, [2])

In [None]:
np.vsplit(z2, [2])