# NumPy is a library of prewritten mathematical functions

- Extremely useful for doing anything from simple to complex mathematics operations
- NumPy gives a useful API for data manipulation through its matrix operations
- It is conventional to import NumPy as "np"

In [1]:
import numpy as np

# Intro to arrays, attributes, and functions

First let us create a normal list and convert it to a NumPy array

In [2]:
list = [1, 2, 3, 4, 5]
array = np.array([list])
print(array)

[[1 2 3 4 5]]


Now the ```array``` variable has all the functionality of NumPy! Let's make a second list so we can start seeing some of NumPy's built in operations

In [3]:
list2 = [6, 7, 8, 9, 10]
array2 = np.array([list2])
print(array2)

[[ 6  7  8  9 10]]


Now with two arrays, we can use them to perform mathematical operations

In [4]:
add = np.add(array, array2) # Performs elementwise adding
sub = np.subtract(array, array2) # Performs elementwise subtraction
# mult = np.matmul(array, array2) # Performs matrix multiplication
div = np.divide(array, array2) # Performs elementwise division

Can anyone spot the problem in the code above?

NumPy arrays have great attributes such as ```ndim```, ```shape```, ```size```, and ```dtype```.

```ndim```: The number of dimensions

```shape```: The size of each dimension presented as a tuple

```size```: The total size of the array

```dtype```: The data type of the array

Using these attributes, we can find the problem.

In [5]:
print("array has", array.ndim, "dimensions")
print("array has the shape", array.shape)
print("array is of size", array.size)
print("array is of type", array.dtype)
print()
print("array2 has", array2.ndim, "dimensions")
print("array2 has the shape", array2.shape)
print("array2 is of size", array2.size)
print("array2 is of type", array2.dtype)

('array has', 2, 'dimensions')
('array has the shape', (1L, 5L))
('array is of size', 5)
('array is of type', dtype('int32'))
()
('array2 has', 2, 'dimensions')
('array2 has the shape', (1L, 5L))
('array2 is of size', 5)
('array2 is of type', dtype('int32'))


The problem is because ```array``` and ```array2``` are both of shape ```1 x 5``` meaning they cannot be multiplied together. A matrix of shape ```1 x 5``` can only be multiplied by a matrix of shape ```5 x k``` which would produce a matrix of shape ```1 x k```

Let's perform a simple transformation to allow the matrix multiplication to work! First let's comment out the bugged line earlier.

In [6]:
array2_transformed = array2.transpose()

We can check this new array's ```shape```

In [7]:
print(array2_transformed.shape)

(5L, 1L)


Now we can perform our matrix multiplication! Since the multiplication is ```(1 x 5) x (5 x 1)```, this should result in an array of size ```1 x 1```

In [8]:
mult = np.matmul(array, array2_transformed)
print(mult.shape)

(1L, 1L)


In [9]:
print(add)
print(sub)
print(mult)
print(div)

[[ 7  9 11 13 15]]
[[-5 -5 -5 -5 -5]]
[[130]]
[[0 0 0 0 0]]


# Manipulating Arrays

Now that we have a basic overview of NumPy arrays, we can work to manipulate them

First let me introduce a new function for instantiating arrays similar to the arrays we had before called ```arange()```

In [10]:
x1 = np.arange(12)
print(x1)

[ 0  1  2  3  4  5  6  7  8  9 10 11]


## Indexing Arrays

We can index arrays similar to in other programming language. 

__Note__: Remember that array indexing starts at index 0

In [11]:
print(x1[0])
print(x1[4])
print(x1[11])

0
4
11


Python also allows for indexing backwards by using negative indicies. So we can access the last element of an array with index ```-1```

In [12]:
print(x1[-1])

11


## Slicing Arrays: Subarrays

Accessing subarrays can be done witht he slice notation which is marked by the ```:``` character. The format is as follows:

``` x[start:stop:step] ```

where the default values are ```start = 0```, ```stop = size of dimension```, and ```step = 1```

So we can access the first five elements by overwriting the default of ```stop```

In [13]:
print(x1[:5])

[0 1 2 3 4]


We can access the middle elements by overwriting the default of ```start``` and ```stop```

In [14]:
print(x1[4:7])

[4 5 6]


We can access every other element by overwriting the default of ```step```

In [15]:
print(x1[::2])

[ 0  2  4  6  8 10]


__Note__: Having a negative ```step``` value is valid; however, the default of ```start``` and ```stop``` are swapped

Thus we can reverse an array in the following manner

In [16]:
print(x1[::-1])

[11 10  9  8  7  6  5  4  3  2  1  0]


## Array Dimensions

It is important to note the ```shape``` of the above array

In [17]:
print(x1.shape)

(12L,)


This is different than the shape we have previously seen. This is a 1-dimensional array of with 12 elements.

Sometimes this is what we want, but what if we rather wanted a matrix with 1 row and 12 columns?

We can use the ```reshape()``` function in this case to reshape the dimensions of the array

In [18]:
x1 = x1.reshape((1, 12))
print(x1.shape)
print(x1)

(1L, 12L)
[[ 0  1  2  3  4  5  6  7  8  9 10 11]]


We can also use the ```random``` package to instantiate randomly generated arrays to work with

In [19]:
np.random.seed(0) # The seed allows for consistent examples

x1 = np.random.randint(10, size=(1, 12))
print(x1)

[[5 0 3 3 7 9 3 5 2 4 7 6]]


We can use the ```reshape()``` function to change ```x1``` to be of a different shape like ```3 x 4```

In [20]:
x1 = x1.reshape((3, 4))
print(x1)

[[5 0 3 3]
 [7 9 3 5]
 [2 4 7 6]]


## Indexing and Splicing with Multi-Dimensional Arrays

It can be seen how we can extend the earlier techniques of indexing and splicing to multiple dimensions.

For indexing, it is as simple as specifying the index in all of the dimensions you wish to index

In [21]:
print(x1[1][3])

5


If we can use this same idea to not only access specific elements, but even rows or columns of our array

In [22]:
print(x1[1])
# An equivalent line of code using the splicing notation is
# print(x1[1][:])

[7 9 3 5]


We can use splicing notation along with the indexing notation to do things like get subarrays from multi-dimensional arrays

In [23]:
print(x1[1][2:4])

[3 5]


## Subarrays are views and not copies

It is best to know early on that when you slice arrays you are not getting back a copy of that subarray, but simply a view of that subarray.

Thus, any changes made to the view affect the original array as you will see below. First let's print our array

In [24]:
print(x1)

[[5 0 3 3]
 [7 9 3 5]
 [2 4 7 6]]


Now let's print a subarray of ```x1``` that we will define as ```x1_subarray```

In [25]:
x1_subarray = x1[2][1:]
print(x1_subarray)

[4 7 6]


Now let's make a change to ```x1_subarray``` and see how it affects ```x1```

In [26]:
x1_subarray[0] = -1
print(x1_subarray)
print(x1)

[-1  7  6]
[[ 5  0  3  3]
 [ 7  9  3  5]
 [ 2 -1  7  6]]


While this may seem unuseful it actually is a great feature when working with big data. If we have a large quantity of data, then we can use this to access and process smaller chuncks of our large dataset.

For now though, it would be best to know how to make actual copies of an array. We can do this using the ```copy()``` function

In [27]:
x1_subarray_copy = x1[2][1:].copy()
print(x1_subarray_copy)

[-1  7  6]


In [28]:
x1_subarray_copy[0] = 100
print(x1_subarray_copy)

[100   7   6]


Now let's see how making this change to the copy of the subarray didn't change the original ```x1``` at all.

In [29]:
print(x1)

[[ 5  0  3  3]
 [ 7  9  3  5]
 [ 2 -1  7  6]]
