# Week 1 - Numpy tutorial

In [None]:
import numpy as np

## Numpy arrays

Numpy is the basis of many python packages. The central data type is the ndarray, which stands for n-dimensional array. It looks similar to a python list, however common operations run much faster. One reason is that numpy arrays can hold only one type of data, and so doesn’t have to check the type for every single element of the array when it is doing the computations.

### Creating arrays

#### 1D arrays

In [None]:
a = np.array([1, 2, 3])
a

In [None]:
print(type(a))

In [None]:
print(a.shape)

In [None]:
print(a.dtype)

In [None]:
print(a[0], a[1], a[2])

In [None]:
a[0] = 5

In [None]:
a

#### 2D arrays

In [None]:
b = np.array([[1, 2, 3], [4, 5, 6]])
b

In [None]:
b.shape

In [None]:
print(b[0, 0], b[0, 1], b[1, 0])

Zero arrays can be created quickly and easily with `.zeros()`

In [None]:
np.zeros((2, 2))

Similarly, arrays of ones can be created with `.ones()`

In [None]:
np.ones((1, 2))

Constant arrays can be created using `.full()`, or alternatively using `.ones()` and multiplying by the constant.

In [None]:
np.full((2, 2), 7)

In [None]:
7 * np.ones((2, 2), dtype=np.int64)

Identity matrices can be created using `.eye()`.

In [None]:
np.eye(2)

Arrays can also be created whose entries are uniformly distributed in the interval [0,1].

In [None]:
np.random.random((2, 2))

A linearly spaced array can be created using `.linspace()` and `.arange()`. Note the different operation of these two functions!

In [None]:
np.linspace(0, 10, 5 )

In [None]:
np.arange( 0, 10, 2.5 )

## Array indexing

One of the benefits of working with numpy is its powerful indexing features.

In [None]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

In [None]:
a

Entries can be retrieved by indexing into the dimensions of the array.

In [None]:
a[0, 1]

If only one index is provided, it indexes the first dimension.

In [None]:
a[0]

## Array slicing

We can explicitly retrieve all entries in one dimension by using `:`. The following is equivalent to the last expression:

In [None]:
a[0, :]

Retrieve a single column:

In [None]:
a[:, 2]

Note that slicing returns a view of the original array. As in the following:

In [None]:
b = a[:2, 1:3]
print(b)

In [None]:
b[0, 0] = 77
print(a[0, 1])

In general, slicing in numpy has the structure `[Start : Stop : Step]`

In [None]:
a = np.arange(1, 11, dtype=int)
a

In [None]:
# Get the first two elements of a
a[:2]

In [None]:
# Get the number 3,4 and 5 
a[2:5]

In [None]:
# Get even numbers
a[1::2]

Other arrays can also be used for indexing.

In [None]:
a = np.arange(12).reshape((4, 3))
print(a)

In [None]:
b = np.array([0, 2, 0, 1])
print(a[np.arange(4), b])

In [None]:
a[np.arange(4), b] += 10
print(a)

#### Filtering array elements

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])
print(a)
book_idx = (a > 2)
print(book_idx)
print(a[book_idx])
print(a[a > 2])

## Datatypes

Numpy arrays have a given type, that can be displayed using the `.dtype` attribute.

In [None]:
x = np.array([1, 2])
print(x.dtype)

In [None]:
x = np.array([1.0, 2.0])
print(x.dtype)

In [None]:
x = np.array([1, 2], dtype=np.int64)
print(x.dtype)

## Mathematical operations

In general, mathematical operations on arrays are done elementwise.

In [None]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6], [7, 8]], dtype=np.float64)
print(x + y)
print(np.add(x, y))

In [None]:
print(x - y)
print(np.subtract(x, y))

In [None]:
print(x * y)
print(np.multiply(x, y))

In [None]:
print(x / y)
print(np.divide(x, y))

In [None]:
print(np.sqrt(x))

### Matrix and vector products

In [None]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

v = np.array([9, 10])
w = np.array([11, 12])

print(v.dot(w))
print(np.dot(v, w))

In [None]:
print(x.dot(v))
print(np.dot(x, v))

In [None]:
print(x.dot(y))
print(np.dot(x, y))

### Array sums and means

In [None]:
x = np.array([[1, 2], [3, 4]])
print(x)
print(np.sum(x))
print(np.sum(x, axis=0))
print(np.sum(x, axis=1))

In [None]:
x.mean(axis=0)

### Transpose

In [None]:
x = np.array([[1, 2], [3, 4]])
print(x)
print(x.T)

In [None]:
v = np.array([1, 2, 3])
print(v)
print(v.T)

## Universal functions (Ufuncs)

Universal functions are useful when it comes to doing statistical and mathematical operations with numpy arrays. 

NumPy Ufuncs are significantly faster than python because the same operation in python might require loops.

In [None]:
x = np.arange(1, 11, dtype=int)
x

In [None]:
np.max(x)

In [None]:
np.mean(x)

In [None]:
np.power(x, 4)

In [None]:
print(np.sin(x))
print(np.tan(x))

In [None]:
np.square(np.sin(x)) + np.square(np.cos(x))

## Broadcasting

Broadcasting is a powerful and important concept when working with numpy arrays. In fact, we have already seen an example of broadcasting earlier:

In [None]:
7 * np.ones((2, 2), dtype=np.int64)

The multiplication is 'broadcast' across all elements of the 2x2 array.

In [None]:
x = np.arange(3)[:, np.newaxis]
print(x.shape)
x

In [None]:
y = np.arange(1, 10).reshape((3, 3))
y

In [None]:
print(np.multiply(y, x))
print(y * x)

In [None]:
v = np.array([1, 2, 3])
print(v)
w = np.array([4, 5])
print(w)
print(np.reshape(v, (3, 1)) * w)

## Masking, Comparing and Sorting

In [None]:
# Create an array of 10 elements between 1 and 5
x = np.random.randint(1,5, 10)
x

In [None]:
# Create (3,3) size of array elements from 1 and 5
y = np.random.randint(1,5, (3,3))
y

In [None]:
# Sort elements in array x
np.sort(x)

In [None]:
# Sort values along the rows
np.sort(y, axis=0)

In [None]:
# Sort values along the columns
np.sort(y, axis=1)

In [None]:
# == , !=, < , >, >=, <= operations on arrays
# This returns a Boolean
x > 3

In [None]:
# Use masking feature to get the values of comparisons
x[x>3]

In [None]:
# Another example 
x[(x <= 3) & ( x > 1 )]

## Sparse matrices

Sometimes we deal with datasets where most of the features are zero for each data example. This leads to a *sparse matrix*, where there are very few nonzero elements. It is useful to store these in a special sparse format where only the nonzero elements (and their matrix positions) are stored, to avoid memory problems.

In [None]:
from scipy import sparse

In [None]:
# Dense matrix
A = np.array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
A

In [None]:
S = sparse.csr_matrix(A)
S

In [None]:
print(S)

We can see that only the nonzero entries (and their positions) are saved in the sparse matrix.

In [None]:
S.todense()  # Convert back to a dense matrix

The available sparse matrix types are as follows:

1. csc_matrix: Compressed Sparse Column
2. csr_matrix: Compressed Sparse Row format
3. lil_matrix: List of Lists format
4. dok_matrix: Dictionary of Keys format
5. coo_matrix: Coordinate format