## Reshaping arrays



Numpy allows us to change the shape of any array



In [None]:
import numpy as np
arr = np.random.randn(3, 4)
arr

In [None]:
arr2 = arr.reshape((4, 3))
arr2

Why do the reshaped values follow that order?

![img](images/reshape_layout.png)



In [None]:
arr.reshape(arr2.shape)

## Transposing and swapping axes



Another way of reshaping arrays is transposing them



In [None]:
arr.T

For high-dimensional arrays, `transpose` takes a tuple of axis numbers to permute



In [None]:
arr3 = np.arange(16).reshape((2, 2, 4))
arr3

In [None]:
arr3.transpose((2, 1, 0))

Transposing can be considered a special case of swapping axes



In [None]:
arr3.swapaxes(1, 2)

## Concatenating arrays



`np.concatenate` takes a sequence of arrays and joins them along an input axis



In [None]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([[7, 8, 9], [10, 11, 12]])
print(arr1,arr2)

In [None]:
np.concatenate([arr1, arr2], axis=0)

In [None]:
np.concatenate([arr1, arr2], axis=1)

There are some convenient functions for common types of concatenation



In [None]:
np.vstack((arr1, arr2))

In [None]:
np.hstack((arr1, arr2))

## Fast element-wise array functions



-   We saw some functions that operate on the entire array (e.g. `np.sqrt`)
-   These element-wise transformation functions are called *unary* or **ufuncs**
-   Usually take and return one or two arguments



In [None]:
x = np.random.randn(8)
y = np.random.randn(8)
print(x)
print(y)
np.maximum(x, y)

ufuncs accept an optional `out` argument that allows them to operate in-place



In [None]:
arr1 = np.random.randn(3, 4)
np.sqrt(arr1)

In [None]:
np.sqrt(arr1, out=arr1)
arr1

![img](images/ufuncs1.png)


![img](images/ufuncs2.png)

![img](images/ufuncs3.png)

## Mathematical operations



A set of mathematical functions that compute statistics over the array are accessible either from the Numpy module or as methods of the array object



In [None]:
arr = np.random.randn(5, 4)
arr

In [None]:
np.mean(arr)

In [None]:
arr.mean()

![img](images/numpy_stats.png)



## Sorting arrays



Can sort arrays in place or as a copy (using `np.sort`)



In [None]:
arr = np.random.randn(6)
arr

In [None]:
arr.sort()
arr

In [None]:
arr = np.random.randn(5, 3)
arr

In [None]:
arr.sort(1)
arr

We can also get the indices of the sorted array



In [None]:
arr = np.random.randn(10)
np.argsort(arr)

Similar functions exist to get the index for the maximum or minimum element of an array



In [None]:
np.argmax(arr)

## Broadcasting



Form of vectorization that leads to faster code by replacing explicit loops



In [None]:
arr = np.arange(7)
arr * 4 + arr

What happens when array shapes don't match?



In [None]:
arr = np.random.randn(4, 3)
arr

Let's subtract the column means



In [None]:
arr.mean(0)

In [None]:
arrm = arr - arr.mean(0)
print(arrm)
print(arrm.mean(0))

![img](images/broadcast1.png)

Broadcasting works in higher dimensions as well
![img](images/broadcast2.png)

Let's try to subtract the row means this time

In [None]:
arr.mean(1)

In [None]:
arr - arr.mean(1)

The broadcasting rule
> Two arrays are compatible for broadcasting if for each trailing dimension (i.e., starting
> from the end) the axis lengths match or if either of the lengths is 1. Broadcasting is
> then performed over the missing or length 1 dimensions.

Using `reshape` can be tedious, so Numpy offers a more convenient syntax



In [None]:
arr - arr.mean(1).reshape((4, 1))

In [None]:
arr = np.zeros((4, 4))
arr3 = arr[:, np.newaxis, :]
arr3.shape

Suppose we had a 3-D array and wanted to subtract the mean along the z-axis



In [None]:
arr = np.random.randn(3, 4, 5)
arrm = arr.mean(2)
arrm.shape

In [None]:
arrd = arr - arrm[:, :, np.newaxis]
arrd.mean(2)

-   Is there a generic way to achieve this?
-   New operator! `slice`



In [None]:
slice(None)

In [None]:
indexer = [slice(None)] * 3
arr[tuple(indexer)]

**Exercise!**
Write a function that takes an array and an integer as argument corresponding to the axis along which we'd like to subtract the mean



## Array-oriented programming



-   We saw how we can use vectorized operations to apply functions as if we were using scalars
-   As an example, suppose we wanted to calculate the Euclidian distance from two grids of coordinates



In [None]:
points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
xs, ys = np.meshgrid(points, points)
ys

In [None]:
z = np.sqrt(xs**2 + ys**2)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray)
plt.colorbar();

We will explore visualization later in the semester in more detail!



## Conditional logic as array operations



In [None]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

Suppose we wanted to take a value from `xarr` whenever the corresponding value in
cond is True , and otherwise take the value from `yarr`



In [None]:
result = [(x if c else y) for x,y,c in zip(xarr, yarr, cond)]
result

-   Using list comprehensions will be relatively slow
-   This wouldn't work with multi-dimensional arrays



In [None]:
result = np.where(cond, xarr, yarr)
result

We can combine scalar and arrays



In [None]:
arr = np.random.randn(4, 4)
arr

In [None]:
np.where(arr > 0, 2, arr+1.2)

We can also use this function to get the indices where the condition is true



In [None]:
np.where(arr > 0)

## File I/O with Numpy



Saving and loading data from disk



In [None]:
arr = np.arange(10)
np.save('some_array', arr)
np.load('some_array.npy')

Saving multiple arrays and compressing them



In [None]:
np.savez('array_archive.npz', a=arr, b=arr)
np.load('array_archive.npz')

## Random number generation



Numpy contains a number of probability distributions



In [None]:
samples = np.random.normal(0, 1, (4,4))
samples

Partial list of `np.random` functions
![img](images/numpy_random.png)



## Example: Random walk



> A random walk is a stochastic process that describes a path that consists of a succession of random steps on some mathematical space such as the integers. An elementary example of a random walk is the random walk on the integer number line, $\mathbb {Z}$ , which starts at 0 and at each step moves +1 or −1 with equal probability1. 

1. Simulate a random walk of 1000 steps with two implementations (one with loops and one with Numpy)
2.  How long did it take to cross step 10 in either direction (*first crossing time*)?



## Homework



1.  A Monte Carlo method to approximate $pi$ involves randomly selecting points $x_{i},y_{i}$, $i=1,n$ in the unit square (between 0 and 1) and determining the ratio $\frac{m}{n}$ where $m$ is the number of points satisfying $x_{i}^{2}+y_{i}^{2} \leq 1$ and $n$ is the sample size. The ratio should approximate $\pi / 4$. Write two implementations of this algorithm, one using loops and one using broadcasting and compare them with `%timeit`.
2.  Simulate 5000 random walks (same configuration as in the class example) and calculate the average, maximum and minimum number of steps reached. What was the minimum crossing time for step 30 in either direction.
3.  Solve Problem #57 from Code Abbey ([https://www.codeabbey.com/index/task_view/smoothing-the-weather](https://www.codeabbey.com/index/task_view/smoothing-the-weather)).

