# Numpy

To import `numpy`, the convention is to do:

In [2]:
import numpy as np

## Getting help

In [3]:
np.lookfor('evenly spaced numbers')

Search results for 'evenly spaced numbers'
------------------------------------------
numpy.linspace
    Return evenly spaced numbers over a specified interval.
numpy.logspace
    Return numbers spaced evenly on a log scale.
numpy.arange
    Return evenly spaced values within a given interval.
numpy.ma.arange
    Return evenly spaced values within a given interval.


In [4]:
np.linspace?

In [5]:
np.arc*?

---

## Why?

Python's collections (e.g. `list`) can be heterogeneous -- e.g., can contain objects of different types:

In [6]:
[1, "cat", 15.13451, [1,2,3]]

[1, 'cat', 15.13451, [1, 2, 3]]

This is very convenient for many applications, but means there is a lot of overhead if you simply have a list of numbers and want to, for example, compute the same operation on all elements. The main feature of Numpy is to offer an array object that represents a contiguous block in memory. This is more like arrays from C or other languages. Arrays most commonly share a common data type over all elements (e.g., integer), but certain kinds of arrays may have mixed types. For homogeneous arrays, Python doesn't have to infer the datatype of every element -- for this and many other more detailed reasons, numerical operations with arrays are generally __much__ more efficient than on lists:

In [7]:
np.array([1,2,3,4,5,6])

array([1, 2, 3, 4, 5, 6])

---

## The ndarray object

Numpy arrays are created by passing an iterable (like a Python `list`) to the numpy `array()` function

In [8]:
some_array = np.array([1,2,3,4])

In [11]:
some_array

array([1, 2, 3, 4])

The above statement infers that the datatype is an integer because all of the elements are integers. We could enforce this by specifying a datatype

In [12]:
np.array([1,2,3,4,5,6], dtype=int)

array([1, 2, 3, 4, 5, 6])

In [13]:
np.array([1,2,3,4,5,6], dtype=float)

array([ 1.,  2.,  3.,  4.,  5.,  6.])

In [17]:
np.array([1, 2, 4.], dtype=np.float128)

array([ 1.0,  2.0,  4.0], dtype=float128)

Complex (in a math sense) data types are supported as well:

In [18]:
a = np.array([1 + 6j, 2.3 - 11j])
a.dtype

dtype('complex128')

Numpy arrays can have an arbitrary number of dimensions. For example, we could create a 2D array by passing in a list of lists:

In [19]:
arr = np.array([[1,2], [3,4]])
arr

array([[1, 2],
       [3, 4]])

Some useful attributes for prying in to arrays:

In [20]:
arr.ndim # number of dimensions

2

In [21]:
arr.shape # length along each dimension

(2, 2)

In [22]:
arr.size # total number of elements

4

What about a 3 by 3 array?

$$\left(
    \begin{array}{c} 
    1 & 2 & 3\\\\
    4 & 5 & 6\\\\
    7 & 8 & 9
    \end{array} 
\right)
$$

In [23]:
arr = np.array([[1,2,3],
                [4,5,6],
                [7,8,9]])
arr

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [24]:
arr.ndim

2

In [25]:
arr.shape

(3, 3)

In [26]:
arr.size

9

A quick note: in Numpy terminology, dimensions are called `axes` -- so, the 2nd dimension is `axis=1`

---

## Creating arrays

There are a number of useful convenience functions for creating arrays with various properties. For example, Numpy provides an array version of Python's built-in `range()` function called `arange()`

In [27]:
np.arange(5) # create numbers from 0 up to but not including 5

array([0, 1, 2, 3, 4])

The bonus to using `arange()` is that you can specify a float step:

In [29]:
np.arange(0, 10+0.25, 0.25) # create an array of numbers from 0 up to but not including 10 with a spacing of 0.25

array([  0.  ,   0.25,   0.5 ,   0.75,   1.  ,   1.25,   1.5 ,   1.75,
         2.  ,   2.25,   2.5 ,   2.75,   3.  ,   3.25,   3.5 ,   3.75,
         4.  ,   4.25,   4.5 ,   4.75,   5.  ,   5.25,   5.5 ,   5.75,
         6.  ,   6.25,   6.5 ,   6.75,   7.  ,   7.25,   7.5 ,   7.75,
         8.  ,   8.25,   8.5 ,   8.75,   9.  ,   9.25,   9.5 ,   9.75,  10.  ])

_It's good to remember that arange creates elements up to but not including the `end` value (10 above)_

Another useful one to remember is `linspace()`, which generates uniformly spaced values between the start and end value. For example, 11 evenly spaced values from 0 to 1:

In [30]:
np.linspace(0, 1, 11) # 11 evenly spaced numbers between 0 and 1

array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

Yes, confusingly `linspace()` _includes_ the final value...

Another two that I use most frequently are `zeros()` and `ones()`, which generate arrays full of zeros or ones given a specified shape. For example, to create a 4 by 4 by 3 array of zeros:

In [31]:
arr = np.zeros((4,4,3))
arr.shape

(4, 4, 3)

If you need integer values, you can specify a data type with the `dtype` argument:

In [32]:
arr = np.zeros((4,4,3), dtype=int)
arr.dtype

dtype('int64')

The same holds for `ones()`:

In [33]:
arr = np.ones((4,4,3), dtype=float)
print(arr.shape, arr.dtype)

((4, 4, 3), dtype('float64'))


Both `zeros()` and `ones()` are useful for creating placeholder arrays that you will later fill with data, however, 0 and 1 are valid numbers! If your code doesn't check that you filled the array successfully, then your code might not fail where you expect it to and strange things can happen. In numpy, there is a special number called NaN, which stands for "not a number." Any numerical operations with NaN will produce a new NaN, which makes it obvious that something went wrong. For placeholder arrays, I often recommend adding NaN to the array so that it is full of invalid numbers:

In [38]:
np.array([3,3])

array([3, 3])

In [41]:
np.zeros(shape=(3,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [37]:
placeholder = np.zeros([3,3]) + np.nan
placeholder

array([[ nan,  nan,  nan],
       [ nan,  nan,  nan],
       [ nan,  nan,  nan]])

In [35]:
placeholder * 4.

array([[ nan,  nan,  nan],
       [ nan,  nan,  nan],
       [ nan,  nan,  nan]])

### Random numbers

Numpy provides a subpackage, `np.random`, for generating (pseudo)random numbers. There are many useful functions for sampling from distributions:

In [45]:
np.random.uniform(0, 2*np.pi, size=(5,5)) # make a 5x5 array of numbers uniformly sampled between 0 and 2π

array([[ 1.64983458,  5.56830798,  0.10782421,  4.65861036,  5.61106194],
       [ 1.24371509,  1.52244186,  1.7324425 ,  1.09770356,  4.07706488],
       [ 3.52683269,  4.27775792,  4.51904762,  0.75217378,  3.83557969],
       [ 3.88934022,  1.25247229,  0.98708209,  2.21893213,  3.57225022],
       [ 2.98925828,  0.48521326,  4.40309958,  2.76361533,  3.42487584]])

In [50]:
np.set_printoptions(precision=3)

In [51]:
arr = np.random.normal(0., 0.5, size=4)
arr

array([-0.703,  0.724,  1.14 , -0.395])

In [48]:
# sample an array of 4 numbers, normally (Gaussian) distributed with mean=0, stddev=0.5
arr = np.random.normal(0., 0.5, size=4)
["{0:.3f}".format(x) for x in arr]

['0.140', '0.023', '0.719', '-0.550']

In [52]:
arr = np.array([1,2,3,4])
arr.shape

(4,)

---

## Array operations and differences with lists

Array options act __element-wise__, unlike how `list`s behave. For example, multiplying a `list` by an integer will return a new list with the original replicated that many times. Multiplying an array of numbers by a scalar behaves like vector multiplication.

Again, multiplying a list by an integer, `n`, produces a new list with `n` copies of the original list:

In [53]:
print([1, 2]*5)

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]


Multiplying an array by a number acts element-wise:

In [55]:
np.array([1, 2]) * 5

array([ 5, 10])

$5 \times \left(\begin{array}{c} 1\\\ 2 \end{array} \right) = \left(\begin{array}{c} 5\\\ 10 \end{array} \right)$

Adding two lists will concatenate the two into a new list:

In [56]:
[1,2] + [3,4]

[1, 2, 3, 4]

Adding two arrays is like adding two vectors:

$\left(\begin{array}{c} 1\\\ 2 \end{array} \right) + \left(\begin{array}{c} 3\\\ 4 \end{array} \right) = \left(\begin{array}{c} 4\\\ 6 \end{array} \right)$

In [57]:
np.array([1,2]) + np.array([3,4])

array([4, 6])

This works for comparison operators as well. For example, when comparing an array of numbers with another, you will get a new array of boolean values of the same shape as the original. 

$\left(\begin{array}{c} 1\\2 \\3 \\4 \\5 \end{array} \right) > 3 = \left(\begin{array}{c} False\\False \\False \\True\\True \end{array} \right)$

In [58]:
5 > 3

True

In [59]:
np.array([1,2,3,4,5]) > 3

array([False, False, False,  True,  True], dtype=bool)

In [None]:
def do_the_thing(a, b):
    a = np.array(a, copy=False)
    b = np.array(b, copy=False)
    return a*b


---

## Reshaping

Arrays with a given shape can be _reshaped_, as long as the total number of elements is conserved. For example, a 1D array of length 100 can be reshaped into a 10 x 10 2D array:

In [65]:
np.arange(100).reshape(10,10)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

A handy one to remember is the function or method `ravel()`, which takes an n-dimensional array and flattens it into a 1D array:

In [66]:
arr = np.random.random(size=(4,7,2))
arr.shape

(4, 7, 2)

In [69]:
arr2 = arr.ravel()
arr2.shape

(56,)

In [68]:
arr.flat

<numpy.flatiter at 0x101bbfc00>

In [71]:
arr = np.arange(100).reshape(10,10).T
arr

array([[ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
       [ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91],
       [ 2, 12, 22, 32, 42, 52, 62, 72, 82, 92],
       [ 3, 13, 23, 33, 43, 53, 63, 73, 83, 93],
       [ 4, 14, 24, 34, 44, 54, 64, 74, 84, 94],
       [ 5, 15, 25, 35, 45, 55, 65, 75, 85, 95],
       [ 6, 16, 26, 36, 46, 56, 66, 76, 86, 96],
       [ 7, 17, 27, 37, 47, 57, 67, 77, 87, 97],
       [ 8, 18, 28, 38, 48, 58, 68, 78, 88, 98],
       [ 9, 19, 29, 39, 49, 59, 69, 79, 89, 99]])

---

## Slicing and indexing

Arrays support slicing and indexing just like Python lists

In [72]:
arr = np.arange(10)
arr[3:]

array([3, 4, 5, 6, 7, 8, 9])

The main difference is that arrays can be multi-dimensional, so you can specify indices or slices along multiple dimensions by separating the indices or slices with a comma. To see this, let's first create a 10 by 10 array of numbers from 0 to 99:

In [73]:
arr = np.arange(100).reshape(10,10)
arr

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

We can select a single element from this 2D array by specifying two indices:

In [74]:
arr[3,5]

35

We can also use slicing to select a sub-rectangle from this array by specifying slices along the two dimensions:

In [75]:
arr[1:5,3:5]

array([[13, 14],
       [23, 24],
       [33, 34],
       [43, 44]])

Don't think about arrays as 2D matrices, even if they are. But since you're going to anyways, remember that the first index refers to __rows__ and the second refers to __columns__.

If you want to slice along, say, the last axis, and don't want to slice along any others, you can use ellipses (...) as a placeholder. For example, let's imagine we had a 3D array with shape (128,128,3) -- imagine this is a 100 pixel by 100 pixel image, and the last axis represents the Red, Green, and Blue filters:

In [76]:
img = np.random.random(size=(128,128,3))
img.shape

(128, 128, 3)

If we wanted to select just the Green channel (1 on the last axis), we can use the ellipses as:

In [78]:
img[...,1].shape, img[:,:,1].shape

((128, 128), (128, 128))

--- 

## Views vs. arrays?

Many operations with numpy arrays are extremely fast because numpy doesn't alter the memory block associated with the array values, it just changes how to present the data to you. For example, basic slicing of an array does __not__ return a copy of the selected array values -- it simply returns a view of a sub-sample of the array values

In [83]:
arr = np.arange(100)
arr2 = arr[10:15]
arr2

array([10, 11, 12, 13, 14])

In [84]:
arr2.base

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

The `.base` attribute of a view gets back the full "parent" array.

Most of the time, you don't need to think about this -- numpy does some magic underneath to reduce memory usage and optimize performance. However, this can lead to a few gotchas. Because the view points to the same block in memory as the parent array, if you change an element in the view, it will change that place in memory and therefore the parent array as well:

In [85]:
arr2

array([10, 11, 12, 13, 14])

In [86]:
arr2[3] = 999999999

In [None]:
arr

---

<h1 style='background-color: #cccccc; padding: 15px;'>Exercise A</h1>

1) Create the following array:

In [88]:
arr = np.ones((4,4), dtype=int)
arr[2,-1] = 2
arr[-1,1] = 6
arr

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 2],
       [1, 6, 1, 1]])

2) Create a 128x128 array of numbers sampled from a Normal distribution with mean=0 and standard deviation=5 (_hint: above we used `np.random.random()`, try to find another function within the `np.random` subpackage that will help you_)

In [91]:
norm_nums = np.random.normal(0, 5, size=(128,128))

3) Select out the inner square of values with shape 96x96

In [92]:
sub = norm_nums[16:-16,16:-16]
sub.shape

(96, 96)

In [101]:
arr = np.arange(0,10+0.5,0.5)
arr[range(3) + range(5,8)]

array([ 0. ,  0.5,  1. ,  2.5,  3. ,  3.5])

---

## Using arrays to index other arrays

Integer arrays can be used as indices for other arrays

In [102]:
a = np.array([0,3,1])
b = np.array([1,2,3,4,5,6,7,8,9])
b[a]

array([1, 4, 2])

In the above example, we use the array `a` as an index array to select the 0th, 3rd, and 1st elements out of the array 'b'.

### Boolean arrays

Numpy supports boolean-valued arrays, and boolean arrays can be used to select elements out of other arrays (for those of you coming from IDL, welcome -- this should be a familiar concept to you). To use an array of booleans as an index array, it must have the same shape as the valued array. For example, we could use `array([True, False, True])` to select the first and last elements from `array([1,2,3])`:

In [103]:
# select the 0th and 2nd elements out of array b
a = np.array([True, False, True])
b = np.array([1,2,3])
b[a]

array([1, 3])

One common use case is to select all values from an array that match some comparative operations:

In [104]:
a = np.arange(10)
print((a > 1) & (a < 5))
a[(a > 1) & (a < 5)] # select all elements from a that are greater than 1 and less than 5

[False False  True  True  True False False False False False]


array([2, 3, 4])

Boolean arrays can also be combined using binary operations so that compound logic can be used in selection expressions:

In [106]:
np.array([True, True, True]) & np.array([False, True, False])

array([False,  True, False], dtype=bool)

In [107]:
np.array([False, True, True]) | np.array([False, True, False])

array([False,  True,  True], dtype=bool)

In [110]:
a[~(a > 5)]

array([0, 1, 2, 3, 4, 5])

In [108]:
np.logical_not(np.array([True, False, True]))

array([False,  True, False], dtype=bool)

In [111]:
a = np.arange(100)
a[(a > 51) & ((a % 3) == 0)] # select factors of 3 greater than 51

array([54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99])

---

<h1 style='background-color: #cccccc; padding: 15px;'>Exercise B</h1>

1) Create an array of 256 random integers between 0 and 1024. Select all numbers greater than 100 that are _not_ divisible by 5 or 10.

In [114]:
arr = np.random.randint(low=0,high=1024,size=256)
arr[((arr > 100) & ((arr % 5) != 0))]

array([ 184,  541,  444,  529, 1009,  912,  954,  853,  359, 1007,  573,
        384,  458,  637,  443,  591,  507,  602,  129,  199,  587,  306,
        152, 1017,  917, 1019,  241,  179,  708,  819,  404,  661,  313,
        187,  961,  604,  557,  927,  703,  262,  366,  971,  476,  952,
        138,  974,  254,  946,  401,  581,  363,  694,  373,  908,  219,
        418,  611,  977,  776,  794,  539,  999,  187,  786,  638,  817,
        769,  674,  748,  527,  484,  662,  383,  427,  623,  228,  512,
        697,  687,  592,  184,  717,  623,  397,  304,  561,  909,  467,
        427, 1016,  578,  877,  328, 1018,  236,  877,  752,  588,  928,
        792,  234,  811,  284,  368,  582,  712,  407,  149,  789,  788,
        586,  733,  622,  813,  998,  177,  867,  106,  973,  554,  653,
        402,  611,  897,  253,  823,  382,  749,  228,  912,  733,  999,
        437,  181,  383,  663,  703,  276,  907,  383,  631,  819,  906,
        792,  421,  997,  363,  203,  496,  183,  9

2) Create a 128x128 array of Normally distributed random numbers (same as previous exercise), but count how many elements are less than zero

In [115]:
# arr = np.random.normal(<fill in here>)
# np.sum(arr < 0.)
arr = np.random.normal(0, 5, size=(128, 128))
mybool = arr < 0
np.sum(mybool)

8196

3) Using the same array of Normal random numbers, replace all of the negative numbers with 0.

In [116]:
# <fill in here>
arr[mybool] = 0
print(arr[0])

[  0.      0.      0.      4.75    2.558   0.      0.      0.      0.
   2.515   1.758   1.423   0.      2.997   0.      2.012   0.     12.056
   2.02    0.      0.      0.804   0.      0.072   0.      0.      4.287
   0.      0.      0.      2.428   0.      2.704   7.017   6.126   0.      0.
   0.      0.92    0.      1.271   2.302   0.     12.24    0.      0.      0.
   1.977   0.      6.92    0.      0.      2.142   5.298   0.      0.
   8.687   0.      0.      0.605   1.324   2.991   4.588   0.      1.032
   0.      0.668   0.547   5.369   0.      0.      1.447   0.      2.146
   1.978   2.448   0.      3.199   8.363   0.      0.      0.      0.      0.
   5.196   1.116   0.      0.      0.      5.667   0.      0.      4.714
   4.963   0.      4.489   0.      0.      0.      0.      0.      0.
   1.497   2.903   7.466   0.      0.      2.467   0.      8.739  10.919
   0.      8.173   0.      0.      0.      0.      0.      0.      0.      0.
   0.071   0.278   0.      1.889   6.151

---

## Logical operators

Arrays support bitwise operations like AND, OR, NOT on boolean arrays. We saw an example above using the & operator between two boolean arrays. This is useful because it let's us combine expressions in inequalities, as we saw:

In [124]:
a = np.random.randint(0., 20., size=100)
idx = ((a > 5) & (a < 10)) | (a > 18)
a[idx] # select numbers either between 5 and 10 OR greater than 18

array([ 7,  8,  7,  8,  7,  8, 19,  6,  9,  8,  8, 19, 19,  9,  7,  9,  6,
        9,  7, 19,  7,  9, 19,  7,  7,  8,  9,  8])

Above, the array `idx` is a boolean array where the True elements meet the selection criteria, and the False elements don't:

In [125]:
idx

array([False, False, False,  True, False, False, False,  True, False,
       False,  True, False,  True, False,  True, False, False,  True,
       False, False, False, False,  True, False, False, False, False,
       False,  True, False, False,  True, False, False, False,  True,
        True, False,  True, False, False, False, False, False,  True,
       False, False, False,  True,  True, False, False,  True, False,
       False, False, False,  True, False,  True, False, False, False,
        True, False, False,  True, False, False,  True, False, False,
       False,  True, False, False, False, False,  True,  True, False,
       False, False, False, False, False,  True, False, False,  True,
        True, False, False, False, False,  True, False, False, False, False], dtype=bool)

Sometimes, you may want the actual index values of any elements that match some constraint. We can get the indices of the values that match the constraint with `np.where()`:

In [134]:
indices = np.where(idx)
indices

(array([ 3,  7, 10, 12, 14, 17, 22, 28, 31, 35, 36, 38, 44, 48, 49, 52, 57,
        59, 63, 66, 69, 73, 78, 79, 86, 89, 90, 95]),)

In this case, selecting from an array using a boolean array or an array of indices is equivalent:

In [135]:
np.all(a[idx] == a[indices])

True

In [133]:
np.any(np.array([True, True, False])), np.all(np.array([True, True, False]))

(True, False)

Here, the `all()` function returns True if all of the values are True, and False if any value is False.

Sometimes we want to partition or split a data set based on some condition. For simple cases, we can do something like:

In [None]:
chunk1 = a[a > 10]
chunk2 = a[a <= 10]
chunk1, chunk2

But it's often annoying with long constraints to go through and invert all of the operators. For example:

In [None]:
a[((a > 10) & ((a % 5) == 1)) | ((a < 2) & (a > 1))]

For cases like the above, we can store the boolean index array as a separate variable, then use a logical NOT on the array to turn all of the `False`'s to `True` and vice versa.

In [None]:
idx = ((a > 10) & ((a % 5) == 1)) | ((a < 2) & (a > 1))
idx

In [None]:
chunk1 = a[idx]
chunk2 = a[np.logical_not(idx)]
a.shape, chunk1.shape, chunk2.shape

In [136]:
arr = np.random.normal(0,1,size=(6,6,8))
arr[np.where(arr > 0.25)]

array([ 0.742,  2.163,  1.521,  1.267,  0.299,  0.693,  0.41 ,  1.642,
        0.331,  0.671,  0.475,  1.011,  0.391,  0.911,  0.999,  0.691,
        0.954,  0.437,  1.149,  0.813,  0.623,  0.784,  0.265,  0.753,
        0.947,  1.199,  1.019,  0.488,  0.747,  0.466,  1.856,  1.431,
        0.458,  1.572,  0.71 ,  0.883,  0.686,  0.43 ,  0.37 ,  0.736,
        0.362,  2.09 ,  0.741,  0.803,  1.005,  0.554,  1.352,  1.83 ,
        0.675,  0.395,  1.612,  1.369,  0.797,  0.286,  1.303,  1.431,
        0.464,  1.425,  1.199,  1.088,  0.824,  1.502,  0.837,  0.919,
        0.458,  2.32 ,  1.536,  0.346,  0.584,  0.265,  0.621,  1.114,
        1.421,  1.537,  2.102,  0.262,  0.596,  1.569,  0.845,  0.293,
        1.087,  0.525,  1.185,  1.033,  0.776,  1.352,  1.349,  1.314,
        0.936,  1.276,  0.397,  0.739,  0.412,  0.944,  0.397,  0.898,
        0.507,  0.791,  1.459,  0.262,  1.534,  2.596,  2.038,  0.736,
        0.61 ,  0.497,  0.876,  1.604,  1.044,  0.517,  0.888,  1.468,
      

In [137]:
np.where(arr > 0.25)

(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
        4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
        5]),
 array([0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 5, 5, 5, 0, 0, 0,
        0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 5, 5, 5, 5, 0, 0, 1, 1, 1,
        2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 0, 0, 0, 0, 1, 2, 2, 2, 2,
        3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 3, 3,
        3, 4, 4, 4, 5, 5, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4, 5, 5,
        5]),
 array([2, 0, 2, 3, 6, 0, 1, 4, 5, 6, 0, 1, 2, 5, 6, 7, 0, 3, 6, 7, 1, 2, 3,
        5, 7, 2, 3, 0, 1, 2, 3, 4, 5, 0, 2, 3, 6, 1, 2, 3, 5, 5, 7, 0, 3, 5,
        1, 2, 3, 5, 0, 1, 3, 4, 0, 3, 4, 6, 5, 7, 

---

<h1 style='background-color: #cccccc; padding: 15px;'>Exercise C</h1>

1) Create an array of 1024 normally distributed random numbers (mean=0, stddev=5.). Split this array in to 3 parts: values less than 0, values between 0 and 1, and values greater or equal to 1

In [None]:
# <fill in here>

---

## Dimensionality reduction

We often end up with complicated array structures where we may want to sum along one axis, or take the mean or median along an axis. For example, let's say we had a 2D array, `A`, where each _row_ is a spectrum of the same star at a different time. So that `A[0]` is an array of fluxes at time=0, `A[1]` is an array of fluxes at time=1, and etc. 

We'll create an array of random numbers to represent this: let's assume we have 100 timesteps, and 5000 wavelength bins.

In [138]:
A = np.random.random((100,5000))
A.shape

(100, 5000)

We may want to coadd the individual "spectra":

In [139]:
coadd = A.sum(axis=0)
print(coadd.shape)

(5000,)


Most mathematical operations that reduce the dimensionality of an array will by default reduce the array to be 0-dimensional, but they often support the keyword `axis` and can be explicitly told to only squash that one dimension:

In [140]:
np.sum(A, axis=0).shape

(5000,)

In [141]:
np.mean(A, axis=0).shape # take the mean of all spectra

(5000,)

In [142]:
np.var(A, axis=0).shape # the variance

(5000,)

In [143]:
np.sum(A)

250061.25814225923

---

## Broadcasting

_This is a complicated topic but understanding array broadcasting is critical for making your code efficient!_

Broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. As an example, let's consider the simplest case: multiplying a 1D array by a single number:

In [None]:
arr = np.arange(5)
arr * 5.

One way to think about this is that you are multiplying a 1D array by a 0D array. In situations like this, Numpy will try to promote the array with lower dimensionality up to match the dimensionality of the other array. Of course, it doesn't actually do this (it uses tricks that we won't go in to here), but conceptually it is illustrated by this cartoon:

```
[0,1,2,3,4] x 5
[0,1,2,3,4] x [5,5,5,5,5]
<perform operation element-wise>
```

Above we are effectively multiplying an array with shape=(5,) by an array with shape=(1,) (a single number) -- Numpy knows that in situations like this, the second array should be "copied" until its shape matches the shape of the first array. Here's a more complicated example -- multiplying an array with shape (3,1) by an array with shape (1,3):

In [144]:
a = np.array([[1,2,3]])
b = np.array([[4],[5],[6]])
print(a.shape)
print(b.shape)

(1, 3)
(3, 1)


In [145]:
a*b

array([[ 4,  8, 12],
       [ 5, 10, 15],
       [ 6, 12, 18]])

Multiplying these arrays makes a new array with shape 3x3! How did this happen? Array `a` has 1 element in axis=0, array `b` has 3 elements in axis=0. Similarly, array `a` has 3 elements in axis=1, and array `b` has 1 element in axis=1. The single elements are _broadcast_ to match the shape (3) of the complimentary array. This is effectively an outer product between the arrays.

Even though I used the word "copied" when describing this above, numpy doesn't actually make copies of anything -- this is why operations that make use of broadcasting are so efficient.

When utilizing broadcasting, you may need to add dimensions to arrays in order to make the number of dimensions of two arrays match. For example, imagine I have a 2D array:

In [146]:
arr = np.array([[1,2,3],
                [4,5,6],
                [7,8,9]])

Now imagine I have a 1D array that I would like to multiply in to each *row* of the above matrix:

In [148]:
arr2 = np.array([10,20,30])

In [149]:
arr.shape

(3, 3)

In [150]:
arr2.shape

(3,)

In [152]:
arr2.reshape(1,3).shape, arr2[np.newaxis].shape

((1, 3), (1, 3))

In [153]:
arr2[np.newaxis] * arr

array([[ 10,  40,  90],
       [ 40, 100, 180],
       [ 70, 160, 270]])

In [157]:
arr[np.newaxis].shape

(1, 3, 3)

In [159]:
arr2.shape

(3,)

In [161]:
(arr[:,np.newaxis] * arr2).shape

(3, 1, 3)

The `np.newaxis` object lets us promote the array `arr` to have shape (1,3).

---

## Structured array

Another useful numpy data type is the __structured array__, which acts like a normal 2 dimensional array but can also have named columns:

<table>
    <tr>
        <td>ID</td>
        <td>RA</td>
        <td>Dec</td>
    </tr>
    <tr>
        <td>15125</td>
        <td>115.51244</td>
        <td>21.51363</td>
    </tr>
    <tr>
        <td>1157</td>
        <td>173.12155</td>
        <td>-11.15124</td>
    </tr>
    <tr>
        <td>31346</td>
        <td>201.15135</td>
        <td>13.51613</td>
    </tr>
    <tr>
        <td>3463</td>
        <td>278.23523</td>
        <td>67.24114</td>
    </tr>
</table>    


In [162]:
table_data = [(15125, 115.51244, 21.51363), 
              (1157, 173.12155, -11.15124),
              (31346, 201.15135, 13.51613),
              (3463, 278.23523, 67.24114)]
structured_array = np.array(table_data, dtype=[("id", int), 
                                               ("ra", float),
                                               ("dec", float)])

In [163]:
structured_array["ra"] # Returns column

array([ 115.512,  173.122,  201.151,  278.235])

In [165]:
structured_array[0]['ra'] # returns row

115.51244

In [167]:
structured_array.dtype.names

('id', 'ra', 'dec')

Numpy provides two functions for reading in numeric data from text files: `genfromtxt()` and `loadtxt()`. I recommend using `genfromtxt()`, but (as you'll hear tomorrow) Astropy contains some more useful tools for reading from ascii files.

---

<h1 style='background-color: #cccccc; padding: 15px;'>Exercise D</h1>

1) Read in the provided data file (`borkova2002.csv`)

In [168]:
# you'll need to set the 'delimiter', 'dtype', and 'names' arguments

data = np.genfromtxt("/Users/adrian/projects/tutorials/data/borkova2002.csv",
                     delimiter=',',
                     dtype=None,
                     names=True)

In [169]:
data

array([(5.9295, 29.401, 'SWAnd', -0.38, 200, 40, 9.0, 0.4, 0.01),
       (19.3642, 38.9506, 'XXAnd', -2.01, 18, 325, 19.3, 11.3, 0.91),
       (355.6285, 43.0143, 'ATAnd', -0.97, 10, 292, 12.5, 7.6, 0.91),
       (28.7846, 43.7657, 'CIAnd', -0.83, 276, 113, 21.5, 2.6, 0.38),
       (16.3569, 34.2204, 'DRAnd', -1.48, -94, 357, 16.0, 6.6, 0.83),
       (154.0206, -29.7284, 'WYAnt', -1.66, -46, 351, 17.5, 14.0, 0.97),
       (222.2084, -71.3283, 'TYAps', -1.21, 8, 247, 7.7, 7.0, 0.8),
       (223.0226, -79.6796, 'XZAps', -1.57, -17, 243, 8.1, 5.7, 0.94),
       (318.8244, 0.0763, 'SWAqr', -1.24, 3, 369, 18.2, 9.5, 0.94),
       (324.0352, 3.2306, 'SXAqr', -1.83, -67, 482, 34.9, 16.9, 0.88),
       (330.4814, -5.6008, 'TZAqr', -1.24, 162, 65, 8.2, 1.1, 0.14),
       (343.534, -12.3607, 'BOAqr', -1.8, 163, 111, 9.4, 1.9, 0.37),
       (354.637, -9.3187, 'BRAqr', -0.84, 180, 62, 8.5, 1.7, 0.01),
       (314.4517, -5.6852, 'BTAqr', -0.29, 144, 75, 7.5, 0.9, 0.3),
       (317.5535, -1.7212, 'C

2) Select all rows with a metallicity (`FeH`), [Fe/H] < -1.2 and a maximum height above the midplane (`zmax`) zmax > 1 kpc

In [170]:
data.dtype.names

('ra', 'dec', 'name', 'FeH', 'theta', 'Vres', 'apo', 'zmax', 'ecc')

In [171]:
subdata = data[(data['FeH'] < -1.2) & (data['zmax'] > 1.)]

In [172]:
subdata

array([(19.3642, 38.9506, 'XXAnd', -2.01, 18, 325, 19.3, 11.3, 0.91),
       (16.3569, 34.2204, 'DRAnd', -1.48, -94, 357, 16.0, 6.6, 0.83),
       (154.0206, -29.7284, 'WYAnt', -1.66, -46, 351, 17.5, 14.0, 0.97),
       (222.2084, -71.3283, 'TYAps', -1.21, 8, 247, 7.7, 7.0, 0.8),
       (223.0226, -79.6796, 'XZAps', -1.57, -17, 243, 8.1, 5.7, 0.94),
       (318.8244, 0.0763, 'SWAqr', -1.24, 3, 369, 18.2, 9.5, 0.94),
       (324.0352, 3.2306, 'SXAqr', -1.83, -67, 482, 34.9, 16.9, 0.88),
       (330.4814, -5.6008, 'TZAqr', -1.24, 162, 65, 8.2, 1.1, 0.14),
       (343.534, -12.3607, 'BOAqr', -1.8, 163, 111, 9.4, 1.9, 0.37),
       (349.8217, -24.2163, 'DNAqr', -1.63, -3, 383, 25.9, 13.0, 0.95),
       (308.1315, 0.5853, 'V341 Aql', -1.37, 103, 201, 9.7, 5.2, 0.41),
       (270.7679, -52.7224, 'MSAra', -1.48, 21, 220, 7.4, 4.8, 0.92),
       (220.5557, 28.2065, 'SZBoo', -1.68, 101, 119, 8.1, 2.0, 0.33),
       (221.2748, 41.0289, 'TWBoo', -1.41, -1, 327, 14.7, 9.5, 0.94),
       (229.2722,

3) What is the maximum apocenter? (use the column `apo`) What is the eccentricity (`ecc`) of the star with the largest apocenter?

In [173]:
subdata['apo'].max()

52.5

In [176]:
subdata['ecc'][subdata['apo'].argmax()]

0.70999999999999996