## Arrays

As we saw above, generalizing the way we subset vectors to subsetting matrices was quite easy -- we just subset our matrices by doing what we did with vectors, but with two terms in in our square brackets separated by a comma (e.g. `[1, 1]` instead of just `[1]`). 

But guess what? Just as it was easy to generalize from one dimension to two, it turns out that we can also generalize from two dimensions to N dimensions the same way! In fact, rather than thinking of vectors and matrices as to different thing, we can think of them as special cases (the case of N=1 and N=2) of a more general data structure: **Arrays**. 

Arrays are collections of data of the same type with a regular structure organized into N dimensions. When N=1, an array is the equivalent of a vector:

In [3]:
import numpy as np

np.arange(20)


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

And when N=2, an array is just a matrix:

In [4]:
np.arange(20).reshape(5, 4)


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

But when N > 2, things start to get a little harder to visualize, but also a lot more powerful. 

*But WHY*, I hear you asking, *would I ever want more than two dimensions?!* 

Well, let's start with use-cases for N=3. Suppose that you wanted to build a climate model. One thing that you would have to do is come up with a way to store the temperature at different points within a 3D space! And you could do that with a three-dimensional array where the first two dimensions are used to store the x and y coordinates of a point on the ground, the third could represent an elevation above that point, and the value in that entry would be the temperature. 

For example, let's create a 3 x 3 x 10 array filled with random temperatures generated with `np.random.uniform()`, where we're imagining the first dimension to be, say, number of kilometers North of a reference point (say, the center of Duke's campus), the second dimension is the number of kilometers East of the center of Duke's campus, and the third dimension is kilometers elevation. (Obviously if this weren't an exercise, we'd be using real temperatures to fill our array!):

In [7]:
# Make random temperatures.
# In a real world application
# you'd have measured these or
# started with seed values based on
# measured values.

# using a reasonable farenheit mean (70) and sd (10)
rand_temps = np.random.normal(loc=70, scale=10, size=90)
rand_temps


array([74.4805914 , 63.4636685 , 76.57108736, 74.73777105, 69.5091633 ,
       60.23823657, 69.12935326, 67.55646317, 65.10868325, 64.58053686,
       79.07979103, 82.21836456, 74.92904092, 69.15802631, 54.63349454,
       67.93559111, 72.62280125, 78.94457043, 70.50896904, 73.79335368,
       75.36353838, 76.99135483, 67.67623238, 69.42428859, 71.21606243,
       86.89645753, 48.33803445, 75.46721846, 78.86047331, 84.59021349,
       79.95296623, 61.9922921 , 64.33604576, 81.62034998, 75.35339524,
       69.23745221, 58.70209583, 75.01664044, 71.16015887, 70.19108603,
       75.7114543 , 49.31545425, 85.71400994, 68.53578825, 59.84809671,
       60.92305415, 61.60753433, 82.82671149, 69.55628355, 64.49613608,
       69.84175434, 71.78339683, 76.92045085, 65.63166412, 70.35045024,
       71.93352092, 70.06825931, 80.36982052, 59.68859737, 72.71113089,
       65.49021116, 78.52684681, 66.89187906, 83.02452446, 58.41207515,
       71.71369369, 90.19331332, 69.31810151, 87.8016218 , 90.54

In [8]:
temperatures = rand_temps.reshape((3, 3, 10))
temperatures


array([[[74.4805914 , 63.4636685 , 76.57108736, 74.73777105,
         69.5091633 , 60.23823657, 69.12935326, 67.55646317,
         65.10868325, 64.58053686],
        [79.07979103, 82.21836456, 74.92904092, 69.15802631,
         54.63349454, 67.93559111, 72.62280125, 78.94457043,
         70.50896904, 73.79335368],
        [75.36353838, 76.99135483, 67.67623238, 69.42428859,
         71.21606243, 86.89645753, 48.33803445, 75.46721846,
         78.86047331, 84.59021349]],

       [[79.95296623, 61.9922921 , 64.33604576, 81.62034998,
         75.35339524, 69.23745221, 58.70209583, 75.01664044,
         71.16015887, 70.19108603],
        [75.7114543 , 49.31545425, 85.71400994, 68.53578825,
         59.84809671, 60.92305415, 61.60753433, 82.82671149,
         69.55628355, 64.49613608],
        [69.84175434, 71.78339683, 76.92045085, 65.63166412,
         70.35045024, 71.93352092, 70.06825931, 80.36982052,
         59.68859737, 72.71113089]],

       [[65.49021116, 78.52684681, 66.89187906, 

Now because `temperatures` has three dimensions, it's hard to print out, but it's easy to extract values. Suppose wanted to know the temperature 1 km North of Duke, 2km East, and at an elevation of 10km:

In [9]:
temperatures[0, 1, 9]


73.79335367808505

Ta-Da! I could also get the temperature at all elevations at the point 1km North and 2km East:

In [10]:
temperatures[0, 1, :]


array([79.07979103, 82.21836456, 74.92904092, 69.15802631, 54.63349454,
       67.93559111, 72.62280125, 78.94457043, 70.50896904, 73.79335368])

Or the temperature at 5km elevation across all ground locations: 

In [11]:
temperatures[:, :, 4]


array([[69.5091633 , 54.63349454, 71.21606243],
       [75.35339524, 59.84809671, 70.35045024],
       [58.41207515, 86.31893806, 60.6086747 ]])

As you can see, this is a *really* powerful idea, and honestly arrays are probably the most fundamental data structure in data science. 

### Other Uses for Arrays

Not into climate modeling? OK, here are some other uses of arrays with more than 2 dimensions:

- **Color Images:** In our last reading, we saw that black and white images can be thought of as matrices, as each cell can contain the greyscale value for a given pixel. When working with color images, however, we often work with three-dimensional arrays, where the first two dimensions are the x-y coordinates of each pixel, and the third dimension is used to differentiate between the value for red in a pixel, the value of green in a pixel, the value of blue in a pixel, and the luminance (brightness) of the pixel. 
- **Repeated measurements over time:** Just as a single survey is easily represented as a matrix, so too can repeated surveys be easily represented by making time a third dimension. This makes it easy to pull out a single wave of the survey, or to pick out all the responses for a given person over time. 
- **Brain Scans:** fMRI scan the entire volume of the brain, and that volumetric data is most naturally stored in the three-dimensional array, just like the temperature data above. Obviously not all social scientists will end up working with brain scans, but there's certainly a lot of cutting-edge work in this area!
- **3D measurements that evolve over time:** just as we can model survey data that evolves over time in three dimensions, we can also model three-dimensional volumetric data that evolves over time (e.g. not just a slice of a climate model, but its evoluation) in a four-dimensional array!
- **Satellite Data:** Satellite image data usually comes in the form of sets of 2-dimensional images, where each image includes information about light intensity at a given wavelength. When these images are stacked to, say, generate a color image, or identify wavelength combinations common to certain types of pollution, flood waters, or specific crops, you get a three-dimensional array.

So yeah, high dimensional arrays are a SUPER powerful data structure, and one you shouldn't shy away.

## Moving Between Dimensions

Now that we've embraced the fact that there's nothing fundamentally special about an array that's one, two, or three-dimensional from the perspective of numpy (even if we tend to use arrays of different dimensions for different purposes), it's worth pausing to talk a little about challenges that can arise when moving between arrays of different dimensionality.  

### Narrow Matrices v. 1-Dimensional Vectors

In numpy, there is a distinction between a 1-dimensional vector (the data structure we've been working with up until now), and a 2-dimensional matrix with only 1 row or 1 column.

To illustrate, let's start by creating a simple vector and getting its `.shape`:

In [12]:
my_vector = np.array([1, 2, 3])
my_vector


array([1, 2, 3])

In [13]:
my_vector.shape


(3,)

As we can see, numpy only reports the size of our vector in one dimension (the trailing comma is included so you know that it's a list with one entry, not a weirdly formatted 3). One value is given because our data is one-dimensional. 

But if we create a matrix with one row and three columns (note I'm passing a list within a list to `np.array`), we get a data structure that *looks* similar, but is actually different in an important way, as evident from the output of `.shape`:

In [14]:
skinny_matrix = np.array([[1, 2, 3]])
skinny_matrix


array([[1, 2, 3]])

In [None]:
skinny_matrix.shape


(1, 3)

As we can see, this matrix is two-dimensional, as evidenced by the fact the `.shape` is reporting two numbers. 

If we want to make our skinny_matrix into a one-dimensional vector, we can do so with reshape: 

In [19]:
now_a_vector = skinny_matrix.reshape(3)
now_a_vector


array([1, 2, 3])

In [21]:
now_a_vector.shape


(3,)

Or we could use `.reshape` to make our original one dimensional vector a matrix:

In [23]:
now_a_matrix = my_vector.reshape((1, 3))
now_a_matrix


array([[1, 2, 3]])

In [25]:
now_a_matrix.shape


(1, 3)