| [⬅️ Previous Exercise](Exercise1-5_FunctionsClassesObjects.ipynb) | [🏠 Index](Index.ipynb) | [➡️ Next Exercise](Exercise2-2_Pandas.ipynb) |

# Exercise 2.1: Introduction to Python Data Science using NumPy

Having covered the basics of Python, we will now explore its applications for data science. Bypassing the [hype](https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century), **data science** is an interdisciplinary subject that lies at the intersection of statistics, computer programming, and domain expertise. It is best to think of data science not as a new field of knowledge itself, but rather as a *set of skills* for analysing and interrogating datasets within your existing area of expertise – in our case, environmental science and management.

<img src="./assets/datascience.png" alt="datascience" width="400"/>

Python's extensive, active "ecosystem" of packages like NumPy, Pandas, SciPy, and Matplotlib – all of which we will explore in this next set of exercises – lends itself well to data analysis and scientific computing. In addition, this section outlines techniques for importing, manipulating, visualizing, and exporting data in Python. While data come in a wide variety of formats, it is useful to conceptualize all data as **arrays of numbers** (recall the spreadsheet analogy from Exercise 1.4). For example, an image is, at its core, a two-dimensional array of numbers representing the brightness of each pixel across the image area. When envisioned this way, it is easy to see how the image can be transformed and analysed by manipulating values in the array: 

![imagearray](./assets/imagearray.png)


<p style="height:1pt"> </p>

<div class="boxhead2">
    Exercise 2.1 Topics
</div>

<div class="boxtext2">
<ul class="a">
    <li> 📌 NumPy Arrays </li>
    <ul class="b">
        <li> Constructing arrays from lists </li>
        <li> Constructing arrays from scratch </li>
    </ul>
    <li> 📌 Array Manipulation </li>
    <ul class="b">
        <li> Array attributes </li>
        <li> Indexing + slicing </li>
        <li> Array reduction </li>
        <li> Reshaping, resizing, + rearranging arrays </li>
        <li> Joining + splitting arrays </li>
    </ul>
    <li> 📌 Array Math</li>
    <ul class="b">
        <li> Universal functions </li>
        <li> Array-to-array math: broadcasting </li>
    </ul>
    <li> 📌 Handling missing data </li>
</ul>
</div>


<div class="boxhead2">
    Readings
</div>
<div class="boxtext2">
    This notebook is designed to be run as a stand alone exercise. However, the material covered can be supplemented by <a href="https://proquest-safaribooksonline-com.proxy.library.ucsb.edu:9443/book/programming/python/9781491912126/2dot-introduction-to-numpy/introduction_to_numpy_html"> Chapter 2</a> of the <a href="https://proquest-safaribooksonline-com.proxy.library.ucsb.edu:9443/book/programming/python/9781491912126"> <i>Python Data Science Handbook</i></a>.
</div>

<hr style="border-top: 0.2px solid gray; margin-top: 12pt; margin-bottom: 0pt"></hr>

### Instructions
Work through the exercise, writing code where indicated. To run a cell, click on the cell and press "Shift" + "Enter" or click the "Run" button in the toolbar at the top. Note: Do not restart the kernel and clear all outputs. If this happens, run the last cell in the notebook before proceeding.

<p style="color:#408000; font-weight: bold"> 🐍 &nbsp; &nbsp; This symbol designates an important note about Python structure, syntax, or another quirk.  </p>

<p style="color:#008C96; font-weight: bold"> ▶️ &nbsp; &nbsp; This symbol designates a cell with code to be run.  </p>

<p style="color:#008C96; font-weight: bold"> ✏️ &nbsp; &nbsp; This symbol designates a partially coded cell with an example.  </p>

<p style="color:#008C96; font-weight: bold"> 📚 &nbsp; &nbsp; This symbol designates a practice question.  </p>


<hr style="border-top: 1px solid gray; margin-top: 24px; margin-bottom: 1px"></hr>

## Introduction to NumPy

<img src="./assets/numpy.jpeg" alt="numpy" width="500"/>

NumPy, an abbreviation for *Numerical Python*, is the core library for scientific computing in Python. In addition to manipulation of array-based data, NumPy provides an efficient way to store and operate on very large datasets. In fact, nearly all Python packages for data storage and computation are built on NumPy arrays. 

This exercise will provide an overview of NumPy, including how arrays are created, NumPy functions to operate on arrays, and array math. While most of the basics of the NumPy package will be covered here, there are many, many more operations, functions, and modules. As always, you should consult the [NumPy Docs](https://docs.scipy.org/doc/numpy/reference/index.html) to explore its additional functionality.

Before jumping into NumPy, we should take a brief detour through importing libraries in Python. While most packages we will use – including NumPy – are developed by third-parties, there are a number of "standard" packages that are built into the Python API. The following table contains a description of a few of the most useful modules worth making note of.

| Module | Description | Syntax |
| :----- | :---------- | :----- |
| <a href="https://docs.python.org/3.8/library/os.html" style="text-decoration: none; font-family: Lucida Console, Courier, monospace; font-weight: bold"> os </a> | Provides access to operating system functionality | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> import os </span> |
| <a href="https://docs.python.org/3.8/library/math.html" style="text-decoration: none; font-family: Lucida Console, Courier, monospace; font-weight: bold"> math </a> | Provides access to basic mathematical functions | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> import math </span> |
| <a href="https://docs.python.org/3.8/library/random.html" style="text-decoration: none; font-family: Lucida Console, Courier, monospace; font-weight: bold"> random </a> | Implements pseudo-random number generators for various distributions | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> import random </span> |
| <a href="https://docs.python.org/3.8/library/os.html" style="text-decoration: none; font-family: Lucida Console, Courier, monospace; font-weight: bold"> datetime </a> | Supplies classes for generating and manipulating dates and times | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> import datetime as dt </span> |

<div class="python">
    🐍 <b>Import syntax.</b> 
    As we saw in Exercise 1.5, modules and packages can be loaded into a script using an <code>import</code> statement: <code>import [module]</code> for the entire module, or <code>from [module] import [identifier]</code> to import a certain class of the module. All modules and packages used in a program should be imported at the beginning of the program.
    
Many packages are imported with standard abbreviations (such as <code>dt</code> for the <code>datetime</code> module) using the following syntax:
    
<p style="margin-left:60pt"><code>import [module] as [name]</code></p>

The standard syntax for importing NumPy is:

<p style="margin-left:60pt"> <code><span style="font-weight:bold; color:#007A00">import</span> numpy <span style="font-weight:bold; color:#007A00">as</span> np</code></p>

</div>

<div class="run">
    ▶️ <b> Run the cell below. </b>
</div>


In [1]:
import numpy as np

### NumPy Arrays
<hr style="border-top: 0.2px solid gray; margin-top: 12px; margin-bottom: 1px"></hr>

The *n*-dimensional array object in NumPy is referred to as an `ndarray`, a multidimensional container of *homogeneous* items – i.e. all values in the array are the same type and size. These arrays can be one-dimensional (one row or column vector), two-dimensional (*m* rows x *n* columns), or three-dimensional (arrays within arrays).

<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Constructing arrays from lists </span> </h4>

There are two main ways to construct NumPy arrays. The first involves using the `np.array()` function to generate an array from one or more lists:

```python
np.array([8,0,9,1,4])
>>> array([8, 0, 9, 1, 4])
```

Recall that unlike lists, all elements within an array must be of the same type. If the types do not match, NumPy will "upcast" if possible (e.g. convert integers to floats):

```python
np.array([8.14,0.12,9,1.77,4])
>>> array([8.14, 0.12, 9.  , 1.77, 4.  ])
```

In these examples, we have created one-dimensional arrays. By default, elements in a one-dimensional array are cast as rows in a column (i.e. a column vector). If, however, we wanted a row vector instead, we could use double brackets `[[]]` to create an array with one row and multiple columns:

```python
np.array([[8,0,9,1,4]]) # row vector with 5 columns
>>> array([[8, 0, 9, 1, 4]])
```

This is because NumPy treats the inner element(s) or list(s) as rows. This is easier to see with a multidimensional array:

```python
np.array([[3,2,0,1],[9,1,8,7],[4,0,1,6]]) # array with 3 rows x 4 columns

>>> array([[3, 2, 0, 1],
           [9, 1, 8, 7],
           [4, 0, 1, 6]])
```


<div class="practice">
    📚  <b> Practice 1. </b> 
    Create the following arrays and assign the corresponding variable names:
<ol class="alpha">
    <li> <code>a</code> </li>
        $ \begin{matrix}
            4 &   5 &   0 & 12 & -1 \\
            8 & -21 &  -4 &  6 &  3 \\
           17 &   1 & -13 &  7 &  0
          \end{matrix}$
    <li> <code>b</code> </li> 
        $ \begin{matrix}
            1.0 & 2.7 & 0 & 0.188 & 4.07 & 0.24
          \end{matrix}$
    <li> <code>c</code> </li> 
        $ \begin{matrix}
            0.4 \\
            0.8 \\
            1.2 \\
            1.6 \\
            2.0 \\
            2.4
          \end{matrix}$
</ol>    
</div>

In [90]:
np.array([])

array([], dtype=float64)

In [4]:
import numpy as np

a = np.array([[4, 5, 0, 12, -1] , [8, -21, -4, 6, 3] , [17, 1, -13, 7, 0]])

b = np.array([[1.0, 2.7, 0, 0.188, 4.07, 0.24]])

c = np.array([[0.4], [0.8], [1.2], [1.6], [2.0], [2.4]])

print(a)

print(b)

print(c)

[[  4   5   0  12  -1]
 [  8 -21  -4   6   3]
 [ 17   1 -13   7   0]]
[[1.    2.7   0.    0.188 4.07  0.24 ]]
[[0.4]
 [0.8]
 [1.2]
 [1.6]
 [2. ]
 [2.4]]


<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Constructing arrays using functions </span> </h4>

Oftentimes, it will be more efficient to construct arrays from scratch using NumPy functions. The `np.arange()` function is used to generate an array with evenly spaced values within a given interval. `np.arange()` can be used with one, two, or three parameters to specify the *start*, *stop*, and *step* values. If only one value is passed to the function, it will be interpreted as the *stop* value:


```python
# Create an array of the first seven integers 
np.arange(7)
>>> array([0, 1, 2, 3, 4, 5, 6])

# Create an array of floats from 1 to 12
np.arange(1.,13.)
>>> array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])

# Create an array of values between 0 and 20, stepping by 2
np.arange(0,20,2)
>>> array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
```

Similarly, the `np.linspace()` function is used to construct an array with evenly spaced numbers over a given interval. However, instead of the *step* parameter, `np.linspace()` takes a *num* parameter to specify the number of samples within the given interval:

```python
# Create an array of 5 evenly spaced values between 0 and 1
np.linspace(0,1,5)
>>> array([0.  , 0.25, 0.5 , 0.75, 1.  ])
```

Note that unlike `np.arange()`, `np.linspace()` includes the *stop* value by default (this can be changed by passing *`endpoint=True`*). Finally, it should be noted that while we could have used `np.arange()` to generate the same array in the above example, it is recommended to use `np.linspace()` when a non-integer step (e.g. 0.25) is desired.



<div class="practice">
    📚  <b> Practice 2. </b> 
<ol class="alpha">
    <li> Create a new array <code>d</code> of integers the multiples of 3 between 0 and 100.</li>
    <li> Create an array <code>f</code> 10 evenly spaced elements between 0 and 2.</li> 
    <li> Re-create array <code>c</code> from Practice 1c using a function. Assign this to variable name <code>g</code>. </li>
</ol>    
</div>

In [14]:
d = np.arange(0, 100, 3)
d

f = np.linspace(0,2,10)
f

g = np.arange(0.4,2.8,0.4)
g


[0.4 0.8 1.2 1.6 2.  2.4]


There are several functions that take a *shape* argument to generate single-value arrays with specified dimensions passed as a tuple (*rows,columns*):

```python
# Create a 1D array of zeros of length 4
np.zeros(4)
>>> array([0., 0., 0., 0.]

# Create a 4 x 3 array filled with zeros
np.zeros((4,3))
>>> array([[0., 0., 0.],
           [0., 0., 0.],
           [0., 0., 0.],
           [0., 0., 0.]])

# Create a 4 x 3 array filled with ones
np.ones((4,3))
>>> array([[1., 1., 1.],
           [1., 1., 1.],
           [1., 1., 1.],
           [1., 1., 1.]])


# Create a 4 x 3 array filled with 3.14
np.full((4,3),9.87)
>>> array([[9.87, 9.87, 9.87],
           [9.87, 9.87, 9.87],
           [9.87, 9.87, 9.87],
           [9.87, 9.87, 9.87]])
```

The `np.random.rand()` function is used to generate *n*-dimensional arrays filled with random numbers between 0 and 1:

```python
# Create a 4 x 3 array of uniformly distributed random values
np.random.rand(4,3)
>>> array([[0.17461878, 0.74586348, 0.9770975 ],
           [0.77861373, 0.28807114, 0.10639001],
           [0.09845499, 0.36038089, 0.58533369],
           [0.30983962, 0.74786381, 0.27765305]])
```

As we will see, the `np.random.rand()` function is very useful for sampling and modeling.

The last array-construction function we will consider (but by no means the last in the [NumPy API](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html)!) is the `np.eye()` function, which is used to generate the two-dimensional identity matrix:

```python
# Create a 4 x 4 identity matrix
np.eye(4)
>>> array([[1., 0., 0., 0.],
           [0., 1., 0., 0.],
           [0., 0., 1., 0.],
           [0., 0., 0., 1.]])
```

Lastly, it's worth noting that nearly all of these functions contain an optional *dtype* parameter, which can be used to specify the data-type of the resulting array (e.g. `np.ones((4,3),dtype=int)` would return a 4 x 3 array of ones as integers, rather than the default floats).

<div class="practice">
    📚  <b> Practice 3. </b> Assign the following to variables:
    <ol class="alpha">
        <li> A 5x3 array of ones. </li>
        <li> A one-dimensional array of 6 zeros. </li> 
        <li> A 7x7 identity array. </li>
        <li> A random 10x10 array. </li>
</ol>
</div>

In [31]:
np.ones((5, 3))

np.zeros(6)

np.eye(7)

np.random.rand(10,10)

array([[0.9753657 , 0.7812122 , 0.21300136, 0.77169591, 0.9769775 ,
        0.92654097, 0.09359439, 0.96369153, 0.55594898, 0.14831787],
       [0.72495379, 0.4931098 , 0.88895476, 0.90778538, 0.0698082 ,
        0.86432969, 0.89656358, 0.49070386, 0.54957278, 0.25960646],
       [0.27885953, 0.8578936 , 0.62820722, 0.04417704, 0.47295186,
        0.39057057, 0.44315299, 0.58214867, 0.92801418, 0.68440713],
       [0.41474167, 0.44685893, 0.05380595, 0.69550307, 0.95208015,
        0.7204484 , 0.71501811, 0.35062258, 0.91382094, 0.53618549],
       [0.96427435, 0.91962751, 0.7331714 , 0.48656134, 0.0661219 ,
        0.32708815, 0.20219855, 0.11580922, 0.14290983, 0.49706505],
       [0.15410424, 0.48517381, 0.28824445, 0.7361962 , 0.13101368,
        0.24359515, 0.29326842, 0.49166018, 0.15494012, 0.98072884],
       [0.70192818, 0.35373674, 0.32380839, 0.75851733, 0.30385186,
        0.19135498, 0.53976641, 0.16509295, 0.60222566, 0.13257798],
       [0.73454367, 0.68079143, 0.1441609

### Array Manipulation
<hr style="border-top: 0.2px solid gray; margin-top: 12px; margin-bottom: 1px"></hr>

Having established how to construct arrays in NumPy, let's explore some of the attributes of the `ndarray`, including how to manipulate arrays. Nearly all data manipulation in Python involves NumPy array manipulation; many other Python data tools like Pandas (Exercise 2.2) are built on the NumPy array. Thus, while many of the examples below may seem trivial, understanding these operations will be critical to understanding more complex operations and Python data manipulation more broadly.


<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Array attributes </span> </h4>

Array attributes are properties that are intrinsic to the array itself. While there are quite a few attributes of NumPy arrays, the ones we will use most often provide information about the size, shape, and type of the arrays:

| Method | Description |
| :----- | :---------- |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> ndarray.ndim </span> | Number of array dimensions |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> ndarray.shape </span> | Tuple of array dimensions (*rows*, *columns*) |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> ndarray.size </span> | Total number of elements in the array |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> ndarray.dtype </span> | Data-type of array elements |

For example, let's create a random two-dimensional array and explore its attributes using the above methods.

```python
# Initialize array
a = np.random.rand(4,7)

# Determine array dimensions
a.ndim
>>> 2

# Determine array shame
a.shape
>>> (4, 7)

# Determine array size
a.size
>>> 28

# Determine data-type
a.dtype
>>> dtype('float64')

```

<div class="example">
    ✏️ <b> Try it. </b> 
    Construct two array vectors, a column vector and a row vector, from the list <code>[8,0,9,1,4]</code>, as in the first example. Using the <code>ndarray.ndim</code> and <code>ndarray.shape</code> methods, show the difference between constructing an array with single vs. double brackets.
</div>

In [43]:
a = np.array([8, 0, 9, 1, 4])

print(a)
print(a.ndim)
print(a.shape)

b = np.array([[8], [0], [9], [1], [4]])

print(b)
print(b.ndim)
print(b.shape)

print(b.dtype)

[8 0 9 1 4]
1
(5,)
[[8]
 [0]
 [9]
 [1]
 [4]]
2
(5, 1)
int64


<div class="practice">
    📚  <b> Practice 4. </b> Use array methods and the array you created in Practice 2a (<code>d</code>) to count the number of multiples of 3 between 0 and 100.
</div>

In [42]:
d = np.arange(0,100,3)

d.shape
d.size

abc = np.random.rand(10, 10, 2)
print(abc)

abc.size

[[[0.42006648 0.03961602]
  [0.60102133 0.9391203 ]
  [0.50929561 0.81569137]
  [0.64271059 0.43691179]
  [0.81739457 0.19954269]
  [0.30616252 0.49637925]
  [0.06409667 0.60924324]
  [0.56301552 0.60907231]
  [0.46485821 0.44458697]
  [0.21334547 0.6325322 ]]

 [[0.90446636 0.04720751]
  [0.79560997 0.58368606]
  [0.75500127 0.78855217]
  [0.2257198  0.49816296]
  [0.65749382 0.29106669]
  [0.04882232 0.90037988]
  [0.07711728 0.95145542]
  [0.08462848 0.32555278]
  [0.628541   0.69355345]
  [0.21974338 0.94954557]]

 [[0.67984992 0.76021835]
  [0.99652366 0.19661601]
  [0.90155186 0.98441869]
  [0.72824059 0.42174297]
  [0.87583339 0.66246721]
  [0.88335743 0.84796736]
  [0.94702488 0.68578674]
  [0.09612692 0.57813229]
  [0.44904924 0.75211625]
  [0.44122546 0.01499516]]

 [[0.64255219 0.70176309]
  [0.50450521 0.8477316 ]
  [0.71588704 0.81221865]
  [0.39389064 0.05843888]
  [0.69439162 0.73645369]
  [0.25795535 0.27482502]
  [0.23708271 0.63729939]
  [0.25664612 0.04877287]
  [0.7

200

In [44]:
np.size(d)

34

<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Indexing + slicing </span> </h4>

Indexing arrays is analogous to indexing lists:

```python
# Initialize a one-dimensional array
x1 = np.array([8,0,9,1,4])

# Return the value in position 1
x1[1]
>>> 0
```

With multidimensional arrays, a tuple of indices can be passed to access the rows and columns of an array: `ndarray[row,column]`. If a single index is passed, the corresponding row element will be returned:

```python
# Initialize a two-dimensional array
x2 = np.array([[3,2,0,1],
               [9,1,8,7],
               [4,0,1,6]])

# Return the value of the element in the 2nd row, 3rd column
x2[1,2]
>>> 8 

# Return the entire second row
x2[1]
>>> array([9, 1, 8, 7])
```

Slicing of arrays allows you to access parts of arrays or *subarrays*. Just like with lists, slicing follows the syntax `ndarray[start:stop:step]`.

```python
# Return the elements in positions 1-4
x1[1:]
>>> array([0, 9, 1, 4])
```

For multidimensional arrays, a tuple of slices is used: `ndarray[row_start:row_end:row_step, col_start:col_end:col_step]`.
    
```python
# Return the entire third column
x2[:,2]
>>> array([0, 8, 1])

# Return the first two rows and two columns
x2[:2,:2]
>>> array([[3, 2],
           [9, 1]])

# Return all rows and every other column
x2[:,::2]
>>> array([[3, 0],
           [9, 8],
           [4, 1]])
```

<div class="practice">
    📚  <b> Practice 5. </b> Using the array you created in Practice 3d,
    <ol class="alpha">
        <li> Print all the elements in column 4. </li>
        <li> Print all the elements in row 7. </li>
        <li> Extract the 4x4 subarray at the center of the array and assign it as a new variable. </li>
        <li> Print the last two values in column 10. </li>
</ol>
</div>

In [81]:
#rand_array = np.random.rand(10,10)
#print(rand_array)

#rand_array[:,3] #1

#rand_array[6,:] #2

#rand_subset = rand_array[3:7, 3:7]

#rand_array[8:10, 9]
#rand_array[-1:-2, 9]

small_array = np.random.rand(4,4)

print(small_array)

small_array[-2:,-1]

small_array[-1,-2:]


[[0.46328908 0.73515083 0.64882641 0.43771294]
 [0.04440577 0.97295703 0.49298278 0.02368259]
 [0.8628705  0.0222338  0.19913835 0.67664111]
 [0.94192041 0.57993722 0.22933749 0.48703136]]


array([0.22933749, 0.48703136])

<div class="practice">
    📚  <b> Practice 6. </b> Create a blank 8x8 matrix and fill it with a checkerboard pattern of 0s and 1s using indexing.
</div>

In [89]:
check = np.zeros((8, 8))

check[0::2, 0::2] = 1
check[1::2, 1::2] = 3

print(check)


    

[[1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 3. 0. 3. 0. 3. 0. 3.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 3. 0. 3. 0. 3. 0. 3.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 3. 0. 3. 0. 3. 0. 3.]
 [1. 0. 1. 0. 1. 0. 1. 0.]
 [0. 3. 0. 3. 0. 3. 0. 3.]]


<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Array reduction </span> </h4>

**Array reduction** refers to the computation of summary statistics on an array – i.e. *reducing* an array to a single aggregate value, such as the mean, minimum, maximum, etc. These array reduction methods are similar to those used for lists:

```python
x2 = np.array([[3,2,0,1],
               [9,1,8,7],
               [4,0,1,6]])

# Sum of all values in array
x2.sum()
>>> 42

# Maximum value of the array
x2.max()
>>> 9

# Minimum value of the array
x2.min()
>>> 0

# Mean value of the array
x2.mean()
>>> 3.5

# Standard deviation of the array
x2.std()
>>> 3.095695936834452

```

All of these methods can be passed with an *`axis`* argument, which allows for aggregation across the rows or columns of the array. In NumPy – as well as the many libraries built on NumPy, axis `0` always refers to the *rows* of an array, while axis `1` refers to the *columns*:

```python
# Mean of each row (calculated across columns)
x2.mean(axis=1)
>>> array([1.5 , 6.25, 2.75])

# Maximum value of each column (calculated across rows)
x2.max(axis=0)
>>> array([9, 2, 8, 7])
```

<div class="python">
    🐍 <b>Functions vs. Methods.</b> 
    Recall from Exercise 1.5 that <i>functions</i> and <i>methods</i> in Python are essentially the same thing. The key difference, however, is that functions can be called generically, while methods are always attached to and called on objects. It is also worth noting that while a method may alter the object itself, a function <i>usually</i> simply operates on an object without changing it, and then prints something or returns a value.
    
For each of the array reduction <i>methods</i> demonstrated above, there is a corresponding <i>function</i>. For example, the mean of an array can be calculated using the <i>method</i> <code>ndarray.mean()</code> or the <i>function</i> <code>np.mean(ndarray)</code>.
    
These – and the many additional – aggregation <i>functions</i> in NumPy can be used, not only on arrays, but on any numerical object.
</div>

<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Reshaping, resizing, + rearranging arrays </span> </h4>

Other useful array operations include **reshaping**, **resizing**, and **rearranging** arrays. The `ndarray.reshape()` method is used to change the shape of an array: 

```python
# Initialize a one-dimensional array with 16 elements
a = np.arange(1.0,17.0)

a
>>> array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14., 15., 16.])

# Reshape array a into a 4x4 array
b = a.reshape(4,4)

b
>>> [[ 1.  2.  3.  4.]
     [ 5.  6.  7.  8.]
     [ 9. 10. 11. 12.]
     [13. 14. 15. 16.]]


```

There are a few important things to note about the `ndarray.reshape()` method. First and unsurprisingly, the *size* of array must be preserved (i.e. the size of the reshaped array must match that of the original array). Secondly, and perhaps more importantly, the `ndarray.reshape()` method creates a **view** of the original array `a`, rather than a **copy**, which would allow the two variables to exist independently. Because `b` is a *view* of `a`, any changes made to `b` will also be applied to `a`:

```python
# Reset the value in the third row, third column (11.0)
b[2,2] = 0.0

b
>>> array([[ 1.,  2.,  3.,  4.],
           [ 5.,  6.,  7.,  8.],
           [ 9., 10.,  0., 12.],
           [13., 14., 15., 16.]])

a
>>> array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.,  0., 12., 13., 
           14., 15., 16.])

```

Unlike `ndarray.reshape()`, the `ndarray.resize()` method operates *in-place* on the original array. The `ndarray.resize()` method is used to add or delete rows and/or columns:

```python
# Initialize a 2 x 3 array
a = np.array([[1,2,3],[4,5,6]])

# Copy the original array
smaller = a.copy()
# Use ndarray.resize() to reshape to a 2x2 array and delete the last two elements
smaller.resize(2,2)

smaller
>>> array([[1, 2],
           [3, 4]])

# Copy the original array
bigger = a.copy()
# Use ndarray.resize() to reshape to a 6x6 array by adding zeros
bigger.resize(6,6)

bigger
>>> array([[1, 2, 3, 4, 5, 6],
           [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0]])
```


<div class="python">
    🐍 <b>Copies vs. Views</b> 
    This is just one example of many occasions when it is advisable to create a <b>copy</b> of the original object before manipulating it. Had we not copied <code>a</code> before resizing it to a 2x2 array, the last two elements would have been permanently deleted, as <code>a</code> itself would have been resized. A good rule of thumb is to <b>always create a copy</b> before changing or deleting any data.
</div>

Often it is useful to **rearrange** the elements in an array. The `ndarray.transpose()` method – or simply `ndarray.T`, transposes the array, switching the rows and columns, while the `np.flip()`, `np.flipud()`, and `np.fliplr()` functions reverse the order of elements in the array along a given axis:

```python
# Initialize a new 4x5 array
x = np.array([[4, 2, 0, 1, 5],
              [9, 4, 1, 3, 0],
              [6, 0, 8, 5, 9],
              [7, 3, 2, 7, 4]])

# Transpose rows + columns
x.T
>>> array([[4, 9, 6, 7],
           [2, 4, 0, 3],
           [0, 1, 8, 2],
           [1, 3, 5, 7],
           [5, 0, 9, 4]])

# Flip the array (reverse the order of all elements)
np.flip(x)
>>> array([[4, 7, 2, 3, 7],
           [9, 5, 8, 0, 6],
           [0, 3, 1, 4, 9],
           [5, 1, 0, 2, 4]])

# Flip the array up/down (reverse the order of the rows)
np.flipud(x)
>>> array([[7, 3, 2, 7, 4],
           [6, 0, 8, 5, 9],
           [9, 4, 1, 3, 0],
           [4, 2, 0, 1, 5]])

# Flip the array left/right (reverse the order of the columns)
np.fliplr(x)
>>> array([[5, 1, 0, 2, 4],
           [0, 3, 1, 4, 9],
           [9, 5, 8, 0, 6],
           [4, 7, 2, 3, 7]])
```

When passed with the *`axis`* argument, `np.flip()` mimics the `np.flipud()` and `np.fliplr()` functions:

```python
# Flip the array over the row axis (same as np.flipud(x))
np.flip(x, axis=0)
>>> array([[7, 3, 2, 7, 4],
           [6, 0, 8, 5, 9],
           [9, 4, 1, 3, 0],
           [4, 2, 0, 1, 5]])

# Flip the array over the column axis (same as np.fliplr(x))
np.flip(x, axis=1)
>>> array([[5, 1, 0, 2, 4],
           [0, 3, 1, 4, 9],
           [9, 5, 8, 0, 6],
           [4, 7, 2, 3, 7]])
```

<div class="practice">
    📚  <b> Practice 7. </b>
    <ol class="alpha">
        <li> Create a 3x3 matrix with values ranging from 0 to 8. </li>
        <li> Reverse the order of elements in your random 10x10 array from 3d. </li>
</ol>
</div>

In [115]:
#problem a
a = np.arange(0,9).reshape(3,3)

#problem b
rand_array_flipped = np.flip(rand_array)
rand_array_flipped

##flipping left/right
np.fliplr(a.reshape(3,3))
np.flipud(a.reshape(3,3))

Object `random.rand` not found.


array([[6, 7, 8],
       [3, 4, 5],
       [0, 1, 2]])

In [132]:
a = np.random.rand(5,5) * 6 - 3

a = np.array(np.random.rand(5,5) * 6 - 3).round()

r = np.array(np.random.rand(5,5) * 20 - 10)

r


array([[ 0.37180788, -8.58215252,  3.52456861,  3.52676801,  4.38474875],
       [ 4.67365017, -8.84745251, -6.80310834, -4.17167386, -8.92994822],
       [ 8.25352424, -7.51552724,  7.29373797, -9.84013959, -5.88989698],
       [ 6.84904487, -7.38033019, -2.28256366,  5.6921663 , -2.44697108],
       [-6.28675596,  4.37397462, -0.80426792,  4.87303567,  4.18485872]])

<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Joining + splitting arrays </span> </h4>

So far, we have considered array manipulation routines that operatee on a single array. We will encounter many scenarios in which it is necessary to combine multiple arrays into one or, conversely, to split a single array into two or more separate objects.

**Concatenation** in computer programming refers to the process of joining multiple objects end-to-end. The most common way of concatenating arrays in NumPy is with the `np.concatenate()` function, which takes a tuple of arrays:

```python
# Initialize a 3x3 array
x = np.array([[4,2,0],
              [9,4,1],
              [6,0,8]])
# Initialize a 1x3 array
y = np.array([[2,8,6]])

# Concatenate x and y
np.concatenate((x,y))
>>> array([[4, 2, 0],
           [9, 4, 1],
           [6, 0, 8],
           [2, 8, 6]])
```

Note that, by default, `np.concatenate()` operates along the *row* axis (`0`). To concatenate along the column axis, we must specify `axis=1` as an argument:

```python
# Concatenate x and y along the column axis
np.concatenate((x,y), axis=1)
>>> ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-65-6c2205ef28d2> in <module>
          5 y = np.array([[2,8,6,0]])
          6 
    ----> 7 np.concatenate((x,y),axis=1)

    <__array_function__ internals> in concatenate(*args, **kwargs)

    ValueError: all the input array dimensions for the concatenation axis must match exactly, but along 
    dimension 0, the array at index 0 has size 3 and the array at index 1 has size 1
```

Uh-oh! Unsurprisingly, when we tried to concatenate an array with 1 row to an array with 3 rows, we got a `ValueError`. For `np.concatenate()` to work, the dimensions must match. Thus, we must first transpose `y` before adding it to `x` as a column:

```python
# Transpose y and concatenate x and y along the column axis
np.concatenate((x,y.T),axis=1)
>>> array([[4, 2, 0, 2],
           [9, 4, 1, 8],
           [6, 0, 8, 6]])
```

Equivalently, we could use the `np.vstack()` or `np.hstack()` function to concatenate directly along the row or column axis, respectively:

```python 
# Stack rows of x and y (same as np.concatenate((x,y), axis=0)
np.vstack((x,y))
>>> array([[4, 2, 0],
           [9, 4, 1],
           [6, 0, 8],
           [2, 8, 6]])

# Stack columns of x and y (same as np.concatenate((x,y), axis=1)
np.hstack((x,y.T))
>>> array([[4, 2, 0, 2],
           [9, 4, 1, 8],
           [6, 0, 8, 6]])
```
<div class="practice">
    📚  <b> Practice 8. </b> Create two random 1-D arrays of length 10. Merge them into a 2x10 array and then a 10x2 array.
</div>

In [149]:
#dog = np.arange(0, 110, 11)   This didn't work because it was in 1 dimension. would need to reshape. 
#cat = np.arange(2, 22, 2)

#print(dog.ndim)

#dog_cat = np.concatenate((dog, cat))

d3og_cat


a = np.random.rand(1,10)
b = np.random.rand(1,10)

np.concatenate(a, b)

np.concatenate((a.T,b.T), axis = 1)



1


array([[0.83207199, 0.89720156],
       [0.41075076, 0.47937695],
       [0.50515307, 0.52661865],
       [0.77239471, 0.78327747],
       [0.59990329, 0.85693889],
       [0.88788498, 0.54768182],
       [0.08037391, 0.13053282],
       [0.88442464, 0.49495313],
       [0.71784954, 0.05203905],
       [0.61951157, 0.38045112]])

In [155]:
a = np.random.rand(10)
a.ndim

b = np.random.rand(1,10)
b.ndim

c = np.random.rand(10,1)
c.ndim



2

In [None]:
x = np.array([[4, 2, 0],
            [9, 4, 4],
            [6, 0, 8]])

y = np.array([[5, 7, 3]])


np.concatenate((x, y))

Conversely, **splitting** allows you to breakdown a single array into multiple arrays. Splitting is implemented with the `np.split()`, `np.vsplit()`, and `np.hsplit()` functions. 

```python
# Initialize a 4x3 array
z = np.array([[4, 2, 0],
              [9, 4, 1],
              [6, 0, 8],
              [2, 8, 6]])

# Split z into two arrays at row 1
np.split(z,[1])
>>> [array([[4, 2, 0]]), array([[9, 4, 1],
                                [6, 0, 8],
                                [2, 8, 6]])]

# OR
np.vsplit(z,[1])
>>> [array([[4, 2, 0]]), array([[9, 4, 1],
                                [6, 0, 8],
                                [2, 8, 6]])]

# Split z into two arrays at column 1
np.hsplit(z,[1])
>>> [array([[4],
            [9],
            [6],
            [2]]), 
     array([[2, 0],
            [4, 1],
            [0, 8],
            [8, 6]])]

```

Multiple indices can be passed to the `np.split()` and related functions, with *n* indices (split points) resulting in *n + 1* subarrays.

<div class="practice">
    📚  <b> Practice 9. </b>
    <ol class="alpha">
        <li> Split your random 10x10 array from 3d into two 10x5 arrays. </li>
        <li> Combine the first 10x5 array from (a), the 10x2 array from 8b, and the other 10x5 array from (a). In other words, recombine the 10x10 array from 3d with two new columns in index positions 5 and 6. Your final array should have 10 rows and 12 columns. Verify this by printing the shape of the resulting array. </li>
</ol>
</div>

In [173]:
#problem a
random_array = np.random.rand(10,10)
random_array

print(np.hsplit(random_array, [5]))

first = np.ones([10,5])
second = np.ones([10,2]) * 2
third = np.ones([10,5]) * 3

print(first)
print(second)
print(third)


# problem b
np.concatenate((first, second, third), axis = 1)



[array([[0.48473494, 0.67271204, 0.49348845, 0.99272751, 0.78242128],
       [0.59457815, 0.74046319, 0.38615192, 0.51494038, 0.61684541],
       [0.5797333 , 0.70093733, 0.83749014, 0.36944348, 0.41168413],
       [0.06715802, 0.49737245, 0.8133778 , 0.20476177, 0.67382813],
       [0.36733957, 0.86522231, 0.07832902, 0.06594352, 0.23660911],
       [0.604765  , 0.97837916, 0.38973423, 0.31505264, 0.90776089],
       [0.2506873 , 0.57275289, 0.41317174, 0.51639639, 0.06748707],
       [0.97678669, 0.25732111, 0.68512469, 0.17748098, 0.64971749],
       [0.71812396, 0.55359389, 0.11726082, 0.24373461, 0.85696893],
       [0.02879313, 0.58879206, 0.1197084 , 0.65999674, 0.85110773]]), array([[0.83744503, 0.67662322, 0.88059301, 0.15547323, 0.52454038],
       [0.35407688, 0.04737487, 0.12936284, 0.46232559, 0.35972694],
       [0.3544713 , 0.7023631 , 0.66438757, 0.27623451, 0.48653762],
       [0.51023288, 0.28351406, 0.35294497, 0.87467297, 0.88428869],
       [0.75936368, 0.98292223,

array([[1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.],
       [1., 1., 1., 1., 1., 2., 2., 3., 3., 3., 3., 3.]])

In [171]:
random_array = np.random.rand(10,10)


[a, b] = np.hsplit(random_array, [5])

print(a, "\n\n\n", b)


[[0.27894227 0.02533418 0.21981849 0.20303552 0.86338881]
 [0.37283641 0.27090703 0.34609136 0.38157743 0.76173973]
 [0.86399343 0.07690506 0.94960056 0.42441217 0.28781252]
 [0.94927187 0.12473387 0.34317322 0.99358239 0.8754051 ]
 [0.0107408  0.47739586 0.2739473  0.55015891 0.51857266]
 [0.49898433 0.9922226  0.20456264 0.21528433 0.56309997]
 [0.8380312  0.99043082 0.88245633 0.09175759 0.64600299]
 [0.08984134 0.64213403 0.79917645 0.15438514 0.21793857]
 [0.48134868 0.41470381 0.93344585 0.30744417 0.43663535]
 [0.66268003 0.4763383  0.33882824 0.76435247 0.8622519 ]] 


 [[0.57253784 0.19417293 0.96926551 0.44982943 0.98631408]
 [0.96696181 0.46731953 0.45393797 0.41171576 0.68640447]
 [0.49471937 0.75407755 0.20177787 0.23832262 0.30907208]
 [0.86745197 0.57092365 0.5205455  0.42477202 0.36740726]
 [0.51428129 0.90507374 0.02719672 0.41269496 0.4916678 ]
 [0.67499249 0.66992859 0.72414101 0.38653893 0.4718473 ]
 [0.82691364 0.50208882 0.93033142 0.30169144 0.10050005]
 [0.62591

### Array Math

<hr style="border-top: 0.2px solid gray; margin-top: 12px; margin-bottom: 1px"></hr>

One of the key advantages of NumPy is its ability to perform *vectorized* operations using **universal functions** (ufuncs), which perform element-wise operations on arrays very quickly. For example, say we had a very large list of data, and we wanted to perform some mathematical operation on all of the data elements. We could store this data as a `list` or an `ndarray`:

```python
# Create a list of the first 10,000 integers
a = list(range(10000))

# Create a one-dimensional array of the first 10,000 integers
b = np.arange(10000)
```

Now, let's multiply each element in our dataset by 2. We can accomplish this by using a `for` loop for the list `a` and a **ufunc** for array `b`. (The `%timeit` module is a built-in Python function used to calculate the time it takes to execute short code snippets.)

```python
# Use a for loop to multiply every element in a by 2
%timeit [i*2 for i in a]
# Use a ufunc to multiply every element in b by 2
%timeit b * 2

>>> 388 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    3.58 µs ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```


The `%timeit` module is a built-in Python function used to calculate the time it takes to execute short code snippets.

<div class="run">
    ▶️ <b> Run the cell below. </b>
</div>

In [175]:
# Create a list of the first 10,000 integers
list10 = list(range(10000))
# Use a for loop to multiply every element in a by 2
%timeit [i*2 for i in list10]

# Create a one-dimensional array of the first 10,000 integers

##can only do multiplication with numpy array. 
array10 = np.arange(10000)
# Use a ufunc to multiply every element in b by 2
%timeit array10 * 2

384 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
4.65 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


As you can see, the `for` loop took about 100 times longer than the exact same element-wise array operation!

<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Computation on single arrays using ufuncs </span> </h4>

Ufuncs are fairly straightforward to use, as they rely on Python's native operators (e.g. `+`, `-`, `*`, `/`):

```python
# Create a 2x4 array of floats
x  = np.array([[1.,2.,3.,4.],
               [5.,6.,7.,8.]])

# Do some math
# Addition
x + 12
>>> array([[13., 14., 15., 16.],
           [17., 18., 19., 20.]])

# Subtraction
x - 400
>>> array([[-399., -398., -397., -396.],
           [-395., -394., -393., -392.]])

# Exponentiation
x ** 2
>>> array([[ 1.,  4.,  9., 16.],
           [25., 36., 49., 64.]])

# Combine operations
10 ** (x/2)
>>> array([[3.16227766e+00, 1.00000000e+01, 3.16227766e+01, 1.00000000e+02],
           [3.16227766e+02, 1.00000000e+03, 3.16227766e+03, 1.00000000e+04]])
```

These arithmetic operators act as *wrappers* (effectively shortcuts) around specific built-in NumPy functions; for example, the `+` operator is a convenient shortcut for the `np.add()` function:

```python
x + 2
>>> array([[ 3.,  4.,  5.,  6.],
           [ 7.,  8.,  9., 10.]])

np.add(x,2)
>>> array([[ 3.,  4.,  5.,  6.],
           [ 7.,  8.,  9., 10.]])
```

The following table contains a list of arithmetic operators implemented by NumPy. Note that these functions work on *all* numerical objects, not just arrays.

<p style="height:12pt"> </p>

<center> <b>Arithmetic functions in NumPy </b> </center>

| Operator | ufunc | Description |
| :------- | :---- | :---------- |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> +</span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.add() </span> | Addition |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> -</span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.subtract() </span> | Subtraction |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> * </span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.multiply() </span> | Multiplication |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> / </span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.divide() </span> | Division |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> // </span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.floor_divide() </span> | Floor division (returns largest integer) |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> ** </span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.power() </span> | Exponentiation |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> % </span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.mod() </span> | Modulus/remainder |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> \*\*(1/2) </span> | <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.sqrt() </span> | Square root-alize |


Furthermore, as a *numerical* package, NumPy implements many additional mathematical operations for use in Python – on arrays or otherwise. The following tables show some of the more commonly used mathematical functions in NumPy. The `x` is used to denote a numerical object – this could be an `int`, `float`, `list`, `ndarray`, etc.

<p style="height:12pt"> </p>

<center> <b>Logarithmic functions </b> </center>

| ufunc | Operation |
| :---- | :---------- |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.exp(x) </span> | $e^x$ |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.log(x) </span> | $\ln x$|
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.log10(x) </span> | $\log x$ |


<p style="height:12pt"> </p>

<center> <b> Trigonometric functions </b> </center>

| ufunc | Description |
| :---- | :---------- |
|<span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.sin(x) </span>|$\sin{x}$|
|<span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.cos(x) </span>|$\cos{x}$|
|<span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.tan(x) </span>|$\tan{x}$|
|<span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.arcsin(x) </span>| $\sin^{-1}{x}$
|<span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.arccos(x) </span>| $\cos^{-1}{x}$|
|<span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.arctan(x) </span>| $\tan^{-1}{x}$|

<p style="height:12pt"> </p>

Note: NumPy assumes all inputs to trigonometic functions are in units of *radians*. The `np.radians()` function can be used to convert from degrees to radians, while the `np.degrees()` function does the opposite.

<p style="height:12pt"> </p>
<center> <b> Useful mathematical constants </b> </center>

| Constants | Description |
| :-------- | :---------- |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.e </span> | $e$ |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> np.pi </span> | $\pi$ |



<h4 style="border:1px; border-style:solid; border-color:black; padding: 0.5em;"> <span style="color:black"> Array-to-array math </span> </h4>

So far, we have only considered operations between a single array and an integer, but often it is necessary to perform mathematical operations on multiple arrays. Much like NumPy handles single array operations, array-to-array math in NumPy uses ufuncs to perform element-wise calculations. For arrays of the same dimensions, this is straight forward:

```python
x  = np.array([[1.,2.,3.,4.],
               [5.,6.,7.,8.]])

y = np.array([[9.,87.,3.,5.6],
              [-1.,4.,7.1,8.]])

# Addition
x + y
>>> array([[10. , 89. ,  6. ,  9.6],
           [ 4. , 10. , 14.1, 16. ]])

# Division
x / y
>>> array([[ 0.11111111,  0.02298851,  1.        ,  0.71428571],
           [-5.        ,  1.5       ,  0.98591549,  1.        ]])
```

For arrays whose dimensions do not match, NumPy does something called **broadcasting**. So long as one dimension of each array matches and one array has a dimension of 1 in one direction, the smaller array is "broadcast" to the dimensions of the larger array. In this process, the row or column is replicated to match the dimensions of the larger array. This is best illustrated in the following diagram:

<img src="./assets/broadcasting.png" alt="broadcasting" width="600"/>


```python
a = np.array([[1.,2.,3.,4.],
             [5.,6.,7.,8.]])

b = np.array([10,11,12,13])

c = np.array([[1.],
             [20.]])

# Row-wise
a + b
>>> array([[11., 13., 15., 17.],
           [15., 17., 19., 21.]])

# Column-wise
a + c
>>> array([[ 2.,  3.,  4.,  5.],
           [25., 26., 27., 28.]])

# Multiple operations
a + c**2
>>> array([[  2.,   3.,   4.,   5.],
           [405., 406., 407., 408.]])

```

<div class="practice">
    📚  <b> Practice 10. </b>
    <ol class="alpha">
        <li> Raise array <code>b</code> to the power of array <code>c</code>. </li>
        <li> Create a new 5x10 array of random values. Subtract the mean of each row from every value. </li>
</ol>
</div>

In [195]:
# PROBLEM A
a = np.array([[1.,2.,3.,4.],
             [5.,6.,7.,8.]])

b = np.array([10,11,12,13])

c = np.array([[1.],
             [20.]])

print(a)
print(b)
print(c)

b**c

# PROBLEM B

a = np.random.rand(10, 5)
row_means = np.mean(a, axis = 1)

print(row_means)
print(a)

b = np.array([row_means])

print(b)

a - b.T

[[1. 2. 3. 4.]
 [5. 6. 7. 8.]]
[10 11 12 13]
[[ 1.]
 [20.]]
[0.44407127 0.44825247 0.64716993 0.6730969  0.67077252 0.64927212
 0.45133655 0.63568864 0.33194963 0.45533181]
[[0.23238015 0.37375745 0.60436299 0.73367685 0.2761789 ]
 [0.40985973 0.11096443 0.75178775 0.30932012 0.6593303 ]
 [0.55156097 0.70981371 0.34147648 0.82815894 0.80483956]
 [0.22663068 0.89339116 0.80905956 0.94700919 0.48939393]
 [0.6002245  0.47505653 0.84699423 0.96298569 0.46860164]
 [0.92890483 0.82673753 0.43625691 0.08477902 0.9696823 ]
 [0.98159961 0.31379942 0.56707769 0.05211598 0.34209004]
 [0.5307306  0.68986791 0.16380159 0.97140771 0.82263542]
 [0.04084004 0.88037681 0.38016168 0.20288776 0.15548184]
 [0.22864806 0.48404621 0.32472822 0.68150042 0.55773615]]
[[0.44407127 0.44825247 0.64716993 0.6730969  0.67077252 0.64927212
  0.45133655 0.63568864 0.33194963 0.45533181]]


array([[-0.21169112, -0.07031381,  0.16029172,  0.28960558, -0.16789237],
       [-0.03839274, -0.33728804,  0.30353529, -0.13893234,  0.21107783],
       [-0.09560896,  0.06264378, -0.30569345,  0.18098901,  0.15766963],
       [-0.44646623,  0.22029426,  0.13596266,  0.27391229, -0.18370298],
       [-0.07054802, -0.19571598,  0.17622171,  0.29221317, -0.20217088],
       [ 0.27963271,  0.17746541, -0.21301521, -0.5644931 ,  0.32041018],
       [ 0.53026306, -0.13753713,  0.11574115, -0.39922057, -0.10924651],
       [-0.10495805,  0.05417927, -0.47188705,  0.33571906,  0.18694677],
       [-0.29110958,  0.54842719,  0.04821205, -0.12906187, -0.17646778],
       [-0.22668376,  0.0287144 , -0.13060359,  0.22616861,  0.10240434]])

In [183]:
b = np.ones([4,5])
a = np.array([[1, 2, 3, 4]])
print(a.T)

b + a.T

[[1]
 [2]
 [3]
 [4]]


array([[2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5.]])

### Missing Data
<hr style="border-top: 0.2px solid gray; margin-top: 12px; margin-bottom: 1px"></hr>

Most real-world datasets – environmental or otherwise – have data gaps. Data can be missing for any number of reasons, including observations not being recorded or data corruption. While a cell corresponding to a data gap may just be left blank in a spreadsheet, when imported into Python, there must be some way to handle "blank" or missing values. 

Missing data should not be replaced with zeros, as 0 can be a valid value for many datasets, (e.g. temperature, precipitation, etc.). Instead, the convention is to fill all missing data with the constant **NaN**. NaN stands for "Not a Number" and is implemented in NumPy as `np.nan`.

NaNs are handled differently by different packages. In NumPy, all computations involving NaN values will return `nan`:

```python
data = np.array([[2.,2.7.,1.89.],
                 [1.1, 0.0, np.nan],
                 [3.2, 0.74, 2.1]])

data.mean()
>>> nan
```

In this case, we'd want to use the alternative `np.nanmean()` function, which ignores NaNs:

```python
data.nanmean()
>>> 1.71625
```

NumPy has several other functions – including `np.nanmin()`, `np.nanmax()`, `np.nansum()` – that are analogous to the regular ufuncs covered above, but allow for computation of arrays containing NaN values.



<hr style="border-top: 0.2px solid gray; margin-top: 12pt; margin-bottom: 0pt"></hr>

### Wrapping up

The topics covered in this exercise are but a small window into the wide world of NumPy, but by now you should be familiar with the basic objects and operations in the NumPy library, which are the building blocks of data science in Python. As always – especially now that we've begun exploring third-party packages – refer to the **[NumPy docs](https://numpy.org/doc/1.18/reference/index.html)** for comprehensive information on all functions, methods, routines, etc. and to check out more of NumPy's capabilities. 

Next, we'll explore one of data scientists' favorite libraries: 🐼.


<hr style="border-top: 1px solid gray; margin-top: 24px; margin-bottom: 1px"></hr>

In [91]:
# IGNORE THIS CELL
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/exercises.css", "r").read()
    return HTML(styles)
css_styling()