# Introduction to NumPy

This notebook provides an introduction to the `numpy` package. The content borrows heavily from the book *Data Science Handbook*, which was written by Jake VanderPlas and is available at https://jakevdp.github.io/PythonDataScienceHandbook/ (accessed 12/17/2019).

The following table of contents lists the topics discussed in this notebook. Clicking on any topic will advance the notebook to the associated area.


# Table of Contents
<a id="Table_of_Contents"> </a>

1. [NumPy](#numpy)<br>
    1.1 [Importing NumPy](#Importing_NumPy)<br>
    1.2 [NumPy Arrays](#NumPy_Arrays)<br>
    1.3 [Basic Computations using NumPy Arrays](#Array_Computation)<br>
    1.4 [Aggregations](#Array_Aggregations)<br>
    1.5 [Broadcasting](#Array_Broadcasting)<br>
    1.6 [Indexing Numpy Arrays](#Array_Indexing)<br>
    1.7 [Boolean Logic with Numpy Arrays](#Array_Boolean_Logic)<br>    
    
#### Disclaimer

This notebook is by no means a comprehensive resource for the `numpy` package. Also, it is important to realize that the Python language and the available packages will continue to evolve. That being said, the objects, functions, and methods described in this notebook may one day change. If changes occur, areas of this notebook that use deprecated features may cease to work and will need to be revised or omitted.

## NumPy
<a id="numpy"> </a>

This section of the notebook will introduce the `numpy` package and demonstrate several features of the `numpy` multi-dimensional array. 

From https://en.wikipedia.org/wiki/NumPy (accessed on 1/6/2018):

>NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors.

[Back to Table of Contents](#Table_of_Contents)<br>

## Importing NumPy
<a id="Importing_NumPy"> </a>

Users can import available packages and modules using Python's `import` statement. Two forms of import expressions are commonly used.
1. The first common import expression takes the form **import mypackage as mp**. This statement imports a package named *mypackage*, and assigns it to the alias *mp*. Suppose that *mypackage* contains the definition for a function named *myfunction*. If this were true, we would call *myfunction* using the syntax `mp.myfunction(*args)`, where `*args` is a placeholder for any function arguments.<br>

2. The second common import expression takes the form **from mypackage import mysubmodule**. This statement imports a specific submodule from a package named *mypackage*. Since there is no alias, this type of import will bring in the functions specified in the submodule as they are writtin. For example, if the submodule *mysubmodule* includes a function called *myfunction*, we would call *myfunction* using the syntax `myfunction(*args)`, where `*args` is a placeholder for any function arguments.

<div class="alert alert-block alert-danger">
    <b>Name conflicts with <i>from - import</i> approach:</b> When using the <i>from - import</i> approach sepcified in bullet 2, it is important to make sure that method names in package or module you are importing do not conflict with names defined in the importing code. For example, if we import a submodule named *mysubmodule* that includes a function called *myfunction*, but we also have a function named *myfunction* in the importing code, there will be a naming conflict.
</div>

The following code block uses the *import - as* approach to import NumPy. The alias *np* is a standard convention.

[Back to Table of Contents](#Table_of_Contents)<br>

In [1]:
import numpy as np

#### Motivation for NumPy

Before we look at specific details of the NumPy package, it is important to understand its motivation. NumPy was developed to support scientific computations via the efficient implementation of a multi-dimensional array. In addition to an efficient array implementation, NumPy also includes functions for performing operations on NumPy arrays that are optimized for computational efficeincy. The following code block illustrate the substantial increas in efficiency that NumPy provides in comparison to a standard python list. Specifically, the example considers the task of adding two vectors of a specified size using both standard Python lists and NumPy arrays. The time of the addition, and the size of the resulting objects are reported for comparison purposes.


<div class="alert alert-block alert-info">
    <b>The <i>del()</i> function:</b> The <i>del()</i> function is a python method that deletes a created object from memory. For example, <i>del(my_var)</i> deletes a Python variable named <i>my_var</i>, freeing any computer memory that was being used to store the variable. The <i>del(my_var)</i> can take multiple arguments. For example, <i>del(my_var1, my_var2)</i> deletes the Python variables named <i>my_var1</i> and <i>my_var2</i>. If you pass an argument to <i>del(my_var)</i> that does not correspond to an existing Python object, an error will be raised.
</div>

[Back to Table of Contents](#Table_of_Contents)<br>

In [2]:
import time
import sys

SIZE = 100000

list1 = range(SIZE)
list2 = range(SIZE)

start = time.time()
result = [(x+y) for x,y in zip(list1,list2)]
print("Using Python lists, the addition took",(time.time() - start)*1000,"milliseconds.")
print("The size of the result object based on Python lists is",sys.getsizeof(result),"bytes.\n")

del(list1, list2, result)

nparray1 = np.arange(SIZE)
nparray2 = np.arange(SIZE)
start = time.time()
result = nparray1 + nparray2
print("Using NumPy arrays, the addition took",(time.time() - start)*1000,"milliseconds.")
print("The size of the result object based on NumPy arrays is",sys.getsizeof(result),"bytes.\n")

del(nparray1, nparray2, result)

Using Python lists, the addition took 8.997678756713867 milliseconds.
The size of the result object based on Python lists is 824464 bytes.

Using NumPy arrays, the addition took 0.0 milliseconds.
The size of the result object based on NumPy arrays is 400096 bytes.



In addition to demonstrating the substantial performance gains offered by NumPy, the previous code block also illsutrates some of the subtle differences of working with Python lists and NumPy arrays, and a method for checking the computation time that is required to execute a portion of code. Specifically:

- The `time.time()` function, from the `time` module, returns the current system time. Saving the value of the current time in a variable `start` and then computing the difference `time.time() - start` returns the seconds elapsing between the two calls to `time.time()` in seconds. Multiplying by 1000 converts the elapsed time to milliseconds.


- When working with Python lists, the `range()` function returns a sequence of integers starting at zero and ending at the argument passed to `range()`. In our example, we pass a variable `SIZE` to the `range()` function. Thus, the sequence stored in the list is 0, 1, ..., `SIZE`-2, `SIZE`-1.


- When working with NumPy arrays, the `np.range()` function returns a sequence of integers starting at zero and ending at the argument passed to `np.arange()`. In our example, we pass a variable `SIZE` to the `np.arange()` function. Thus, the sequence stored in the NumPy array is 0, 1, ..., `SIZE`-2, `SIZE`-1.


- The `sys.getsizeof()` function, from the `sys` library, returns the size of an object in bytes.


- When working with Python lists, the `zip()` function essentially combines two or more list objects and allows element-wise operations to be performed.


- When working with NumPy arrays, there is no need to *zip* arrays. Instead, element-wise operations are performed using standard mathematical operators.

[Back to Table of Contents](#Table_of_Contents)<br>

<div class="alert alert-block alert-info">
    <b>Jupyter's <i>timeit</i> magic command:</b> Another approach for timing operations that is <b>specific to Jupyter notebooks</b> is the <i>timeit</i> magic command. This command can be used with syntax that follows the form <b>%timeit [-n &lt;N&gt; -r &lt;R&gt; [-t|-c] -q -p &lt;P&gt; -o]</b>, where

<li> -n &lt;N&gt;: specifies to execute the given statement &lt;N&gt; times in a loop. If <N> is not provided, <N> is determined so as to get sufficient accuracy.</li>

<li> -r &lt;R&gt;: specifies the number of repeats &lt;R&gt;, each consisting of &lt;N&gt; loops, and take the best result. Default: 7</li>

<li> -t: specifies to use time.time to measure the time, which is the default on Unix. This function measures wall time, i.e., elapsed real time.</li>

<li> -c: specifies to use time.clock to measure the time, which is the default on Windows and measures wall time. On Unix, resource.getrusage is used instead and returns the CPU user time.</li>

<li> -p &lt;P&gt;: specifies to use a precision of &lt;P&gt; digits to display the timing result. Default: 3</li>

<li> -q: specifies quiet calculation, where no results are printed.
</div>

The following block performs a the timing check using the `timeit` magic command, with 5 repeats of 10 executions. Note that by performing the calcuations multiple times, the `timeit` magic is able to provide estimates of the variability in computational time.

[Back to Table of Contents](#Table_of_Contents)<br>

In [3]:
SIZE = 100000

list1 = range(SIZE)
list2 = range(SIZE)

print("Time statistics for Python lists:")
%timeit -n 10 -r 5 [(x+y) for x,y in zip(list1,list2)]

del(list1,list2)

nparray1 = np.arange(SIZE)
nparray2 = np.arange(SIZE)

print("\nTime statistics for NumPy arrays:")
%timeit -n 10 -r 5 nparray1 + nparray2

del(nparray1, nparray2)

Time statistics for Python lists:
7.89 ms ± 117 µs per loop (mean ± std. dev. of 5 runs, 10 loops each)

Time statistics for NumPy arrays:
52 µs ± 4.66 µs per loop (mean ± std. dev. of 5 runs, 10 loops each)


#### Searching for Relevant NumPy Methods

A final interesting thing to note is that the NumPy package includes a method to search for other methods that may be used to accomplish a desired task. This is extremely helpful given the large size of the package. As an example, suppose that we are interested in methods that work with the *standard deviation* for a set of numbers. The following code block shows how NumPy's `lookfor()` method allows us to search for such methods.

[Back to Table of Contents](#Table_of_Contents)<br>

In [4]:
np.lookfor('Standard deviation')

Search results for 'standard deviation'
---------------------------------------
numpy.std
    Compute the standard deviation along the specified axis.
numpy.nanstd
    Compute the standard deviation along the specified axis, while
numpy.ma.std
    Returns the standard deviation of the array elements along given axis.
numpy.matrix.std
    Return the standard deviation of the array elements along the given axis.
numpy.chararray.std
    Returns the standard deviation of the array elements along given axis.
numpy.ma.MaskedArray.std
    Returns the standard deviation of the array elements along given axis.
numpy.var
    Compute the variance along the specified axis.
numpy.nanvar
    Compute the variance along the specified axis, while ignoring NaNs.
numpy.ma.var
    Compute the variance along the specified axis.
numpy.histogram_bin_edges
    Function to calculate only the edges of the bins used by the `histogram` function.
numpy.ma.MaskedArray.var
    Compute the variance along the specifie

### NumPy Arrays
<a id="NumPy_Arrays"> </a>

The key data structure used by NumPy is the multi-dimensional array or `ndarray`. The following code blocks demonstrate. several methods for creating numpy arrays.

<div class="alert alert-block alert-info">
    <b>NumPy's <i>random</i> module:</b> The final two examples utilize random number generators from NumPy's <i>random</i> module. The methods available in this module can be accessed via the prefix <i>np.random.</i> (this assumes NumPy was imported as np).
</div>

[Back to Table of Contents](#Table_of_Contents)<br>

In [5]:
# Creating a one-dimensional (1-d) array with a list
np.array([1,2,3,4])

array([1, 2, 3, 4])

In [6]:
# Creating a two-dimensional (2-d) array with a list
my_list = [[1,2,3,4],
           [5,6,7,8]]

np.array(my_list)

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [7]:
# Creating a two-dimensional (2-d) array of size 4 x 5 (rows X columns)
# that is filled with zeros
np.zeros((4,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [8]:
# Creating a two-dimensional 5 x 5 identity matrix
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [9]:
# Creating a two-dimensional 5 x 3 matrix filled with
# values randomly drawn from a continuous uniform distribution
# that ranges from 0.0 to 100.0
np.random.uniform(low = 0.0, high = 100.0, size = (5, 3))

array([[50.75550682,  2.11933018, 43.35217582],
       [44.6313056 , 23.88199894, 83.02457318],
       [74.47641766, 58.6479001 , 49.28678526],
       [48.73558798, 26.67406959, 60.50111014],
       [75.3543721 , 27.0584228 , 52.23032831]])

In [10]:
# Creating a two-dimensional 3 x 3 matrix filled with
# values randomly drawn from a normal distribution with
# a mean of 10.0 and standard deviation of 2.0 
np.random.normal(loc = 10.0, scale = 2.0, size = (3,3))

array([[10.57716264,  8.91484097, 10.34320476],
       [11.96563566,  9.9493022 ,  9.4248951 ],
       [11.84888573,  9.8775074 ,  8.53793331]])

The following code blocks demonstrate several attributes of NumPy arrays that we can use to gain information regarding their structure.

[Back to Table of Contents](#Table_of_Contents)<br>

In [11]:
# Create a two-dimensional 3 x 5 matrix filled with
# values randomly drawn from a normal distribution with
# a mean of 10.0 and standard deviation of 2.0 and store
# the array in a variable named my_array

my_array = np.random.normal(loc = 10.0, scale = 2.0, size = (3, 5))

# Use ndim attribute to get number of dimensons
print('my_array has', my_array.ndim, 'dimensions.\n')

# Use shape attribute to get shape of the array
print('The shape of my_array is', my_array.shape,'.\n')

# Use size attribute to get the size of the array
# (i.e., the number of entires)
print('my_array has', my_array.size,'entries.\n')

# Use dtype attribute to determine the data type for array elements
print('The data type for entries of my_array is', my_array.dtype,'.\n')

# Use itemsize attribute to determine size of each array element
print('The size of each element of my_array is', my_array.itemsize,'bytes.\n')

# Use nbytes attribute to estimate the size of the array, which we 
# expect to equal the product of the itemsize and size attribute values
print('The size of of my_array is', my_array.nbytes,'bytes.\n')

my_array has 2 dimensions.

The shape of my_array is (3, 5) .

my_array has 15 entries.

The data type for entries of my_array is float64 .

The size of each element of my_array is 8 bytes.

The size of of my_array is 120 bytes.



###  Basic Computations with NumPy Arrays
<a id="Array_Computation"> </a>

NumPy is able to achieve fast computation times through the use of *vectorized* operations. Essentially, instead of performing operations on NumPy array elements one at a time, NumPy operations can be performed on entire arrays or slices of arrays. The following code block shows some examples.

[Back to Table of Contents](#Table_of_Contents)<br>

In [12]:
# Create two two-dimensional 3 x 5 arrays, the first with
# values randomly drawn from a uniform distribution with low
# and high values of 0.0 and 1.0, respectively, and the second
# with values drawn from discrete uniform distribution ranging from
# 1 to 10.

array1 = np.random.uniform(low = 0.0, high = 1.0, size = (3, 5))
array2 = np.random.randint(low = 1, high = 10, size = (3, 5))

In [13]:
# Print array1
array1

array([[0.93864246, 0.74504455, 0.91073504, 0.23722471, 0.49496735],
       [0.80987834, 0.95456578, 0.63748325, 0.91084975, 0.69213675],
       [0.04294299, 0.8335869 , 0.36994852, 0.936557  , 0.48305288]])

In [14]:
# Print array2
array2

array([[5, 6, 7, 4, 9],
       [2, 8, 5, 2, 3],
       [5, 4, 4, 5, 3]])

In [15]:
# Add the arrays
array1 + array2

array([[5.93864246, 6.74504455, 7.91073504, 4.23722471, 9.49496735],
       [2.80987834, 8.95456578, 5.63748325, 2.91084975, 3.69213675],
       [5.04294299, 4.8335869 , 4.36994852, 5.936557  , 3.48305288]])

In [16]:
# Subtract array1 from array2
array2 - array1

array([[4.06135754, 5.25495545, 6.08926496, 3.76277529, 8.50503265],
       [1.19012166, 7.04543422, 4.36251675, 1.08915025, 2.30786325],
       [4.95705701, 3.1664131 , 3.63005148, 4.063443  , 2.51694712]])

In [17]:
# Compute the reciprocal of values in array2
1.0/array2

array([[0.2       , 0.16666667, 0.14285714, 0.25      , 0.11111111],
       [0.5       , 0.125     , 0.2       , 0.5       , 0.33333333],
       [0.2       , 0.25      , 0.25      , 0.2       , 0.33333333]])

In [18]:
# Add 1 to all elements of array1
array1 + 1.0

array([[1.93864246, 1.74504455, 1.91073504, 1.23722471, 1.49496735],
       [1.80987834, 1.95456578, 1.63748325, 1.91084975, 1.69213675],
       [1.04294299, 1.8335869 , 1.36994852, 1.936557  , 1.48305288]])

In [19]:
# Compute 20*(array1^2)-array2
20*(array1**2)-array2

array([[12.62099319,  5.10182754,  9.58876637, -2.87448875, -4.10014641],
       [11.11805847, 10.22391671,  3.127698  , 14.59294544,  6.58106565],
       [-4.96311799,  9.89734242, -1.26276191, 12.54278032,  1.6668016 ]])

In [20]:
# Compute the absolute value for 20*(array1^2)-array2
abs(20*(array1**2)-array2)

array([[12.62099319,  5.10182754,  9.58876637,  2.87448875,  4.10014641],
       [11.11805847, 10.22391671,  3.127698  , 14.59294544,  6.58106565],
       [ 4.96311799,  9.89734242,  1.26276191, 12.54278032,  1.6668016 ]])

In [21]:
del(array1, array2)

### Aggregations
<a id="Array_Aggregations"> </a>

In addition to performing computations on each element of an array, NumPy also allows users to perform aggregations accross all elements and to perform computations along a particular array *axis*. Before continuing, it is important to understand exactly what is meant by an *array axis*.

In NumPy, *axes* are defined for arrays with more than one dimension. For example, a 2-dimensional array has two axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). Another way to think about this is if I have an $mxn$ array, where $m$ is the number of rows and $n$ the number of columns, the first axis, axis 0, runs across the $m$ rows and the second axis, axis 1, runs across the $n$ columns. 

Using this concept of array axes, we can do computations across rows and columns of arrays. The following code blocks provide examples.

[Back to Table of Contents](#Table_of_Contents)<br>

In [22]:
# Create a two-dimensional 3 x 5 array with values drawn
# from discrete uniform distribution ranging from 1 to 10.

my_array1 = np.random.randint(low = 1, high = 10, size = (3, 5))
my_array1

array([[7, 1, 6, 5, 7],
       [7, 7, 4, 9, 9],
       [9, 6, 8, 1, 8]])

In [23]:
# Sum all array values
np.sum(my_array1)

94

In [24]:
# Sum the values in each row
# Note that we need to sum the values across
# each column, or axis 1
np.sum(my_array1, axis = 1)

array([26, 36, 32])

In [25]:
# Sum the values in each column
# Note that we need to sum the values across
# each row, or axis 0
np.sum(my_array1, axis = 0)

array([23, 14, 18, 15, 24])

In [26]:
# Find the maximum value in each row
np.max(my_array1, axis = 1)

array([7, 9, 9])

In [27]:
# Find the index of the maximum value in each row
# Note that indexing starts at 0!
np.argmax(my_array1, axis = 1)

array([0, 3, 0], dtype=int64)

NumPy provides many other aggregation functions, but we won't discuss them in detail here. Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value. 

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

[Back to Table of Contents](#Table_of_Contents)<br>

### Broadcasting
<a id="Array_Broadcasting"> </a>

Earlier, when we looked at basic NumPy array computations, we consider arrays that were all the same size. However, there are cases where we may wish to perfrom operations involving arrays that are not equivalently sized. In such cases, *array broadcasting* is helpful. Consider the following example.

[Back to Table of Contents](#Table_of_Contents)<br>

In [28]:
my_array= np.array([5, 6, 7])

print(f'my_array is {my_array}')
print(f'my_array + 1 is {my_array + 1}')

my_array is [5 6 7]
my_array + 1 is [6 7 8]


In [29]:
my_array.shape

(3,)

When we look at the previous example, it is very clear that a value of 1 is added to each element of `my_array`. However, `my_array` is an array of length 3, whereas the value 1 is essentially an array of length 1. Thus, NumPy is actually broadcasting the value 1 to the array `[1, 1, 1]` and then adding this broadcasted array to the `my_array` object.

Broadcasting in NumPy follows a strict set of rules to determine the interaction between two arrays:

- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

The following examples aim to make these rules clear.

[Back to Table of Contents](#Table_of_Contents)<br>

#### Broadcasting example 1

Let's consider adding a 2D array and a 1D array.

[Back to Table of Contents](#Table_of_Contents)<br>

In [30]:
my_1D_array = np.array([1, 2, 3])
print(f'my_1D_array is {my_1D_array}\n')

my_2D_array = np.eye(3)
print(f'my_2D_array is\n {my_2D_array}')

my_1D_array is [1 2 3]

my_2D_array is
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


The shape of `my_1D_array` is (3,) and the shape of `my_2D_array` is (3, 3). Thus, according to the first broadcasting rule, we need to pad the one with the smallest dimension (`my_1D_array`) on the left with ones. This padding is  referring to the array dimension, so `my_1D_array` will be transformed to shape (1, 3).

Next, we look at rule 2, and note that the shape of the two arrays does not match in the first dimension. Specifically, `my_1D_array` is (1, 3) and `my_2D_array` is (3, 3). Thus, we stretch the first dimension of `my_1D_array` to get a (3, 3) array.

We can now add the arrays together.

[Back to Table of Contents](#Table_of_Contents)<br>

In [31]:
my_1D_array + my_2D_array

array([[2., 2., 3.],
       [1., 3., 3.],
       [1., 2., 4.]])

In [32]:
del(my_1D_array, my_2D_array)

#### Broadcasting example 2

We will now slightly modify the previous example to illustrate a situation that will result in an error.

[Back to Table of Contents](#Table_of_Contents)<br>

In [33]:
my_1D_array = np.array([1, 2, 3])
print(f'my_1D_array is {my_1D_array}\n')

my_2D_array = np.array([[1,1],
                        [2,2],
                        [3,3]])
print(f'my_2D_array is\n {my_2D_array}')

my_1D_array is [1 2 3]

my_2D_array is
 [[1 1]
 [2 2]
 [3 3]]


The shape of `my_1D_array` is (3,) and the shape of `my_2D_array` is (3, 2). Thus, according to the first broadcasting rule, we need to pad the one with the smallest dimension (`my_1D_array`) on the left with ones. This padding is  referring to the array dimension, so `my_1D_array` will be transformed to shape (1, 3).

Next, we look at rule two, and note that the shape of the two arrays does not match in the first dimension. Specifically, `my_1D_array` is (1, 3) and `my_2D_array` is (3, 2). Thus, we stretch the first dimension of `my_1D_array` to get a (3, 3) array.

Now, we reach rule three with arrays of size (3, 3) (`my_1D_array`) and (3, 2) (`my_2D_array`). Since the shapes differ, an error will be raised.

[Back to Table of Contents](#Table_of_Contents)<br>

In [34]:
my_1D_array + my_2D_array

ValueError: operands could not be broadcast together with shapes (3,) (3,2) 

In [35]:
del(my_1D_array, my_2D_array)

### Indexing NumPy Arrays
<a id="Array_Indexing"> </a>

This subsection looks at how to access elements of a NumPy arrays, which is very similar to how we access elements of lists. The following code block shows basic methods that can be used to access elements or portions of a NumPy array. It is important to keep in mind that Python and NumPy both consider 0 to be the first index.

[Back to Table of Contents](#Table_of_Contents)<br>

In [36]:
# Create a 3 x 4 array using values samples from a 
# standard normal distribution (i.e., mean = 0 and standard
# deviation = 1)
my_array = np.random.normal(size = (3,4))

print(f'my_array is\n {my_array}\n')

print(f'The first element of the second row of my_array is {my_array[1,0]}\n')

print(f'The first row of my_array is {my_array[0,:]}\n')

print(f'The first column of my_array is {my_array[:,0]}\n')

my_array is
 [[-1.8879036   0.61746113 -1.26026979 -1.21270503]
 [ 0.81937935  1.83003338  0.06174604 -0.64577448]
 [-0.02463203  0.06691692 -1.03999184 -0.2560261 ]]

The first element of the second row of my_array is 0.8193793515272458

The first row of my_array is [-1.8879036   0.61746113 -1.26026979 -1.21270503]

The first column of my_array is [-1.8879036   0.81937935 -0.02463203]



### Boolean Logic with Numpy Arrays
<a id="Array_Boolean_Logic"> </a>

We can use conditions with NumPy arrays to return boolean arrays specifying which elements meet the condition. This feature will come in handy often when working with NumPy and Pandas. In this section, we demonstrate how this functionality can be used. The following code block generates a NumPy array that we will use for this demonstration.

<div class="alert alert-block alert-info">
    <b>NumPy's <i>rehsape()</i> method:</b> NumPy's `reshape()` method can be used to <i>cast</i> an array from one size to another. The only requirement is that the two sizes, the original and the one to be cast to, include the same number of elements, e.g., you can cast a (12,) array to size (3, 4) but you cannot cast a (12,) array to size (3, 3).
</div>

[Back to Table of Contents](#Table_of_Contents)<br>

In [37]:
# Create a 3 x 4 array using values samples from a 
# standard normal distribution (i.e., mean = 0 and standard
# deviation = 1)
my_array = np.random.normal(size = 12).reshape((3, 4))

print(f'my_array is\n {my_array}')

my_array is
 [[-0.23238899 -1.00755283 -0.96900851  0.3094744 ]
 [-0.11024096  1.60897676 -0.75604294  1.02829799]
 [-0.02520225 -0.33355825 -0.3729897   0.17476908]]


Let's use a condition to create a boolean array specifying values less than zero.

[Back to Table of Contents](#Table_of_Contents)<br>

In [38]:
my_bool_array = my_array < 0

print(f'my_bool_array is\n {my_bool_array}')

my_bool_array is
 [[ True  True  True False]
 [ True False  True False]
 [ True  True  True False]]


We can now use the boolean array to assign different values to the elements that are less than zero.

[Back to Table of Contents](#Table_of_Contents)<br>

In [39]:
my_array[my_bool_array] = 0

print(f'my_array is\n {my_array}')

my_array is
 [[0.         0.         0.         0.3094744 ]
 [0.         1.60897676 0.         1.02829799]
 [0.         0.         0.         0.17476908]]


This concludes the NumPy introduction.