# Lesson 1: NumPy Part 1

This notebook is based on the official `NumPy` [documentation](https://docs.scipy.org/doc/numpy/user/quickstart.html).  Unless otherwise credited, quoted text comes from this document.  The Numpy documention describes NumPy in the following way:

> NumPy is the fundamental package for scientific computing with Python. It contains among other things:
> - a powerful N-dimensional array object
> - sophisticated (broadcasting) functions
> - tools for integrating C/C++ and Fortran code
> - useful linear algebra, Fourier transform, and random number capabilities
>
> Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.


## Instructions
This tutorial provides step-by-step training divided into numbered sections. The sections often contain embeded exectable code for demonstration.  This tutorial is accompanied by a practice notebook: [L01-Numpy_Part1-Practice.ipynb](./L01-Numpy_Part1-Practice.ipynb). 

Throughout this tutorial sections labeled as "Tasks" are interspersed and indicated with the icon: ![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/16/Apps-gnome-info-icon.png). You should follow the instructions provided in these sections by performing them in the practice notebook.  When the tutorial is completed you can turn in the final practice notebook. 

---
## 1. Getting Started
First, we must import the NumPy library.  All packages are imported at the top of the notebook. Execute the code in the following cell to get started with this notebook (type Ctrl+Enter in the cell below)

In [2]:
# Import numpy
import numpy as np

The code above imports numpy as a variable named `np`. We can use this variables to access the functionality of NumPy.  The above is what we will use for the rest of this class.

You may be wondering why we didn't import numpy like this:  
```python
import numpy
```
We could, but the first is far more commonly seen, and allows us to the `np` variable to access the functions and variables of the NumPy package. This makes the code more readable because it is not a mystery where the functions come from that we are using.

### Task 1a: Setup
<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, import the following packages:
+ `numpy` as `np`

## 2. The NumPy Array

What is an array?  An array is a data structure that stores one or more objects of the same type (e.g. integers, strings, etc.) and can be multi-dimensional (e.g. 2D matricies). In python, the list data type provides this type of functionality, however, it lacks important operations that make it useful for scientific computing.  Therefore, NumPy is a Python package that defines N-dimensional arrays and provides support for linear algebra, and other fucntions useful to scientific computing.

From the Numpy QuickStart Tutorial: 
> NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes. 

_Note: a "tuple" is a list of numbers. For example, the pair of numbers surrounded by parentheses: (2,4), is a tuple containing two numbers.

NumPy arrays can be visualized in the following way:

<img src="http://community.datacamp.com.s3.amazonaws.com/community/production/ckeditor_assets/pictures/332/content_arrays-axes.png">

(image source: https://www.datacamp.com/community/tutorials/python-numpy-tutorial)

Using built-in Python lists, arrays are created in the following way:

```python
# A 1-dimensional list of numbers.
my_array = [1,2,3]  

# A 2-dimensional list of numbers.
my_2d_array = [[1,2,3],[4,5,6]]

# A 3-dimensional list of numbers.
my_3d_array = [[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]

# Two lists of boolean values
a = [True, True, False, False]
b = [False, False, True, True]

```

Using NumPy, arrays are created using the `np.array()` function. For example, arrays with the same contents as above are created in the following way:

```python
# A 1-dimensional list of numbers.
my_array = np.array([1,2,3,4])

# A 2-dimensional list of numbers.
my_2d_array = np.array([[1,2,3,4], [5,6,7,8]])

# A 3-dimensional list of numbers.
my_3d_array = np.array([[[1,2,3,4], [5,6,7,8]], [[1,2,3,4], [9,10,11,12]]])

# Two lists of boolean values
a = np.array([True,True,False,False])
b = np.array([False,False,True,True])
```

In NumPy, these arrays are an object of type `ndarray`.  You can learn more about the `ndarray` class on the [NumPy ndarray introduction page](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html). However, this tutorial will walk you through some of the most important attributes, functions and uses of NumPy.

### Task 2a: Creating Arrays

<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, perform the following.  
- Create a 1-dimensional numpy array and print it.
- Create a 2-dimensional numpy array and print it.
- Create a 3-dimensional numpy array and print it.

## 3. Accessing Array Attributes
For this section we will retrieve information about the arrays. Once an array is created you can access information about the array such as the number of dimensions, its shape, its size, the data type that it stores, and the number of bytes it is consuming. There are a variety of attributes you can use such as:
+ `ndim`
+ `shape`
+ `size`
+ `dtype`
+ `itemsize`
+ `data`
+ `nbytes`

For example, to get the number of dimensions for an array:
```Python
# Print the number of dimensions for the array:
print(my_3d_array.ndim)
```

You can learn more about these attributes, and others from the [NumPy ndarray reference page](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html) if you need help understanding the attributes.

Notice that we use dot notation to access these attributes, yet we do not provide the parenthesis `()` like we would for a function call.  This is because we are accessing attributes (i.e. member variables) of the numpy object, we are not calling a function

### Task 3a: Accessing Array Attributes

<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, perform the following.

- Create a NumPy array.
- Write code that prints these attributes (one per line): `ndim`, `shape`, `size`, `dtype`, `itemsize`, `data`, `nbytes`.
- Add a comment line, before each line describing what value the attribute returns. 


## 4. Creating Initialized Arrays

Here we will learn to create initialized arrays. These arrays are pre-initalized with default values.  NumPy provides a variety of functions for creating and intializing an array in easy-to-use functions. Some of these include: 

+ [np.ones()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html#numpy.ones): Returns a new array of given shape and type, filled with ones.
+ [np.zeroes()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros): Returns a new array of given shape and type, filled with zeros.
+ [np.empty()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html#numpy.empty): Return a new array of given shape and type, without initializing entries.
+ [np.full()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html#numpy.full): Returns a new array of given shape and type, filled with a given fill value.
+ [np.arange()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange): Returns a new array of evenly spaced values within a given interval.
+ [np.linspace()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#numpy.linspace): Returns a new array of evenly spaced numbers over a specified interval.
+ [np.random.random](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.random.html): Can be used to return a single random value or an array of random values between 0 and 1.

Take a moment, to learn more about the functions listed above by clicking on the function name as it links to the NumPy documentation.  Pay attention to the arguments that each receives and the type of output (i.e array) it generates.

NumPy has a large list of array creation functions, you can learn more about these functions on the [array creation routins page](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html) of the NumPy documentation. 

To demonstrate the use of these functions, the following code will create a two-dimensional array with 3 rows and 4 columns (i.e 3 *x* 4) filled with 0's.  

```Python
zeros = np.zeros((3, 4))
```

The following creates a 1D array of values between 3 and 7

```Python
np.arange(3, 7)
```
The result is: `array([3, 4, 5, 6])`

The following creates a 1D array of values between 0 and 10 spaced every 2 integers:

```Python
np.arange(0, 10, 2)
```
The result is: `array([0, 2, 4, 6, 8])`

Notice that just like with Python list slicing, the range uncludes up-to, but not including the "stop" value of the range.


### Task 4a: Initializing Arrays

<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, perform the following.

+ Create an initialized array by using these functions:  `ones`, `zeros`, `empty`, `full`, `arange`, `linspace` and `random.random`. Be sure to follow each array creation with a call to `print()` to display your newly created arrays. 
+ Add a comment above each function call describing what is being done.  

## 5. Performing Math and Broadcasting

At times you may want to apply mathematical operations between arrays. For example, suppose you wanted to add, multiply or divide the contents of two arrays.  If the two arrays are the same size this is straightfoward. However if the arrays are not the same size then it is more challenging.  This is where Broadcasting comes to play:

> The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)


### 5.1 Arrays of the same size
To demonstrate math with arrays of the same size, the following cell contains code that creates two arrays of the exact same size: _3 x 4_.  Execute the cell to create those arrays:

In [3]:
# Define demo arrays:
demo_a = np.ones((3,4))
demo_b = np.random.random((3,4))

# Print the shapes of each array.
print(f"demo_a shape: {demo_a.shape}")
print(f"demo_b Shape: {demo_b.shape}")

demo_a shape: (3, 4)
demo_b Shape: (3, 4)


Let's print the array to see what they contain:

In [4]:
print(demo_a)
print(demo_b)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[0.60408097 0.03982343 0.46110385 0.07311238]
 [0.35346515 0.80237855 0.67899183 0.18806865]
 [0.15782033 0.63441383 0.37100691 0.79133124]]


Because these arrays are the same size we can perform basic math by using common arithamtic symbols. Exectue the following cell to see the results of adding the two demo arrays:

In [5]:
# These arrays have the same shape, 
demo_a + demo_b

array([[1.60408097, 1.03982343, 1.46110385, 1.07311238],
       [1.35346515, 1.80237855, 1.67899183, 1.18806865],
       [1.15782033, 1.63441383, 1.37100691, 1.79133124]])

The addition resulted in the corresponding positions in each matrix being added to the other and creating a new matrix.  If you need clarification for how two matricies can be added or subtracted see the [Purple Math](https://www.purplemath.com/modules/mtrxadd.htm) site for examples.

### 5.2 Broadcasting for Arrays of Different Sizes
When arrays are not the same size, you cannot perform simple math.  For this, NumPy provides a service known as "broadcasting". To broadcast, NumPy automatically resizes the arrays to match, and fills in newly created empty cells with values.

To Broadcast, NumPy begins at the right-most dimensions of the array and comparses them then moves left and compares the next set. As long as each set meet the following criteria, Broadcasting can be performed:

+  The dimensions are equal or
+  One of the dimensions is 1.

Consider two arrays of the following dimensions:

+ 4D array 1:  10 x 1 x 3 x 1
+ 3D array 2:       2 x 1 x 9

These arrays are not the same size, but they are compatible with broadcasting because at each diemsion (from right to left) the dimension crtieria is met. When performing math, the value in each dimension of size 1 is broadcast to fill that dimesion (an example is provided below). The resulting array, if the above arrays are added, will be broadcasted to a size of _10 x 2 x 3 x 9_

To demonstrate math with arrays of different size, the following cell contains code that creates two arrays: one of size _3 x 4_ and onther of size _4 x 1_.  Execute the cell to create those arrays:

In [6]:
# Create the arrays.
demo_c = np.ones((3,4))
demo_d = np.arange(4)

# Print the array shapes.
print(f"demo_c shape: {demo_c.shape}")
print(f"demo_d Shape: {demo_d.shape}")

demo_c shape: (3, 4)
demo_d Shape: (4,)


Let's print the array to see what they contain:

In [7]:
print(demo_c)
print(demo_d)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[0 1 2 3]


Because these arrays meet our brodcasting requirements, we can perform basic math by using common arithamtic symbols. Exectue the following cell to see the results of adding the two demo arrays:

In [8]:
demo_c + demo_d

array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

The addition resulted in the value in each dimension of size 1, being "broadcast" or "streched" throughout that dimesion and then used in the operation. 

### 5.3 Broadcasting With Higher Dimensions

Consider the following arrays of 2 and 3 dimensions. 

In [9]:
demo_e = np.ones((3, 4))
demo_f = np.random.random((5, 1, 4))
print(f"demo_e shape: {demo_e.shape}")
print(f"demo_f shape: {demo_f.shape}")

demo_e shape: (3, 4)
demo_f shape: (5, 1, 4)


Print the arrays to see what they contain:

In [10]:
print(demo_e)
print(demo_f)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[[0.17374832 0.1429455  0.45663313 0.49749361]]

 [[0.74360706 0.26550668 0.32312428 0.82141586]]

 [[0.24271685 0.75143477 0.34317865 0.72745169]]

 [[0.01960713 0.68453376 0.65404253 0.84864024]]

 [[0.69510494 0.65463217 0.48525065 0.81904048]]]


These two arrays meet the rules for broadcasting becuase they both have a 4 in their last dimension and there is a 1 in the  `demo_f` 2nd dimension.  

Perform the math by executing the following cell:

In [11]:
result = demo_e + demo_f
print(result)

[[[1.17374832 1.1429455  1.45663313 1.49749361]
  [1.17374832 1.1429455  1.45663313 1.49749361]
  [1.17374832 1.1429455  1.45663313 1.49749361]]

 [[1.74360706 1.26550668 1.32312428 1.82141586]
  [1.74360706 1.26550668 1.32312428 1.82141586]
  [1.74360706 1.26550668 1.32312428 1.82141586]]

 [[1.24271685 1.75143477 1.34317865 1.72745169]
  [1.24271685 1.75143477 1.34317865 1.72745169]
  [1.24271685 1.75143477 1.34317865 1.72745169]]

 [[1.01960713 1.68453376 1.65404253 1.84864024]
  [1.01960713 1.68453376 1.65404253 1.84864024]
  [1.01960713 1.68453376 1.65404253 1.84864024]]

 [[1.69510494 1.65463217 1.48525065 1.81904048]
  [1.69510494 1.65463217 1.48525065 1.81904048]
  [1.69510494 1.65463217 1.48525065 1.81904048]]]


The resulting array has dimensions of _5 x 3 x 4_.  For this math to work, the values from `demo_f` had to be "stretched" (i.e. copied and then added) in the second dimension

### Task 5a: Broadcasting Arrays

<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, perform the following.

+ Create two arrays of differing sizes but compatible with broadcasting.
+ Perform addition, multiplication and subtraction.
+ Create two additional arrays of differing size that do not meet the rules for broadcasting and try a mathematical operation.  

## 6. NumPy Aggregate Functions
NumPy also provides a variety of functions that "aggregate" data. Examples of aggreagation of data include calculating the sum of every element in the array, calculating the mean, standard deviation, etc.  Below are a few examples of aggregation functions provided by NumPy.

**Mathematics Functions**:
+ [np.sum()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html): sums the array elements over a given axis
+ [np.minimum()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.minimum.html#numpy.minimum): compares two arrays and returns a new array of the minimum at each position (i.e. element-wise)
+ [np.maximum()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.maximum.html#numpy.maximum): compares two arrays and returns a new array of the maximum at each position (i.e. element-wise).
+ [np.cumsum()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.cumsum.html#numpy.cumsum): returns the cummulative sum of the elements along a given axes.

You can find more about mathematical functions for arrays at the [Numpy mathematical functions page](https://docs.scipy.org/doc/numpy/reference/routines.math.html).

**Statistics**:
+ [np.mean()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html): compute the arithmetic mean along the specified axis.
 [np.median()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html#numpy.median): compute the median along the specified axis.
+ [np.corrcoef()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html#numpy.corrcoef): return Pearson product-moment correlation coefficients between two 1D arrays or one 2D array.
+ [np.std()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html#numpy.std): compute the standard deviation along the specified axis.
+ [np.var()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html#numpy.var): compute the variance along the specified axis.

You can find more about statistical functions for arrays at the [Numpy statistical functions page](https://docs.scipy.org/doc/numpy/reference/routines.statistics.html).


Take a moment, to learn more about the functions listed above by clicking on the function name as it links to the NumPy documentation.  Pay attention to the arguments that each receives and the type of output it generates.

For example:
```Python
# Calculate the sum of our demo data from above
np.sum(demo_e)
```


### Task 6a: Math/Stats Aggregate Functions

<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, perform the following.

+ Create three to five arrays
+ Experiment with each of the aggregation functions: `sum`, `minimum`, `maximum`, `cumsum`, `mean`, `np.corrcoef`, `np.std`, `np.var`. 
+ For each function call, add a comment line above it that describes what it does.  

### 6.1 Logical Aggregate Functions
When arrays contain boolean values there are additional logical aggregation functions you can use: 

 + [logical_and()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.logical_and.html#numpy.logical_and): computes the element-wise truth value of two arrays using AND.
 + [logical_or()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.logical_or.html#numpy.logical_or): computes the element-wise truth value of two arrays using OR.
 + [logical_not()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.logical_not.html#numpy.logical_not):  computes the element-wise truth value of two arrays using NOT.
 
 
You can find more about logical functions for arrays at the [Numpy Logic functions page](https://docs.scipy.org/doc/numpy/reference/routines.logic.html).

Take a moment, to learn more about the functions listed above by clicking on the function name as it links to the NumPy documentation.  Pay attention to the arguments that each receives and the type of output it generates.

To demonstrate usage of the logical functions, please execute the following cells and examine the results produced.

In [12]:
# Two lists of boolean values
a = [True, True, False, False]
b = [False, False, True, True]

# Perform a logical "or":
np.logical_or(a, b)

array([ True,  True,  True,  True])

In [None]:
# Perform a logical "and":
np.logical_or(a, b)

### Task 6b: Logical Aggregate Functions

<span style="float:right; margin-left:10px; clear:both;">![Task](http://icons.iconarchive.com/icons/sbstnblnd/plateau/96/Apps-gnome-info-icon.png)
</span>

In the practice notebook, perform the following.

+ Create two arrays containing boolean values.
+ Experiment with each of the aggregation functions: `logical_and`, `logical_or`, `logical_not`. 
+ For each function call, add a comment line above it that describes what it does.  

In [None]:
x = [1,2,3]
print(x)
x[2]=4
print(x)

a = (1,2,3)
print(a)
a[2] = 4
print(a)