# Introduction to Numpy

Numpy (which is short for Numerical Python) is the fundamental package for scientific computing with Python.  With it you can:
- store and work with data in multi-dimensional arrays
- calculate statistical properties of large sets of data

To use Numpy, you must ALWAYS first import the Numpy package at the beginning of your Python script:

In [1]:
import numpy as np

Notice there is no output generated.  This is good because that means that the package imported successfully.  
<br><br>
Also, I imported Numpy as *np*.  That means whenever I want to call a builtin Numpy function, I call it from *np* instead of having to type out the entire word *numpy*.

Let's create a 1-Dimensional Numpy array (i.e., an array of RANK 1) from a list.  To do this, we need to call the **np.array( )** function by passing in a list of items that we want to convert to an array.  This argument list must be contained in square brackets [  ].

In [4]:
import numpy as np
myArray = np.array([1,2,3])
print(myArray)

[1 2 3]


Once a list (or list of lists) is defined as a Numpy array, there are many powerful functions and features that become available for working with the list data.
<br><br>
**Functions and features for:**
- Creating pre-filled multi-dimensional arrays
- Generating arrays of random numbers
- Slicing and filtering
- Calculating statistical properties
- Sorting and finding unique array elements

And much, much more!

# Creating Pre-filled Multi-dimensional Arrays

Let's create a Numpy array of RANK 2 by calling **np.array( )** and passing in a list of 2 lists.  **NOTE:** Since Numpy has already been imported previously in this notebook, I will not need to import it again to use it.

In [5]:
myArray2 = np.array([[1,2,3],[4,5,6]])
print(myArray2)

[[1 2 3]
 [4 5 6]]


Pay close attention to the syntax.  Each list is enclosed in square brackets, [ ].  In addition, there has to be square brackets enclosing the entire list of lists.  Lastly, we are passing all of this inside the parenthesis of the np.array( ) function.

Now let's create a Numpy array of RANK 2 containing all zeros.  Use the **np.zeros( )** function to do this.  Pass into this function a tuple representing the desired dimensions of your array.  Here our tuple is (2, 2) since we want a 2 x 2 dimesional array.

In [7]:
myArray3 = np.zeros((2,2))
print(myArray3)

[[ 0.  0.]
 [ 0.  0.]]


Similarly, we can create a Numpy array of RANK 3 pre-filled with all ones by using the **np.ones( )** function.

In [8]:
myArray4 = np.ones((3,3))
print(myArray4)

[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]


We can even create a Numpy array of RANDOM floats between 0 and 1.  This is very useful when you don't have any real data to work with.  Use the **np.random.random( )** function to do this.  Pass into this function a tuple of your desired dimensions.

In [15]:
myArray5 = np.random.random((3,3))
print(myArray5)

[[ 0.15131646  0.54774323  0.23830944]
 [ 0.76400822  0.42019182  0.52107382]
 [ 0.57287014  0.358928    0.22892549]]


### Exercise 

In the next cell, randomly generate a Numpy array of size 3 x 2.  Assign this array to the variable, *ex1*.  Then print *ex1* to the screen.

# Numpy Array Slicing

We can access individual items or sub-regions of a Numpy array by slice indexing.  That is, if you want to pull out only the item in the 2nd ROW and 3rd COLUMN of myArray5 you would do the following:

In [16]:
myArray6 = myArray5[1,2]
print(myArray6)

0.521073822816


**NOTES:**
- Numpy arrays in Python begin row and column indexing at 0 NOT 1.  So the first ROW/COLUMN INDEX would be 0, the second ROW/COLUMN INDEX would be 1 and so on.  
- Since myArray5 is a multi-dimensional array, we pass in the desired indexes as a ROW INDEX and COLUMN INDEX enclosed in square brackets, [ ].  Here [1,2] means we want the item with the ROW INDEX of 1 and COLUMN INDEX of 2.  That is, the item in the 2nd row and 3rd column.

We can access a sub-region of a Numpy array by passing in a ROW RANGE and COLUMN RANGE.  If you want to, for example, pull out the sub-matrix from rows 1 to 2 and columns 2 to 3, you would do the following:

In [17]:
myArray7 = myArray5[0:2,1:3]
print(myArray7)

[[ 0.54774323  0.23830944]
 [ 0.42019182  0.52107382]]


**NOTES:**
- When specifying a ROW RANGE or COLUMN RANGE, use a colon.  
- Since myArray5 is a multi-dimensional array, we pass in the desired ranges as a ROW RANGE and a COLUMN RANGE enclosed in square brackets, [ ].
- The ending INDEX of a RANGE is NOT included.  That is, for the ROW RANGE 0:2, we begin with the ROW INDEX of 0 all the way up to BUT NOT INCLUDING the ROW INDEX of 2.  This gives us access to the rows with indexes 0 and 1.

### Exercise

In the next cell, access the number 5 in myArray2.  Assign this array to the variable, *ex2*.  Then print *ex2* to the screen.

In the next cell, access the sub-region of myArray2 containing numbers 1,2,4 and 5. Assign this array to the variable, *ex3*.  Then print *ex3* to the screen.

# Numpy Array Filtering

We can access items of Numpy array based on a certain criteria by filtering.  Consider the following Numpy array:

In [19]:
myArray8 = np.array([[11,12,13,14],[21,22,23,24],[31,32,33,34]])
print(myArray8)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


If we want to access all the items in this Numpy array that is GREATER THAN 20, we would do the following:

In [22]:
myArray8[myArray8 > 20]

array([21, 22, 23, 24, 31, 32, 33, 34])

**NOTES:**
- I could have assigned the filtered list to a variable and then printed it out to the screen, but upon running my code input the code output will be automatically displayed to the screen in Jupyter notebooks.  So from here on out, I will choose not to use a print( ) function to display output to the screen.
- To filter a Numpy array, you first have to call the array name and then pass the fiter conditions within square brackets, [ ].
- The result of my filter is a Numpy array of RANK 1.

You can also do filtering on multiple conditions.  For example, if you want to filter the items in this Numpy array that are GREATER THAN 20 AND LESS THAN 30, you would do the following:

In [23]:
myArray8[(myArray8 > 20) & (myArray8 < 30)]

array([21, 22, 23, 24])

Notice I used the symbol, &.  The ampersand is Python's *AND* logical operator.  Similarly, the symbol | is Python's *OR* logical operator.  
<br>
**For example**, if you want to filter the items in the Numpby array that are LESS THAN 15 OR GREATER THAN 30, you would do the following:

In [24]:
myArray8[(myArray8 < 15) | (myArray8 > 30)]

array([11, 12, 13, 14, 31, 32, 33, 34])

A very nice list of Python operators can be found [here](https://www.tutorialspoint.com/python/python_basic_operators.htm).

### Exercise

In the next cell, use the Python operators list to filter myArray8 to include numbers GREATER THAN OR EQUAL to 21 AND NOT EQUAL to 32.

In the next cell, use the Python operators list to filter myArray8 to include numbers LESS THAN OR EQUAL to 30 OR EQUAL to 34.

# Sorting and Finding Unique Elements

It is very useful to be able to SORT and find UNIQUE elements of an array.  Numpy has 2 functions for doing that.  They are **np.sort( )** and **np.unique( )**, respectively.  Consider the following Numpy array:

In [25]:
myArray9 = ([[3,1,2],[6,5,3],[2,9,1]])

Let's apply the np.sort( ) function.

In [26]:
np.sort(myArray9)

array([[1, 2, 3],
       [3, 5, 6],
       [1, 2, 9]])

Notice this function returns a Numpy array of the same rank as myArray9 with row items sorted from least to greatest.  Now let's apply the np.unique( ) function.

In [27]:
np.unique(myArray9)

array([1, 2, 3, 5, 6, 9])

This function returns a Numpy array of RANK 1 containing a list of all UNIQUE elements of myArray9.

### Exercise

In the next cell, generate a RANDOM Numpy array of floats between 0 and 1.  Then apply the np.sort( ) function.  Is the result what you expected?  Explain.

In the next cell, generate a Numpy array of all ONES.  Then apply the np.unique( ) function.  Is the result what you expected?  Explain.

# Calculating Statistical Properties

Numpy makes calculating the MEAN, MEDIAN, PERCENTILE, STANDARD DEVIATION and many other statistical properties of list data easy.  Consider the following Numpy array:

In [29]:
myArray10 = np.array([4.01,4.03,4.27,4.19,4.23,4.08,14.03,11.20,8.29,6.17,4.11,3.99,4.23])
print(myArray10)

[  4.01   4.03   4.27   4.19   4.23   4.08  14.03  11.2    8.29   6.17
   4.11   3.99   4.23]


To calculate the MEAN of this Numpy array, we simply use **np.mean( )**.

In [30]:
np.mean(myArray10)

5.9100000000000001

To calculate the MEDIAN of this Numpy array, we simply use **np.median( )**.  Recall that the MEDIAN is the value below which 50% of our data falls.

In [32]:
np.median(myArray10)

4.2300000000000004

To calculate the 75th PERCENTILE of this Numpy array, we simply use **np.percentile()**.  We must also pass into this function the percentile in additon to the Numpy array.  Recall that the 75th PERCENTILE is the value below which 75% of our data falls.

In [34]:
np.percentile(myArray10, 75)

6.1699999999999999

To calculate the STANDARD DEVIATION of this Numpy array, we use **np.std( )**.

In [35]:
np.std(myArray10)

3.1423043969477114

# Statistical Properties Exploration 

In the next notebook, *Numpy-Statistics.ipynb*, you will explore these Numpy statistical functions in greater detail using a real world simulated project.  Have fun!