### FINA 4380 with Marius Popescu

## NumPy - Part I

NumPy (short for Numerical Python) is a linear algebra library for Python. It is an important library, because many of the data analytics libraries (such as Pandas) rely on it as one of their main building blocks. NumPy is already installed with the Anaconda distribution.

We will use NumPy to generate and work with arrays, which are similar to Python lists, but unlike them in that the elements must be of the same type.

### NumPy can be easily imported as follows:

In [4]:
import numpy as np

### Generating Numpy Arrays with NumPy Built-in Methods

We will work with NumPy arrays that are either **vectors** (*one-dimensional arrays*) or **matrices** (*two-dimensional arrays*).

#### We can use the `np.arange()` method to generate a one-dimensional array of evenly spaced integers

A start, end and step (which may be negative) can be given. The function returns integers up to but **not including** the endpoint. If a step is provided, then the starting point must also be provided. This function is similar to the Python built-in `range()` function I covered earlier.

In [5]:
# Generate an array with all integers in the interval [0,10]
np.arange(0,11)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [6]:
# Generate an array with all the even integers in the interval [4,26]
np.arange(4,27,2)

array([ 4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26])

#### We can use the `np.linspace()` method to generate a one-dimensional array of evenly spaced values

A start, end and the number of values must be given. The function returns values up to and **including** the endpoint. The default number of values is 50.

In [7]:
# Generate an array of thirty evenly spaced values in the interval [0,10]
np.linspace(0,10,30)

array([ 0.        ,  0.34482759,  0.68965517,  1.03448276,  1.37931034,
        1.72413793,  2.06896552,  2.4137931 ,  2.75862069,  3.10344828,
        3.44827586,  3.79310345,  4.13793103,  4.48275862,  4.82758621,
        5.17241379,  5.51724138,  5.86206897,  6.20689655,  6.55172414,
        6.89655172,  7.24137931,  7.5862069 ,  7.93103448,  8.27586207,
        8.62068966,  8.96551724,  9.31034483,  9.65517241, 10.        ])

#### We can use the `np.zeros()` method to generate a one-dimensional or two-dimensional array of zeros.

In [8]:
# Generate a size 3 one-dimensional array of zeros
np.zeros(3)

array([0., 0., 0.])

In [9]:
# Generate a 5x5 matrix of zeros 
# (Use a tuple to indicate the size of each dimension of the array)
np.zeros((5,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

### Generating Random Number-Populated Arrays 

The `np.random` module contains functions for efficiently generating arrays of sample values from many kinds of probability distributions.

The random numbers we generate will be based on the **seed** of NumPy's random number generator. We can easily set the random seed as follows:

In [10]:
np.random.seed(100)

We can choose any positive number and set it as the seed of NumPy's random number generator. You can think of the seed as the starting point of an algorithm that generates the values from a distribution (or interval) of our choice. 


By resetting NumPy's random number generator seed every time we want to generate an array of random numbers, we can replicate the selection. This is very useful for those projects whose results need to be replicated.

#### We can use the `np.random.uniform()` method to generate an array with a specific size, and populated with random numbers from the uniform distribution U(a,b).

In [11]:
# Generate a size 3 array from the uniform distribution U(0,1)
np.random.seed(1000)
np.random.uniform(0,1,3)
#np.random.rand(3)

array([0.65358959, 0.11500694, 0.95028286])

#### We can use the `np.random.normal()` to generate an array with a specific size, and populated with random numbers from the normal distribution $N(\mu,\sigma^2)$, where $\mu$ is the mean and $\sigma$ is the standard deviation

In [12]:
# Generate a size 5 array from the standard normal distribution N(0,1); mean is 0 and standard
# deviation is 1
np.random.seed(1000)
np.random.normal(0,1,5)
#np.random.randn(5)

array([-0.8044583 ,  0.32093155, -0.02548288,  0.64432383, -0.30079667])

#### We can use `np.random.randint()` to generate an array populated with random integers from a given range [low,high)

In [13]:
# Generate a 3x4 matrix of random integers from the interval [1,100]
np.random.seed(1000)
arr = np.random.randint(1,101,(3,4))
print(arr)

[[52 88 72 65]
 [95 93  2 62]
 [ 1 90 46 41]]


### NumPy Array Attributes

An array attribute is a variable associated with the NumPy array class, and its value changes with the array.

#### `array_name.ndim` returns the number of dimensions of an array

In [14]:
arr.ndim

2

#### `array_name.shape` returns a tuple indicating the size of each dimension of an array
Since we will be working only with one-dimensional or two-dimensional arrays, the `shape` atrribute will return a tuple with only one or two values.

In [15]:
arr.shape

(3, 4)

#### `array_name.size` returns the number of elements in an array

In [16]:
arr.size

12

#### `array_name.T` attribute can be used to transpose a two-dimensional array.

In [17]:
arr.T

array([[52, 95,  1],
       [88, 93, 90],
       [72,  2, 46],
       [65, 62, 41]])

### Indexing and Slicing of Two-Dimensional Arrays

#### We use the syntax `array_name[row,col]` to select either an element or multiple elements from the array.

In [18]:
arr

array([[52, 88, 72, 65],
       [95, 93,  2, 62],
       [ 1, 90, 46, 41]])

In [19]:
#Selecting the element in the upper left corner
arr[0,0]

52

In [20]:
#Slicing the first two rows, and all columns
arr[:2,:]

array([[52, 88, 72, 65],
       [95, 93,  2, 62]])

In [21]:
#Slicing the first two rows, but only the second column
arr[:2,1]

array([88, 93])

In [22]:
#Slicing all rows, but only the third column
arr[:,2]

array([72,  2, 46])

In [23]:
#Slicing first two rows and last two columns
arr[:2,1:]

array([[88, 72, 65],
       [93,  2, 62]])

### Boolean Indexing

Using arithmetic conditioning on the original array yields a boolean array.

In [24]:
arr > 60

array([[False,  True,  True,  True],
       [ True,  True, False,  True],
       [False,  True, False, False]])

To return the elements in the original array that meet the condition, we need to pass the boolean array when indexing it. Boolean indexing *always* creates a copy of the data, even if the returned array is unchanged.

In [25]:
arr[arr > 60]

array([88, 72, 65, 95, 93, 62, 90])

In [28]:
# Return the elements that are higher than 60 and less than 80
arr[(arr>60) & (arr<80)]

array([72, 65, 62])

### Np.nan
Np.nan represents a special floating point representation of Not a Number. Is is widely used to represent missing values.

In [29]:
# Generate a 3x4 matrix of random values from the standard normal distribution 
np.random.seed(1000)
new_arr=np.random.normal(0,1,(3,4))
new_arr

array([[-0.8044583 ,  0.32093155, -0.02548288,  0.64432383],
       [-0.30079667,  0.38947455, -0.1074373 , -0.47998308],
       [ 0.5950355 , -0.46466753,  0.66728131, -0.80611561]])

In [30]:
# Setting the values in the array above the median as missing
new_arr[new_arr < 0] = np.nan
new_arr

array([[       nan, 0.32093155,        nan, 0.64432383],
       [       nan, 0.38947455,        nan,        nan],
       [0.5950355 ,        nan, 0.66728131,        nan]])