# Data Programming in Python | BAIS:6040
# Module 4 - Handling Numbers with NumPy

Written by Kang-Pyo Lee 

Topics to be covered:
- NumPy Array Creation and Random Number Generation (+ exercises)
- Array Attributes and Comparison
- Array Indexing and Slicing (+ exercises)
- Array Operations
- Fast Element-wise Array Functions and Array Methods (+ exercises)

NumPy, which is the abbreviation for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.

## Import the NumPy Package

In [None]:
# ! pip install --user --upgrade numpy

Note that you should install an external package just once unless you want to upgrade it. Once it's installed, you can simply load it whenever you use it.  

In [None]:
import numpy as np

## Create NumPy Arrays from Python Lists

One of the key features of NumPy is its N-dimensional array object, called ndarray, which is a fast, flexible container for large data sets in Python.

In [None]:
x = np.array([1, 2, 3, 4, 5])
x

numpy.array: https://numpy.org/doc/stable/reference/generated/numpy.array.html

You can create a NumPy ndarray from a primitive Python list using the <b>array</b> function. Now you can take advantage of all the useful features of NumPy that primitive Python lists do not offer. 

In [None]:
type(x)

In [None]:
y = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])
y

A nested list of equal-length lists will be converted into a 2-dimensional array.

In [None]:
type(y)

## Other Ways to Create New Arrays

These NumPy functions are useful when you need to generate an array of values that meet certain criteria. 

In [None]:
np.zeros(5)                    # a numpy array filled with 5 float zeros

numpy.zeros: https://numpy.org/doc/stable/reference/generated/numpy.zeros.html

The <b>zeros</b>(shape, dtype=float, ...) function returns a new array of given `shape` and `dtype`, filled with all zeros. The default data type is numpy.float64. 

In [None]:
np.zeros(5, dtype=int)         # a numpy array filled with 5 integer zeros

In [None]:
np.zeros((5, 10), dtype=int)    # a 5 x 5 numpy array filled with all integer zeros

You can specify the shape of a 2-dimensional array as a tuple. The first value in the tuple refers to the number of rows and the second value to the number of columns.

In [None]:
np.ones((5, 10))                 # a 5 x 5 numpy array filled with all float ones

numpy.ones: https://numpy.org/doc/stable/reference/generated/numpy.ones.html

The <b>ones</b>(shape, dtype=None, ...) function returns a new array of given `shape` and `dtype`, filled with all ones. The default data type is numpy.float64.

In [None]:
np.ones((5, 10), dtype=int)       # a 5 x 5 numpy array filled with all integer ones

In [None]:
np.full((5, 10), 3.14)            # a 5 x 5 numpy array filled with all 3.14's

numpy.full: https://numpy.org/doc/stable/reference/generated/numpy.full.html

The <b>full</b>(shape, fill_value, dtype=None, ...) function returns a new array of given `shape` and `dtype`, filled with `fill_value`.

In [None]:
np.arange(0, 10)                 # a numpy array with integers from 0 to 9

numpy.arange: https://numpy.org/doc/stable/reference/generated/numpy.arange.html

The <b>arange</b>([start, ]stop, [step, ]dtype=None) function works basically the same as the built-in **range** function, except that it returns a NumPy array. The function returns a new array of evenly spaced values within a given interval. Note that the parameter `start` is inclusive, while `stop` is exclusive. 

In [None]:
np.arange(10)

You can skip `start` if it is 0.

In [None]:
np.arange(0, 10, 2)               # a numpy array with integers from 0 to 9 stepping by 2

The parameter `step` determines spacing between values.

In [None]:
np.linspace(0, 1, 5)              # a numpy array with 5 evenly spaced numbers from 0 and 1

numpy.linspace: https://numpy.org/doc/stable/reference/generated/numpy.linspace.html

The <b>linspace</b>(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0) function returns a new array of evenly spaced numbers over a specified interval. Note that both the `start` and `stop` parameters are inclusive. 

In [None]:
np.identity(5)                    # a 5 x 5 numpy identity array

numpy.identity: https://numpy.org/doc/stable/reference/generated/numpy.identity.html

The <b>identity</b>(n, dtype=None) function returns the identity array, which is a square array with 1's on the main diagonal and 0's elsewhere.

<hr>

## Create Arrays of Random Numbers

The <b>numpy.random</b> module is used to efficiently generate arrays of sample values from commonly-used probability distributions.

In [None]:
np.random.normal(0, 1, 10)           # a numpy array of 10 random samples that follow a normal distribution 
                                     # with 0 being the mean and 1 being the standard deviation 

numpy.random.normal: https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html

The <b>random.normal</b>(loc=0.0, scale=1.0, size=None) function returns a new array of random samples from a normal (Gaussian) distribution with `loc` being the mean and `scale` being the standard deviation of the distribution.

In [None]:
np.random.uniform(0, 1, (3, 3))       # a 3 x 3 numpy array of random samples that follow a uniform distribution 
                                      # over the interval [0, 1) 

numpy.random.uniform: https://numpy.org/doc/stable/reference/random/generated/numpy.random.uniform.html

The <b>random.uniform</b>(low=0.0, high=1.0, size=None) function returns a new array of random samples from a uniform distribution over the half-open interval [`low`, `high`) (`low` inclusive, but `high` exclusive). In other words, any value within the given interval is equally likely to be drawn by uniform.

There are other functions that return random samples from an other type of distribution such as <b>random.binomial</b> for the binomial distribution, <b>random.beta</b> for the beta distribution, <b>random.chisquare</b> for the chi-square distribution, and <b>random.gamma</b> for the gamma distribution. 

numpy.random.binomial: https://numpy.org/doc/stable/reference/random/generated/numpy.random.binomial.html<br>
numpy.random.beta: https://numpy.org/doc/stable/reference/random/generated/numpy.random.beta.html<br>
numpy.random.chisquare: https://numpy.org/doc/stable/reference/random/generated/numpy.random.chisquare.html<br>
numpy.random.gamma: https://numpy.org/doc/stable/reference/random/generated/numpy.random.gamma.html

In [None]:
np.random.randint(0, 10, (3, 3))     #  a 3 x 3 numpy array with random integers from 0 and 9

numpy.random.randint: https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html

The <b>random.randint</b>(low, high=None, size=None, dtype='l') function returns a new array of random integers from `low` (inclusive) to `high` (exclusive).

In [None]:
np.random.choice(np.arange(10), 3, replace=False)     # choose 3 values from np.arage(10) with no duplicates 

numpy.random.choice: https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html

There are cases where you want to allow no duplicates in generated random numbers. The <b>random.choice</b>(a, size=None, replace=True, p=None) returns a new array of random samples from a given 1-D array. The first parameter `a` specifies a population for sampling, and the second parameter `size` specifies the sample size. When the third parameter `replace` is set to false, the sample has no replacement. 

In [None]:
np.random.permutation([1, 4, 9, 16, 25])

numpy.random.permutation: https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html

The <b>random.permutation</b>(x) function shuffles the values in a sequence and returns a permuted array.

In [None]:
np.random.seed(seed=0)   

numpy.random.seed: https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html

The <b>random.seed</b>(seed=None) function seeds the generator. A random seed is used for generating pseudo-random numbers, which are apparently random numbers but were generated based on a seed value. The same random number generator and random seed always generates the same random numbers. This is useful for ensuring reproducibility. A seed value can be any value. 

In [None]:
np.random.seed(seed=0) 
np.random.randint(0, 10, (3, 3))

## Exercises for Arrays Creation

<hr>

## Array Attributes

In [None]:
np.random.seed(seed=0) 
x = np.random.randint(0, 100, (3, 3))
x

In [None]:
x.ndim 

The <b>ndim</b> attribute returns the number of dimensions of an array.

In [None]:
x.shape

The <b>shape</b> attribute returns the shape of an array.

In [None]:
x.size

The <b>size</b> attribute returns the number of all values in an array.

In [None]:
x.dtype

The <b>dtype</b> attribute returns the data type of an array.

NumPy data types: https://numpy.org/doc/stable/user/basics.types.html

Note that the <b>ndim</b>, <b>shape</b>, <b>size</b>, and <b>dtype</b> are attributes, not methods, of NumPy arrays. There are no round brackets after them.

In [None]:
len(x)

The length of an array is the number of the first-dimension values, which are the rows of the array.

## Array Comparisons

Comparison of arrays yields a Boolean array of element-wise answers.

In [None]:
x = np.array([1, 2, 3, 4, 5])
x

In [None]:
x < 3

In [None]:
x == 3

In [None]:
(x < 3) | (x == 3)        # or

In [None]:
(x < 3) & (x == 3)        # and

<hr>

## Array Indexing & Slicing

Indexing and slicing of NumPy arrays works basically the same as general indexing and slicing in Python, with some additional functionality such as Boolean indexing and Fancy Indexing. 

### 1-dimensional Arrays

In [None]:
np.random.seed(0)
x = np.random.randint(10, 100, 10)
x

In [None]:
x[0]

In [None]:
x[-1]

In [None]:
x[:3]

In [None]:
x[-3:]

In [None]:
x[:]

`x[:]` is equivalent to just `x`. 

In [None]:
x[::2]

In [None]:
x[::-1] 

`x[::-1]` reverses `x`.

### 2-dimensional Arrays

In [None]:
np.random.seed(0)
x = np.random.randint(0, 100, (5, 10))
x

You can think of a 2-dimensional array as a matrix with rows and columns. 

In [None]:
x[0]        # Selects the first row from the matrix

In [None]:
x[:3]       # The first 3 rows

`x[:n]` retrieves the first `n` rows.

In [None]:
x[0, 1]      # The first row and the second column

When retrieving a particular value in a 2-dimensional array, look up the row index first and then the column index, separated by a comma in matching square brackets.

In [None]:
x[:3, :3]    # The first 3 rows and the first 3 columns

In [None]:
x[-3:, -3:]   # The last 3 rows and the last 3 columns

In [None]:
x[:3, -5:]    # The first 3 rows and the last 5 columns 

In [None]:
x[:, :3]    # All rows and the first 3 columns

`x[:, :n]` retrieves the first `n` columns.

<hr>

### Boolean indexing

In [None]:
np.random.seed(0)
data = np.random.normal(75, 10, (7, 3))
data

In [None]:
data > 90

Comparing `data` with a number 90 yields a Boolean array of element-wise answers.

In [None]:
data[data > 90]     # Returns all the values in data that are greater than 90.

This Boolean array can be passed as a mask when indexing the array. This is called Bollean indexing. The Boolean array must be of the same length as the axis it is indexing. Boolean indexing returns an array of the values that correspond to True. 

In [None]:
l = list(data)
l[l > 90]

Note that primitive Python lists do not support this kind of Boolean indexing. 

In [None]:
names = np.array(['Bob', 'Alice', 'Sam', 'Bob', 'Sam', 'Alice', 'Alice'])
names

Suppose each name in `names` corresponds to a row in `data`.

In [None]:
names == "Bob"

Comparing `names` with a string *Bob* yields a Boolean array of element-wise answers.

In [None]:
data[names == "Bob"]    # Returns all the values in data that correspond to Bob.

This Boolean array can be passed as a mask when indexing the `data` array.

In [None]:
data[names == "Alice"]   # Returns all the values in data that correspond to Alice.

In [None]:
data[names != "Alice"]   # Returns all the values in data that do not correspond to Alice.

In [None]:
data[(names == 'Bob') | (names == 'Alice')]

To make a compound condition, use Boolean arithmetic operators such as & (i.e., and) and | (i.e., or). 

In [None]:
mask = (names == 'Bob') | (names == 'Alice')
data[mask]

If the condition is too long and complex, you can assign the resulting Boolean array to a variable, say `mask`, and then you can pass the `mask` for indexing `data`.

Note that selecting data from an array by Boolean indexing always returns a copy of the target array, not changing its content.

In [None]:
data < 80

In [None]:
data[data < 80]

In [None]:
data[data < 80] = 0     # Sets all of the values less than 80 to 0.
data

You can set values with boolean arrays. Only the values that correspond to True are affected. 

In [None]:
data[names == "Sam"]

In [None]:
data[names == "Sam"] = 100
data

This way, you can set the whole rows or columns.

### Fancy Indexing

Fancy Indexing is a term coined by NumPy to describe indexing using integer arrays.

In [None]:
x = np.array([num * 10 for num in np.arange(10)])
x

In [None]:
mask = [0, 7, 1]
x[mask]

To select a subset of the values in a particular order, you can simply pass a list or array of index positions, specifying the desired order.

In [None]:
mask = [-1, -5, -3]
x[mask]

You can alsway use negative index positions. 

## Exercises for Array Indexing and Slicing

<hr>

## Array Operations

NumPy arrays allows you to express batch operations on data without writing any <b>for</b> loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise.

In [None]:
x = np.array([1, 2, 3, 4, 5])
x

In [None]:
x + 1                   # a new numpy array with each value in x added by 1

This is called element-wise addition.

In [None]:
l = [1, 2, 3, 4, 5]
l + 1

Note that primitive Python lists do not support this element-wise operation.

In [None]:
[num + 1 for num in l]

You need a <b>for</b> loop or list comprehension to do the same thing with a primitive Python list.

In [None]:
x

In [None]:
x - 1

In [None]:
x * 2

In [None]:
x / 2

In [None]:
1 / x

In [None]:
x ** 2

In [None]:
-x

In [None]:
x = np.array([1, 2, 3])
y = np.array([1, 3, 5])

In [None]:
x + y

In [None]:
x * y

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 3, 5])
x + y

For element-wise array operations, the two operands must have the same shape for addition, subtraction, muliplication, and division. 

## Fast Element-wise Array Functions

Mathematical functions: https://numpy.org/doc/stable/reference/routines.math.html

In [None]:
x = np.array([-1, 2, -3, 4, -5])
np.absolute(x)      # absolute values

numpy.absolute: https://numpy.org/doc/stable/reference/generated/numpy.absolute.html

In [None]:
x = np.array([1, 2, 3])
np.exp(x)            # exponential (= e^x)

numpy.exp: https://numpy.org/doc/stable/reference/generated/numpy.exp.html

In [None]:
x = np.array([1, 2, 3])
np.power(3, x)        # power (= 3^x)

numpy.power: https://numpy.org/doc/stable/reference/generated/numpy.power.html

In [None]:
x = [1, 2, 4, 8]
np.log(x)             # ln(x)

numpy.log: https://numpy.org/doc/stable/reference/generated/numpy.log.html

In [None]:
x = [1, 2, 4, 8]
np.log2(x)            # log2(x)

numpy.log2: https://numpy.org/doc/stable/reference/generated/numpy.log2.html

In [None]:
x = [1, 2, 4, 8]
np.log10(x)           # log10(x)

numpy.log10: https://numpy.org/doc/stable/reference/generated/numpy.log10.html

In [None]:
x = np.array([[1, 2], [3, 4]])
x

In [None]:
y = np.array([[5, 6], [7, 8]])
y

In [None]:
np.dot(x, y)          # dot product of two arrays

numpy.dot: https://numpy.org/doc/stable/reference/generated/numpy.dot.html

## Array Methods

In [None]:
np.random.seed(0)
x = np.random.randint(0, 50, 10)
x

In [None]:
x.sum()

numpy.ndarray.sum: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.sum.html

In [None]:
x.cumsum()

numpy.ndarray.cumsum: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.cumsum.html

The <b>cumsum</b>(axis=None, dtype=None, out=None) returns the cumulative sum of the values along the given `axis`.

In [None]:
x.prod()

numpy.ndarray.prod: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.prod.html

In [None]:
x.cumprod()

numpy.ndarray.cumprod: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.cumprod.html

The <b>cumprod</b>(axis=None, dtype=None, out=None) returns the cumulative product of the values along the given `axis`.

In [None]:
x.mean()

numpy.ndarray.mean: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.mean.html

In [None]:
x.var()

numpy.ndarray.var: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.var.html

In [None]:
x.std()

numpy.ndarray.std: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.std.html

In [None]:
x.min()

numpy.ndarray.min: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.min.html

In [None]:
x

In [None]:
x.max()

numpy.ndarray.max: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html

In [None]:
x.argmin()

numpy.ndarray.argmin: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.argmin.html

The <b>argmin</b>(axis=None, out=None) returns indices of the minimum values along the given `axis` of a.

In [None]:
x.argmax()

numpy.ndarray.argmax: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.argmax.html

The <b>argmax</b>(axis=None, out=None) returns indices of the maximum values along the given `axis`.

In [None]:
x = np.arange(15)
x

In [None]:
y = x.reshape((3, 5))
y

numpy.ndarray.reshape: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.reshape.html

The <b>reshape</b>(shape, ...) method returns an array containing the same data with a new `shape`.

In [None]:
y.transpose()

numpy.ndarray.transpose: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.transpose.html

The <b>transpose</b>(*axes) returns a view of the array with `axes` transposed.

In [None]:
y.flatten()

numpy.ndarray.flatten: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html

The <b>flatten</b> method returns a copy of the array collapsed into one dimension. The <b>flatten</b> function is the opposite of <b>reshape</b>.

In [None]:
x.astype(float)

numpy.ndarray.astype: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html

The <b>astype</b>(dtype, ...) method casts an array to a specified `dtype`.

## Exercises for Array Functions and Methods