# Data Programming in Python | BAIS:6040
# Handling Numbers with NumPy

Instructor: Jeff Hendricks 

Topics to be covered:
- NumPy Array Creation
- Random Number Generation
- Array Attributes and Methods
- Array Indexing & Slicing
- Array Concatenation, Splitting, Comparison, and Sorting
- Operations between Arrays and Scalars
- Fast Element-wise Array Functions

References: 
- NumPy official website (http://www.numpy.org/) 
- Python Data Science Handbook by Jake VanderPlas (http://shop.oreilly.com/product/0636920034919.do)
- Python for Data Analysis by Wes McKinney (https://www.oreilly.com/library/view/python-for-data/9781491957653/)

NumPy, which stands for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.

## Import the NumPy Package

In [2]:
import numpy as np

## Create NumPy Arrays from Python Lists

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large data sets in Python.

In [2]:
x = np.array([1, 2, 3, 4, 5])

numpy.array: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

You can create a NumPy ndarray from a primitive Python list using the <b>array</b> function. Now you are ready to take advantage of all useful features of NumPy that primitive Python lists do not offer. 

In [3]:
x, type(x)

(array([1, 2, 3, 4, 5]), numpy.ndarray)

In [7]:
y = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])

A nested list of equal-length lists will be converted into a multidimensional array.

In [8]:
y, type(y)

(array([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]]),
 numpy.ndarray)

## Other Ways to Create New Arrays

These NumPy functions are very useful when you need to quickly generate an array of values that follow some rule. 

In [9]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

numpy.zeros: https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html

The <b>zeros(shape, dtype=float, ...)</b> function returns a new array of given shape and type, filled with all zeros. The default data type is numpy.float64. 

In [10]:
np.zeros(5, dtype=int)

array([0, 0, 0, 0, 0])

In [11]:
np.zeros((3, 5), dtype=int)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

You can specify the shape of a two-dimensional array. 

In [12]:
np.ones((5, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

numpy.ones: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html

The <b>ones(shape, dtype=None, ...)</b> function returns a new array of given shape and type, filled with all ones. The default data type is numpy.float64.

In [13]:
np.full((5, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

numpy.full: https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html

The <b>full(shape, fill_value, dtype=None, ...)</b> function returns a new array of given shape and type, filled with `fill_value`.

In [14]:
np.empty((5, 5), int)

array([[4614253070214989087, 4614253070214989087, 4614253070214989087,
        4614253070214989087, 4614253070214989087],
       [4614253070214989087, 4614253070214989087, 4614253070214989087,
        4614253070214989087, 4614253070214989087],
       [4614253070214989087, 4614253070214989087, 4614253070214989087,
        4614253070214989087, 4614253070214989087],
       [4614253070214989087, 4614253070214989087, 4614253070214989087,
        4614253070214989087, 4614253070214989087],
       [4614253070214989087, 4614253070214989087, 4614253070214989087,
        4614253070214989087, 4614253070214989087]])

numpy.empty: https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html

The <b>empty(shape, dtype=float, ...)</b> function returns a new array of given shape and type, without initializing entries. The <b>empty</b> function may be marginally faster than any other functions above that have to fill the array with a specific value. It is often used to quickly create an array with a given shape and manually set all the values in the array afterwards.

In [15]:
x = np.empty((5, 5))
for row in range(len(x)):
    x[row] = row
x

array([[0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

In [16]:
np.arange(0, 10).reshape(2,5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

numpy.arange: https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html

The <b>arange([start, ]stop, [step, ]dtype=None)</b> function returns a new array of evenly spaced values within a given interval. Note that the parameter `start` is inclusive, while `stop` is exclusive. 

In [17]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can skip `start` if it is 0.

In [18]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

The parameter `step` determines spacing between values.

In [19]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

numpy.linspace: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

The <b>linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)</b> function returns a new array of evenly spaced numbers over a specified interval. Note the parameter `stop` is inclusive. 

In [None]:
np.identity(5)

numpy.identity: https://docs.scipy.org/doc/numpy/reference/generated/numpy.identity.html

The <b>identity(n, dtype=None)</b> function returns the identity array, which is a square array with 1's on the main diagonal and 0's elsewhere.

In [None]:
np.eye(5)

In [None]:
np.eye(5,5)

numpy.eye https://numpy.org/doc/stable/reference/generated/numpy.eye.html

## Create Arrays of Random Numbers

The <b>numpy.random</b> module supplements the built-in Python <b>random</b> with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

In [34]:
np.random.normal(0, 1, (5, 3))

array([[ 0.48431215,  0.57914048, -0.18158257],
       [ 1.41020463, -0.37447169,  0.27519832],
       [-0.96075461,  0.37692697,  0.03343893],
       [ 0.68056724, -1.56349669, -0.56669762],
       [-0.24214951,  1.51439128, -0.3330574 ]])

numpy.random.normal: https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html

The <b>normal(loc=0.0, scale=1.0, size=None)</b> function returns a new array of random samples from a normal (Gaussian) distribution with `loc` being the mean and `scale` being the standard deviation of the distribution.

In [35]:
np.random.seed(seed=0)   
np.random.uniform(0, 1, 1)

array([0.5488135])

numpy.random.uniform: https://numpy.org/doc/stable/reference/random/generated/numpy.random.uniform.html

The <b>random.uniform(low=0.0, high=1.0, size=None)</b> function returns a new array of random samples from a uniform distribution over the half-open interval [`low`, `high`) (`low` inclusive, but `high` exclusive). In other words, any value within the given interval is equally likely to be drawn by uniform.

There are more functions that return random samples from an other type of distribution such as <b>random.binomial</b> for the binomial distribution, <b>random.beta</b> for the beta distribution, <b>random.chisquare</b> for the chi-square distribution, and <b>random.gamma</b> for the gamma distribution. 

numpy.random.binomial: https://numpy.org/devdocs/reference/random/generated/numpy.random.binomial.html<br>
numpy.random.beta: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.beta.html<br>
numpy.random.chisquare: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.chisquare.html<br>
numpy.random.gamma: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.gamma.html

In [36]:
np.random.randint(0, 10, (3, 3))

array([[5, 0, 3],
       [3, 7, 9],
       [3, 5, 2]])

numpy.random.randint: https://numpy.org/devdocs/reference/random/generated/numpy.random.randint.html

The <b>random.randint(low, high=None, size=None, dtype='l')</b> function returns a new array of random integers from `low` (inclusive) to `high` (exclusive).

In [42]:
np.random.choice(np.arange(10), 3, replace=False)     # choose 3 values from np.arange(10) with no duplicates 

array([8, 9, 7])

numpy.random.choice: https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html

There are cases where you want to allow no duplicates in generated random numbers. The <b>random.choice(a, size=None, replace=True, p=None)</b> returns a new array of random samples from a given 1-D array. The first parameter `a` serves as a population, while the second parameter `size` serves as the sample size. When the third parameter `replace` is set to false, the sample is without replacement. 

In [45]:
np.random.permutation(10)

array([6, 2, 1, 8, 7, 0, 5, 9, 3, 4])

numpy.random.permutation: https://numpy.org/devdocs/reference/random/generated/numpy.random.permutation.html

The <b>random.permutation(x)</b> randomly permutes a sequence, or return a permuted range. If `x` is an integer, randomly permute np.arange(x).

In [46]:
np.random.permutation([1, 4, 9, 16, 25])

array([ 1, 16,  4,  9, 25])

If `x` is an array, make a copy and shuffle the elements randomly.

numpy.random.seed: https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html

In [47]:
np.random.seed(seed=0)   
np.random.randint(0, 10, (3, 3))

array([[5, 0, 3],
       [3, 7, 9],
       [3, 5, 2]])

The <b>random.seed(seed=None)</b> function seeds the generator. Random seeds are used for generating pseudo-random numbers, which are apparently random numbers but were generated based on seed values. The same random number generator and random seed always generates the same random numbers. This is very useful for ensuring reproducibility.

## Exercises for Arrays Creation (13 Questions)

1\. Create an array of length 100 filled with all zeros that are integers.

In [51]:
# Your answer here
np.zeros(100, int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

2\. Create a 10 x 10 array filled with all ones that are float pointing numbers.

In [54]:
# Your answer here
np.ones((10,10))

array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

3\. Create a 10 x 10 array filled with all -1s that are integers.

In [55]:
# Your answer here
np.ones((10,10),int)

array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])

4\. Create an array of two-digit integers. 

In [61]:
# Your answer here
np.full((10,100),11)


array([[11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11],
       [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11, 11, 11],
       [11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11,
        11, 11

5\. Create an array of integers from -100 to 100 stepping by 10. 

In [63]:
# Your answer here
np.arange(-100,100,10)

array([-100,  -90,  -80,  -70,  -60,  -50,  -40,  -30,  -20,  -10,    0,
         10,   20,   30,   40,   50,   60,   70,   80,   90])

6\. Create an array of 10 evenly spaced numbers from -100 (inclusive) to 100 (inclusive).

In [64]:
# Your answer here
np.linspace(-100,100,10)

array([-100.        ,  -77.77777778,  -55.55555556,  -33.33333333,
        -11.11111111,   11.11111111,   33.33333333,   55.55555556,
         77.77777778,  100.        ])

7\. Create a 10 x 10 square array with integer 1's on the main diagonal and integer 0's elsewhere.

In [66]:
# Your answer here
np.eye(10)

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

8\. Set the random seed value to 999.

In [69]:
# Your answer here
np.random.seed(999)

9\. Create a 5 x 5 array of random numbers that follow a normal distribution with 70 being the mean and 4 being the standard deviation. 

In [73]:
# Your answer here
np.random.normal(70,4,(5,5))

array([[78.52021991, 70.28461689, 62.63343103, 68.58963721, 68.36438482],
       [72.23724068, 71.89605239, 63.18946653, 64.76685381, 71.69932695],
       [70.02667473, 68.11726689, 71.92208865, 74.26474521, 74.62527365],
       [69.99036099, 70.27697938, 73.38809627, 72.64673741, 67.16706496],
       [68.00557826, 72.4195029 , 71.20217253, 68.99508533, 72.22444032]])

10\. Create an array of 100 random numbers that follow a binomial distribution with 10 trials and 0.5 probability of success. You may need to refer to the documentation for the usage of the <b>random.binomial</b> function. 

In [76]:
# Your answer here
np.random.binomial(10,.5,100)

array([8, 6, 2, 6, 5, 8, 6, 4, 8, 7, 5, 6, 9, 5, 5, 5, 7, 5, 5, 6, 4, 2,
       7, 4, 4, 4, 7, 4, 8, 6, 5, 5, 6, 4, 6, 5, 7, 4, 8, 3, 7, 5, 7, 6,
       4, 5, 5, 7, 6, 8, 4, 3, 6, 4, 6, 6, 5, 6, 8, 4, 6, 5, 5, 3, 6, 7,
       4, 8, 4, 3, 3, 6, 5, 3, 5, 6, 7, 5, 4, 4, 6, 5, 3, 8, 4, 5, 6, 7,
       5, 4, 4, 3, 4, 6, 8, 3, 3, 6, 6, 8])

11\. Create a 10 x 10 array of three-digit random integers.

In [77]:
# Your answer here
np.random.randint(100,999,(10,10))

array([[797, 199, 969, 890, 138, 503, 490, 480, 436, 274],
       [677, 910, 124, 742, 171, 248, 719, 519, 523, 158],
       [521, 193, 160, 953, 115, 902, 113, 484, 410, 427],
       [384, 173, 754, 580, 847, 214, 686, 277, 202, 694],
       [788, 942, 301, 557, 884, 965, 216, 139, 366, 259],
       [976, 942, 692, 778, 874, 920, 730, 794, 361, 614],
       [395, 542, 355, 947, 468, 653, 129, 664, 469, 912],
       [770, 473, 585, 926, 579, 693, 677, 222, 750, 465],
       [201, 594, 682, 226, 310, 124, 176, 545, 159, 986],
       [244, 323, 357, 295, 209, 639, 116, 214, 438, 529]])

12\. Create an array of 10 random integers from 1 to 100 without replacement.

In [87]:
# Your answer here
np.random.choice(np.arange(1,100), 10, replace=False)

array([21, 43, 70, 12, 19, 91, 13, 87, 18, 20])

13\. Create a permuted array of multiples of 3 between 1 and 100.

In [90]:
# Your answer here
np.random.permutation(np.arange(3,100,3))

array([33, 75, 84, 21, 57, 15, 66, 78,  9, 87, 51, 93, 69, 99, 24, 18, 96,
       12, 81, 90, 27,  3, 48, 60, 54, 63,  6, 36, 72, 42, 45, 30, 39])

## NumPy Array Attributes

In [3]:
x = np.random.randint(0, 100, (3,3))
x

array([[63, 46, 80],
       [44, 62, 23],
       [58, 84, 97]])

In [4]:
x.ndim 

2

The <b>ndim</b> attribute returns the number of dimensions of an array.

In [5]:
x.shape

(3, 3)

The <b>shape</b> attribute returns the shape of an array.

In [6]:
x.size

9

The <b>size</b> attribute returns the number of all elements in an array.

In [7]:
x.dtype

dtype('int64')

The <b>dtype</b> attribute returns the data type of an array.

NumPy data types: https://docs.scipy.org/doc/numpy/user/basics.types.html

Note that the <b>ndim</b>, <b>shape</b>, <b>size</b>, and <b>dtype</b> are attributes, not methods, of NumPy arrays. There are no parentheses after them.

In [8]:
len(x)

3

The length of an array is the number of the first-dimension elements.

## Array Indexing & Slicing

### 1-dimensional Arrays

In [9]:
np.random.seed(0)
x = np.random.randint(10, 100, 10)
x

array([54, 57, 74, 77, 77, 19, 93, 31, 46, 97])

In [10]:
x[0]

54

In [11]:
x[-1]

97

In [12]:
x[:]

array([54, 57, 74, 77, 77, 19, 93, 31, 46, 97])

x[:] is equivalent to just x. 

In [13]:
x[::2]

array([54, 74, 77, 93, 46])

The third operand determines spacing between values. 

In [14]:
x[::-1] 

array([97, 46, 31, 93, 19, 77, 77, 74, 57, 54])

x[::-1] is equivalent to the reverse of x.

In [15]:
np.array(list(reversed(x)))

array([97, 46, 31, 93, 19, 77, 77, 74, 57, 54])

### 2-dimensional Arrays

In [16]:
np.random.seed(0)
x = np.random.randint(0, 100, (5, 5))
x

array([[44, 47, 64, 67, 67],
       [ 9, 83, 21, 36, 87],
       [70, 88, 88, 12, 58],
       [65, 39, 87, 46, 88],
       [81, 37, 25, 77, 72]])

In [17]:
x[0]

array([44, 47, 64, 67, 67])

In [18]:
x[:3]

array([[44, 47, 64, 67, 67],
       [ 9, 83, 21, 36, 87],
       [70, 88, 88, 12, 58]])

`x[:n]` is the easiest way to retrieve the first n rows.

In [19]:
x[0, 1]

47

When retrieving a particular value in a 2D array, look up the row index first, followed by the column index.

In [20]:
x[:3, :3]

array([[44, 47, 64],
       [ 9, 83, 21],
       [70, 88, 88]])

`x[:n, :n]` is the easiest way to retrieve the values that are in the first n rows and n columns.

In [21]:
x[:, :3]

array([[44, 47, 64],
       [ 9, 83, 21],
       [70, 88, 88],
       [65, 39, 87],
       [81, 37, 25]])

`x[:, :n]` is the easiest way to retrieve the first n columns.

### Boolean indexing

In [22]:
np.random.seed(0)
data = np.random.normal(75, 10, (7, 3))
data

array([[92.64052346, 79.00157208, 84.78737984],
       [97.40893199, 93.6755799 , 65.2272212 ],
       [84.50088418, 73.48642792, 73.96781148],
       [79.10598502, 76.44043571, 89.54273507],
       [82.61037725, 76.21675016, 79.43863233],
       [78.33674327, 89.94079073, 72.94841736],
       [78.13067702, 66.45904261, 49.47010184]])

In [23]:
data>90

array([[ True, False, False],
       [ True,  True, False],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]])

In [24]:
np.nonzero(data > 90)

(array([0, 1, 1]), array([0, 0, 1]))

Comparing <i>data</i> with the number 90 yields a Boolean array of element-wise answers.

In [25]:
data[data>90]     # Returns all the values in data that are greater than 90.

array([92.64052346, 97.40893199, 93.6755799 ])

This Boolean array can be passed when indexing the array. This is called Boolean indexing. The Boolean array must be of the same length as the axis it is indexing.

In [26]:
l = list(data)
l[l > 90]

TypeError: '>' not supported between instances of 'list' and 'int'

Note that primitive Python lists do not support this kind of Boolean indexing. 

In [27]:
names = np.array(['Bob', 'Alice', 'Sam', 'Bob', 'Sam', 'Alice', 'Alice'])
names

array(['Bob', 'Alice', 'Sam', 'Bob', 'Sam', 'Alice', 'Alice'], dtype='<U5')

Suppose each name in <i>names</i> corresponds to a row in <i>data</i>.

In [28]:
names == "Bob"

array([ True, False, False,  True, False, False, False])

Comparing <i>names</i> with the string "Bob" yields a Boolean array of element-wise answers.

In [29]:
data[names == "Bob"]    # Returns all the values in data that correspond to Bob.

array([[92.64052346, 79.00157208, 84.78737984],
       [79.10598502, 76.44043571, 89.54273507]])

This Boolean array can be passed when indexing the array.

In [None]:
data[(names == 'Bob') | (names == 'Alice')]   # Returns all the values in data that correspond to Alice.

In [None]:
data[names != "Alice"]   # Returns all the values in data that do not correspond to Alice.

In [None]:
data[~(names == "Alice")]

The <b>~</b> operator negates the whole Boolean array. 

In [None]:
mask = (names == 'Bob') | (names == 'Alice')
mask

To select two of the three names to combine multiple Boolean conditions, use Boolean arithmetic operators like & (and) and | (or) and assign the resulting Boolean array to a variable, say <i>mask</i>. 

In [None]:
data[mask]

Then you can pass <i>mask</i> for indexing <i>data</i>.

Note that selecting data from an array by Boolean indexing always creates a copy of the data.

In [None]:
data[data < 80] = 0     # Sets all of the values less than 80 to 0.

Setting values with boolean arrays works in a common-sense way.

In [None]:
data

In [None]:
data[names == "Sam"] = 100

Setting whole rows or columns using a 1D boolean array is also easy.

In [None]:
data

### Fancy Indexing

Fancy indexing is a term coined by NumPy to describe indexing using integer arrays.

In [None]:
x = np.array([num * 10 for num in range(10)])
x

In [None]:
mask = [0, 7, 1]
x[mask]

To select out a subset of the rows in a particular order, you can simply pass a list or array of integers, or index numbers, specifying the desired order.

In [None]:
mask = [-1, -5, -3]
x[mask]

You can always use negative index numbers. 

In [None]:
y = np.empty((10, 3))
for i in range(len(x)):
    y[i] = 10 * i
y

In [None]:
mask = [0, 7, 1]
y[mask]

In [None]:
mask = [-1, -5, -3]
y[mask]

## Exercises for Indexing and Slicing Arrays (15 Questions)

In [None]:
np.random.seed(seed=0)
x = np.random.randint(1, 101, 10)
x

1\. Get the first 5 values from <i>x</i>.

In [None]:
# Your answer here


2\. Get the last 5 values from <i>x</i>. 

In [None]:
# Your answer here


3\. Get the values from <i>x</i> stepping by 3.

In [None]:
# Your answer here


4\. Get the reverse of <i>x</i>. 

In [None]:
# Your answer here


In [None]:
np.random.seed(seed=0)
y = np.random.randint(100, 1000, (10, 10))
y

5\. Get the shape of <i>y</i>.

In [None]:
# Your answer here


6\. Get the number of all elements in <i>y</i>.

In [None]:
# Your answer here


7\. Get the first 5 rows. 

In [None]:
# Your answer here


8\. Get the first 5 columns.

In [None]:
# Your answer here


9\. Get the values that are in the last 5 rows and the last 5 columns.

In [None]:
# Your answer here


10\. Get the element on the third row (i.e., row index number 2) and the fifth column (i.e., column index number 4) of y.

In [None]:
# Your answer here


In [None]:
np.random.seed(seed=0)
z = np.random.randint(-100, 101, 20)
z

11\. Get the values of <i>z</i> that are negative.

In [None]:
# Your answer here


12\. Get the values of <i>z</i> that are greater than or equal to 50 or less than or equal to -50.

In [None]:
# Your answer here


In [None]:
np.random.seed(seed=0)
data = np.random.randint(-100, 101, (5, 10))
data

In [None]:
labels = np.array(["c", "b", "c", "a", "b"])

13\. Suppose each label in <i>labels</i> corresponds to a row in <i>z</i>. Get all the rows in <i>data</i> with the corresponding labels "a" and "b".

In [None]:
# Your answer here


14\. Set all values in the corresponding rows to the label "c" in <i>data</i> to 0.

In [None]:
# Your answer here


15\. Get the rows in <i>data</i> at the index positions 4, 0, and 2 (order is important). 

In [None]:
# Your answer here


## Array Concatenation and Splitting

In [None]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

In [None]:
np.concatenate([x, y])

In [None]:
np.concatenate((x, y),axis=0)

numpy.concatenate: https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

The <b>concatenate((a1, a2, ...), axis=0, out=None)</b> joins a sequence of arrays along an existing axis.

In [None]:
from IPython.display import Image
Image(url="https://i.stack.imgur.com/DL0iQ.jpg")

In NumPy and Pandas, axis 0 refers to the row axis, while axis 1 to the column axis.

In [None]:
x = np.array([[1, 2, 3], [4, 5, 6]])
x

In [None]:
np.concatenate([x, x], axis=0)

Concatenating two 2D arrays along the row axis means placing the second array below the bottom of the first array.

In [None]:
np.concatenate([x, x], axis=1)

Concatenating two 2D arrays along the column axis means placing the second array in the right-hand side of the first array.

In [None]:
x = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

np.split(x, [3, 4])       # Split x with 3 and 5 being the indices of split points.

numpy.split: https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html

The <b>split(ary, indices_or_sections, axis=0)</b> function splits an array into multiple sub-arrays. Splitting is the opposite of concatenation.

## Operations between Arrays and Scalars

NumPy arrays enable you to express batch operations on data without writing any <b>for</b> loops. This is usually called <b><i>vectorization</i></b>. Any arithmetic operations between equal-size arrays applies the operation elementwise.

Vectorization __generally__ results in faster execution. Apply an operation to an object all at once and not looping over single elements.

It is tempting to use vectorization with Numpy all the time to concise syntax and speed. Sometimes, those benefits come at a cost of __higher memory utilization__.

In [None]:
x = np.array([1, 2, 3, 4, 5])

In [None]:
x + 1

In [None]:
l = [1, 2, 3, 4, 5]
l + 1

Note that primitive Python lists do not allow computation on lists.

In [None]:
[num + 1 for num in l]

You need a <b>for</b> loop or list comprehension to do the same thing with a primitive Python list.

In [None]:
x * 2

In [None]:
1 / x

In [None]:
x = np.array([-3, -2, -1, 0, 1, 2, 3])

In [None]:
x ** 2

In [None]:
-x

In [104]:
x = np.array([1, 2, 3])
y = np.array([1, 3, 5])

In [105]:
x + y

array([2, 5, 8])

In [None]:
x * y

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 3, 5])
x + y

The two operands must have the same shape for addition, subtraction, muliplication, and division. 

## Fast Element-wise Array Functions

Mathematical functions: https://docs.scipy.org/doc/numpy/reference/routines.math.html

In [None]:
np.absolute(x)      # absolute values

numpy.absolute: https://docs.scipy.org/doc/numpy/reference/generated/numpy.absolute.html

In [None]:
np.exp(x)            # exponential (= e^x)

numpy.exp: https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html

In [None]:
x = [1, 2, 3]
np.power(3, x)        # power (= 3^x)

numpy.power: https://docs.scipy.org/doc/numpy/reference/generated/numpy.power.html

In [None]:
x = [1, 2, 4, 10]
np.log(x)             # ln(x)

numpy.log: https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html

In [None]:
x = [1, 2, 4, 10]
np.log2(x)            # log2(x)

numpy.log2: https://docs.scipy.org/doc/numpy/reference/generated/numpy.log2.html

In [None]:
x = [1, 2, 4, 10]
np.log10(x)           # log10(x)

numpy.log10: https://docs.scipy.org/doc/numpy/reference/generated/numpy.log10.html

In [None]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
np.dot(x, y)          # dot product of two arrays

numpy.dot: https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

In [None]:
a = np.arange(15).reshape(3,5)
b = np.arange(5).reshape(5,1)

print(a.shape)
a

In [None]:
print(b.shape)
b

In [None]:
print(a.dot(b).shape)
a.dot(b)

## Array Methods

In [None]:
np.random.seed(0)
x = np.random.randint(0, 50, 10)
x

In [None]:
x.sum()

numpy.ndarray.sum: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.sum.html

In [None]:
x.cumsum()

numpy.ndarray.cumsum: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.cumsum.html

The <b>cumsum(axis=None, dtype=None, out=None)</b> returns the cumulative sum of the elements along the given axis.

In [None]:
x.prod()

numpy.ndarray.prod: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.prod.html

In [None]:
x.cumprod()

numpy.ndarray.cumprod: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.cumprod.html

The <b>cumprod(axis=None, dtype=None, out=None)</b> returns the cumulative product of the elements along the given axis.

In [None]:
x.mean()

numpy.ndarray.mean: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.mean.html

In [None]:
x.var()

numpy.ndarray.var: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.var.html

In [None]:
x.std()

numpy.ndarray.std: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.std.html

In [None]:
x.min()

numpy.ndarray.min: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.min.html

In [None]:
x.max()

numpy.ndarray.max: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.max.html

In [None]:
x.argmin()

numpy.ndarray.argmin: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.argmin.html

The <b>argmin(axis=None, out=None)</b> returns indices of the minimum values along the given axis of a.

In [None]:
x.argmax()

numpy.ndarray.argmax: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.argmax.html

The <b>argmax(axis=None, out=None)</b> returns indices of the maximum values along the given axis.

In [None]:
x = np.arange(15)
x

In [None]:
y = x.reshape((3, 5))
y

numpy.ndarray.reshape: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.reshape.html

The <b>reshape(shape, ...)</b> method returns an array containing the same data with a new shape.

In [None]:
y.transpose()

numpy.ndarray.transpose: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.transpose.html

The <b>transpose(*axes)</b> returns a view of the array with axes transposed.

In [None]:
y.flatten()

numpy.ndarray.flatten: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html

The <b>flatten(...)</b> method returns a copy of the array collapsed into one dimension. <b>flatten</b> is the opposite of <b>reshape</b>.

In [None]:
x.astype(float)

numpy.ndarray.astype: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html

The <b>astype(dtype, ...)</b> copies of the array, cast to a specified type.

## Array Comparisons

Comparison of arrays yields an array of answers.

In [None]:
x = np.array([1, 2, 3, 4, 5,1,3])
x

In [None]:
x < 3

In [None]:
x == 3

In [None]:
(x < 3) | (x == 3)       # or

In [None]:
(x < 3) & (x == 3)        # and

## Sorting Arrays

In [None]:
np.random.seed(0)
x = np.random.choice(10, 5, replace=False)
x

In [None]:
np.sort(x)

numpy.sort: https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html

In [None]:
np.sort(x)[::-1]

There is no parameter like `reverse` in the <b>sort</b> function. 

In [None]:
x

Note that <i>x</i> has not changed when the <b>sort</b> function is used.

In [None]:
x.sort()

In [None]:
x

Note that <i>x</i> has changed when the <b>sort</b> method is used.

In [None]:
np.random.seed(0)
x = np.random.choice(10, 5, replace=False)
x

In [None]:
np.sort(x)

In [None]:
np.argsort(x)

numpy.argsort: https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

The <b>argsort(a, axis=-1, kind='quicksort', ...)</b> method returns the indices of the sorted elements, instead of the elements.

## Exercises for Array Functions and Methods (7 Questions)

1\. Create a 5 x 5 array <i>x</i> of integers from 1 to 25. Order is important. For example, the first row of <i>x</i> has integers from 1 to 5, the second row from 6 to 10, and so on and so forth. 

In [None]:
# Your answer here


2\. Get a copy of <i>x</i> collapsed into one dimension.

In [None]:
# Your answer here


3\. Get the min value in <i>x</i>.

In [None]:
# Your answer here


4\. Get the index number of the max value in <i>x</i>.

In [None]:
# Your answer here


5\. Negate <i>x</i> and assign the result to <i>y</i>.

In [None]:
# Your answer here


6\. Get the element-wise addition of <i>x</i> and <i>y</i>.

In [None]:
# Your answer here


7\. Get the dot product of <i>x</i> and <i>y</i>.

In [None]:
# Your answer here
